Analysis guide · Wiki · croc_testing / croc_wlt

This guide provides an in-depth documentation of the analysis workflow, covering the main aspects of its configuration and usage tips.

Introduction
- Analysis output
Main analysis components
Data storage in DCA database
- DB upload instructions
- Verifying or updating data in DB
Usage tips

Introduction

The ultimate goal of the analysis tool is to produce a list of chips on the wafer that satisfy all the necessary quality requirements.
When testing many different parameters of many chips on a single wafer some tests can fail due to external issues that are not related to the chip quality itself. Therefore, analysis is also used for in-depth look into the quality of the measured data to identify potential problems in the testing procedure and for repeating the test of certain chips if necessary.

See brief overview of the workflow in this presentation

Analysis output

For each quality criterion the corresponding measured and analysed data is compared against the predefined set of threshold values putting each chip in one of the quality categories: green/yellow/red.
The final quality of a chip is defined as the worst category it had across all the considered criteria, e.g. if the chip falls into the yellow region for one criterion and is green for all other criteria, its final quality status will be yellow.

For each quality criterion the data is visualised in two representation:

histogram - 1D distribution of data from all the tested chips and the corresponding quality region
wafer map - 2D map of the wafer showing individual data and quality status for every chip

All these plots are combined into a single PDF file: <data folder>/plots/summary/wafer_wise.pdf.

Other types of plots are also produced, such as graphs showing multiple measured values overlaid for a single chip, allowing to correlate several measured values for in-depth analysis by eye. Such plots are combined into separate PDF files, 1 per plot with data from every chip shown on a separate page.

Main analysis components

The primary tool for analysing data from a wafer test is wlt/waferanalyzer.py, which takes as input the path to the data folder and processes it according to the two configuration files:

analysis configuration - defines all the plots to be produced using Python syntax and lambda functions
quality-region definitions - defines chip-quality region boundaries for each plot, together with complementary information, such as axis labels, group IDs, priority, etc.

Versioning

Analysis configurations are supposed to change very rarely, only when modifications to the testing procedure are made, such as changes in hardware, DAQ version, logics of some tests or introduction of new test procedures.
Every version of analysis configuration is kept in a separate folder wlt/config/analyzer_v*, containing a corresponding pair of analyzer_config.py and analyzer_regions.py files.
By default the latest available version is used (i.e. with the greatest version number).

An older version of analysis configuration can be selected using the -c|--config <version number> option, e.g. -c 0.

Quality-region definitions can vary between different production batches or even between different wafers within a single batch, therefore only a single analyzer_regions.py file is kept in the repository for any given analysis version.
It is normal for regions of some non-critical tests to be slightly adjusted after the first iteration of analysis if a significant number of chips are disqualified by those tests. The adjusted version of the quality-region definitions actually used in the analysis is always stored in the output folder <data folder>/plots/_region_definitions.py, and is kept in the database for future references.

A specific file with region definitions can be used in analysis using the -r|--regions option, e.g. -r <path>/_region_definitions.py

WaferAnalyzer

Analysis configuration

Quality-region definitions

Data storage in DCA database

Output data from a subset of plots are stored to db/*.json files: 1 file/chip + 1 file/wafer.
In addition a set of 3 files is put in db/files/ that can be stored to DB for this wafer:

{wafer}_map.pdf - copy of the summary wafer map for external usage
{wafer}_region.py - region definitions used in this analysis
{wafer}_docs.zip - multiple additional files combined into a ZIP archive

Arbitrary number of additional files can be added to {wafer}_docs.zip even after the analysis has finished, by rerunning waferanalyzer.py in the --update mode without plotting:

python wlt/waferanalyzer.py <folder> -du -e <file1> <file2>

After all the plots have been validated by the operator this data can be uploaded to the Detector Construction Assembly (DCA) database for future reference during detector construction and operation.

DB upload functionality is implemented in the py4dbupload repository, which has to be downloaded and set up independently from croc_wlt.

NOTE: Operator uploading the data must have CERN credentials (e.g. for login to lxplus.cern.ch), which are requested only once and then stored in the local .session.cache file. Furthermore, membership in cms-tracker-constructionDB and cms-tracker-qcOperators e-groups is necessary for write access to the DB, which can be requested from Sandro Di Mattia (sandro.di.mattia@cern.ch) for every qualified operator.

DB upload instructions

Set up the py4dbupload package following the instructions in its README, including its submodules and dependencies.

Assuming the analysis output stored in <folder>/plots/, execute the following commands:

# Set up the environment
source bin/setup.sh
# Generate XML/ZIP files to be digested by DB and upload them
python run/registerCrocData.py --verbose -d <folder>/plots/db/ --upload

This script produces 2 output files that are uploaded to the DB:

<wafer_label>_results.xml - values of actual WLT measurements from the wafer and each chip
components_update.zip|.xml - properties of the affected parts (wafer and chips) that need to be updated due to the new WLT data (e.g. grade, serial number, supporting files)

NOTE: To test that the preparation of data for DB upload works properly you can do a dry run first by removing the --upload option. Dry run takes almost the same amount of time because it still queries the DB for each chip being processed, to ensure that the relevant items exist in DB, and to populate XML files with the necessary part IDs for cross-referencing between different DB tables. The actual upload of data takes just a few seconds.

Verifying or updating data in DB

When WLT data is already in DB it can be compared to the local input data using wlt/utils/db_reader.py script:

python wlt/utils/db_reader.py -w <wafer_label> -i <folder>/plots/db/

This will download WLT data from DB for the wafer and all its chips and compare each value with the ones in the input data.
Every detected difference or values missing from DB will be reported.

NOTE: Due to a bug in the DB code the time component of TEST_START field is always 00:00:00. This is printed in the terminal but is not counted in the total number of differences found by the script.

Usage tips

Analysis speed-up

The 3 most time-consuming parts of the analysis are:

Generation of graphics (histograms, wafer maps)
Processing of ROOT histograms (threshold-scan distributions)
Merging of individual plots into summary PDFs

If only text output is required (e.g. for DB upload), you can disable generation of plots --noplots and wafer maps --nomaps.
If you're only interested in individual plots for presentations, without merged summary, you can disable merging into summary PDFs: --nomerge.
If you're only interested in wafer-level distributions, without detailed chip-level graphs, you can skip them with -w, --waferonly option.

Gradual adding of new chips during a scan

By default waferanalyzer.py deletes the output plots/ folder and starts analysis from scratch. When repeating analysis in the middle of the wafer scan after new chip has been tested you can significantly speed-up the process by reusing the output from the previous iterations.

To do this add the following options to waferanalyzer.py: -ku.
This will --keep individual plots in the plots/chips and plots/wafer folders after merging them into summary PDFs, and it will run plotting in the --update mode, reusing output cached during the last execution, saving time.

Cache detection is based on the chip JSON file name, therefore a chip will be reanalyzed if a file with a newer date-time in the name is provided as input. Otherwise the old data will be used from cache. Chip-wise plots for such chips will not be redrawn if the cached data was reused. Instead wafer-wise plots will be redrawn if at least 1 chip has been reanalyzed. PDF merging at the end will happen regardless of cache and has to be disabled explicitly with --nomerge if it's not needed during the scan.

Tuning of quality-region definitions

If you need to iteratively adjust some of the region definitions and see how it affects the yield, rerunning analysis of all the tests from scratch would be a waste of time. Like in the previous example, it can be sped up dramatically by adding -ku option to keep the plots and run analysis in the update mode.
In this case wafer maps and histograms will be plotted only once for all the tests, while in all the subsequent iterations they will be redrawn only for those tests whose region definitions were modified.

Collection of large statistics

It is possible to run jsonplotter.py over an arbitrarily large number of input JSON files, which allows to fill the plots configured for waferanalyzer.py with data from multiple independent tests/wafers stored in separate folders. If path to a folder is provided as input, all JSON files from that folder will be processed, like this:

python wlt/jsonplotter.py -w <folder1/file1.json> <folder1/file2.json> <folder2>

If only a small set of plots has to be produced in high statistics run it several times with -n <pattern> option to process only relevant plots, e.g. those containing "vina_shuntldo":

python wlt/jsonplotter.py -w -n "vina_shuntldo" <folder1> <folder2> ... <folderN>

Comparison of wafers within a batch

waferanalyzer.py produces a _wafer_data_*.json file containing statistical information about the scan and each wafer-wise plot produced by it (MIN, MAX, MED, RMS, etc). It is possible to use these JSON files as input to the jsonplotter.py for producing histograms with 1 entry/wafer using the dedicated configuration file:

python wlt/jsonplotter.py -c wlt/config/analyzer_config_wafers.py <folder1>/_wafer_data_*.json <folder2/_wafer_data_*.json> ... <folderN>/_wafer_data_*.json

Statistical definition of quality regions

Normally all the boundaries of red/orange/green regions are defined in the wlt/config/analyzer_v*/analyzer_regions.py file. For certain plots it is sufficient to define region boundaries based on their statistical distribution. This can be achieved by the --stat_regions command-line option, with boundaries defined in multiples of RMS around the median value. Symmetric regions of ±8*RMS (red), ±6*RMS (orange), ±3*RMS (green) can be defined for all the ring-oscillator plots using:

python wlt/waferanalyzer.py --stat_regions -8 -6 -4 4 6 8 -n "ring_oscillator" <folder>

These automatically generated regions are saved to the _region_definitions.py file in the output folder, from where they can be copied to the static configuration file for later use in future analysis iterations.

Validation of analysis configuration

Analysis configuration files and region definitions are self-sufficient Python files, which can be executed directly with Python interpreter without launching a complete analysis sequence, like this:

python wlt/config/analyzer_v1/analyzer_config.py
python wlt/config/analyzer_v1/analyzer_regions.py

In case of any syntax or spelling errors they will be immediately reported by the interpreter. Use -h command-line option to display possible additional functions provided by the configuration script.