Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • C cmsgemos-analysis
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 27
    • Issues 27
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 3
    • Merge requests 3
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Terraform modules
    • Model experiments
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • cmsgemonline
  • gem-daq
  • cmsgemos-analysis
  • Issues
  • #11

present unpacked data frame as multi indexed

Summary

Presenting column names as tuples in form like

('FIELD NAME', 'BLOCK TYPE', 'BLOCK LEVEL', slot, link, position)

where:

  • 'FIELD NAME' - name of the data field in format
  • 'BLOCK TYPE' - name of data block (one of ['CDF', 'AMC13', 'AMC', 'GEB', 'VFAT']) * optional
  • 'BLOCK LEVEL' - one of ['HEADER', 'PAYLOAD', 'TRAILER'] * optional
  • slot, link, position - hardware topology tags with default values equal to None where they are not applicable

would allow automated conversion of the resulting data frame columns to pandas.MultiIndex. Subsequent data analysis would benefit of easy data slicing/selection.

What is the expected correct behavior?

Performance impact expected to be minor, the data will become multi-indexed and easily accessible for analysis.

Relevant logs and/or screenshots

Simple example:

l=[('Latency', -1,-1,-1),  ('VFAT HIT', 1, 0, 0), ('VFAT HIT',1, 0, 2),  ('VFAT HIT',1,0,3), ('VFAT HIT', 1, 1, 0), ('VFAT HIT', 1, 1, 2)]
df = pd.DataFrame(np.random.randint(0,128,(5,6)), columns = l)
df
(Latency, -1, -1, -1) (VFAT HIT, 1, 0, 0) (VFAT HIT, 1, 0, 2) (VFAT HIT, 1, 0, 3) (VFAT HIT, 1, 1, 0) (VFAT HIT, 1, 1, 2)
0 8 7 13 69 18 97
1 6 94 54 77 75 52
2 25 24 123 78 56 14
3 92 103 68 60 6 98
4 46 1 37 12 54 124
df.columns = pd.MultiIndex.from_tuples(df.columns, names=['reg_name','slot','link','position'])
df
reg_name Latency VFAT HIT
slot -1 1
link -1 0 1
position -1 0 2 3 0 2
0 8 7 13 69 18 97
1 6 94 54 77 75 52
2 25 24 123 78 56 14
3 92 103 68 60 6 98
4 46 1 37 12 54 124
df.iloc[:, df.columns.get_level_values(2)>=0]
reg_name VFAT HIT
slot 1
link 0 1
position 0 2 3 0 2
0 7 13 69 18 97
1 94 54 77 75 52
2 24 123 78 56 14
3 103 68 60 6 98
4 1 37 12 54 124
df.xs(0, level='link', axis = 1)
reg_name VFAT HIT
slot 1
position 0 2 3
0 7 13 69
1 94 54 77
2 24 123 78
3 103 68 60
4 1 37 12
Assignee
Assign to
Time tracking