Binary files

changed the description

doesn't look good for an easy solution...

"As a number of options exist already at CERN, Git-LFS (Large File Storage) is not provided." (http://information-technology.web.cern.ch/services/git-service)

"In more extreme cases where files are tens of MB large, depending on how loaded the system is, GitLab could become unstable. If you observe that your project contains files of this characteristics and in order to keep GitLab service in the best shape possible for all the community please, reconsider your design if it is a software project or the need of your long files to be in a Git repository." (https://cern.service-now.com/service-portal/article.do?n=KB0003887)

the problem is the fixtures.. we could agree on running the unittests on special, smaller data sets, which would have to be created especially for testing. this is less realistic, but on the other hand would also significantly speed up the pipeline runtime..

This is bad. Then the only way I see it to open a github.org repository for the binary files. Unfortunately versioning and synchronization will not work with this repository. This would only work if we are on github.org.

@hemperek The reason to be on cern.gitlab.ch are the builders, no? Otherwise github.org is just better to my mind.

@mdaas Good find

@hemperek The reason to be on cern.gitlab.ch are the builders, no? Otherwise, github.org is just better to my mind.

Yes. We use some licensed software (simulation).

I asked the author of the article the following and hope to get a sufficient answer:

Unit testing is one of the important tools of software engineering. A common way to test data analysis is to provide fixtures and check for changes in the output. These tests run on continuous integration systems on new commits. The fixtures can be files of any format with any size. To be able to track failure modes it is useful to have a versioning system even on binary files and have them synchronized with the git repository. How is it possible to get the benefits of git-lfs (auto download for CIs, versioning) with the infrastructure provided at CERN? I would be happy to hear from you

unless there is an unexpectedly helpful answer from CERN support, we decided to move the fixtures to a dedicated github-lfs repository and rebase the gitlab repository to get rid of the large files in the history. for this, we will remove / merge all open branches this week. please do not open any new branches until this is done!

removed Discussion label

added Bug and removed ~3671 labels

mentioned in merge request !37 (merged)

What about the firmware binaries? Frequent changes will also contribute to repository size.

What about the firmware binaries? Frequent changes will also contribute to repository size.

Not optimal but could add them to tag/release here: https://gitlab.cern.ch/silab/bdaq53/tags

Or generate on every build/tag and upload (will take time).

Can use the wiki for development.

These solutions would require to write firmware download functions. And the download link to the actual firmware has to be frequently and manually updated. Maybe another firmware git-lfs is easier?

Although I do not like to be forced to split one repo into multiples. But git submodule feature is meant for this...

These solutions would require to write firmware download functions.

@dphol Firmware is not part of the package nevertheless. Needs to be downloaded instantly.

I did a test run on removing above 0.5 Mb files from repository and (much) more than RD53 data files showed up. These files would be removed, I guess this is not OK:

Deleted files
-------------

	Filename                               Git id                                                    
	-------------------------------------------------------------------------------------------------
	SiTCP_XC7K_32K_BBT_V100.ngc          | 291eef20 (2.2 MB)                                         
	analog_scan.h5                       | 318dc969 (5.7 MB), ffdbe30b (3.5 MB), 863e0c88 (5.8 MB)   
	analog_scan_clustered_result.h5      | ff2512e2 (8.7 MB)                                         
	analog_scan_interpreted_result.h5    | d1172771 (7.0 MB), 1b04ec1e (7.0 MB), 090019eb (4.0 MB)   
	aurora_64b66b_1lane.xml              | 9d5d47bd (2.6 MB)                                         
	aurora_64b66b_1lane_kc705.xml        | 6cdfc00d (2.6 MB), 3f2ffbc8 (2.6 MB)                      
	aurora_64b66b_1lane_sim_netlist.v    | b9c53ca1 (845.8 KB)                                       
	aurora_64b66b_1lane_sim_netlist.vhdl | f0e74f8a (1.2 MB)                                         
	bdaq53.bit                           | a896cfdd (2.1 MB), 089741c3 (2.5 MB), ...                 
	bdaq53_flash.mcs                     | d446b21f (6.8 MB), 9d2a0bb8 (5.8 MB), ...                 
	bdaq53_kc705.bit                     | bbe03d0b (2.9 MB), 7693eb6c (10.9 MB), ...                
	bdaq53_mio3.bit                      | 927b55cb (6.4 MB)                                         
	digital_scan.h5                      | 71602427 (1.1 MB), 8115c1d5 (1.1 MB), ...                 
	digital_scan_interpreted_result.h5   | e8448f46 (1.5 MB), 6a491631 (7.2 MB), ...                 
	kc705.bit                            | 8c4bd8bc (3.1 MB), 5e113f5f (3.0 MB), 34074c39 (3.0 MB)

@hemperek Firmware is part of the actual repo

Complete list of all files to be deleted including history changes:

(Sorry for the mess but code collapsing is apparently not supported in gitlab markdown...)

291eef2060319dcfa2fbef8bf84d0443c650cc8e 2310531 SiTCP_XC7K_32K_BBT_V100.ngc
318dc969c65347f883d48ab1833a648e66851672 6008537 analog_scan.h5
ffdbe30be786aa1dcff0f0833e53dc3935eb7471 3633136 analog_scan.h5
863e0c88f22f407ec2297beb3a892bd9017c4e09 6030677 analog_scan.h5
ff2512e2afcf2918527bcd7218b4d30bc57496ff 9119922 analog_scan_clustered_result.h5
d117277108984d9416a2671620ce60ed06b83c07 7292151 analog_scan_interpreted_result.h5
1b04ec1ea1bc7a6327e7e390fb4d9027c7d00594 7293751 analog_scan_interpreted_result.h5
090019eb75974eb275fd512c94fb48a1b90eefe5 4211734 analog_scan_interpreted_result.h5
9d5d47bd6654a9b76397e14e091bb0524047c3b9 2719910 aurora_64b66b_1lane.xml
6cdfc00d7eadd76fbbde54b13636c8363be88d74 2720917 aurora_64b66b_1lane_kc705.xml
3f2ffbc89c3ca55910af8d1bb8c0122e0f5ede6a 2720917 aurora_64b66b_1lane_kc705.xml
b9c53ca171827200acd3f80e0b958d94d0f51fc7 866117 aurora_64b66b_1lane_sim_netlist.v
f0e74f8a5543cfafa73e3453409e1cdfce719ae5 1208036 aurora_64b66b_1lane_sim_netlist.vhdl
49e3da2601cfa4a715f19e4e2da247604232dc73 6692675 bdaq53.bit
089741c374e7521ddbc9854f458137eb4c506025 2578653 bdaq53.bit
a896cfddba956a0073809a3f8dc02a1e3ba07dcd 2156817 bdaq53.bit
703a876de1039f21003e9002a9c5f6b92cbdedcf 6692675 bdaq53.bit
36aeb6a7ef8403b874b1b6900c617741eae0585e 2578653 bdaq53.bit
aa666e613cf4d94eb96e120afb4ad35980408f3a 2583169 bdaq53.bit
30c9d4bc23c4f53ae47ef0e9ba49125df35e5766 2204349 bdaq53.bit
d446b21f4f0482fd3ae5de3122918dc9b3a212a3 7104048 bdaq53_flash.mcs
9d2a0bb82447a977b7645646513c790dd3505d14 6062200 bdaq53_flash.mcs
d2303d5e65f9e33fa1d9e0e7898348f2efa0acf9 7091632 bdaq53_flash.mcs
f050976c65d9845804d64f6b7b79915c16ab8c47 5931468 bdaq53_flash.mcs
bbe03d0b88b54841be77c7196c0703dd5fbdd319 2994841 bdaq53_kc705.bit
7693eb6c85026937934a280a889b9005c96399f6 11443718 bdaq53_kc705.bit
72720c465d894d297697841b04e2dd301d22bc78 2994841 bdaq53_kc705.bit
c8ba9b866e591789892e00825eb54bb9031e20dd 11443715 bdaq53_kc705.bit
927b55cba9334b0efd4765b35ed35b692bfaa625 6692678 bdaq53_mio3.bit
7160242723ed76a2329f952cba66bdc3b6e8f04b 1193894 digital_scan.h5
8115c1d5f7ba6795a1c031e529a2aad265fb56d5 1194777 digital_scan.h5
1eafa537058a61bce16ab7c7e1583f001a47f74d 1202089 digital_scan.h5
238129935a9660967049f3546b0bba89bb0aaeaa 1189193 digital_scan.h5
e8448f46d644c7560da2579e4aba20d1bb6567a9 1537589 digital_scan_interpreted_result.h5
6a491631ddb3b644895fa13078d6a290fe1a35a4 7588967 digital_scan_interpreted_result.h5
ac6c967344facf829e34429216dda0c532534667 1538775 digital_scan_interpreted_result.h5
d55c89c3ddc04a1d333ef164266b6507e18a778f 1537790 digital_scan_interpreted_result.h5
8c4bd8bcec51e7a3c7cd0cb22d1c3592ac5341c8 3218905 kc705.bit
5e113f5fac88166f407dfb0ac22631f864b32f10 3181825 kc705.bit
34074c39e0da88e61ad1888e759f5058b357cd99 3158869 kc705.bit
03a0cfae63b673186c2d645828a851a877a680d6 8687364 kc705_flash.mcs
0581856d08beca5febfae77094d05d5cc7279329 8852488 kc705_flash.mcs
0c2d6de9925ab54d58b7cd6b8ab9b8c622bcb2b6 8750496 kc705_flash.mcs
d413cb3fe47cb9fca9420280a2d77f738dd9b584 8236248 kc705_flash.mcs
1795f882a690109a5f4b32eedeeb3b25aafab0a0 6692675 rd53a_daq.bit
1e4d0db984f6147e35297ae5a4f7038e18379a80 6692678 rd53a_daq.bit
c11e84fc427a6f229292e3a76006c4e685053e6b 11443718 rd53a_daq_kc705.bit
223b97bbe75733ecb1728f677c6f3b12cc7b5a98 29283968 threshold_scan.h5
efd77bbf6503f276b4075ae6a1718f55e612276d 29198460 threshold_scan.h5
c52ffbf4e0159be7ebd4852aa02e767b46689370 29135923 threshold_scan.h5
5e6a2361f19a8b40b0aebe4176371eccd1bf7e5a 44043014 threshold_scan_interpreted_result.h5
a9a77b1d36b78a5a24d0be43ab824d9dcc74df8d 43556063 threshold_scan_interpreted_result.h5
2aa3c15b37eb810386651d847072588f5d7d0d29 43836149 threshold_scan_interpreted_result.h5

The source files to generate firmware is part of the repo the binary I am not so sure about. We have to decide. I would keep it somewhere else.

So keeping *.xml, *.v, *.vhdl, *.ngc and deleting *.h5, *. mcs, *.bit?

This should go out too SiTCP_XC7K_32K_BBT_V100.ngc

We also should not have that many firmware builds. You have no reason to flash an old version and if you really want, you can always build it. The latest master and the latest development version might be enough. Only one file per firmware version number (0.1 and 0.2 for now)? But then we have to increment our versions more often to force users to update.

We have a problem. Data bandwidth via git-lfs on github.com is not free anymore for academic/research organizations. We are back finding a decent solution... :-( Paying bandwidth is tough to pay (5$ / 50 Gb). With about 10 builds a day with 100 Mb each 5$ could be enough, but I do not like this.

Well actually paying might be rather easy, I am in favour of paying 5$ one month and see if that is enough. What do you think?

Got a reply from CERN:

At the moment, Git-LFS is not enabled in the GitLab instance at CERN, as we consider there is some overlap with other file systems available at CERN. From the description of your use case, we believe EOS [0] could be the most appropriate solution.
 
EOS seems to be the storage medium of choice for analysis data. It does not support versioning like git but one could imagine storing each (immutable) version of fixture data like this:
 
/eos/myexperiment/path/datasample_v1.dat
/eos/myexperiment/path/datasample_v2.dat
/eos/myexperiment/path/datasample_v3.dat
 
Then have a version-controlled symlink in the git repository pointing to the relevant data sample:
 
my_git_repo/test/fixtures/datasample.dat [symlink to /eos/myexperiment/path/datasample_v1.dat]
 
If the test should now use fixture data v2, change the symlink in the git repo to point to /eos/myexperiment/path/datasample_v2.dat
 
Certain tools like git-annex can help automate this process if there are many files to manage.
 
If the data in EOS is not public (accessible anonymously), credentials will have to be provided for the CI job to access EOS data, see https://gitlab.cern.ch/gitlabci-examples/deploy_eos for an example. This assumes a GitLab runner with EOS access has been set up [1].
 
This is what we believe can be the best approach with the current set of services at CERN. In any case, your use case has been collected and will be considered in the future if we review the availability of git-LFS at CERN.

We have 50GB on github. It looks like for free will see.

We did a rebase and rewrote successfully the history (including all merge requests). The repo size before and after rebase is checked with git count-objects -vH is:

Before

count: 71
size: 288.00 KiB
in-pack: 3084
packs: 6
size-pack: 182.03 MiB
prune-packable: 9
garbage: 0
size-garbage: 0 bytes

After

count: 0
size: 0 bytes
in-pack: 2938
packs: 1
size-pack: 1.46 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

Clean up was done with external git cleaning tool: bfg and using these commands:

git clone --mirror https://gitlab.cern.ch/silab/bdaq53.git
java -jar bfg-1.12.16.jar --delete-files '*.{h5,mcs,bit,ngc,vhdl}' bdaq53.git
cd bdaq53.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push --force

closed

Binary files

Designs

Child items ...

Activity