A word of caution, since I made this error already in a lot of other repositories. When using large files (examples, testing) these clutter the repository. They cannot be deleted, due to the distinct history of git repositories and make syncs from and to repository increasingly slow. This repository is already at 100 Mb. Proper deleting would require a full rebase to the beginning of this project. That should happen better earlier than later.
I recommend putting large files in another repository, the github.org binary files feature would be perfect. I am not sure if something similar is offered at gitlab.cern.ch.
Edited
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
"In more extreme cases where files are tens of MB large, depending on how loaded the system is, GitLab could become unstable. If you observe that your project contains files of this characteristics and in order to keep GitLab service in the best shape possible for all the community please, reconsider your design if it is a software project or the need of your long files to be in a Git repository." (https://cern.service-now.com/service-portal/article.do?n=KB0003887)
the problem is the fixtures.. we could agree on running the unittests on special, smaller data sets, which would have to be created especially for testing. this is less realistic, but on the other hand would also significantly speed up the pipeline runtime..
This is bad. Then the only way I see it to open a github.org repository for the binary files. Unfortunately versioning and synchronization will not work with this repository. This would only work if we are on github.org.
@hemperek The reason to be on cern.gitlab.ch are the builders, no? Otherwise github.org is just better to my mind.
I asked the author of the article the following and hope to get a sufficient answer:
Unit testing is one of the important tools of software engineering. A common way to test data analysis is to provide fixtures and check for changes in the output. These tests run on continuous integration systems on new commits. The fixtures can be files of any format with any size. To be able to track failure modes it is useful to have a versioning system even on binary files and have them synchronized with the git repository. How is it possible to get the benefits of git-lfs (auto download for CIs, versioning) with the infrastructure provided at CERN? I would be happy to hear from you
unless there is an unexpectedly helpful answer from CERN support, we decided to move the fixtures to a dedicated github-lfs repository and rebase the gitlab repository to get rid of the large files in the history. for this, we will remove / merge all open branches this week. please do not open any new branches until this is done!
These solutions would require to write firmware download functions. And the download link to the actual firmware has to be frequently and manually updated. Maybe another firmware git-lfs is easier?
I did a test run on removing above 0.5 Mb files from repository and (much) more than RD53 data files showed up. These files would be removed, I guess this is not OK:
We also should not have that many firmware builds. You have no reason to flash an old version and if you really want, you can always build it. The latest master and the latest development version might be enough. Only one file per firmware version number (0.1 and 0.2 for now)? But then we have to increment our versions more often to force users to update.
We have a problem. Data bandwidth via git-lfs on github.com is not free anymore for academic/research organizations. We are back finding a decent solution... :-( Paying bandwidth is tough to pay (5$ / 50 Gb). With about 10 builds a day with 100 Mb each 5$ could be enough, but I do not like this.
At the moment, Git-LFS is not enabled in the GitLab instance at CERN, as we consider there is some overlap with other file systems available at CERN. From the description of your use case, we believe EOS [0] could be the most appropriate solution.EOS seems to be the storage medium of choice for analysis data. It does not support versioning like git but one could imagine storing each (immutable) version of fixture data like this:/eos/myexperiment/path/datasample_v1.dat/eos/myexperiment/path/datasample_v2.dat/eos/myexperiment/path/datasample_v3.datThen have a version-controlled symlink in the git repository pointing to the relevant data sample:my_git_repo/test/fixtures/datasample.dat [symlink to /eos/myexperiment/path/datasample_v1.dat]If the test should now use fixture data v2, change the symlink in the git repo to point to /eos/myexperiment/path/datasample_v2.datCertain tools like git-annex can help automate this process if there are many files to manage.If the data in EOS is not public (accessible anonymously), credentials will have to be provided for the CI job to access EOS data, see https://gitlab.cern.ch/gitlabci-examples/deploy_eos for an example. This assumes a GitLab runner with EOS access has been set up [1].This is what we believe can be the best approach with the current set of services at CERN. In any case, your use case has been collected and will be considered in the future if we review the availability of git-LFS at CERN.
We did a rebase and rewrote successfully the history (including all merge requests). The repo size before and after rebase is checked with git count-objects -vH is: