There are some subdirectories in the CTA repo which contain files of historical interest but which are not really part of the code base:
CTA/migration contains the CASTOR to CTA migration scripts and tools. We have finished with this process. RAL still have to do their Facilities migration, otherwise no-one in the world needs these tools. Maybe of use to dCache/Fermi people as example code, as they will need to create a similar migration process for their tape metadata.
CTA/LTO_RAO contains documents, data and scripts that were used for the RAO on LTO investigation. The results of the investigation have been incorporated into cta-taped, so the contents of this directory are for historical information only (like if we wanted to write a paper about this).
CTA/doc contains some early CTA design documentation, plus some more recent papers and presentations, but it is not a comprehensive repository of all documentation.
CTA/EOSCTATAPE contains 1 config file and some e-mails from 2017. Ask @vlado if these need to be kept. The e-mails for sure should not be part of the repo due to OC11.
CTA/python: should this be moved to the cta-operations repo?
I propose that we move these subdirectories, with all of their git history intact, out of cta/CTA into new repos.
Then delete the subdirectories (including all their history) from the CTA repo. The history will be preserved in the new repo.
I will look at the CTA/doc portion of this and post a candidate solution for how to best preserve and remove the relevant git history from the main repo.
cd <some place that is not the location of your local CTA dev repo>git clone ssh://git@gitlab.cern.ch:7999/cta/CTA.git # Clone master of CTA repocd CTA/git-filter-repo --subdirectory-filter doc/git remote add origin <url_of_new_repo>git push --set-upstream origin master
diff -r shows no difference between the doc/ directory of the original CTA repo and the docs-only repo.
Recipe for the doc-less repo: (also excludes the LTO_RAO directory due to push file size limit)
cd <some place that is not the location of your local CTA dev repo>git clone ssh://git@gitlab.cern.ch:7999/cta/CTA.git # Clone master of CTA repocd CTA/git-filter-repo --invert-paths --path doc/ --path LTO_RAO/ --path migration/ --path EOSCTATAPE/git remote add origin <url_of_new_repo> # Or use the old one when satisfied, then push with --forcegit push --set-upstream origin master
Tags can be migrated to a new repo by adding the --tags flag to the push. I'm not sure how this will impact existing releases, still looking into this. Also note that force-pushing these changes to the real CTA repo does involve rewriting history, so we should discuss if we really want this and warn collaborators.
Recipe for the repo with all the paths mentioned above removed:
cd <some place that is not the location of your local CTA dev repo>git clone ssh://git@gitlab.cern.ch:7999/cta/CTA.git # Clone master of CTA repocd CTA/git-filter-repo --invert-paths --path doc/ --path LTO_RAO --path migration/ --path EOSCTATAPE/git remote add origin <url_of_new_repo> # Or use the old one when satisfied, then push with --forcegit push --tags --set-upstream origin master
I managed to get the repo-import to work (import from URL works), but it turns out that this does in fact not clone Gitlab resources such as releases. Nonetheless, here is what the imported CTA repo looks like after a git-filter-repo and a force push: https://gitlab.cern.ch/rbachman/CTA
Project settings (new project settings seem to be default ones)
General settings
Branch protection rules
CI runners
Merging branches across the repos
If possible, we would want to avoid having to migrate all branches to the cloned project in one go.
Branches are copied during the git project clone, but are not updated by the filter-repo tool. Trying to merge such a branch straight into the repo is very bad https://gitlab.cern.ch/rbachman/CTA/-/merge_requests/1 .
Recipe for developers migrating from one project to the other (works in simple case):
# Commit all remaining local changes you want to include, then:git push # push to original repo just in casegit remote add newrepo <git_url_of_new_project>git checkout -b "<my_branch_name>_filter-repo> git-filter-repo --force --invert-paths --path doc/ --path LTO_RAO/ --path migration/ --path EOSCTATAPE/git push newrepo# Then create merge request and merge# Close original branch on the new project
When trying this however it turns out that it is quite cumbersome:
Every time a patch includes changes outside of the target directory (docs/) it fails and one has to modify it manually.
For the docs repo 423 patches are produced, and a good amount of the first 50 I tried need at least some basic manual intervention.
The intervention count would then probably also have to be doubled, since the reverse changes must be done to rewrite the doc-less history repo.
WARNING git filter-branch has a plethora of pitfalls that can produce non-obvious manglings of the intended history rewrite (and can leave you with little time to investigate such problems since it has such abysmal performance). These safety and performance issues cannot be backward compatibly fixed and as such, its use is not recommended. Please use an alternative history filtering tool such as git filter-repo[1]. If you still need to use git filter-branch, please carefully read the section called “SAFETY” (and the section called “PERFORMANCE”) to learn about the land mines of filter-branch, and then vigilantly avoid as many of the hazards listed there as reasonably possible.
Instructions for testing a merge into the test-repo
Please test the following instructions with your development branch.
You will need an install of git >= 2.22.0 and to install the git-filter-repo tool (not included in git).
These commands will rewrite history in the newly created branch.
# Commit all remaining local changes you want to include, then:git checkout <my_branch_name>git push # push to original repo just in casegit remote add newrepo https://gitlab.cern.ch/rbachman/CTA.gitgit checkout -b <my_branch_name>_filter-repo>git-filter-repo --force --invert-paths --path doc/ --path LTO_RAO/ --path migration/ --path EOSCTATAPE/git push -u newrepo <my_branch_name>_filter-repo># Then create merge request and merge
Note: Expiry date for old ST egroups (it-dep-st-*) has been set to 2022-09-07. See KB0008134.
I believe we are still using this e-group to grant Developer access to GitLab. When we switch to the new repo, we should replace it with the new group it-dep-sd.
This works for the purpose of splitting out the subdirectories into separate repositories.
However, we actually want to rewrite history, as this is needed to clean up the emails in CTA/EOSCTATAPE which are mentioned at the top of the issue.
With the submodule approach one still has the commits with the cta-dcache directory in the history.
I did not find any difficulties.
In addition, after applying the command git-filter-repo to branch cta-1204_filter-repo, I tried to rebase it on top of newrepo/master. All went smoothly, which means that rebuild hashes of cta-1204_filter-repo match the rebuilt hashes of newrepo/master:
(env) afonso@ctadevjoaoafonso:~/workspace/CTA_proto$ git rebase newrepo/masterCurrent branch cta-1204_filter-repo is up to date.
We decided at the meeting today that I will create the proper new CTA repo and proceed as far as I can with the setup.
The new repo will be created from scratch, without exporting/importing, since there is no need to preserve the old branches.
I will then alert everyone on MM once I get to the point of doing something that warrants putting the old repo into read-only mode (such as moving over the old issues).
I've created the new project: https://gitlab.cern.ch/cta/cta-new . Once the move is complete we rename it and change the URL to match the present CTA one.
The settings should now be copied over with only some minor changes (some were in an undefined state in the original project, which Gitlab does not allow me to replicate, and I've kept the 100MB file size limit around this time to avoid future problems related to this).
@mdavis and @rbachman: CTA/python contains the code for eosfstgcd that must be distributed and versioned along with CTA: this is the eos garbage collector that runs on the EOSCTA disk servers.
CTA/migration contents can now be found in a repo of their own: https://gitlab.cern.ch/cta/castor-eoscta-migration. Please note that removing it breaks CI for existing tags in the new repo. As such I recommend getting the code for already existing tags from the original CTA repo, which we will rename and archive.
CTA/LTO_RAO has been merged into the existing similarly named project at https://gitlab.cern.ch/cta/tape-RAO . I have compressed the offending 100MB+ log artifact such that it should no longer be a problem.