irrad issueshttps://gitlab.cern.ch/groups/irrad/-/issues2023-12-07T17:55:22+01:00https://gitlab.cern.ch/irrad/orb-dataset/-/issues/8ORB-8: define and structure the metadata2023-12-07T17:55:22+01:00Jaroslaw SzumegaORB-8: define and structure the metadataThe existing metadata (including the ontology) and the Vocabulary of Interlinked Datasets (VoID) should be added and organized in the proper destination inside this repository.The existing metadata (including the ontology) and the Vocabulary of Interlinked Datasets (VoID) should be added and organized in the proper destination inside this repository.Jaroslaw SzumegaJaroslaw Szumegahttps://gitlab.cern.ch/irrad/orb-dataset/-/issues/7ORB-7: import changes that fix missing venues2023-12-04T14:20:04+01:00Jaroslaw SzumegaORB-7: import changes that fix missing venuesCurrent approach used the venue names existing in previously constructed grading scales. They were the keys of this python dict that existed when the scores existed.
However, for certain OpenReview venues, such as some workshops, there ...Current approach used the venue names existing in previously constructed grading scales. They were the keys of this python dict that existed when the scores existed.
However, for certain OpenReview venues, such as some workshops, there is no formal gradings and the name was not recorded.
This import fixes this issue.ORB Continuous Maintenance PhaseJaroslaw SzumegaJaroslaw Szumegahttps://gitlab.cern.ch/irrad/orb-dataset/-/issues/6ORB-6: automatic ORB's statistics calculation2023-12-06T11:55:27+01:00Jaroslaw SzumegaORB-6: automatic ORB's statistics calculationImport changes providing automatic statistics. It will serve both as utils and notification message when ETL process finishes.Import changes providing automatic statistics. It will serve both as utils and notification message when ETL process finishes.ORB Continuous Maintenance PhaseJaroslaw SzumegaJaroslaw Szumegahttps://gitlab.cern.ch/irrad/orb-dataset/-/issues/5ORB-5: Add field to the OrbVenue that will help determine online portal where...2023-12-04T14:20:38+01:00Jaroslaw SzumegaORB-5: Add field to the OrbVenue that will help determine online portal where venue publishes submissions and reviewsIn order to have better distinction between data sources, the field describing the online publishing portal should be added. For venues like "NeurIPS", "ICLR"... that would be OpenReview.net, for "SciPost Physics Proceedings", "SciPost P...In order to have better distinction between data sources, the field describing the online publishing portal should be added. For venues like "NeurIPS", "ICLR"... that would be OpenReview.net, for "SciPost Physics Proceedings", "SciPost Physics Lecture Notes"... that is SciPost.orgORB Continuous Maintenance PhaseJaroslaw SzumegaJaroslaw Szumegahttps://gitlab.cern.ch/irrad/orb-dataset/-/issues/4ORB-4: Merge with Radnext NLP repository2023-12-06T11:55:27+01:00Jaroslaw SzumegaORB-4: Merge with Radnext NLP repositoryThe following issues were addressed [date 23.11.2023]
- fixed OrbGrading description as the SciPost had repeated grade name instead of description
- SciPost venue id is extracted from full name, it is not "SciPost" for all the itmes
- ad...The following issues were addressed [date 23.11.2023]
- fixed OrbGrading description as the SciPost had repeated grade name instead of description
- SciPost venue id is extracted from full name, it is not "SciPost" for all the itmes
- added <purge_submissions_without_review> function to ScipostRawDataset
- purging submissions for ORB dataset is now optional and parametrized
- OpenReview grading fixed when multiple colons ":" are in the grade text
- OpenReview and SciPost boolean decision flag is predicted based on mutliple specific keywords (not only "accept/publish" as before)
- found and covered corner-cases when OpenReview final decision was not in 'Decision' note but in 'Meta_Review'
- optimized export to JSON - instead of dumping whole json-string the JSON Encoder is used to iteratively dump the chunks of dataORB Continuous Maintenance PhaseJaroslaw SzumegaJaroslaw Szumegahttps://gitlab.cern.ch/irrad/orb-dataset/-/issues/3ORB-3: improve OpenReview notes parsing to accomodate more papers and reviews2023-11-23T22:37:55+01:00Jaroslaw SzumegaORB-3: improve OpenReview notes parsing to accomodate more papers and reviewsThe extraction and processing code rejected some submissions that could not be processed (we purge the submissions without reviews as they are not following the main purpose of the dataset).
After analysis, we managed to establish the r...The extraction and processing code rejected some submissions that could not be processed (we purge the submissions without reviews as they are not following the main purpose of the dataset).
After analysis, we managed to establish the root-cause of that behaviour - the structure of "note" differes between venues:
1) e.g. note's keys may have different names: final rating may be called "rating", "recommendation", "score" etc.,
2) review may be stored as fulltext "review" or in few dict keys as sections "strengths"+"weaknesses"+"limitations"
The code was refactored to be more modular and to perform some searches in the keys list.
Additionally, to store larger number of submissions in JSON format, the code was changed because saving used too much memory and caused MemoryError.Jaroslaw SzumegaJaroslaw Szumegahttps://gitlab.cern.ch/irrad/orb-dataset/-/issues/2ORB-2: Import ORB fixes from Radnext NLP repository2023-12-04T14:21:23+01:00Jaroslaw SzumegaORB-2: Import ORB fixes from Radnext NLP repositorySince we observed a few corner-cases and caveats causing bugs in SciPost submissions processing (already fixed in internal Radnext NLP branch), we import the fixes to official ORB repository.Since we observed a few corner-cases and caveats causing bugs in SciPost submissions processing (already fixed in internal Radnext NLP branch), we import the fixes to official ORB repository.Jaroslaw SzumegaJaroslaw Szumegahttps://gitlab.cern.ch/irrad/orb-dataset/-/issues/1ORB-1: extract ORB dataset code from Radnext NLP repository2023-07-28T17:17:30+02:00Jaroslaw SzumegaORB-1: extract ORB dataset code from Radnext NLP repositoryORB import to standalone repoJaroslaw SzumegaJaroslaw Szumega