ORB-3: improve OpenReview notes parsing to accomodate more papers and reviews
The extraction and processing code rejected some submissions that could not be processed (we purge the submissions without reviews as they are not following the main purpose of the dataset).
After analysis, we managed to establish the root-cause of that behaviour - the structure of "note" differes between venues:
- e.g. note's keys may have different names: final rating may be called "rating", "recommendation", "score" etc.,
- review may be stored as fulltext "review" or in few dict keys as sections "strengths"+"weaknesses"+"limitations"
The code was refactored to be more modular and to perform some searches in the keys list. Additionally, to store larger number of submissions in JSON format, the code was changed because saving used too much memory and caused MemoryError.