Poor query performance, even for paginated results

Creating a transient issue on the pylogbook tracker so that we can link to the discussion/outcome.

Hi *,

First of all, I do not seem to have access to the ELOG JIRA, so maybe there is already an issue for this...

I am using the Logbook API (via pylogbook) to fetch the last 10 entries from a particular activity. But this seems to be very slow. Phil did a test with the raw API and this doesn't seem to be added by pylogbook:

$ time curl -H "Authorization: RBAC ${RBAC_TOKEN_SERIALIZED}" https://cs-ccr-logdev.cern.ch/elogbook-server/GET/events?filter=%7B%22doCount%22%3A+false%2C+%22startRecord%22%3A+0%2C+%22nbRecordToFetch%22%3A+25%2C+%22activityIds%22%3A+%5B2%5D%7D -O /dev/null --silent

real    0m1.038s
user    0m0.067s
sys    0m0.078s

Any idea if this could be made a bit faster? In my case I only need entry IDs and times. So if it would speed things up, maybe there could be an option to avoid fetching the full entires?

Cheers,

@tlevens

Hi, Through REST a get the same request result in less than 250 ms.

Phil sent me the curl example some time ago which I copied/pasted. Since it is using the test server then maybe the conditions are different now?

But if I try on the SPS operational e-logbook:

time curl -H "Authorization: RBAC ${RBAC_TOKEN_SERIALIZED}" https://logbook.cern.ch/elogbook-server/GET/events?filter=%7B%22doCount%22%3A%20false%2C%20%22startRecord%22%3A%200%2C%20%22nbRecordToFetch%22%3A%2025%2C%20%22activityIds%22%3A%20%5B424%5D%7D -O /dev/null --silent

real  0m1.904s
user  0m0.087s
sys   0m0.128s

Hello guys, I had a look on the DB’s side and there is not much we can do for now. We have ~1.2 millions rows table and what happen when you do not specify a date is that oracle select all the events corresponding to the given activity, then it sorts the whole index by date to know which are the last 25 and return to you. The java server contribution is low as it is just wrapping records into json. I have noticed if you specify a date it helps a lot (I searched from 1st January 2021), maybe you could try this? I also found that the way we store the text of the events (in CLOBs) is slowing down the select, which is known drawback of using them. There might be a way to not include them in the initial select but this would then slow down queries with large datasets and increase the load of the server, so we need more time to investigate .

I hope the date trick will help you.

Thanks for the investigation and hint, yes specifying a from date does help a lot.