fix: Improve indexing of the `request_queue_records` table for `SqlRequestQueueClient` by Mantisus · Pull Request #1527 · apify/crawlee-python

@Mantisus

@Mantisus

@Mantisus

@ericvg97, thank you for your analysis and for finding this. Is it possible for you to check the new index in your environment and confirm that the queue is working correctly with it?

@ericvg97

@ericvg97, thank you for your analysis and for finding this. Is it possible for you to check the new index in your environment and confirm that the queue is working correctly with it?

It doesn't work in my environment. The query I shared was incorrect, the actual query crawlee is doing is
SELECT * FROM request_queue_records
WHERE request_queue_id = 'QXxcAytWcFELg1yFR'
AND is_handled IS false
AND (time_blocked_until IS NULL OR time_blocked_until < now())
ORDER BY sequence_number ASC LIMIT 10 FOR UPDATE SKIP LOCKED

Notice the condition "is_handled IS false" vs "is_handled = false" that I shared before. When doing the first, the planner doesn't use the index you are proposing. When doing the second, the planner does use the index. I didn't know this could happen in postgres TBH. Can you check this also happens in your environmnet

@ericvg97

Changing "postgresql_where=text('is_handled = false')", to "postgresql_where=text('is_handled is false') fixes it in my environment.

@Mantisus

@Mantisus

Changing "postgresql_where=text('is_handled = false')", to "postgresql_where=text('is_handled is false') fixes it in my environment.

@ericvg97 Thank you, I double-checked the query and updated the index. It works the same way in my environment.

I didn't know this could happen in postgres TBH.

Yes, I think it's because it's a partition index. Therefore, it is important that the expression fully matches the index.

Thanks again for double-checking this and helping to better identify the error. I really appreciate it.

janbuchar

vdusek