Tiny Search API service build on top of Fast API allowing to query data from Elastisearch with a rate-limiter
Quick start
Simply start the app via docker compose:
You can go to http://localhost:8000/docs/ and try the API from there
to import the data, you have to install dependencies locally:
poetry install python3 import.py
Design
In this section, I will outline thoughts and design decisions which were taken during implementation
Elasticsearch
Field types
I started with designing mapping, looking at the data I identified:`
id,username- these fields are unique identifiers, so they are likely to be selected by exact queries, do not have stemming requirementspost,user_bio- these fields are text fields, so we need full-text search features such as stemming, stop words filteringpost_like_count,post_diamond_count- integer values, so we can have range queries for them
Analysers
I decided to proceed with standard tokenizer, it's recommended one and demonstrated good results, comparing to default
settings of other tokenizers.
Filters included:
stop- removes stop words, by default uses English dictionarylowercase- sets all characters to be lowercase, as register doesn't matter for searchporter_stem- English language stemmer
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"post": {
"type": "text"
},
"username": {
"type": "keyword"
},
"user_bio": {
"type": "text"
},
"post_like_count": {
"type": "integer"
},
"post_diamond_count": {
"type": "integer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"stop",
"lowercase",
"porter_stem"
]
}
}
}
}
}What does this mapping support
- Stemming in English
- Multicolumn search
- Index optimizations (do not stem
id/username) - Typo support (couldn't make it work with multicolumn queries)
Python
I have chosen FastAPI as a modern framework with async support, Swagger generation and etc.
Elasticsearch adapter
Sadly, popular Elasticsearch libraries did not support all required features (async, PIT, scroll), so I decided to
write src/es_helper.py for this to save a bit of time.
Point in time API
In order to provide consistent view for API client while paginating, I decided to utilize Point in time API usage, which allows a client to scroll data in a state they were at the moment client initialized first connection, even if they were updated. It is implemented as a query parameter, and return after each request
Pydantic
In order not to have json/dict hell, all data passed within the project, are wrapped into Pydantic models, including ElasticSearch responses.
Dockerfile
Dockerfile utilizes multistage build with caching support, so final image is slimmer and recurrent builds are faster
Rate limiting
Rate limiting is implemented by slowapi library, which is based on a popular Flask implementation. It uses Redis
backend in order to allow to scale python containers without loosing global rate-limiting.
Documentation
Swagger is available at http://localhost:8000/docs