Binary database using AVL tree data structure (Balanced-Binary Tree)
An implementation of the AVL tree to store data. I'm also include basic transaction handling with commit and rollback methods using in-memory snapshots and saving it to a .pkl (binary file)
The Only necessary file
databasestructure.py
How to Use
Note: see an example with main.py file
Import databasestructure
import pickle from databasestructure import *
Create a database object
Create a table
Example about how to create a table named users, with columns id, name and age:
db.create_table('users', ['id', 'name', 'age'], key_index=0)
Note: key_index is wich column index has the id column, in this case id is on the first column, that's why key_index = 0.
Insert data to table
db.insert('users', [None, 'Alice', 30]) db.commit('users')
Note: The id column is the index, you use None to use the automatic autoincrement feature
Select (search) Data
Select data based on the index
This will look on the id column.
# Select the user with id=0 headers, user = db.select('users', 0) print(headers) print(user)
Select data based on a value of a column
# Select the user with age > 25 headers, user = db.select_by_column_value("users", "age", '>', 25) print(headers) print(user)
Valid operators:
>>=<<===!=
Select data whole table
# Select table headers, table = db.select_all('users') print(headers) print(table)
Update Data
Update a row
# You need the index of the row to be updated user = db.update('users', 0, [0, 'Alice', 31]) print(user)
Update a Table
# Add the column email to users table db.add_column('users', 'email') print(db.select_all('users'))
Delete Data
# You need the index of the row to be deleted db.delete('users', 1)
Joining Operation
Let's say that we have these tables:
- Table
users:
| id | name | age |
|---|---|---|
| 0 | Charlie | 28 |
| 1 | Alice | 30 |
- Table
orders:
| order_id | user_id | product |
|---|---|---|
| 101 | 0 | Laptop |
| 102 | 1 | Smartphone |
| 103 | 0 | Phone |
If we want to join tables, we must do it based on the key_index value.
So, to join table users and table orders by user_id column:
# Perform the join headers, joined_data = db.join('users', 'orders', key_index1=0, key_index2=1) # Print the joined data print("Joined Data:\n") print(headers) for data in joined_data: print(data)
key_index1 = 0: This means that the join key for the first table (users) is in the first column (id).key_index2 = 1: This means that the join key for the second table (orders) is in the second column (user_id).
Note: Check test_joining.py file
- Result will be:
| id | name | age | order_id | user_id | product |
|---|---|---|---|---|---|
| 0 | Charlie | 28 | 101 | 0 | Laptop |
| 1 | Alice | 30 | 102 | 1 | Smartphone |
| 0 | Charlie | 28 | 103 | 0 | Phone |
Transactions
Rollback
db.insert('users', [None, 'Charlie', 28]) print("Before rollback user with latest id:", db.select_all('users')) db.rollback('users') print("After rollback user with latest:", db.select_all('users'))
Commit
db.insert('users', [None, 'Charlie', 28]) db.commit('users') print("After commit user with latest id:", db.select_all('users'))
Save data to a .pkl file
# Save the database in binary format db.save('./avl_database.pkl')
Load data from a .pkl file
# Load the database in binary format db = CustomDatabase.load('./avl_database.pkl')
Load data from a .csv file
You must have a .csv with the same structure that the table you want to upload. Let's say table users, you will need a .csv file with a structure like the example:
| id | name | age |
|---|---|---|
| 1 | John | 25 |
| 2 | Jane | 30 |
| 3 | Bob | 22 |
db.load_from_csv("users", "./data.csv")
Added Multi-threading operations
Please check: main_multi_threading.py file
Test
Inside of the dummy_data folder, there a generator of dummy data and main_test.py that will execute basic operations to test the database performance:
The results may vary due to CPU capabilities, and if you want to use multi-thread. For test support i used only 1 thread and a 11th Gen Intel(R) Core(TM) i3-1115G4 @ 3.00GHz CPU (very normal CPU).
Used 10,000 dummy users and 20,000 orders.
| Test | AVL DB (seconds) | MySQL (seconds) |
|---|---|---|
| Creation of both tables | 0.0 | - |
| Loaded 10,000 users from .csv to memory RAM | 0.12057 | - |
| Loaded 20,000 orders from .csv to memory RAM | 0.29318 | - |
| Write 30,000 rows from RAM to .pkl file | 0.09558 | - |
| Load 30,000 rows from .pkl to memory RAM | 0.10949 | - |
| From 10,000 users select those older than 25 yeard old | 0.00709 | - |
Joining users and orders tables, total 17,358 rows |
0.16644 | - |