In the last post, we implemented asimple NoSQL database in Python, concentrating on storing JSON documents.

Now, we're going to improve this basic database by adding features like indexing and complex query tools.

These upgrades will make your database run more smoothly and able to handle intricate queries and tasks.

If you prefer, you can follow along with the video version of this blog post:


Getting Ready for the Upgraded Database

Make sure you have the basic NoSQL database we made earlier.

If you didn't see the last tutorial, you can check it out Creating Your Own NoSQL Database in Python.

We're going to use that code as our starting point.


Are you tired of writing the same old Python code? Want to take your programming skills to the next level? Look no further! This book is the ultimate resource for beginners and experienced Python developers alike.

Get "Python's Magic Methods - Beyond __init__ and __str__"

Magic methods are not just syntactic sugar, they're powerful tools that can significantly improve the functionality and performance of your code. With this book, you'll learn how to use these tools correctly and unlock the full potential of Python.

Improving Search Speed with Indexing

Indexing is a key feature for making queries run quicker. When we index certain fields, it makes finding information much faster.

Let's modify the Database Class for Indexing:

import json
import os
import uuid


# Database class to store data in JSON format with indexing support
class JSONNoSQLDatabase:
    def __init__(self, filename='database.json', index_fields=None):
        self.filename = filename
        # Index fields are the fields that we want to index for faster search
        self.index_fields = index_fields if index_fields else []
        self.store = {}
        # Indexes are the dictionary of field values to document IDs
        self.indexes = {field: {} for field in self.index_fields}
        if os.path.exists(self.filename):
            with open(self.filename, 'r') as file:
                data = json.load(file)
                # Load the store and indexes from the JSON file
                self.store = data.get("store", {})
                # Convert the indexes from list to set
                self.indexes = {field: {key: set(value) for key, value in index.items()} for field, index in
                                data.get("indexes", {field: {} for field in self.index_fields}).items()}

    # Save the store and indexes to the JSON file
    def save(self):
        with open(self.filename, 'w') as file:
            # Convert the indexes from set to list
            data = {
                "store": self.store,
                "indexes": {field: {key: list(value) for key, value in index.items()} for field, index in
                            self.indexes.items()}
            }
            json.dump(data, file, indent=4)

    # Insert a document into the database
    def insert(self, document):
        doc_id = str(uuid.uuid4())
        self.store[doc_id] = document
        # Update the indexes with the new document
        self._update_indexes(doc_id, document)
        self.save()
        return doc_id

    # Update a document in the database
    def update(self, doc_id, document):
        if doc_id not in self.store:
            raise KeyError("Document ID does not exist.")
        self.store[doc_id] = document
        # Update the indexes with the updated document
        self._update_indexes(doc_id, document)
        self.save()

    # Update the indexes with the new document
    def _update_indexes(self, doc_id, document):
        for field in self.index_fields:
            if field in document:
                value = document[field]
                # Add the document ID to the index
                if value not in self.indexes[field]:
                    self.indexes[field][value] = set()
                self.indexes[field][value].add(doc_id)

    # Get a document from the database
    def get(self, doc_id):
        return self.store.get(doc_id, None)

    # Delete a document from the database
    def delete(self, doc_id):
        if doc_id in self.store:
            document = self.store[doc_id]
            # Remove the document from the indexes
            self._remove_from_indexes(doc_id, document)
            del self.store[doc_id]
            self.save()
        else:
            raise KeyError("Document ID does not exist.")

    # Remove the document from the indexes
    def _remove_from_indexes(self, doc_id, document):
        for field in self.index_fields:
            if field in document:
                value = document[field]
                # Remove the document ID from the index
                if value in self.indexes[field]:
                    self.indexes[field][value].discard(doc_id)
                    if not self.indexes[field][value]:
                        del self.indexes[field][value]

    # Query the database with a condition function
    def query(self, condition):
        results = {}
        for doc_id, document in self.store.items():
            if condition(document):
                results[doc_id] = document
        return results

    # Query the database by a field value
    def query_by_index(self, field, value):
        results = {}
        # Check if the field is indexed and the value exists in the index
        if field in self.indexes and value in self.indexes[field]:
            for doc_id in self.indexes[field][value]:
                results[doc_id] = self.store[doc_id]
        return results

Here's a breakdown of the code:

  • Import necessary libraries: json, os, and uuid.
  • Define the JSONNoSQLDatabase class with the following methods:
    • __init__: Initialize the database with a filename and a list of fields to be indexed. If the file exists, load the store and indexes from it.
    • save: Save the store and indexes to the JSON file.
    • insert: Add a new document to the database, update the indexes, and save the changes.
    • update: Modify an existing document in the database, update the indexes, and save the changes.
    • _update_indexes: Update the indexes when a document is inserted or updated.
    • get: Retrieve a document from the database by its ID.
    • delete: Remove a document from the database and update the indexes accordingly.
    • _remove_from_indexes: Remove a document from the indexes when it's deleted.
    • query: Search the database using a condition function.
    • query_by_index: Search the database by a specific field value using indexes.

The database stores documents in a dictionary called store, where keys are document IDs (generated using uuid) and values are the documents themselves.

The indexes dictionary is used to store indexed fields, allowing for faster searches based on those fields.

Example Usage with Indexing

Let's see a simple example of how to use the indexing feature:

from database import JSONNoSQLDatabase

db = JSONNoSQLDatabase(index_fields=["age"])

# Insert some documents
doc1_id = db.insert({"name": "Alice", "age": 30})
doc2_id = db.insert({"name": "Bob", "age": 24})

# Retrieve documents by index
results = db.query_by_index("age", 30)
print(results)  

# Output: {doc1_id: {"name": "Alice", "age": 30}}

Here's a step-by-step explanation of the code:

  • Import the JSONNoSQLDatabase class from the database module.
  • Create an instance of the JSONNoSQLDatabase class, specifying the "age" field as the indexed field.
  • Insert two documents into the database, each containing a "name" and an "age" field. The insert method returns the ID of the inserted document.
  • Query the database to find documents with a specific "age" value (in this case, 30) using the query_by_index method.
  • Print the results of the query. The output will be a dictionary containing the document ID and the document itself. In this example, the output will be {doc1_id: {"name": "Alice", "age": 30}}, where doc1_id is the actual ID of the document.

Adding Advanced Query Mechanisms

Advanced query mechanisms allow for more complex queries, such as range queries and composite queries.

Let's see the code changes for advanced query mechanisms:

Tagged in: