Python quick start

Python client guide

Learn how to create a Python application that connects to the Memgraph database and executes simple queries.

Both Neo4j Python client and GQLAlchemy can be used to connect to Memgraph with Python. This guide will show how to use Neo4j Python client and for more information on GQLAlchemy, check out its documentation.

Memgraph and Neo4j both support Bolt protocol and Cypher queries, which means that same client can be used to connect to both databases. This is very convenient if switching between the two databases is needed. This guide is based on the client version v5 and above. Some examples may not be supported in older versions of the client.

Quickstart

The following guide will demonstrate how to start Memgraph, connect to Memgraph, seed the database with data, and run simple read and write queries.

Necessary prerequisites that should be installed in your local environment are:

Python client API usage and examples

After a brief Quickstart guide, this section will go into more detail on how to use the Python client API, explain code snippets, and provide more examples. Feel free to skip to the section that interests you the most.

Database connection

Once the database is running and the client is installed or available in Python, you should be able to connect to the database in the following ways:

Connect without authentication (default)

By default, the Memgraph database is running without authentication, which means that you can connect to the database without providing any credentials (username and password). To connect to Memgraph, create a client object with the appropriate URI and credentials arguments. If you’re running Memgraph locally, the URI should be similar to , and if you are running Memgraph on a remote server, replace with the appropriate IP address. If you ran Memgraph on a port different than 7687, do not forget to update that in the URI too.

By default, you can set username and password in the argument as empty strings. This means that you are connecting without authentication.

To connect a Go to the Memgraph database without authentication, you can use the following snippet:

Connect with authentication

In order to set up authentication in Memgraph, you need to create a user with a and . In Memgraph you can set a username and password by executing the following query:

Then, you can connect to the database with the following snippet:

Notice that the first argument in auth is the username and the second is the password.

Connect with self-signed certificate

By adding before the host address in the URI argument, the connection will be encrypted with a self-signed certificate. You will need to use the certificate if you want to connect to Memgraph Cloud instance.

When creating a project on Memgraph Cloud, you have a username and you create a password. After the project is started, Memgraph has assigned host address. Copy those values and paste them in the , and fields in the code snippet below to connect successfully to the database.

Connect with Single sign-on (SSO)

To use SSO with the Python driver, you need to get the access token and, optionally, the ID token yourself. One simple way to do it is to use the authlib library and follow the official tutorial.

To connect to the Memgraph database you have to use the class with the parameter set as , or depending on which scheme you are using, parameter set to contain the access token and optionally the ID token in the format shown in the example below. Finally, set and parameters to .

Below are examples of connecting to the Memgraph database using OIDC SSO with the custom auth scheme.

With both access and ID tokens:

With access token only (when username is configured to use access token fields):

Impersonate a user

Once logged in, a user with the correct permissions can impersonate a different users during a session. This means that any query executing during that session will be executed as if the impersonated user executed it. The target user can be defined during session creation as in the following snippet:

Query the database

After connecting your client to Memgraph, you can start running queries. The simplest way to run queries is by using the method which has an automatic transaction management.

Run a create query

The following query will create a node inside the database:

Due to the nature of the method, transactions are handled automatically.

Run a read query

The following query will read the previously created node from the database:

In this query, each record contains a node object behind the variable .

Running queries with property map

Using this approach, the queries will not contain hard-coded values, they can be more dynamic.

Process the results

Processing results from the database is important since we do not want to lose any data during conversions and properly read results and serve them back to the Python application. Python is dynamically typed language, which means that the type of the variable is determined at runtime, which is why we need to be careful when processing results.

Process the Node result

In order to process the result you need to read them first, you can do that by running the following query:

The method returns a tuple of three elements: , and . The records field contains all the records returned by the query. To process the results, you can iterate over the records and access the fields you need.

For example:

In the example above, each returned record, you can access the field, which is a returned from a query.

You can access individual properties of the using one of the following options:

Keep in mind that the property returns the internal ID of the node, which is not the same as the user-defined ID, and it should not be used for any application-level logic.

Process the Relationship result

You can also receive a relationship from a query. For example:

You can access the properties just like the Node properties. Keep in mind that is the internal ID of the relationship, which is not the same as the user-defined ID, and it should not be used for any application-level logic.

Process the Path result

You can receive path from the database, using the following construct:

Path will contain Nodes and Relationships, that can be accessed in the same way as in the previous examples.

Transaction management

Transaction is a unit of work that is executed on the database, it could be some basic read, write or complex set of steps in form of series of queries. There can be multiple ways to mange transaction, but usually, they are managed automatically by the client or manually by the explicit code steps. Transaction management defines how to handle the transaction, when to commit, rollback, or terminate it.

On the driver side, if a transaction fails because of a transient error, the transaction is retried automatically. The transient error will occur during write conflicts or network failures. The driver will retry the transaction function with an exponentially increasing delay.

Automatic transaction management

To query the database, run the procedure on the client object with the Cypher query argument, its parameters and the database name. A good practice is to provide a database name as it avoids unnecessary requests to the server and, in that way, improves performance. It’s also recommended to provide parameters to protect your queries from Cypher injections. In v2.10, Memgraph added the multi-tenant support to the Enterprise Edition to manage multiple isolated databases within a single instance. If you’re working with more than one tenant, be sure to provide the correct database name.

The procedure automatically creates a transaction that can include multiple Cypher statements as a single query. If the transaction fails, the procedure will automatically rerun it.

Bolt protocol specifies additional metadata that can be sent along with the requested results. Metadata can be divided into two groups: query statistics and notifications. The query statistics metadata provides query counters that indicate the changes that the write query triggered on the server.

Read queries can be run in the same way, but they will not return any query counters because the read queries are not making any changes on the server. The summary will provide other information, such as the query that was run.

On update and delete queries it is again useful to get the query counters to be sure how your change affected the database.

Manual transaction

When using the procedure, you don’t have complete control over the transaction lifecycle because it creates a transaction that can only be committed or rolled back on failure. If you want more control over the query execution, use managed transactions to run the queries.

To run a managed transaction, you first need to create a session:

Don’t forget to close the session after usage with or Python statement, which will automatically close a session, like in the following code snippet:

Sessions are not thread safe, so make sure that each thread creates its own sessions. On the other hand, the main client object can be shared across threads. One transaction can contain multiple queries, and they will either all be executed or none will. This means that you don’t have to worry about rolling back the part of the changes that got executed in that transaction and finding the ones that didn’t. Having more than one query in a transaction is useful when the queries work on a similar database task, usually creating graph database objects.

With sessions, you can run:

Managed transactions - run multiple queries with automatic retries without the possibility to roll back a query within a transaction.
Explicit transactions - get complete control over transactions by explicitly controlling the end of a transaction that won’t be automatically retried.
Implicit transactions - run a Cypher query that won’t be automatically retried.

Managed transactions

To create a managed transaction, use procedure for read queries and procedure for write queries.

If an exception is raised, the transaction will be automatically rolled back. If there is a return statement within the transaction function, the transaction will be committed. Sometimes, if a transaction fails, it is automatically rerun. That means that you can’t be sure how many times a transaction function will be executed, so you have to be careful that it produces the same effect when run multiple times. The queries inside the transaction function will always run only once. A session can contain multiple transactions, but only one transaction is active at any given time within the session. To maintain multiple concurrent transactions, use multiple concurrent sessions.

Explicit transactions

With explicit transactions, you can get complete control over transactions. To begin a transaction, run procedure and to run a transaction, use procedure. Explicit transactions offer the possibility of explicitly controlling the end of a transaction with , or methods.

Use explicit transaction if you need to distribute Cypher execution across multiple functions for the same transaction or if you need to run multiple queries within a single transactions without automatic retries.

The following example shows how to explicitly control the transaction of changing account balances based on a token transfer:

In the above example, if John’s account balance is changed to a number less than 10, you will be warned that he doesn’t have enough tokens and the transfer won’t happen (it will be rolled back).

Implicit transactions

Implicit or auto-commit transactions are the simplest way to run a Cypher query since they won’t be automatically retried as with procedure or managed transactions. With implicit transactions, you don’t have the same control of transaction as with explicit transactions, so they are mostly used for quick prototyping.

To run an implicit transaction, use the method:

The method is most commonly used for clause to prevent timeout errors due to the size of the transaction.

Concurrent transactions

It is possible to run concurrent transactions with Python’s client by leveraging threads or processes. Using threads could cause your code to be partially locked because of Global interpreter lock (GIL), resulting in slow execution. Hence, it is always better to run your workloads in separate processes, where each process will have its own interpreter and memory space, avoiding GIL issues. To leverage multiple concurrent processes, you can use Python’s module.

Here is an example of how to run concurrent transactions with module:

Each process will execute a query that contains a chunk of nodes. You can control the number of concurrent transactions and processes by specifying the number of processes in the constructor. Each transaction will be a separate connection to the database and will be executed in parallel on a different process. This number should align with your system’s capabilities. If you have 32 cores, you can run 32 processes, but if you have only four cores, it would be more efficient to run only four processes.

Running transactions in parallel can be very efficient, but it can lead to conflicting transactions. The typical scenario is when two transactions try to update the same node simultaneously, or add a relationship to the same node. It is a write-write conflict between transactions. In this case, the first transaction will pass, and one of the transactions will fail, and you will need to handle the error and retry the transaction.

If you are running transactions in parallel, you should avoid implicit transactions because you can’t control the execution process, and there are no retries.

You can use the managed transactions or explicit transactions to handle the conflicting transactions. Explicit API provides full control of the process, and it is recommended for production use and handling conflicts.

Here is an example of how to handle conflicting transactions in explicit API:

In the example above, we are using the method to start a transaction, and then we are running the query inside the transaction. If the transaction fails with a the transaction will be retried using the retry strategy. Otherwise, another error occurred, and the transaction should be aborted.

The essential aspects of the retry strategy are the following arguments:

- the maximum number of retries before the transaction will be aborted with a timeout error. This number should be set based on the expected number of conflicts and the time that the transaction will be running. If the commit takes a second and you have 120 retries, the transaction will be running for 2 minutes + waiting time.
- the time that the transaction will wait before the first retry. The first time a transaction fails with conflict, it will wait for seconds before the first retry.
- the factor by which the retry delay will be multiplied after each retry. This number should not be too high because it can lead to long-running transactions; if commits are small and fast, avoid exponential backoff.
- the factor by which the retry delay will be randomized after each retry. If there are a lot of transactions running in parallel, it is recommended to use to avoid the thundering herd problem.

If you use managed transactions, you can configure the retry scenario to use the configuration. Here is an example:

In this case, the will be retried using the retry strategy that is configured in the object. You do not need to handle the retry logic; it is handled by the driver in the background.

The essential configuration arguments are the following:

- the maximum time the transaction will be retried; after that, it will be aborted with a timeout error.
- the time that the transaction will wait before the first retry.
- the factor by which the retry delay will be multiplied after each retry.
- the factor by which the retry delay will be randomized after each retry.

PHP Rust