Import data from CSV files

If you have data in CSV format, there are two ways of import into a running Memgraph instance: using the Cypher clause or directly through Memgraph Lab using a CSV file import method.

In this section of documentation, we’ll dive deeper into the recommended method for CSV import, using the LOAD CSV Cypher clause. If you’re interested in exploring the CSV file import method, visit the import example under the CSV file import documentation.

Cypher clause

The clause reads row by row from a CSV file, binds the contents of the parsed row to the specified variable, and populates the database if it is empty, or appends new data to an existing dataset. Memgraph supports the Excel CSV dialect, as it’s the most common one.

clause syntax

The syntax of the clause is:

is a string of the location of the CSV file.
Without a URL protocol, it refers to a file path. There are no restrictions on where in your file system the file can be located, as long as the path is valid (i.e., the file exists). CSV files can also be imported from the S3. In that case you need to set AWS authentication config options.

If you are using Docker to run Memgraph, you will need to copy the files from your local directory into Docker container where Memgraph can access them.
If using , , or the CSV file will be fetched over the network.
Represents an optional configuration map through which you can specify configuration options: , , and .
- : The region in which your S3 service is being located
- : Access key used to connect to S3 service
- : Secret key used to connect S3 service
- >: Optional configuration parameter. Can be used to set the URL of the S3 compatible storage.
flag specifies whether the CSV file has a header, in which case it will be parsed as a map, or it doesn’t have a header, in which case it will be parsed as a list.

If the option is set, the very first line in the file will be parsed as the header, and any remaining rows will be parsed as regular rows. The value bound to the row variable will be a map of the form:

If the option is set, then each row is parsed as a list of values. The contents of the row can be accessed using the list index syntax. Note that in this mode, there are no restrictions on the number of values a row contains. This isn’t recommended, as the user must manually handle the varying number of values in a row.

flag specifies whether rows containing errors should be ignored or not. If it’s set, the parser attempts to return the first valid row from the CSV file. If it isn’t set, an exception will be thrown on the first invalid row encountered.
option enables the user to specify the CSV delimiter character. If it isn’t set, the default delimiter character is assumed.
option enables the user to specify the CSV quote character. If it isn’t set, the default quote character is assumed.
option enables you to specify a sequence of characters that will be parsed as null. By default, all empty columns in Memgraph are treated as empty strings, so if this option is not used, no values will be treated as null.
is a symbolic name representing the variable to which the contents of the parsed row will be bound to, enabling access to the row contents later in the query. The variable doesn’t have to be used in any subsequent clause.

clause specificities

When using the clause please keep in mind:

The parser parses the values as strings so it’s up to the user to convert the parsed row values to the appropriate type. This can be done using the built-in conversion functions such as , , etc. Consult the documentation on the available conversion functions.

If all values are indeed strings and the file has a header, you can import data using the following query:
The clause is not a standalone clause, meaning a valid query must contain at least one more clause, for example:

In this regard, the following query will throw an exception:

Adding a or clause before LOAD CSV allows you to match certain entities in the graph before running LOAD CSV, optimizing the process as matched entities do not need to be searched for every row in the CSV file.

But, the or clause can be used prior the clause only if the clause returns only one row. Returning multiple rows before calling the clause will cause a Memgraph runtime error.
The clause can be used at most once per query, so queries like the one below will throw an exception:

Load from remote sources

The clause supports loading files from HTTP/HTTPS/FTP URLs and S3 buckets.

When loading from HTTP, HTTPS, or FTP URLs, the file will be downloaded to the directory before being imported:

You can also use FTP URLs:

To load files from S3, you can provide AWS credentials in three ways:

1. Using WITH CONFIG clause (Recommended for query-specific credentials)

For S3-compatible services (like MinIO), you can also specify the endpoint URL:

2. Using environment variables

Set environment variables before starting Memgraph:

Then you can load files without specifying credentials in the query:

3. Using database settings

Set database-level AWS credentials:

Then load files without credentials in the query:

Credential precedence: If credentials are provided in multiple ways, the order of precedence is:

WITH CONFIG clause in the query (highest priority)
Environment variables
Database settings (lowest priority)

When loading files from remote locations (HTTP, FTP, or S3), the file is first downloaded to before being loaded into memory. Ensure you have sufficient disk space for large files. The download can be interrupted using without waiting for the full download to complete. To find the transaction ID, use .

Use run-time configuration to specify the connection timeout when establishing a connection to the remote server.

Increase import speed

The clause will create relationships much faster and consequently speed up data import if you create indexes on nodes or node properties once you import them:

If the LOAD CSV clause is merging data instead of creating it, create indexes before running the LOAD CSV clause.

You can also speed up import if you switch Memgraph to analytical storage mode. In the analytical storage mode there are no ACID guarantees besides manually created snapshots but it does increase the import speed up to 6 times with 6 times less memory consumption. After import you can switch the storage mode back to transactional and enable ACID guarantees.

You can switch between modes within the session using the following query:

If you use mode and have nodes and relationships stored in separate CSV files, you can run multiple concurrent queries to import data even faster. In order to achieve the best import performance, split your nodes and relationships files into smaller files and run multiple queries in parallel. The key is to run all queries, which create nodes first. After that, run all queries that create relationships.

The clause can handle CSVs that are compressed with or . This can speed up the time it takes to fetch and/or load the file.

If you are using on-disk storage mode, consider using Edge import mode to get the best import performance.

Import files WITH and NO HEADER

The goal of this example is to import two CSV files.

One file contains data we will use to create nodes labeled , and the other file will be used to connect those nodes with the relationship.

There are also two variations of the files: files with a header and files without a header.

Import data from a single CSV file containing both nodes and relationships

This example will show how to use set of Cypher queries to import data into Memgraph from a single CSV file containing both nodes and relationships. This file has header row.

The file is structured to contain both people and their relationships:

: Indicates the type of record. It has two possible values: for individual records and for relationships.
: Unique identifier for a person. This is filled only for records.
: Name of the person. This is also filled only for records.
and : Used to denote relationships. For records, is the ID of the person from whom the friendship originates, and is the ID of the friend.

This structure allows representation of both nodes (persons) and their relationships (friendships) within a single CSV file.

Import multiple CSV files with distinct graph objects

In this example, the data is split across four files, each file contains nodes of a single label or relationships of a single type. All files have a header.

Best practices PARQUET