Document Oriented Query Partitioning Technique Implementation by pfu3tz · Pull Request #319

Document Oriented Query Partitioning Technique Implementation by pfu3tz · Pull Request #319 · sqlancer/sqlancer

added 23 commits

March 22, 2021 23:03

Adding support for mongodb database version 4.4. With this commit random collections are created with optional validation options which can enforce the schema on inserts. Additionally the execution is logged in mongo shell code.

Random Document insertion with the option to follow the schema validation or set random types is now supported. Also an option has been added to randomly set an insert to null. Validation has to be turned off for these options though.

Additionally, random indexes are created over random columns, either ascending or descending and when there are multiple indexes added for a single collection it creates a composite index out of the other ones.

The project stage is similar to the SELECT in SQL where we can specify which columns are returned. The Lookup stage is similar to the LEFT OUTER JOIN of SQL, for this we have to specifically make new random columns that specify the join column.

The query ast has multiple key difference to the SQL version. The core is the MongoDBSelect class that holds projection and lookup lists. The filter is then a tree similar to the SQL version.

In order to execute and log, which are different because one are API calls and the other MongoDB shell commands there exist two visitors, the ToQueryVisitor and the ToLogVisitor.

Computed functions allows to perform arithmetic operations in queries. It is part of the projection pipeline stage, where a new field is added that holds the result. This update adds support for random computed fields with functions such as add, multiply, pow, sqrt and more.

…on in MongoDB

Similar to binary comparison regular expression is a new Leaf node that can be added in the tree. To generate the pattern the existing random string generator is used.

… regular expression

Changes to the MongoDBComparatorHelper allow for expected errors to be ignored if they are thrown in a query execution. This feature is mainly necessary for all the illformed regular expressions that the randomized string produces.

Due to the randomly generated computed field tree that has random types as leaf nodes, there are exceptions that we ignore such as wrong type, value not positive for sqrt for example and similar issues.

The options are flags that can be set in a string such as "im". The expression generator now generates a random valid options for the regular expression operator in MongoDB.

… documents in MongoDB

This new variation introduces a variant where we execute the pipeline query once with a count as a last pipeline stage and compare the output to the result set without the count stage.

Until now not has been simulated by nor(id exists, bool_expression) and it has led to believe that the underlying structure works fine. Now after the rework the not gets evaluated and every logical operation inverted by the new NegateVisitor. At the lowest stage type problems remain.

This change reveals some issues with how we form queries for mongodb. The core of it seems to be that neither greater equal nor smaller include null values and it is rather hard to define a query that finds the ones that are null. The query that projects includes all.

…serting data

For four data types of ArangoDB integer, double, string and boolean, this commit supports creating collection, keeping track of the schema, creating tables, collections and inserting randomized data.

ArangoDB now randomly generates queries that support binary comparison, binary logic operations such as or/and and the unary prefix not. Also everything is logged and the results are checked with the new ComparatorHelper.

A new generator and query has been added to support the new functionality.

Similar to MongoDB, in ArangoDB variables for computed values are created and with a special keyword LET calculated.

This oracle generates a random query, executes it and if the result set is not empty, chooses a document at random and removes it from the collection. The query is executed again to check if the document is really removed and at the end a new document is generated and inserted to make sure that the dataset is not decreasing in size.

When using cosmos make sure to set the configuration string in CosmosProvider.

See Pull request sqlancer#319 on github.com/sqlancer for more information.