Data Format and Data Process——VisActor/VMind tutorial documents

Data Format and Data Processing

In this tutorial, we will introduce in detail the data formats supported by VMind, and how to use the data processing functions in VMind to obtain these data.

Dataset

In VMind, most functions require inputting a dataset.

In VMind's definition, a dataset is a type of tabular data, its structure is the same as the flattened data in VChart, which is an array composed of multiple data. Take the product sales dataset as an example, the following shows an example of a dataset:

// Product sales dataset
[
{
"Product name": "Coke",
"region": "south",
"Sales": 2350
},
{
"Product name": "Coke",
"region": "east",
"Sales": 1027
},
{
"Product name": "Coke",
"region": "west",
"Sales": 1027
},
{
"Product name": "Coke",
"region": "north",
"Sales": 1027
},
{
"Product name": "Sprite",
"region": "south",
"Sales": 215
},
{
"Product name": "Sprite",
"region": "east",
"Sales": 654
},
{
"Product name": "Sprite",
"region": "west",
"Sales": 159
},
{
"Product name": "Sprite",
"region": "north",
"Sales": 28
},
{
"Product name": "Fanta",
"region": "south",
"Sales": 345
},
{
"Product name": "Fanta",
"region": "east",
"Sales": 654
},
{
"Product name": "Fanta",
"region": "west",
"Sales": 2100
},
{
"Product name": "Fanta",
"region": "north",
"Sales": 1679
},
{
"Product name": "Mirinda",
"region": "south",
"Sales": 1476
},
{
"Product name": "Mirinda",
"region": "east",
"Sales": 830
},
{
"Product name": "Mirinda",
"region": "west",
"Sales": 532
},
{
"Product name": "Mirinda",
"region": "north",
"Sales": 498
}
]

⚠️Note: In order for the tasks such as chart generation and data aggregation in VMind to be better executed, we recommend that you use a semantically meaningful name for each field in the data (such as Product name, region, Sales, etc.). We do not recommend using field names without any semantics (column1, column2 or random strings, etc.). The large language model will depend on the semantic information contained in the field name to select fields during chart generation and data aggregation

Field Information fieldInfo

In VMind, you need to use the fieldInfo object to describe the field information in the dataset. FieldInfo describes the name, type, field description, etc. of each field in the data. This information will be passed to the large language model for tasks such as chart generation and data aggregation.

The following is the type definition of the fieldInfo object:

/** field information Of Data Table */
export interface FieldInfo {
  /** name of field */
  fieldName: string;
  /** field type, eg: time / category / numerical */
  type: DataType;
  /** field role */
  role: ROLE;
  /** alias of field */
  alias?: string;
  /** additional description of the field. This will help the model have a more comprehensive understanding of this field, improving the quality of chart generation. */
  description?: string;
}

For the dataset shown in the previous section, the corresponding fieldInfo is as follows:

[
{
"fieldName": "Product name",
"description": "Represents the name of the product, which is a string.",
"type": "string",
"role": "dimension"
},
{
"fieldName": "region",
"description": "Represents the region where the product is sold, which is a string.",
"type": "string",
"role": "dimension"
},
{
"fieldName": "Sales",
"description": "Represents the sales amount of the product, which is an integer.",
"type": "int",
"role": "measure"
}
]

⚠️Note: The large language model will rely on the field names and description in fieldInfo for chart generation and data aggregation. The description is not a mandatory item. FieldInfo can be generated by calling getFieldInfo or parseCSVData.

Data Processing Functions

CSV data is a universal and relatively simple file format that stores tabular data in plain text. JSON is a lightweight data exchange format that can be parsed and generated by various programming languages and is widely used in web applications. In this chapter, we will introduce how to use VMind's built-in data processing functions to convert CSV data into a JSON format dataset, and obtain fieldInfo, or directly obtain fieldInfo from JSON formatted data.

parseCSVData

The parseCSVData function in VMind can convert a csv string into a dataset structure and generate fieldInfo by extracting field information through rules. During the execution of the function, the large language model will not be requested. Taking the product sales dataset as an example, the following is an example of using the parseCSVData function:

import VMind from '@visactor/vmind'
const csv=`Product name,region,Sales
Coke,south,2350
Coke,east,1027
Coke,west,1027
Coke,north,1027
Sprite,south,215
Sprite,east,654
Sprite,west,159
Sprite,north,28
Fanta,south,345
Fanta,east,654
Fanta,west,2100
Fanta,north,1679
Mirinda,south,1476
Mirinda,east,830
Mirinda,west,532
Mirinda,north,498`
const vmind = new VMind(options)
const { fieldInfo, dataset } = vmind.parseCSVData(csv);

For the creation of the VMind instance and the detailed configuration in options, please refer to Creating VMind Instance

In this example, the returned dataset is the same as the product sales dataset in the previous chapter, and the returned fieldInfo is as follows:

[
{
"fieldName": "Product name",
"type": "string",
"role": "dimension"
},
{
"fieldName": "region",
"type": "string",
"role": "dimension"
},
{
"fieldName": "Sales",
"type": "int",
"role": "measure"
}
]

For the creation of the VMind instance and the detailed configuration in options, please refer to Creating VMind Instance

The dataset and fieldInfo can be directly used for chart generation and data aggregation in VMind.

Since this function does not pass the data to the large language model, it cannot obtain the field description in fieldInfo. You can also supplement it to get better chart generation results.

For more information on parseCSVData, refer to parseCSVData API.

getFieldInfo

The getFieldInfo function in VMind is used to parse JSON structured data to obtain its field information (fieldInfo). The function obtains fieldInfo based on rules parsing, and does not request the large language model during its execution. Here is an example of usage:

const dataset=[
{
"Product name": "Coke",
"region": "south",
"Sales": 2350
},
{
"Product name": "Coke",
"region": "east",
"Sales": 1027
},
{
"Product name": "Coke",
"region": "west",
"Sales": 1027
},
{
"Product name": "Coke",
"region": "north",
"Sales": 1027
},
...
]

const vmind = new VMind(options)
const fieldInfo = vmind.getFieldInfo(dataset);

For details on creating VMind instances and configurations in options, you can refer to Create VMind Instance

In this example, the returned fieldInfo is the same as the product sales dataset in the previous chapter.

For more information on getFieldInfo, refer to getFieldInfo API