To be able to work with data in a ML Project, they first need to be loaded. Data can be loaded from a CSV file, XLS file, database, data set or a temporary table. There two ways how to read data in BellaDati ML Studio - row by row or as an stream. 

    Reading Row by row

    When reading data row by row, the content of the cycle (closure) is executed for each row of of the file. For each row (cycle), these variables are set:

    • row
    • values[n] - returns value in n-th column for current row.
    • columns[n] - returns name (header) of n-th column.
    • index - returns number of the row.

    Reading from CSV File

    Function readCSVFile() is used for loading data from a CSV file. The function is defined like this:

    Parameters

    Parameters file and separator are mandatory, parameters escape, limit and closure are optional.

    • file - defines the name of file which should be read. This file needs to be uploaded to the project.
    • separator - defines the separator between values. Can be comma, semicolon etc.
    • escape - defines character which is used of escaping of text.
    • limit - defines the limit of rows which will be loaded.
    • closure - closure is a block of code that is executed for each line of the file. The closure has to be written either as a parameter or as a body of the function in curly braces. Therefore it is optional as parameter but mandatory in general. See Groovy closures for more information.

    Sample usage

    This code will print the row index and value of second column for first 10 rows of the file to the console. After finishing the loop it will display the total number of iterations, in this case 10.

    Reading from XLS File

    Function readXLSFile() is used for loading data from XLS file. The function is defined like this:

    Parameters

    Parameters file is mandatory, parameters limit and closure are optional.

    • file - defines the name of file which should be read. This file needs to be uploaded to the project.
    • limit - defines the limit of rows which will be loaded.
    • closure - closure is a block of code that is executed for each line of the file. The closure has to be written either as a parameter or as a body of the function in curly braces. Therefore it is optional as parameter but mandatory in general. See Groovy closures for more information.

    Sample usage

    This code will print the row index and value of second column for first 5 rows of the file to the console. After finishing the loop it will display the total number of iterations, in this case 5.

    Reading from SQL Database

    Function readSQL is used for loading data from an SQL database. This function uses SQL connections which were previously defined in BellaDati. See Data Sources for more information.

    The function is defined like this:

    Parameters

    Parameters id and sql are mandatory, parameters limit and closure are optional.

    • id - defines the id of the data source. It can be set by the Code builder.
    • sql - defines the sql query.
    • limit - defines the limit of rows which will be loaded.
    • closure - closure is a block of code that is executed for each line of the file. The closure has to be written either as a parameter or as a body of the function in curly braces. Therefore it is optional as parameter but mandatory in general. See Groovy closures for more information.

    Sample usage

    This code will use database connection with ID 1 and it will load all columns for 10 rows from table customers.

    Reading from Data Set

    Function readDataset() is used for loading data from a data set. The function is defined like this:

    Parameters

    Parameter id is mandatory, parameters limit and closure are optional.

    • id - defines the id of the data set. It can be set by the Code builder or it can be found in the URL of the data set.
    • limit - defines the limit of rows which will be loaded.
    • closure - closure is a block of code that is executed for each line of the file. The closure has to be written either as a parameter or as a body of the function in curly braces. Therefore it is optional as parameter but mandatory in general. See Groovy closures for more information.

    Sample usage

    This code will use data set with ID 10 and it will load name and value of first column for 5 rows.

    Reading from Table

    Function table() can be used for loading data from a temporary table which was previously stored in the project. The table is available for current session only. The function is defined like this:

    Parameters

    Parameters id is mandatory, parameter closure is optional.

    • id - defines the id (name) of the table. It is set when creating the table.
    • closure - closure is a block of code that is executed for each line of the file. The closure has to be written either as a parameter or as a body of the function in curly braces. Therefore it is optional as parameter but mandatory in general. See Groovy closures for more information.

    Sample usage

    This code will print value of first column for each row of the table to the console.

    Reading as a Stream

    When reading data as a stream, the rows are not iterated, but instead sent all at once as na input stream. In most cases, it is better to use reading row by row. We suggest using streams only when necessary, for example with some Python scripts.

    Streaming from CSV file

    Function streamCSVFile() is used for streaming data from a CSV file. The function is defined like this:

    Parameters

    Parameters file and separator are mandatory, parameters escape and limit are optional.

    • file - defines the name of file which should be read. This file needs to be uploaded to the project.
    • separator - defines the separator between values. Can be comma, semicolon etc.
    • escape - defines character which is used of escaping of text.
    • limit - defines the limit of rows which will be loaded.

    Sample usage

    This code will print the all values from the file, separated by a comma.

    Streaming from XLS file

    Function streamXLSFile() is used for streaming data from a XLS file. The function is defined like this:

    Parameters

    Parameters file is mandatory, parameter limit is optional.

    • file - defines the name of file which should be read. This file needs to be uploaded to the project.
    • limit - defines the limit of rows which will be loaded.

    Sample usage

    This code will print first five rows from the file, with values separated by a comma.

    Streaming from SQL database

    Function streamXLSFile() is used for streaming data from a XLS file. The function is defined like this:

    Parameters

    Parameters id id mandatory.

    • id - defines the id of the data set. It can be set by the Code builder or it can be found in the URL of the data set.

    Sample usage

    This code will print first five rows from the table, with values separated by a comma.

    Streaming from Data set

    Function streamDataset() is used for streaming data from a data set. The function is defined like this:

    Parameters

    Parameters id is mandatory.

    • id - defines the id of the data source. It can be set by the Code builder.

    Sample usage

    This code will print all rows from the data set.

    Streaming from Table

    Function streamTable() can be used for streaming data from a temporary table which was previously stored in the project. The table is available for current session only. The function is defined like this:

    Parameters

    Parameters id is mandatory.

    • id - defines the id (name) of the table. It is set when creating the table.

    Sample usage

    This code will print all rows from the table.

    • No labels