2.1. Data sources

2.1. Data sources
Prev	Up	Chapter 2. Data sources and queries	Home	Next

2.1. Data sources #

2.1.1. SQL based data sources
2.1.2. File based data sources
2.1.3. Application data based datasource
2.1.4. Application defined data sources

OpenCReport separates data access into two entities: a data source driver and a query.

OpenCReports supports diverse data sources:

2.1.1. SQL based data sources #

SQL is the acronym for Standard Query Language. Many database software comply with the standard to a certain extent. The standard is occasionally revised, and a certain database software version complies to a specific version of the standard to a certain extent.

In general, database software are designed to store massive amounts of data and retrieve it as fast as possible. Database software and its data can be accessed through a network connection (even if it's installed in the same machine) or a faster local connection if both the database server and client are installed on the same computer.

The SQL based data sources OpenCReports natively supports are:

MariaDB and MySQL
PostgreSQL
Any SQL database server with a compliant ODBC driver

2.1.1.1. MariaDB/MySQL data source #

MariaDB is a fork of MySQL developing in a different direction but still maintaining strong compatibility with each other. The database client library is compatible with both, therefore OpenCReports supports both with the same driver.

2.1.1.2. PostgreSQL data source #

PostgreSQL (in their own words) is The World's Most Advanced Open Source Relational Database. OpenCReports supports using PostgreSQL.

The PostgreSQL datasource driver in OpenCReports is especially economic with memory by using a WITH HOLD cursor and the PostgreSQL specific FETCH count SQL statement to retrieve a specified number of rows in one round, as opposed to retrieving every row in one round that most database software supports.

This makes the report run slower for longer query results compared to other databases, but allows generating the report from extremely large datasets when other databases may fail due to memory exhaustion.

2.1.1.3. ODBC data source #

OpenCReports supports using a generic ODBC connection to any database servers using a compliant client driver. OpenCReports uses a standard ODBC manager library, so it is able to use any ODBC DSN (Data Source Name) configured for the system or the user.

2.1.1.4. Special note for SQL datasources #

The database client libraries for MariaDB, PostgreSQL and ODBC return all the query rows from the database server at once by default. As such, it is possible that a long query result doesn't fit into the computer memory.

The report needs to traverse the query result twice to pre-compute "delayed" values (see precalculated and Precalculated variables), so it needs to be able to rewind the data set once it was read to the end.

The alternative API in MariaDB to load the rows one by one doesn't allow rewinding, so it's not usable for the report's purposes.

It is only PostgreSQL that allows using an SQL cursor as a standalone entity, i.e. outside SQL procedures as defined by the SQL standard. This PostgreSQL extension to the standard allows saving memory in such a way that it allows processing very long query results. Behind the scenes, a WITH HOLD cursor is used and 1024 rows are loaded in one go from the server.

2.1.2. File based data sources #

The file based data sources OpenCReports supports are:

Comma-separated values a.k.a. CSV
eXtensible Markup Language a.k.a. XML
JavaScript Object Notation a.k.a. JSON
spreadsheet formats, like XLS, XLSX and ODS

The XML and JSON file types expect the data presented in a certain structure syntax. The semantics is application defined. The expected format for these file types are described below.

2.1.2.1. CSV file type #

CSV ("Comma Separated Values") is a simple tabulated file format. Every line must have the same number of columns, the values are separated by commas. The first line in the file contains the column names.

Using only the CSV file, the data type cannot be determined. Because of this, every column is assumed to be a string, regardless if the values themselves are quoted or not in the file. Data conversion functions must be used, see for example Section 4.12.4, Section 4.10.16 and Section 4.10.10.

Using either report XML description or programming code, an optional set of type indicators may be added along with the CSV input file, so the explicit conversion functions may be omitted from expressions using the data.

2.1.2.2. JSON file type #

A JSON file is expected in this format:

{
	"columns": ["colname1", ... ],
	"coltypes": ["type", ... ],
	"rows": [
		{ "colname1": value1, ...  },
		...
	]
}

The JSON file is expected to list the column names in a string array called columns.

The column types are optionally listed in the string array called coltypes. If they are listed, the coltypes array must have the same number of strings as the columns array. The type names are string, number or datetime. If the column type array is missing, then all data values are assumed to be strings and data conversion functions must be used, see for example Section 4.12.4, Section 4.10.16 and Section 4.10.10.

The data rows are listed in a JSON array called rows and column data values for each row are in a JSON collection with data names from the columns and data types from the coltypes arrays.

The sections columns, coltypes and rows may appear in any order.

When the coltypes part is missing from the JSON input file, then using either report XML description or programming code, an optional set of type indicators may be added along with the JSON input file, so the explicit conversion functions may be omitted from expressions using the data.

2.1.2.3. XML file type #

An XML file datasource is expected in this format:

<?xml version="1.0"?>
<data>
	<rows>
		<row>
			<col>value</col>
			...
		</row>
		...
	</rows>
	<fields>
		<field>column1</field>
		...
	</fields>
	<coltypes>
		<col>type1</col>
		...
	</coltypes>
</data>

The XML section names <data>, <rows> and <fields> are the same as they were in RLIB for its XML data source. The order of <rows> and <fields> is not important. But the order of field names in <fields> must match the column value order in each <row>.

The optional section <coltypes> is new in OpenCReports. If it's present, then it must list the data types in the same order as section <fields>. The types may be string, number or datetime. If this section is not present, all values are assumed to be strings and data conversion functions must be used, see for example Section 4.12.4, Section 4.10.16 and Section 4.10.10.

When the coltypes part is missing from the XML input file, then using either report XML description or programming code, an optional set of type indicators may be added along with the XML input file, so the explicit conversion functions may be omitted from expressions using the data.

2.1.2.4. Spreadsheet file types #

OpenCReports also supports various spreadsheet formats as datasources. For that, it relies on the Python pandas module and the supporting modules for the actual spreadsheet file format. Such Python modules are xlrd for the older Microsoft XLS format, pyopenxl for the newer Microsoft XLSX format, and odfpy for LibreOffice ODS format. Other modules may also be used to support other spreadsheet file formats.

2.1.3. Application data based datasource #

Applications may also have internal data that can be used as input for OpenCReports.

OpenCReports supports using two-dimensional C arrays as directly accessible application data. Such arrays must be declared as

char *array[ROWS][COLUMNS]

or converted to it if using OpenCReports from a different language. Each element is a pointer to a zero-terminated C string. The first row contains the names of columns.

Optionally, a set of type indicators may be supplied, similarly to the File based data sources.

2.1.4. Application defined data sources #

OpenCReports allows application defined datasource drivers that may even override built-in datasource drivers.

An application defined data source may be any of the previously listed types: SQL, file or data based.