In my post Cooking Up a Better BI, I gave an overview of how Indicee addresses the problem of describing BI processes to allow maximum flexibility in the creation and sharing of various BI components.
Over the next few posts, I’m going to talk about some of the fixed points in the BI model – the common concepts and objects that are described and manipulated by Indicee’s modeling tools, to provide the refined materials that are used to make reports.
Every BI platform has to provide objects that describe source data, calculations, aggregations, etc. in order to produce data that gets rendered in reports. Less sophisticated BI tools let you query data directly in the reporting tool, but Indicee allows the transformation of source data into ‘reportable’ information, so report writers can tap into more refined (sometimes pre-calculated) data, providing better encapsulation and performance.
Indicee’s data processing stack is driven by a model that is as flexible and holistic as possible, while keeping the number of model concepts and objects to a minimum. The key concepts and objects in the Indicee model are:
- The source Column
- The data dictionary Field
- The data library Question Term
Today, we’ll look at the source data level and the “Column” concept of the model; how Indicee allows you to add and describe some data, so it can be easily consumed by others.
Creating a data signature (Data Sets)
Indicee models all source data as tabular. That is, a sequence of ‘rows’ each having the same kind of data fields, where the types of data are set on the Columns of the table. A particular instance of tabular data, with known structure, is called a Data Set. In fact, a Data Set may contain one or more of these tables, which allows us to model things like sheets in a spreadsheet. Basically, think of a Data Set as some specification of source data in Indicee, that tells us what kind of data to expect. The table schema must exactly match any data sent to Indicee, for that Data Set, in the future. So, you could say that the Data Set defines a ‘signature’ of the data.
Adding and defining data
A Data Set is defined and updated by data sent to Indicee. This is called a Data Contribution. Indicee remembers what Contributions are received and allows data management at the granularity of a Contribution. In addition to the table schema representing the Data Contributions, Data Sets also include the definitions of Contribution Properties; a set of data values that will be captured per Contribution. For example, the store for which contributed data applies, the default timezone for all time data in the contribution, or the name of your Sales Manager who’s uploading his latest pipeline data.
Users can easily manage the Contributions already added to a Data Set, by amending Contribution Properties or even deleting Contributions, all via the API as well as Indicee’s UI.
So, that brings us to the “Define Data Set” screen in Indicee, where Data Sets can be configured.
Anyone adding new data can set up Contribution Properties and provide the source-level metadata that indicates Column types, load-time transformations and load-time filters. This is also where you can configure a Data Set to ignore header information of uploaded files, or convert certain header values into columnar data – particularly useful if you’re transmitting files from other business systems. Some data types, such as dates, allow load-time transformation of the source date.
It’s often the case that some textual representation of the date needs to be scanned and translated into an actual Date value, so you have to select the appropriate data format or enter a matching custom format. The Define Data Set screen displays a sample of the source data with visual indications of mismatching data, where the provided Column type or transformation/format don’t match some rows of data.
In addition to configuring Column types and formats, the Define Data Set screen allows you to declare ‘load-time’ row filters for a Data Set. These are important if the source data might contain extra facts that don’t logically fit with what the information a Data Set is supposed to project. For instance, a particular Data Set may be designed to provide data on the fulfillment of orders from a distribution centre, but might also include “goods-in” information. You can easily add a filter for this information and the Define Data Set screen will provide visual feedback showing which of the rows in the source data will be affected.
In summary, adding a new data source to Indicee is pretty simple, and something that can be achieved as a standalone activity, before it goes on to be consumed by business users.
So, that’s a brief overview of the very bottom of the Indicee data stack. Next time, we’ll look at the data dictionary layer of the model, where specific types of data can be uniquely named and bound to source Columns, and where business rules can be added to drive BI calculations.