Data Glossary

Before you start navigating through data, familiarize yourself with the most important concepts and terms. For both data and business.
























Agile BI Implementation

Based on brief, efficient sprints that generate business value and offer high flexibility. This approach significantly mitigates the risk of failure. Agile methodologies, in general, replace the formality of traditional management with intensive communication and extensive feedback loops throughout the team, including project stakeholders and end users. Agile implementation doesn't preclude adhering to a predefined roadmap in the long term.


Allocation of limited resources to a project, process, etc. In data analytics, it is often used in conjunction with cost allocation, i.e., distributing shared costs (e.g., rent) among individual teams, customers, products, etc. It is closely related to the concept of attribution.

Attribution helps find the answer to the question of how much credit individual marketing channels, for example, deserve for completed orders or other conversions.


Collection, measurement, analysis, tracking, evaluation, and interpretation of data to support decision-making.


From the Greek word "analysis," which means breakdown or decomposition. It is a method of examination involving the breakdown of complex phenomena into simpler, fundamental units.


Attribution helps find the answer to the question of how much credit individual marketing channels, for example, deserve for completed orders or other conversions.


Distinguishing feature - a characteristic assigned to some data. Simplified, it's also humorously referred to as a "descriptive column in a table" :-)))

AOV (Average Order Value)

Average order value. This metric expresses the average amount spent by a customer per transaction. It helps businesses understand how much customers typically spend and serves as a basis for strategies aimed at increasing order value.


The use of technologies that allow certain processes (e.g., monthly reporting) to occur without our active (manual) involvement. In data practice, this means that instead of someone manually inputting numbers into a table/report each month, it happens automatically. The technology automatically retrieves data at a set time and transfers it where it's needed.

Automation of processes

It allows streamlining workflows by eliminating tedious manual tasks. When using web (or "cloud") applications, you have the option to connect applications via services specifically designed for this purpose. The most well-known are IFTTT, Zapier, or, which support hundreds of web applications in areas such as CRM, marketing, communication, HR, BI, and others.

(Data) Architecture

It defines the individual elements of a given system, the relationships between them, and the properties of both elements and relationships. System architecture is a metaphor analogous to the architecture of a building. It functions as a blueprint for the structure of the system in question.

ARPU (Average Revenue Per User)

A metric indicating how much revenue, on average, one user or customer generates over a certain period of time. ARPU is crucial for evaluating the financial performance of companies, particularly in the service sector or in digital businesses.


Balanced scorecard

A method for strategically evaluating a company's performance. It is based on four perspectives - financial, customer, internal processes, and learning and growth. However, these may vary depending on the specific situation. The essential principle is the balance of individual objectives.

Business Intelligence (BI)

Processes, technologies, and tools that enable companies to transform data into information, which is then used for both strategic and operational business decision-making.

Big data

Extremely large datasets that can be analyzed to uncover various patterns, trends, or associations. Basically, everything that doesn't fit into Excel :-)))


A fully managed data warehousing service that enables scalable analysis across petabytes of data. It is a platform as a service supporting querying using ANSI SQL.


Business process management


It is used as a vivid metaphor as well as a generally accepted technical term in some specialized fields. Most commonly, it is used to denote the organization of data into various domains.


Budget of an (business) entity - its expected estimated financial flows, typically over a certain period.


The word has multiple meanings. We can understand it as a group of people established to achieve common goals, whether financial or otherwise.

Business performance management

An approach aimed at strategically enhancing the quality of decision-making processes by creating a unified, integrated management environment that supports performance improvement at all levels of the company. The foundation is the creation of a framework that allows for the integration of proven methods, approaches, or systems.


CAC (Cost of Acquiring a Customer)

This is a metric that provides the total costs associated with acquiring one new customer. These costs include all marketing and sales expenses that the company incurs to attract a customer and are divided by the total number of acquired customers over a certain period.


Cash flow represents the difference between income and expenditures of cash over a certain period.

CLV (Customer Lifetime Value)

Customer Lifetime Value (CLV) represents the value of a customer throughout their relationship with the company. CLV estimates the total revenue that a company can expect from one customer over the period they maintain a relationship with the company. This indicator helps companies decide how much they are willing to invest in acquiring new customers and retaining existing ones.


Controlling is like the helmsman of a ship, who turns the helm to steer the ship towards a defined goal, and constantly monitors and informs whether the course is correct.

CSV (Comma-separated values)

A simple text file format designed for exchanging tabular data.

CRM (Customer relationship management)

Customer Relationship Management (CRM) - an approach characterized by actively creating and maintaining mutually beneficial long-term relationships with customers.


The process of browsing websites through an internet robot - a so-called crawler. It automatically combs through web pages and records the information it finds into a database, indexing it. This enables searching pages by specified words and terms in full-text search engines.

(Data) Cleansing

The process of identifying incomplete, incorrect, inaccurate, or irrelevant data and subsequently correcting or deleting it.



A report that effectively and clearly presents key indicators relevant to a specific goal on one page.


In IT, these are data recorded in digital form intended for computer processing.

Data blending

The process by which data from multiple sources are merged into one. It involves not only merging different file formats or heterogeneous data sources but also various types of data.

Data-driven company

A company that employs a "data-driven" approach. Its strategic, tactical, and operational decisions are based on the analysis and interpretation of data.

Data enrichment

Data enrichment - linking the data we work with to additional external sources.

Data governance

A concept that companies use for managing, utilizing, and protecting their data.

Data lake 

A repository where a vast amount of raw data is stored in its native format until needed. It utilizes a flat architecture for data storage.

Data lineage

Data lineage tracks the data flow from the source to the end user. It describes the origin, movement, characteristics, and quality of a particular dataset.

Data mining

Data mining - an analytical method based on extracting non-trivial hidden correlations, relationships, and information from available data.

Data profiling

Data profiling is the process of inspecting source data, understanding its structure, content, and relationships. The aim is to identify further potential within the dataset.

Data science

It integrates statistics, data analysis, machine learning, and other related methods. It creates predictive models, identifies hidden patterns, and derives further meaningful information from the data.

Data warehouse

Central data repository from various sources. Data is organized in tables.


A system for storing data and their subsequent processing.


Data-driven approach - strategic decisions are based on the analysis and interpretation of data.

Data analysis

A broad field of activities and techniques for processing and utilizing data with the aim of understanding the past, measuring the present, and predicting the future from data. (plus decision support)

Data quality

A subjective term, depending on user requirements, data usage methods, etc. The general goal is to provide users with data of such quality that they can effectively work with it.

Data type

It is determined by the type of data, i.e., the field of values of variables and constants in programming languages, and at the same time by the typical computational operations that can be performed with the data.

Data model

An abstraction (model) of the structure of a selected dataset. It describes individual data entities (tables), their attributes (columns), and the relationships between them.

Data repository

Data repository of a version control system from which it is possible to create additional repositories, i.e., clone its content.


A subset of a data warehouse, which contains data prepared for a specific purpose, such as data for a particular department and/or for a specific method of consumption (visualization, training ML models, integration into specific applications, etc.).

Data source

A source of structured and unstructured data. It can be anything from SQL databases to an Excel phone directory or recordings of individual calls from a call center.

Date democratization

An organization's approach to data utilization, aiming to make data as accessible as possible to everyone who needs it within the organization. This entails ensuring that everyone in the organization has access to current data, in the required level of detail, accuracy, and timeliness.

Design of information (Information design)

A discipline that focuses on preparing information in such a way that it can be effectively utilized by people. It heavily emphasizes ergonomics, functionality, and the visual presentation of information. Information design is the foundation of modern fields such as UX (user experience) or Data Storytelling.

(Temporary) Data Storage

It serves as temporary storage for extracted data from a data source to ensure their preparation and necessary quality before entering the data warehouse.


Data quality assurance

Drill down (Drilling)

A technique that allows breaking down a selected metric according to defined dimensions.

DBT (Data Built Tool)

A tool that focuses on the "T" in ELT. The foundation is a project configured by a set of YAML files and SQL files with macros.


Embedded analytics

Dashboards are displayed directly within the system where the user navigates throughout the day - for example, intranet, point of sale system, CRM system, etc.

Enterprise data warehouse

A database or collection of databases that centralizes enterprise information from multiple sources and applications and makes it accessible for analytics and use throughout the organization.

Enterprise information system

Any type of information system that enhances the functions of enterprise processes through integration, such as planning, inventory, purchasing, sales, marketing, finance, or HR. It is capable of operating in all parts and at all levels within the enterprise.

ERP - Enterprise resource planning

ERP is typically referred to as a category of software for enterprise management - usually a set of integrated applications - that an organization can use to collect, store, manage, and interpret transactional data from many business activities. Typically, it covers agendas such as financial accounting, point of sale, invoicing, purchasing, inventory management, production planning and control, logistics, and many others.

ETL (extract, transform, load)

The process of acquiring data from multiple sources, transforming it, cleaning it, and then loading it, for example, into a data warehouse. Here, it serves as the basis for subsequent analysis.

ELT (extract, load, transform)

An alternative to ETL. Unlike ETL, in ELT models, data is not transformed upon entering the data warehouse, but is stored in its original, unprocessed structure and format. Transformation occurs in real-time at the moment when the data is consumed (for example, through visualization). This often allows for faster access and lower overall operational costs for data processing.

(Data) Extraction

The process of obtaining data from primary systems for further processing.



A data processing platform and distribution visualization tool.


The level of detail of a given dataset. High granularity means a high level of detail, and vice versa. For example, a dataset containing sales data composed of individual order items has higher granularity than the same dataset composed only of aggregate information about those orders.


Git is a tool for source code management. Its strengths lie in versioning and the ability for collaboration of multiple people. In analytics, it has an essential place primarily due to versioning and sharing (if the structure changes on input, SQL changes, and I want to be able to look back at what the code was). Someone makes a change, it stops working, and I want to know who made that change and what the code looked like before the change.


(Data) historization

Data in the data warehouse are typically maintained in a historical form, not just in the current state, allowing for the analysis focused on the development over time.


Churn (model)

A mathematical model for predicting the probability of churn for specific customers. The model works with churn reasons that are influencable, such as switching to a competitor.



We distinguish many types of innovations; generally, innovation is a certain improvement. It involves a comprehensive process from the initial idea through development to implementation.

(Data) Integration

Collecting data from various sources and subsequently providing it further to users in a unified and consistent structure and format.


Joining of tables

Information is typically divided into a large number of tables, which are interconnected via keys. Joining tables allows us to connect them in various ways.



A platform providing tools and services that simplifies working with data - from collection and integration to analysis and presentation

Keboola Connection

Cloud data processing platform that allows organizing all internal and external data from various data sources in one place.

Key performance indicator (KPI)

It is used to measure the success of a particular activity within an organization.


Cluster of records within a dataset that share similar characteristics and significantly differ from another cluster.


For example, an extractor or writer - functional applications/scripts that download data from external systems or write data to them.


A technical component of a larger whole.


Data consolidation in terms of structure, format, and meaning from various sources into one place.


To convert means to change. For example, the process when a website visitor performs an action desired by us - such as ordering goods, etc.

Conversion ratio

A metric expressing, in percentage, how many customers out of the total number performed the desired action by us.


Linear dependency between two variables. However, this statistical dependency may not imply causality.

Data Quality

A set of characteristics used to describe the desired properties of data - for example, reliability, integrity, accuracy, completeness, availability, etc.


Location Intelligence

It focuses on visualizing diverse data layers directly onto map backgrounds, making it particularly advantageous for tasks such as identifying suitable locations for new branches or facilities.


Machine learning

It deals with the creation of algorithms that are able to predict certain tendencies and adapt to changes in the surrounding environment. Thus, it constantly improves its accuracy based on previous inputs.

Mapping table

An auxiliary table in a database, converting the values of one set of attributes to another. E.g. mapping the structure of the accounting journal to the structure of the management statement.

Master data management

It ensures data quality, uniqueness and timeliness of records in the most important tables of so-called master data, such as code lists of customers, products, team members, etc., including the integrity of mutual links and links to transactional data. Sometimes it is also referred to as Golden Record, a place of "one truth".

Matching of data

Identifying, comparing, and merging records that match the same entities from one or more databases.


Data that provides information about other data.


A summary of recommended practices and procedures covering the entire life cycle of the application or data solution being developed.


Indicator, or metric, operating with simple numerical facts. For example, the Revenue metric can be defined as the sum of the products of quantity and unit price excluding VAT of all items on issued invoices.


Process of transfer.

(Data) Migration

The process of transferring data from one location to another, from one format to another, or from one application to another.

(Data) Monitoring

The process of proactive monitoring, evaluating data and its quality to ensure that it is suitable for the intended purpose.


A simple visual design of a future screen or page of an information system. A mockup can be, for example, a layout of a future dashboard drawn by hand on a flipchart, etc.


Driving force.

Multi-project architecture

Decomposition of a large project into smaller logical units.


Open source database capable of storing a large amount of diverse data and then retrieving it upon query.


 Return (ROI, Return on investment)

Return on Investment (ROI) is a financial metric that indicates the ratio between profit (or loss) and invested funds.


OLAP cube (data cube)

Multidimensional array of values ("n-D"). The term is usually used in contexts where these arrays are large, in the order of gigabytes or terabytes. The individual data dimensions represent the dimensions of a "cube," and the values are pre-calculated values of individual metrics. The concept was used in the 1990s when analytical databases did not have sufficient performance, so metric values were "pre-calculated" into the cube for individual combinations of data dimensions.

OLAP system (online analytical processing)

Data storage technology in a database that allows organizing large volumes of data in a way that makes the data accessible and understandable to users engaged in BI (Business Intelligence).

Operational data store

Central database that provides snapshots of the latest data from multiple transactional systems. It allows combining data in its original format from various sources into one destination so that it is available for business reports.

(Data) Orchestration

The process that manages data processing - software takes data from multiple sources, combines them, and subsequently makes them available to tools for further consumption.



The process of assigning (sometimes also referred to as "attribution") records from one dataset to another. For example, assigning bank transactions to specific receivables or liabilities items.


Syntactic analysis of a text. According to predefined rules, the analysis of a text string or file is conducted with the aim to determine the structure of individual elements and values encoded in the string or file. For example, your web browser performs parsing of the string you enter into the address bar to determine the protocol, server, and specific page you want to display, or whether it should initiate a search.


The essence of planning lies in setting goals and defining the procedures for achieving these goals.


Prediction, forecast, a statement about what will/will not happen in the future. It is used for estimates supported by a certain scientific hypothesis or theory.

Predictive anlaysis

It utilizes historical data and predictive models to forecast a certain phenomenon.

Prescriptive analysis

It uses various tools, such as machine learning, simulation, or neural networks, for the purpose of complex analysis of an event. In the case here predictive analysis reveals what and when will happen, prescriptive analysis determines also why it will happen, thereby aiding in, for example, identifying future risks and opportunities, etc.

Primary key

A field or combination of data entity attributes that uniquely identifies each of its occurrences (each record in a database table). This makes it easily findable. It also allows linking two tables together and creating a connection/relation between them.

Propensity modeling

A method of predictive analytics related to the statistical analysis of clients or employees, for example. It is used, among other things, to identify individuals who are most likely to respond to an offer, etc.

Case study

Demonstrates the entire project process on a specific client example, from start to finish. A great way to show new clients what you are capable of.


Real-time analytics

The process of preparing and measuring data in real-time as soon as it enters the database. Users gain insights or can draw conclusions immediately or very quickly after the data enters their system. Real-time analysis enables businesses to respond without delay.

Relational database management system (RDBMS)

A database management system based on the relational data model. Most databases used today are based on this model.


The process of making changes to a software system in such a way that it does not affect the external behavior of the code but improves its internal structure with minimal risk of introducing errors. During refactoring, small changes are made, but the overall effect is significant, resulting in cleaner, more transparent, and readable code, which is also easier to maintain and extend. The overall quality of the code and architecture is improved, the number of errors is reduced, and thus the speed of program development is increased.

Referencial integrity

It helps maintain relationships in relationally linked database tables. A foreign key in one table must refer to an existing primary key in another table, or it must contain a NULL value.

(Data) Replication

Enables storing current data in multiple separate storage locations.


Data visualization that clearly and concisely presents key values, trends, etc.


The process of creating, maintaining, and updating reports.

Reporting of data quality

Documentation of trends, identification of problems and opportunities in data quality.


A data repository of a version control system from which it is possible to create additional repositories, i.e., its content can be cloned. This distinguishes it from a working copy, which does not allow duplication.

Rest API

REST is an API architecture that allows us to access data and perform CRUD operations on it. REST is stateless, which significantly simplifies communication with the API and enables parallel processing.

Data analytics management

It is a part of data governance. The management of the process of creating, maintaining, and using data analytics outputs



Software as a Service - allows users to connect to and use cloud-based applications over the Internet.

Data Collection

A systematic approach to collecting and measuring information from various sources to gain an accurate picture of a particular area of interest.


Automated data extraction directly from websites and their subsequent storage into a structure of our choice. The output can then be, for example, CSV (JSON, etc.), which is prepared for further use.

RFM Segmenation

A method of customer segmentation based on previous purchasing behavior. This requires three customer metrics: Recency, Frequency, and Monetary.

Secondary key

It's not unique - for example, a name vs. a primary key uniquely identifies a record (e.g., social security number).

Self-service BI

An effective BI tool system, whose main idea is the maximum self-sufficiency of the end user and their independence from data analysis specialists, etc.


A simple plan, outline, or sketch outlined in its main features.


Service level agreement - an agreement on the level of services provided agreed upon between the user and the service provider.

Slice and dice

Splitting a large amount of data into smaller parts and subsequently analyzing it from various perspectives.


Snapshotting the current state..


A frozen, precisely defined state of a given entity (such as a server or data file) to which it is possible to quickly revert. It is used, for example, for tracking history.


Apache Spark is an open-source platform for parallel data processing that supports in-memory processing to enhance the performance of applications analyzing large volumes of data. Spark processes large volumes of data in memory, which is much faster than disk-based alternatives.

Data Management

Collecting, storing, and using data securely, efficiently, and economically.


Structured Query Language - a query language for manipulating, managing, and organizing data stored in a database.


A long-term plan to create to achieve a certain goal.


Scale, or sometimes called a range, is an agreed-upon way of numbering a value and indicating a certain quantity. The scale is divided into individual degrees.


In the context of business or IT processes, this term refers to a dynamic, well-managed (often automated) process of adding/removing resources (assets) to meet requirements.


It refers to the ability of a business process or IT process to scale.



Tool for analysis and visualization, report creation, and dashboarding.


The basic logical or physical structure where data is stored. Typically, the columns of a table represent individual attributes of a given data entity, and the rows represent individual occurrences (instances, records).

Time to market

The period from the initial idea through design, implementation, to market launch and accessibility to end users.

(Data) Transformation

The key part of the data processing process, during which data cleaning, enrichment, and preparation for subsequent processing or outputs (e.g., visualization) take place.

Type of visualizations

A synonym for "types of graphs" or other kinds of visual data representation (table, pie chart, bar chart, "traffic light," etc.) is "data visualization methods" or "visual representation techniques."


Use case

Funkčnost nebo využití dat - vždy se váže k nějaké roli a k tomu, co ta role potřebuje pro dosažení svých KPIs / růstu byznysu. Cílem je pojmenovat aktuální (i strategické) požadavky businessu.



Allows for clear and understandable interpretation of data outputs. Information is presented in a graphical form, such as a graph, pie chart, etc.

Visualization tool

Allows to "transform" vast amounts of data into understandable summaries, including key performance indicators, metrics, and other critical points. In our projects, we choose from three tools - Tableau, Power BI, and GoodData. It depends on the specific project.



Defining only those applications that can be launched, while all others are automatically disabled.

Web scraping

See 'Scraping'.


Hardware on which scripts are running.


Data lifecycle

The process through which data goes from its creation to the resulting graph. Acquisition, cleansing, enrichment, storage, visualization and analysis, modeling.

Do you have any questions?

Don't hesitate to contact us.