Before you start navigating through data, familiarize yourself with the most important concepts and terms. For both data and business.
A
Agile BI Implementation
Based on brief, efficient sprints that generate business value and offer high flexibility. This approach significantly mitigates the risk of failure. Agile methodologies, in general, replace the formality of traditional management with intensive communication and extensive feedback loops throughout the team, including project stakeholders and end users. Agile implementation doesn't preclude adhering to a predefined roadmap in the long term.
Allocation
Allocation of limited resources to a project, process, etc. In data analytics, it is often used in conjunction with cost allocation, i.e., distributing shared costs (e.g., rent) among individual teams, customers, products, etc. It is closely related to the concept of attribution.
Attribution helps find the answer to the question of how much credit individual marketing channels, for example, deserve for completed orders or other conversions.
Analytics
Collection, measurement, analysis, tracking, evaluation, and interpretation of data to support decision-making.
Analysis
From the Greek word "analysis," which means breakdown or decomposition. It is a method of examination involving the breakdown of complex phenomena into simpler, fundamental units.
Attribution
Attribution helps find the answer to the question of how much credit individual marketing channels, for example, deserve for completed orders or other conversions.
API
API (Application Programming Interface) is an interface that allows different software applications to communicate with each other. It provides a set of rules, protocols, and tools that define how different software components should interact.
AOV (Average Order Value)
Average order value. This metric expresses the average amount spent by a customer per transaction. It helps businesses understand how much customers typically spend and serves as a basis for strategies aimed at increasing order value.
Automation
The use of technologies that allow certain processes (e.g., monthly reporting) to occur without our active (manual) involvement. In data practice, this means that instead of someone manually inputting numbers into a table/report each month, it happens automatically. The technology automatically retrieves data at a set time and transfers it where it's needed.
Automation of processes
It allows streamlining workflows by eliminating tedious manual tasks. When using web (or "cloud") applications, you have the option to connect applications via services specifically designed for this purpose. The most well-known are IFTTT, Zapier, or Automate.io, which support hundreds of web applications in areas such as CRM, marketing, communication, HR, BI, and others.
(Data) Architecture
It defines the individual elements of a given system, the relationships between them, and the properties of both elements and relationships. System architecture is a metaphor analogous to the architecture of a building. It functions as a blueprint for the structure of the system in question.
ARPU (Average Revenue Per User)
A metric indicating how much revenue, on average, one user or customer generates over a certain period of time. ARPU is crucial for evaluating the financial performance of companies, particularly in the service sector or in digital businesses.
B
Balanced scorecard
A method for strategically evaluating a company's performance. It is based on four perspectives - financial, customer, internal processes, and learning and growth. However, these may vary depending on the specific situation. The essential principle is the balance of individual objectives.
Business Intelligence (BI)
Processes, technologies, and tools that enable companies to transform data into information, which is then used for both strategic and operational business decision-making.
Big data
Extremely large datasets that can be analyzed to uncover various patterns, trends, or associations. Basically, everything that doesn't fit into Excel :-)))
BigQuery
A fully managed data warehousing service that enables scalable analysis across petabytes of data. It is a platform as a service supporting querying using ANSI SQL.
BPM
Business process management
Bucket
It is used as a vivid metaphor as well as a generally accepted technical term in some specialized fields. Most commonly, it is used to denote the organization of data into various domains.
Budget
Budget of an (business) entity - its expected estimated financial flows, typically over a certain period.
Business
The word has multiple meanings. We can understand it as a group of people established to achieve common goals, whether financial or otherwise.
Business performance management
An approach aimed at strategically enhancing the quality of decision-making processes by creating a unified, integrated management environment that supports performance improvement at all levels of the company. The foundation is the creation of a framework that allows for the integration of proven methods, approaches, or systems.
C
CAC (Cost of Acquiring a Customer)
This is a metric that provides the total costs associated with acquiring one new customer. These costs include all marketing and sales expenses that the company incurs to attract a customer and are divided by the total number of acquired customers over a certain period.
Cashflow
Cash flow represents the difference between income and expenditures of cash over a certain period.
CLV (Customer Lifetime Value)
Customer Lifetime Value (CLV) represents the value of a customer throughout their relationship with the company. CLV estimates the total revenue that a company can expect from one customer over the period they maintain a relationship with the company. This indicator helps companies decide how much they are willing to invest in acquiring new customers and retaining existing ones.
Controlling
Controlling is like the helmsman of a ship, who turns the helm to steer the ship towards a defined goal, and constantly monitors and informs whether the course is correct.
CSV (Comma-separated values)
A simple text file format designed for exchanging tabular data.
CRM (Customer relationship management)
Customer Relationship Management (CRM) - an approach characterized by actively creating and maintaining mutually beneficial long-term relationships with customers.
Crawling
The process of browsing websites through an internet robot - a so-called crawler. It automatically combs through web pages and records the information it finds into a database, indexing it. This enables searching pages by specified words and terms in full-text search engines.
(Data) Cleansing
The process of identifying incomplete, incorrect, inaccurate, or irrelevant data and subsequently correcting or deleting it.
D
Dashboard(s)
A report that effectively and clearly presents key indicators relevant to a specific goal on one page.
Data
In IT, these are data recorded in digital form intended for computer processing.
Data blending
The process by which data from multiple sources are merged into one. It involves not only merging different file formats or heterogeneous data sources but also various types of data.
Data-driven company
A company that employs a "data-driven" approach. Its strategic, tactical, and operational decisions are based on the analysis and interpretation of data.
Data enrichment
Data enrichment - linking the data we work with to additional external sources.
Data governance
A concept that companies use for managing, utilizing, and protecting their data.
Data lake
A repository where a vast amount of raw data is stored in its native format until needed. It utilizes a flat architecture for data storage.
Data lineage
Data lineage tracks the data flow from the source to the end user. It describes the origin, movement, characteristics, and quality of a particular dataset.
Data mining
Data mining - an analytical method based on extracting non-trivial hidden correlations, relationships, and information from available data.
Data profiling
Data profiling is the process of inspecting source data, understanding its structure, content, and relationships. The aim is to identify further potential within the dataset.
Data science
It integrates statistics, data analysis, machine learning, and other related methods. It creates predictive models, identifies hidden patterns, and derives further meaningful information from the data.
Data warehouse
Central data repository from various sources. Data is organized in tables.
Database
A system for storing data and their subsequent processing.
Data-driven
Data-driven approach - strategic decisions are based on the analysis and interpretation of data.
Data analysis
A broad field of activities and techniques for processing and utilizing data with the aim of understanding the past, measuring the present, and predicting the future from data. (plus decision support)
Data quality
A subjective term, depending on user requirements, data usage methods, etc. The general goal is to provide users with data of such quality that they can effectively work with it.
Data type
It is determined by the type of data, i.e., the field of values of variables and constants in programming languages, and at the same time by the typical computational operations that can be performed with the data.
Data model
An abstraction (model) of the structure of a selected dataset. It describes individual data entities (tables), their attributes (columns), and the relationships between them.
Data repository
Data repository of a version control system from which it is possible to create additional repositories, i.e., clone its content.
Datamart
A subset of a data warehouse, which contains data prepared for a specific purpose, such as data for a particular department and/or for a specific method of consumption (visualization, training ML models, integration into specific applications, etc.).
Data source
A source of structured and unstructured data. It can be anything from SQL databases to an Excel phone directory or recordings of individual calls from a call center.
Date democratization
An organization's approach to data utilization, aiming to make data as accessible as possible to everyone who needs it within the organization. This entails ensuring that everyone in the organization has access to current data, in the required level of detail, accuracy, and timeliness.
Design of information (Information design)
A discipline that focuses on preparing information in such a way that it can be effectively utilized by people. It heavily emphasizes ergonomics, functionality, and the visual presentation of information. Information design is the foundation of modern fields such as UX (user experience) or Data Storytelling.
(Temporary) Data Storage
It serves as temporary storage for extracted data from a data source to ensure their preparation and necessary quality before entering the data warehouse.
DQA
Data quality assurance
Drill down (Drilling)
A technique that allows breaking down a selected metric according to defined dimensions.
DBT (Data Built Tool)
A tool that focuses on the "T" in ELT. The foundation is a project configured by a set of YAML files and SQL files with macros.
E
Embedded analytics
Dashboards are displayed directly within the system where the user navigates throughout the day - for example, intranet, point of sale system, CRM system, etc.
Enterprise data warehouse
A database or collection of databases that centralizes enterprise information from multiple sources and applications and makes it accessible for analytics and use throughout the organization.
Enterprise information system
Any type of information system that enhances the functions of enterprise processes through integration, such as planning, inventory, purchasing, sales, marketing, finance, or HR. It is capable of operating in all parts and at all levels within the enterprise.
ERP - Enterprise resource planning
ERP is typically referred to as a category of software for enterprise management - usually a set of integrated applications - that an organization can use to collect, store, manage, and interpret transactional data from many business activities. Typically, it covers agendas such as financial accounting, point of sale, invoicing, purchasing, inventory management, production planning and control, logistics, and many others.
ETL (extract, transform, load)
The process of acquiring data from multiple sources, transforming it, cleaning it, and then loading it, for example, into a data warehouse. Here, it serves as the basis for subsequent analysis.
ELT (extract, load, transform)
An alternative to ETL. Unlike ETL, in ELT models, data is not transformed upon entering the data warehouse, but is stored in its original, unprocessed structure and format. Transformation occurs in real-time at the moment when the data is consumed (for example, through visualization). This often allows for faster access and lower overall operational costs for data processing.
(Data) Extraction
The process of obtaining data from primary systems for further processing.
G
GoodData
A data processing platform and distribution visualization tool.
Granularita
The level of detail of a given dataset. High granularity means a high level of detail, and vice versa. For example, a dataset containing sales data composed of individual order items has higher granularity than the same dataset composed only of aggregate information about those orders.
GiT
Git is a tool for source code management. Its strengths lie in versioning and the ability for collaboration of multiple people. In analytics, it has an essential place primarily due to versioning and sharing (if the structure changes on input, SQL changes, and I want to be able to look back at what the code was). Someone makes a change, it stops working, and I want to know who made that change and what the code looked like before the change.
H
(Data) historization
Data in the data warehouse are typically maintained in a historical form, not just in the current state, allowing for the analysis focused on the development over time.
CH / AJ NEMA CH, TOTO POD C
Churn (model)
A mathematical model for predicting the probability of churn for specific customers. The model works with churn reasons that are influencable, such as switching to a competitor.
I
Innovation
We distinguish many types of innovations; generally, innovation is a certain improvement. It involves a comprehensive process from the initial idea through development to implementation.
(Data) Integration
Collecting data from various sources and subsequently providing it further to users in a unified and consistent structure and format.
J
Joining of tables
Information is typically divided into a large number of tables, which are interconnected via keys. Joining tables allows us to connect them in various ways.
K
Keboola
A platform providing tools and services that simplifies working with data - from collection and integration to analysis and presentation
Keboola Connection
Cloud data processing platform that allows organizing all internal and external data from various data sources in one place.
Key performance indicator (KPI)
It is used to measure the success of a particular activity within an organization.
Cluster
Cluster of records within a dataset that share similar characteristics and significantly differ from another cluster.
Connector
For example, an extractor or writer - functional applications/scripts that download data from external systems or write data to them.
Component
A technical component of a larger whole.
Consolidation
Data consolidation in terms of structure, format, and meaning from various sources into one place.
Conversion
To convert means to change. For example, the process when a website visitor performs an action desired by us - such as ordering goods, etc.
Conversion ratio
A metric expressing, in percentage, how many customers out of the total number performed the desired action by us.
Correlation
Linear dependency between two variables. However, this statistical dependency may not imply causality.
Data Quality
A set of characteristics used to describe the desired properties of data - for example, reliability, integrity, accuracy, completeness, availability, etc.
L
Location Intelligence
It focuses on visualizing diverse data layers directly onto map backgrounds, making it particularly advantageous for tasks such as identifying suitable locations for new branches or facilities.
M
Machine learning
It deals with the creation of algorithms that are able to predict certain tendencies and adapt to changes in the surrounding environment. Thus, it constantly improves its accuracy based on previous inputs.
Mapping table
An auxiliary table in a database, converting the values of one set of attributes to another. E.g. mapping the structure of the accounting journal to the structure of the management statement.
Master data management
It ensures data quality, uniqueness and timeliness of records in the most important tables of so-called master data, such as code lists of customers, products, team members, etc., including the integrity of mutual links and links to transactional data. Sometimes it is also referred to as Golden Record, a place of "one truth".
Matching of data
Identifying, comparing, and merging records that match the same entities from one or more databases.
Metadata
Data that provides information about other data.
Methodology
A summary of recommended practices and procedures covering the entire life cycle of the application or data solution being developed.
Metric
Indicator, or metric, operating with simple numerical facts. For example, the Revenue metric can be defined as the sum of the products of quantity and unit price excluding VAT of all items on issued invoices.
Migration
Process of transfer.
(Data) Migration
The process of transferring data from one location to another, from one format to another, or from one application to another.
(Data) Monitoring
The process of proactive monitoring, evaluating data and its quality to ensure that it is suitable for the intended purpose.
Mockup
A simple visual design of a future screen or page of an information system. A mockup can be, for example, a layout of a future dashboard drawn by hand on a flipchart, etc.
Motivation
Driving force.
Multi-project architecture
Decomposition of a large project into smaller logical units.
mysql
Open source database capable of storing a large amount of diverse data and then retrieving it upon query.
N
Return (ROI, Return on investment)
Return on Investment (ROI) is a financial metric that indicates the ratio between profit (or loss) and invested funds.
O
OLAP cube (data cube)
Multidimensional array of values ("n-D"). The term is usually used in contexts where these arrays are large, in the order of gigabytes or terabytes. The individual data dimensions represent the dimensions of a "cube," and the values are pre-calculated values of individual metrics. The concept was used in the 1990s when analytical databases did not have sufficient performance, so metric values were "pre-calculated" into the cube for individual combinations of data dimensions.
OLAP system (online analytical processing)
Data storage technology in a database that allows organizing large volumes of data in a way that makes the data accessible and understandable to users engaged in BI (Business Intelligence).
Operational data store
Central database that provides snapshots of the latest data from multiple transactional systems. It allows combining data in its original format from various sources into one destination so that it is available for business reports.
(Data) Orchestration
The process that manages data processing - software takes data from multiple sources, combines them, and subsequently makes them available to tools for further consumption.
P
Pairing
The process of assigning (sometimes also referred to as "attribution") records from one dataset to another. For example, assigning bank transactions to specific receivables or liabilities items.
Parsing
Syntactic analysis of a text. According to predefined rules, the analysis of a text string or file is conducted with the aim to determine the structure of individual elements and values encoded in the string or file. For example, your web browser performs parsing of the string you enter into the address bar to determine the protocol, server, and specific page you want to display, or whether it should initiate a search.
Planning
The essence of planning lies in setting goals and defining the procedures for achieving these goals.
Prediction
Prediction, forecast, a statement about what will/will not happen in the future. It is used for estimates supported by a certain scientific hypothesis or theory.
Predictive anlaysis
It utilizes historical data and predictive models to forecast a certain phenomenon.
Prescriptive analysis
It uses various tools, such as machine learning, simulation, or neural networks, for the purpose of complex analysis of an event. In the case here predictive analysis reveals what and when will happen, prescriptive analysis determines also why it will happen, thereby aiding in, for example, identifying future risks and opportunities, etc.
Primary key
A field or combination of data entity attributes that uniquely identifies each of its occurrences (each record in a database table). This makes it easily findable. It also allows linking two tables together and creating a connection/relation between them.
Propensity modeling
A method of predictive analytics related to the statistical analysis of clients or employees, for example. It is used, among other things, to identify individuals who are most likely to respond to an offer, etc.
Case study
Demonstrates the entire project process on a specific client example, from start to finish. A great way to show new clients what you are capable of.
R
Real-time analytics
The process of preparing and measuring data in real-time as soon as it enters the database. Users gain insights or can draw conclusions immediately or very quickly after the data enters their system. Real-time analysis enables businesses to respond without delay.
Relational database management system (RDBMS)
A database management system based on the relational data model. Most databases used today are based on this model.
Refactoring
The process of making changes to a software system in such a way that it does not affect the external behavior of the code but improves its internal structure with minimal risk of introducing errors. During refactoring, small changes are made, but the overall effect is significant, resulting in cleaner, more transparent, and readable code, which is also easier to maintain and extend. The overall quality of the code and architecture is improved, the number of errors is reduced, and thus the speed of program development is increased.
Referencial integrity
It helps maintain relationships in relationally linked database tables. A foreign key in one table must refer to an existing primary key in another table, or it must contain a NULL value.
(Data) Replication
Enables storing current data in multiple separate storage locations.
Report
Data visualization that clearly and concisely presents key values, trends, etc.
Reporting
The process of creating, maintaining, and updating reports.
Reporting of data quality
Documentation of trends, identification of problems and opportunities in data quality.
Repository
A data repository of a version control system from which it is possible to create additional repositories, i.e., its content can be cloned. This distinguishes it from a working copy, which does not allow duplication.
Rest API
REST is an API architecture that allows us to access data and perform CRUD operations on it. REST is stateless, which significantly simplifies communication with the API and enables parallel processing.
Data analytics management
It is a part of data governance. The management of the process of creating, maintaining, and using data analytics outputs
S
SaaS
Software as a Service - allows users to connect to and use cloud-based applications over the Internet.
Data Collection
A systematic approach to collecting and measuring information from various sources to gain an accurate picture of a particular area of interest.
Scraping
Automated data extraction directly from websites and their subsequent storage into a structure of our choice. The output can then be, for example, CSV (JSON, etc.), which is prepared for further use.
RFM Segmenation
A method of customer segmentation based on previous purchasing behavior. This requires three customer metrics: Recency, Frequency, and Monetary.
Secondary key
It's not unique - for example, a name vs. a primary key uniquely identifies a record (e.g., social security number).
Self-service BI
An effective BI tool system, whose main idea is the maximum self-sufficiency of the end user and their independence from data analysis specialists, etc.
Scheme
A simple plan, outline, or sketch outlined in its main features.
SLA
Service level agreement - an agreement on the level of services provided agreed upon between the user and the service provider.
Slice and dice
Splitting a large amount of data into smaller parts and subsequently analyzing it from various perspectives.
Snapshotting
Snapshotting the current state..
Snapshot
A frozen, precisely defined state of a given entity (such as a server or data file) to which it is possible to quickly revert. It is used, for example, for tracking history.
Spark
Apache Spark is an open-source platform for parallel data processing that supports in-memory processing to enhance the performance of applications analyzing large volumes of data. Spark processes large volumes of data in memory, which is much faster than disk-based alternatives.
Data Management
Collecting, storing, and using data securely, efficiently, and economically.
SQL
Structured Query Language - a query language for manipulating, managing, and organizing data stored in a database.
Strategy
A long-term plan to create to achieve a certain goal.
Scale
Scale, or sometimes called a range, is an agreed-upon way of numbering a value and indicating a certain quantity. The scale is divided into individual degrees.
Scaling
In the context of business or IT processes, this term refers to a dynamic, well-managed (often automated) process of adding/removing resources (assets) to meet requirements.
Scalability
It refers to the ability of a business process or IT process to scale.
T
Tableau
Tool for analysis and visualization, report creation, and dashboarding.
Table
The basic logical or physical structure where data is stored. Typically, the columns of a table represent individual attributes of a given data entity, and the rows represent individual occurrences (instances, records).
Time to market
The period from the initial idea through design, implementation, to market launch and accessibility to end users.
(Data) Transformation
The key part of the data processing process, during which data cleaning, enrichment, and preparation for subsequent processing or outputs (e.g., visualization) take place.
Type of visualizations
A synonym for "types of graphs" or other kinds of visual data representation (table, pie chart, bar chart, "traffic light," etc.) is "data visualization methods" or "visual representation techniques."
U
Use case
Funkčnost nebo využití dat - vždy se váže k nějaké roli a k tomu, co ta role potřebuje pro dosažení svých KPIs / růstu byznysu. Cílem je pojmenovat aktuální (i strategické) požadavky businessu.
V
Visualization
Allows for clear and understandable interpretation of data outputs. Information is presented in a graphical form, such as a graph, pie chart, etc.
Visualization tool
Allows to "transform" vast amounts of data into understandable summaries, including key performance indicators, metrics, and other critical points. In our projects, we choose from three tools - Tableau, Power BI, and GoodData. It depends on the specific project.
W
Whitelisting
Defining only those applications that can be launched, while all others are automatically disabled.
Web scraping
See 'Scraping'.
Worker
Hardware on which scripts are running.
Z
Data lifecycle
The process through which data goes from its creation to the resulting graph. Acquisition, cleansing, enrichment, storage, visualization and analysis, modeling.