Traditionally, applications and databases were organized around functional domains, such as accounting, human resources, logistics, CRM, and so on. Every department or business unit worked with its own data silo with no cross-department integration. Whereas the silos mainly aimed at operational support, a next phase saw the emergence of business intelligence (BI) and analytics applications, fueled by the need for data-driven tactical and strategical decision making, with a company-wide impact. To sustain this company-wide view, data from the silos was transformed, integrated, and consolidated into a company-wide data warehouse. ETL (extract, transform, load) processes supported the asynchronous data extraction and transfer from the source systems (the operational data silos) to the target data warehouse. However, the ETL process was typically time-consuming, so there was a certain latency between the up-to-date operational data stores and the slightly outdated data warehouse. This latency was tolerable: real-time business intelligence was not the goal in traditional data warehousing.
Because of this evolution, we were confronted for nearly two decades with a dual data storage and processing landscape, supported by two very distinct scenes of tool vendors and products. Nowadays, we see a complete convergence of the operational and tactical/strategic data needs of the corresponding data integration tooling.
This trend was initiated by new marketing practices centered on proactive (instead of reactive) actions requiring a complete understanding of the customer, and quickly spread toward other functional domains. It culminates in the term “operational BI,” with a twofold meaning. First, analytics techniques are more and more used at the operational level as well as by front-line employees. Second, analytics for tactical/strategic decision making increasingly uses real-time operational data combined with the aggregated and historical data found in more traditional data warehouses. This evolution poses interesting challenges to the landscape of data storage and data integration solutions.
Data integration aims at providing a unified view and/or unified access over different, and possibly distributed, data sources. The data itself may be heterogeneous and reside in difference resources (e.g., XML files, legacy systems, relational databases). The desired extent of data integration will highly depend upon the required quality of service characteristics. Data will never be of perfect quality, so a certain level of inaccurate, incomplete, or inconsistent data may have to be tolerated for operational BI to succeed. Different data integration patterns exist to provide this unified view:
• Data consolidation aims at capturing the data from multiple, heterogeneous source systems and integrating it into a single persistent store (e.g., a data warehouse, data mart, or data lake). This is typically accomplished using ETL routines.
• Data federation typically follows a pull approach, where data is pulled from the underlying source systems on an on-demand basis. Enterprise information integration is an example of a data federation technology and can be implemented by realizing a virtual business view on the dispersed underlying data sources.
• Data propagation corresponds to the synchronous or asynchronous propagation of updates or, more generally, events in a source system to a target system. It can be applied in the interaction between two applications (enterprise application integration) or in the synchronization between two data stores (enterprise data replication).
• Changed data capture can detect update events in the source data store, and trigger the ETL process based on these updates. In this way, a “push” model to ETL is supported.
• Data virtualization builds upon the basic data integration patterns discussed previously but isolates applications and users from the actual integration patterns used.
• Data as a service offers data services as part of the overall SOA, where business processes are supported by a set of loosely coupled software services.
The aspect of data integration is also heavily related to that of data quality. Data quality can be defined as “fitness for use,” meaning that the required level of quality of data depends on the context or task at hand. Data quality is a multidimensional concept involving various aspects or criteria by which to assess the quality of a data set or individual data record. The following data quality dimensions are typically highlighted as being important:
• Data accuracy — referring to whether the data values stored are correct (e.g., the name of the customer should be spelled correctly)
• Data completeness — referring to whether both metadata and values are represented to the degree required and are not missing (e.g., a birth date should be filled out for each customer)
• Data consistency — relating to consistency between redundant or duplicate values, and consistency among different data elements referring to the same or a related concept (e.g., the name of a city and postal code should be consistent)
• Data accessibility — which reflects the ease of retrieving the data
Approached from the angle of data integration, it is important to mention that data integration can aid in improving data quality, but might also hamper it. Data consolidation and ETL allow the performance of different transformation and cleansing operations, so the consolidated view of the data should be of higher quality, but one might, appropriately, wonder why it wouldn’t be better to invest in data quality improvements at the source. The same is true for environments where, throughout time, different integration approaches have been combined, leading to a jungle of legacy and newer systems and databases that now all must be maintained and integrated with one another. This is a key challenge for many organizations and one that is indeed very difficult to solve. In these settings, master data management (MDM) is frequently mentioned as a management initiative to counteract these quality-related issues.
A data warehouse is a federated repository for all the data collected by an enterprise's various operational systems, be they physical or logical. Data warehousing emphasizes the capture of data from diverse sources for access and analysis rather than for transaction processing.
A data warehouse stores data that is extracted from data stores and external sources. The data records within the warehouse must contain details to make it searchable and useful to business users. Taken together, there are three main components of data warehousing:
• Data sources from operational systems, such as Excel, ERP, CRM or financial applications;
• A data staging area where data is cleaned and ordered; and
• A presentation area where data is warehoused
Data analysis tools, such as business intelligence software, access the data within the warehouse. Data warehouses can also feed data marts, which are decentralized systems in which data from the warehouse is organized and made available to specific business groups, such as sales or inventory teams.
In addition, Hadoop has become an important extension of data warehouses for many enterprises because the data processing platform can improve components of the data warehouse architecture -- from data ingestion to analytics processing to data archiving.
Data warehouses can benefit organizations from an both IT and a business perspective. Separating the analytical processes from the operational processes can enhance the operational systems and enable business users to access and query relevant data faster from multiple sources. In addition, data warehouses can offer enhanced data quality and consistency, thereby improving business intelligence.
The term Business Intelligence (BI) refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information. The purpose of Business Intelligence is to support better business decision making. Essentially, Business Intelligence systems are data-driven Decision Support Systems (DSS). Business Intelligence is sometimes used interchangeably with briefing books, report and query tools and executive information systems.
Business Intelligence systems provide historical, current, and predictive views of business operations, most often using data that has been gathered into a data warehouse or a data mart and occasionally working from operational data. Software elements support reporting, interactive “slice-and-dice” pivot-table analyses, visualization, and statistical data mining. Applications tackle sales, production, financial, and many other sources of business data for purposes that include business performance management. Information is often gathered about other companies in the same industry which is known as benchmarking.
Currently organizations are starting to see that data and content should not be considered separate aspects of information management, but instead should be managed in an integrated enterprise approach. Enterprise information management brings Business Intelligence and Enterprise Content Management together. Currently organizations are moving towards Operational Business Intelligence which is currently under served and uncontested by vendors. Traditionally, Business Intelligence vendors are targeting only top the pyramid but now there is a paradigm shift moving toward taking Business Intelligence to the bottom of the pyramid with a focus of self-service business intelligence.
Self-service business intelligence (SSBI) involves the business systems and data analytics that give business end-users access to an organization’s information without direct IT involvement. Self-service Business intelligence gives end-users the ability to do more with their data without necessarily having technical skills. These solutions are usually created to be flexible and easy-to-use so that end-users can analyze data, make decisions, plan and forecast on their own.
Advanced analytics is a broad category of inquiry that can be used to help drive changes and improvements in business practices. While the traditional analytical tools that comprise basic business intelligence (BI) examine historical data, tools for advanced analytics focus on forecasting future events and behaviors, enabling businesses to conduct what-if analyses to predict the effects of potential changes in business strategies.
Predictive analytics, data mining, big data analytics and machine learning are just some of the analytical categories that fall under the heading of advanced analytics. These technologies are widely used in industries including marketing, healthcare, risk management and economics.
Advanced data analytics is being used across industries to predict future events. Marketing teams use it to predict the likelihood that certain web users will click on a link; healthcare providers use prescriptive analytics to identify patients who might benefit from a specific treatment; and cellular network providers use diagnostic analytics to predict potential network failures, enabling them to do preventative maintenance.
Advanced analytics practices are becoming more widespread as enterprises continue to create new data at a rapid rate. Now that many organizations have access to large stores of data, or big data, they can apply predictive analytics techniques to understand their operations at a deeper level.
The advanced analytics process involves mathematical approaches to interpreting data. Classical statistical methods, as well as newer, more machine-driven techniques, such as deep learning, are used to identify patterns, correlations and groupings in data sets. Based on these, users can make a prediction about future behavior, whether it is which group of web users is most likely to engage with an online ad or profit growth over the next quarter.
In many cases, these complex predictive and prescriptive analyses require a highly skilled data scientist. These professionals have extensive training in mathematics; computer coding languages, like Python and the R language; and experience in a particular line of business.
Advanced analytics has become more common during the era of big data. Predictive analytics models require large amounts of training to identify patterns and correlations before they can make a prediction. The growing amount of data managed by enterprises today opens the door to these advanced analytics techniques.