Multidimensional data representation. General scheme of organizing a data warehouse. Characteristics, types and main differences between OLAP and OLTP technologies. Star and snowflake patterns. Aggregation. OLAP systems Multidimensional data and olap technologies

In the series of articles “Introduction to Databases,” published recently (see ComputerPress No. 3'2000 - 3'2001), we discussed various technologies and software used in creating information systems - desktop and server DBMSs, data design tools , application development tools, as well as Business Intelligence - enterprise-scale data analysis and processing tools, which are currently becoming increasingly popular in the world, including in our country. We note, however, that the issues of using Business Intelligence tools and the technologies used to create applications of this class are not yet sufficiently covered in the domestic literature. In a new series of articles we will try to fill this gap and talk about what the technologies underlying such applications are. As implementation examples, we will mainly use OLAP technologies from Microsoft (mainly Analysis Services in Microsoft SQL Server 2000), but we hope that the bulk of the material will be useful to users of other tools.

The first article in this series is devoted to the basics of OLAP (On-Line Analytical Processing) - a technology for multidimensional data analysis. In it, we will look at the concepts of data warehousing and OLAP, the requirements for data warehousing and OLAP tools, the logical organization of OLAP data, and the basic terms and concepts used when discussing multidimensional analysis.

What is a data warehouse

Enterprise-scale information systems, as a rule, contain applications designed for complex multidimensional analysis of data, its dynamics, trends, etc. Such analysis is ultimately intended to support decision making. These systems are often called decision support systems.

It is impossible to make any management decision without having the necessary information, usually quantitative. This requires the creation data warehouses(Data warehouses), that is, the process of collecting, screening and pre-processing data in order to provide the resulting information to users for statistical analysis (and often creating analytical reports).

Ralph Kimball, one of the originators of the data warehouse concept, described a data warehouse as “a place where people can access their data” (see, for example, Ralph Kimball, “The Data Warehouse Toolkit: Practical Techniques for Building Dimensions Data Warehouses", John Wiley & Sons, 1996 and "The Data Webhouse Toolkit: Building the Web-Enabled Data Warehouse", John Wiley & Sons, 2000). He also formulated the basic requirements for data warehouses:

  • high speed support receiving data from storage;
  • maintaining internal data consistency;
  • the ability to obtain and compare so-called data slices (slice and dice);
  • availability of convenient utilities for viewing data in the storage;
  • completeness and reliability of stored data;
  • support for a high-quality data replenishment process.

It is often not possible to satisfy all of the above requirements within the same product. Therefore, to implement data warehouses, several products are usually used, some of which are actual data storage tools, others are tools for retrieving and viewing them, others are tools for replenishing them, etc.

A typical data warehouse is typically different from a typical relational database. First, regular databases are designed to help users perform day-to-day work, while data warehouses are designed for decision making. For example, the sale of goods and the issuance of invoices are carried out using a database designed for transaction processing, and the analysis of sales dynamics over several years, which allows planning work with suppliers, is carried out using a data warehouse.

Secondly, conventional databases are subject to constant changes as users work, and the data warehouse is relatively stable: the data in it is usually updated according to a schedule (for example, weekly, daily or hourly - depending on needs). Ideally, the enrichment process is simply adding new data over a period of time without changing previous information already in the warehouse.

And thirdly, regular databases are most often the source of data that ends up in the warehouse. In addition, the repository can be replenished from external sources, such as statistical reports.

What is OLAP

Decision support systems usually have the means to provide the user with aggregate data for various samples from the original set in a form convenient for perception and analysis. Typically, such aggregate functions form a multidimensional (and therefore non-relational) data set (often called a hypercube or metacube), whose axes contain parameters, and whose cells contain aggregate data that depends on them. Along each axis, data can be organized into a hierarchy, representing different levels of detail. Thanks to this data model, users can formulate complex queries, generate reports, obtain data subsets.

The technology for complex multidimensional data analysis is called OLAP (On-Line Analytical Processing). OLAP is a key component of data warehousing. The concept of OLAP was described in 1993 by Edgar Codd, a renowned database researcher and author of the relational data model (see E.F. Codd, S.B. Codd, and C.T. Salley, Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate. Technical report, 1993). In 1995, based on the requirements set out by Codd, the so-called FASMI test (Fast Analysis of Shared Multidimensional Information) was formulated, including the following requirements for applications for multidimensional analysis:

  • providing the user with analysis results for acceptable time(usually no more than 5 s), even at the cost of less detailed analysis;
  • the ability to carry out any logical and statistical analysis characteristic of this application, and saving it in a form accessible to the end user;
  • multi-user access to data with support for appropriate locking mechanisms and authorized access means;
  • multidimensional conceptual representation of data, including full support for hierarchies and multiple hierarchies (this is a key requirement of OLAP);
  • the ability to access any necessary information, regardless of its volume and storage location.

It should be noted that OLAP functionality can be implemented in various ways, from the simplest data analysis tools in office applications to distributed analytical systems based on server products. But before we talk about the various implementations of this functionality, let's look at what they are. OLAP cubes from a logical point of view.

Multidimensional cubes

In this section, we will take a closer look at the concept of OLAP and multidimensional cubes. As an example of a relational database that we will use to illustrate the principles of OLAP, we will use the Northwind database included in the distribution kits Microsoft SQL Server or Microsoft Access and is a typical database storing information about the trading operations of a company engaged in the wholesale supply of food. Such data includes information about suppliers, clients, delivery companies, a list of supplied goods and their categories, data about orders and ordered goods, a list of company employees. Detailed description Northwind databases can be found in the Microsoft SQL Server or Microsoft Access help systems - we do not list it here for lack of space.

To explore the concept of OLAP, we'll use the Invoices view and the Products and Categories tables from the Northwind database to create a query that will result in detailed information about all the goods ordered and invoices issued:

SELECT dbo.Invoices.Country, dbo.Invoices.City, dbo.Invoices.CustomerName, dbo.Invoices.Salesperson, dbo.Invoices.OrderDate, dbo.Categories.CategoryName, dbo.Invoices.ProductName, dbo.Invoices.ShipperName, dbo .Invoices.ExtendedPrice FROM dbo.Products INNER JOIN dbo.Categories ON dbo.Products.CategoryID = dbo.Categories.CategoryID INNER JOIN dbo.Invoices ON dbo.Products.ProductID = dbo.Invoices.ProductID

In Access 2000, a similar query looks like this:

SELECT Invoices.Country, Invoices.City, Invoices.Customers.CompanyName AS CustomerName, Invoices.Salesperson, Invoices.OrderDate, Categories.CategoryName, Invoices.ProductName, Invoices.Shippers.CompanyName AS ShipperName, Invoices.ExtendedPrice FROM Categories INNER JOIN (Invoices INNER JOIN Products ON Invoices.ProductID = Products.ProductID) ON Categories.CategoryID = Products.CategoryID;

This query accesses the Invoices view, which contains information about all invoices issued, as well as the Categories and Products tables, which contain information about the categories of products that were ordered and the products themselves, respectively. The result of this request is a set of order data that includes the category and name of the item ordered, the date the order was placed, the name of the invoicing person, the city, country and company name of the ordering company, as well as the name of the shipping company.

For convenience, let's save this request as a view, calling it Invoices1. The result of accessing this representation is shown in Fig. 1 .

What aggregate data can we get from this view? Typically these are answers to questions like:

  • What is the total value of orders placed by customers from France?
  • What is the total value of orders placed by customers in France and delivered by Speedy Express?
  • What is the total value of orders placed by customers in France in 1997 and delivered by Speedy Express?

Let's translate these questions into queries in SQL (Table 1).

The result of any of the above queries is a number. If in the first query you replace the 'France' parameter with 'Austria' or the name of another country, you can run this query again and get a different number. By performing this procedure with all countries, we get the following data set (a fragment is shown below):

Country SUM (ExtendedPrice)
Argentina 7327.3
Austria 110788.4
Belgium 28491.65
Brazil 97407.74
Canada 46190.1
Denmark 28392.32
Finland 15296.35
France 69185.48
Germany 209373.6

The resulting set of aggregate values ​​(in this case, sums) can be interpreted as a one-dimensional data set. The same data set can also be obtained as a result of a query with a GROUP BY clause of the following form:

SELECT Country, SUM (ExtendedPrice) FROM invoices1 GROUP BY Country

Now let's look at the second query above, which contains two conditions in the WHERE clause. If we run this query, plugging in all possible values ​​for the Country and ShipperName parameters, we will get a two-dimensional data set that looks like this (a snippet is shown below):

ShipperName
Country Federal Shipping Speedy Express United Package
Argentina 1 210.30 1 816.20 5 092.60
Austria 40 870.77 41 004.13 46 128.93
Belgium 11 393.30 4 717.56 17 713.99
Brazil 16 514.56 35 398.14 55 013.08
Canada 19 598.78 5 440.42 25 157.08
Denmark 18 295.30 6 573.97 7 791.74
Finland 4 889.84 5 966.21 7 954.00
France 28 737.23 21 140.18 31 480.90
Germany 53 474.88 94 847.12 81 962.58

Such a data set is called a pivot table or cross table. Many spreadsheets and desktop DBMSs allow you to create such tables - from Paradox for DOS to Microsoft Excel 2000. For example, this is what a similar query looks like in Microsoft Access 2000:

TRANSFORM Sum(Invoices1.ExtendedPrice) AS SumOfExtendedPrice SELECT Invoices1.Country FROM Invoices1 GROUP BY Invoices1.Country PIVOT Invoices1.ShipperName;

Aggregate data for such a pivot table can also be obtained using a regular GROUP BY query:

SELECT Country,ShipperName, SUM (ExtendedPrice) FROM invoices1 GROUP BY COUNTRY,ShipperName Note, however, that the result of this query will not be the pivot table itself, but only a set of aggregate data for its construction (a fragment is shown below):

Country ShipperName SUM (ExtendedPrice)
Argentina Federal Shipping 845.5
Austria Federal Shipping 35696.78
Belgium Federal Shipping 8747.3
Brazil Federal Shipping 13998.26

The third of the queries discussed above already has three parameters in the WHERE condition. By varying them, we obtain a three-dimensional data set (Fig. 2).

Cells of the cube shown in Fig. 2 contain aggregate data corresponding to the values ​​of the query parameters in the WHERE clause located on the cube axes.

You can obtain a set of two-dimensional tables by cutting a cube with planes parallel to its faces (the terms cross-sections and slices are used to denote them).

Obviously, the data contained in the cube cells can also be obtained using an appropriate query with a GROUP BY clause. In addition, some spreadsheets (particularly Microsoft Excel 2000) also allow you to plot a three-dimensional data set and view different cross-sections of the cube parallel to its face as shown on the workbook sheet.

If the WHERE clause contains four or more parameters, the resulting set of values ​​(also called an OLAP cube) can be 4-dimensional, 5-dimensional, etc.

Having looked at what multidimensional OLAP cubes are, let's move on to some key terms and concepts used in multidimensional data analysis.

Some terms and concepts

Along with sums, the cells of an OLAP cube may contain the results of executing other aggregate functions SQL language, such as MIN, MAX, AVG, COUNT, and in some cases, others (variance, standard deviation, etc.). To describe the data values ​​in cells, the term summary is used (in general, there can be several of them in one cube), the term measure is used to denote the source data on the basis of which they are calculated, and the term dimension is used to denote query parameters (translated into Russian usually referred to as "dimension" when talking about OLAP cubes, and as "dimension" when talking about data warehouses). The values ​​plotted on the axes are called dimension members.

When talking about measurements, it is worth mentioning that the values ​​plotted on the axes can have different levels of detail. For example, we may be interested in the total value of orders made by customers in different countries, or the total value of orders made by out-of-town customers or even individual customers. Naturally, the resulting set of aggregate data in the second and third cases will be more detailed than in the first. Note that the ability to obtain aggregate data with varying degrees of detail meets one of the requirements for data warehouses - the requirement for the availability of various data slices for comparison and analysis.

Since in the example considered, in general, each country can have several cities, and a city can have several clients, we can talk about hierarchies of values ​​in dimensions. In this case, countries are located at the first level of the hierarchy, cities are at the second, and clients are at the third (Fig. 3).

Note that hierarchies can be balanced, such as the hierarchy shown in Fig. 3, as well as hierarchies based on date-time and unbalanced data. A typical example of an unbalanced hierarchy is a “superior-subordinate” hierarchy (it can be built, for example, using the values ​​of the Salesperson field of the original data set from the example discussed above), shown in Fig. 4 .

Sometimes the term Parent-child hierarchy is used for such hierarchies.

There are also hierarchies that occupy an intermediate position between balanced and unbalanced (they are designated by the term ragged). They typically contain members whose logical "parents" are not at the immediately superior level (for example, a geographic hierarchy has the levels Country, City, and State, but there are countries in the dataset that have no states or regions between the Country and City levels ; Fig. 5).

Note that unbalanced and “uneven” hierarchies are not supported by all OLAP tools. For example, Microsoft Analysis Services 2000 supports both types of hierarchy, but Microsoft OLAP Services 7.0 supports only balanced ones. The number of hierarchy levels may vary in different OLAP tools, and the maximum valid number members of the same level, and the maximum possible number of dimensions themselves.

Conclusion

In this article we learned the basics of OLAP. We learned the following:

  • The purpose of data warehouses is to provide users with information for statistical analysis and management decision-making.
  • Data warehouses must provide high speed obtaining data, the ability to obtain and compare so-called data slices, as well as the consistency, completeness and reliability of the data.
  • OLAP (On-Line Analytical Processing) is a key component of building and using data warehouses. This technology is based on the construction of multidimensional data sets - OLAP cubes, the axes of which contain parameters, and the cells contain aggregate data that depends on them.
  • Applications with OLAP functionality must provide the user with analysis results in an acceptable time, perform logical and statistical analysis, support multi-user access to data, provide a multi-dimensional conceptual representation of data, and be able to access any necessary information.

In addition, we reviewed the basic principles of the logical organization of OLAP cubes, and also learned the basic terms and concepts used in multidimensional analysis. And finally, we found out what they are Various types hierarchies in OLAP cube dimensions.

In the next article in this series, we will look at the typical structure of data warehouses, talk about what client and server OLAP is, and also focus on some technical aspects of multidimensional data storage.

ComputerPress 4"2001

data warehouses are formed on the basis of snapshots of operational databases recorded over a long period of time information system and possibly various external sources. Data warehouses use database technologies, OLAP, deep data analysis, and data visualization.

Main characteristics of data warehouses.

  • contains historical data;
  • stores detailed information, as well as partially and completely summarized data;
  • the data is mostly static;
  • an ad hoc, unstructured and heuristic way of processing data;
  • medium and low transaction processing intensity;
  • unpredictable way of using data;
  • intended for analysis;
  • focused on subject areas;
  • support for strategic decision making;
  • serves a relatively small number of management employees.

The term OLAP (On-Line Analytical Processing) is used to describe the model for presenting data and, accordingly, the technology for processing it in data warehouses. OLAP uses a multidimensional view of aggregated data to provide quick access to strategically important information for the purpose of in-depth analysis. OLAP applications must have the following basic properties:

  • multidimensional data presentation;
  • support for complex calculations;
  • correct consideration of the time factor.

Advantages of OLAP:

  • promotion productivity production staff, developers application programs. Timely access to strategic information.
  • providing sufficient opportunity for users to make their own changes to the schema.
  • OLAP applications rely on data warehouses and OLTP systems, receiving current data from them, which allows saving integrity control corporate data.
  • reducing the load on OLTP systems and data warehouses.

OLAP and OLTP. Characteristics and main differences

OLAP OLTP
Data store should include both internal corporate data and external data the main source of information entering the operational database is the activities of the corporation, and data analysis requires the involvement of external sources of information (for example, statistical reports)
The volume of analytical databases is at least an order of magnitude larger than the volume of operational ones. to conduct reliable analysis and forecasting in data store you need to have information about the corporation’s activities and market conditions over several years For prompt processing, data for the last few months is required
Data store must contain uniformly presented and consistent information that is as close as possible to the content of operational databases. A component is needed to extract and “clean” information from different sources. In many large corporations, several operational information systems with their own databases simultaneously exist (for historical reasons). Operational databases may contain semantically equivalent information presented in different formats, with different indications of the time of its receipt, sometimes even contradictory
The set of queries to an analytical database cannot be predicted. data warehouses exist to respond to ad hoc requests from analysts. You can only count on the fact that requests will not come too often and will involve large amounts of information. The size of the analytical database encourages the use of queries with aggregates (sum, minimum, maximum, average value etc.) Data processing systems are created to solve specific problems. Information from the database is selected frequently and in small portions. Typically, a set of queries to an operational database is known already during design
When the variability of analytical databases is low (only when loading data), the ordering of arrays, faster indexing methods for mass sampling, and storage of pre-aggregated data turn out to be reasonable Data processing systems by their nature are highly variable, which is taken into account in the DBMS used (normalized database structure, rows stored out of order, B-trees for indexing, transactional)
Analytical database information is so critical for a corporation that greater granularity of protection is required (individual access rights to certain rows and/or columns of the table) For data processing systems it is usually sufficient information protection at table level

Codd's rules for OLAP systems

In 1993, Codd published OLAP for User Analysts: What It Should Be. In it, he outlined the basic concepts of online analytics and defined 12 rules that must be met by products that provide online analytics capabilities.

  1. Conceptual multidimensional representation. An OLAP model must be multidimensional at its core. A multidimensional conceptual diagram or custom representation facilitates modeling and analysis as well as calculations.
  2. Transparency. The user is able to obtain all the necessary data from the OLAP engine, without even knowing where it comes from. Regardless of whether the OLAP product is part of the user's tools or not, this fact should be invisible to the user. If OLAP is provided by client-server computing, then this fact should also, if possible, be invisible to the user. OLAP must be provided in the context of a truly open architecture, allowing the user, wherever he is, to communicate through an analytical tool with the server. In addition to this, transparency should also be achieved when the analytical tool interacts with homogeneous and heterogeneous database environments.
  3. Availability. OLAP must provide its own logic circuit to access in a heterogeneous database environment and perform appropriate transformations to provide data to the user. Moreover, it is necessary to take care in advance about where and how, and what types of physical organization of data will actually be used. An OLAP system should access only the data that is actually required, and not apply general principle“kitchen funnel”, which entails unnecessary input.
  4. Constant performance when developing reports. Performance the ability to generate reports should not drop significantly as the number of dimensions and database size increases.
  5. Client-server architecture. It requires that the product not only be client-server, but also that the server component be intelligent enough to allow different clients to connect with a minimum of effort and programming.
  6. General multidimensionality. All dimensions must be equal, each dimension must be equivalent in both structure and operational capabilities. True, additional operational capabilities for individual dimensions (presumably time is implied), but such additional functionality must be provided to any dimension. It should not be so that basic data structures, computational or reporting formats were more specific to one dimension.
  7. Dynamic Control sparse matrices. OLAP systems must automatically adjust their physical schema depending on the model type, data volumes, and database sparsity.
  8. Multi-user support. An OLAP tool must provide capabilities sharing(query and completion), integrity and security.
  9. Unlimited cross operations. All types of operations must be allowed for any measurements.
  10. Intuitive data manipulation. Data manipulation was carried out through direct actions on cells in viewing mode without using menus and multiple operations.
  11. Flexible reporting options. Dimensions should be placed in the report the way the user needs it.
  12. Unlimited

In 1993, the founder of the relational approach to database construction, Edgar Codd and his partners (Edgar Codd, a mathematician and IBM Fellow), published an article initiated by Arbor Software (today the famous company Hyperion Solutions), entitled "Providing OLAP (online analytical processing) for analytical users", which formulates 12 features OLAP technologies, which were subsequently supplemented by six more. These provisions became the main content of a new and very promising technology.

Main features of OLAP (Basic) technology:

  • multidimensional conceptual representation of data;
  • intuitive data manipulation;
  • availability and detail of data;
  • batch data extraction vs. interpretation;
  • OLAP analysis models;
  • client-server architecture (OLAP accessible from the desktop);
  • transparency (transparent access to external data);
  • multi-user support.

Special Features:

  • processing of unformalized data;
  • saving OLAP results: storing them separately from the source data;
  • exclusion of missing values;
  • Handling missing values.

Report presentation features:

  • flexibility in reporting;
  • standard reporting performance;
  • automatic setting physical layer of data retrieval.

Dimension management:

  • universality of measurements;
  • unlimited number of dimensions and aggregation levels;
  • unlimited number of operations between dimensions.

Historically, today the term "OLAP" implies not only a multidimensional view of data from the end user, but also a multidimensional view of the data in the target database. This is precisely why the terms “Relational OLAP” (ROLAP) and “Multidimensional OLAP” (MOLAP) appeared as independent terms.

OLAP service is a tool for analyzing large volumes of data in real time. By interacting with the OLAP system, the user will be able to flexibly view information, obtain arbitrary data slices and perform analytical operations of drill-down, roll-up, end-to-end distribution, and comparison over time using many parameters simultaneously. All work with the OLAP system occurs in terms of the subject area and allows you to build statistically sound models of the business situation.

Software OLAP is a tool for the operational analysis of data contained in a warehouse. Main feature is that these tools are intended for use by non-specialists in the field information technologies, not an expert statistician, but a professional in the applied field of management - a manager of a department, department, management, and, finally, a director. The tools are designed to allow the analyst to communicate with the problem, not with the computer. In Fig. Figure 6.14 shows a basic OLAP cube that allows you to evaluate data along three dimensions.


A multidimensional OLAP cube and a system of corresponding mathematical algorithms for statistical processing allows you to analyze data of any complexity at any time interval.

Rice. 6.14. Elementary OLAP cube

Having flexible mechanisms for data manipulation and visual display at his disposal (Fig. 6.15, Fig. 6.16), the manager first examines from different angles the data that may (or may not) be related to the problem being solved.

Next, he compares various business indicators with each other, trying to identify hidden relationships; can look at the data more closely, in detail, for example, breaking it down into components by time, region or customer, or, conversely, further generalize the presentation of information to remove distracting details. After this, using the statistical evaluation and simulation module, several options for the development of events are constructed, and the most acceptable option is selected from them.

Rice. 6.15.

A company manager, for example, may have a hypothesis that the spread of asset growth in various branches of the company depends on the ratio of specialists with technical and economic education in them. To test this hypothesis, the manager can request from the warehouse and display on a graph the ratio of interest for those branches whose asset growth in the current quarter decreased by more than 10% compared to last year, and for those which increased by more than 25%. He should be able to use a simple selection from the menu provided. If the results obtained significantly fall into two corresponding groups, then this should become an incentive for further testing of the hypothesis put forward.

Currently, a direction called dynamic modeling (Dynamic Simulation), which fully implements the above-mentioned FASMI principle, has received rapid development.

Using dynamic modeling, the analyst builds a model of a business situation that develops over time, according to a certain scenario. Moreover, the result of such modeling can be several new business situations that generate a tree possible solutions with an assessment of the likelihood and prospects of each.

Rice. 6.16. Analytical IS for data extraction, processing and presentation of information

Table 6.3 shows the comparative characteristics of static and dynamic analysis.

The concept of OLAP technology was formulated by Edgar Codd in 1993.

This technology is based on the construction of multidimensional data sets - so-called OLAP cubes (not necessarily three-dimensional, as one might conclude from the definition). The purpose of using OLAP technologies is to analyze data and present this analysis in a form convenient for management personnel to understand and make decisions based on them.

Basic requirements for applications for multivariate analysis:

  • - providing the user with analysis results in an acceptable time (no more than 5 s);
  • - multi-user access to data;
  • - multidimensional data presentation;
  • - the ability to access any information regardless of its storage location and volume.

OLAP system tools provide the ability to sort and select data according to specified conditions. Various qualitative and quantitative conditions can be specified.

The main data model used in numerous tools ah creation and support of databases - DBMS is a relational model. The data in it is presented in the form of a set of two-dimensional relation tables connected by key fields. To eliminate duplication, inconsistency, and reduce labor costs for maintaining databases, a formal apparatus for normalizing entity tables is used. However, its use is associated with additional time spent on generating responses to database queries, although memory resources are saved.

A multidimensional data model represents the object under study in the form of a multidimensional cube; a three-dimensional model is more often used. Dimensions or attribute details are plotted along the axes or faces of the cube. The base attributes are the filling of the cube cells. A multidimensional cube can be represented by a combination of three-dimensional cubes in order to facilitate perception and presentation when generating reporting and analytical documents and multimedia presentations based on analytical work materials in a decision support system.

Within the framework of OLAP technologies, based on the fact that a multidimensional representation of data can be organized both by means of relational DBMSs and multidimensional specialized tools, three types of multidimensional OLAP systems are distinguished:

  • - multidimensional OLAP-MOLAP;
  • - relational OLAP-ROLAP;
  • - mixed or hybrid (Hibrid) OLAP-HOLAP.

In multidimensional DBMSs, data is organized not in the form of relational tables, but in the form of ordered multidimensional arrays in the form of hypercubes, when all stored data must have the same dimension, which means the need to form the most complete basis of dimensions. Data can be organized in the form of polycubes; in this option, the values ​​of each indicator are stored with its own set of dimensions, and data processing is carried out by the system’s own tool. The storage structure in this case is simplified, because there is no need for a data storage area in a multidimensional or object-oriented form. Huge labor costs for creating models and systems for converting data from a relational model to an object model are reduced.

The advantages of MOLAP are:

  • - faster receipt of responses to requests than with ROLAP - the time spent is one or two orders of magnitude less;
  • - because of SQL restrictions Implementation of many built-in functions becomes difficult.

Limitations of MOLAP include:

  • - relatively small database sizes;
  • - due to denormalization and pre-aggregation multidimensional arrays use 2.5-100 times more memory than the original data (memory consumption increases exponentially as the number of measurements increases);
  • - there are no standards for the interface and means of data manipulation;
  • - there are restrictions when loading data.

Labor costs for creating multidimensional data increase sharply, because... In this situation, there are practically no specialized means of objectifying the relational model of data contained in the information repository. Query response times often cannot meet the requirements for OLAP systems.

The advantages of ROLAP systems are:

  • - the ability to quickly analyze the data directly contained in the warehouse, because most source databases are relational;
  • - with a variable dimension of the problem, RO-LAP wins, because no physical reorganization of the database is required;
  • - ROLAP systems can use less powerful client stations and servers, and the servers bear the main burden of processing complex SQL queries;
  • - the level of information protection and differentiation of access rights in relational DBMSs is incomparably higher than in multidimensional ones.

The disadvantage of ROLAP systems is lower performance, the need for careful development of database schemas, special setting indexes, analysis of query statistics and taking into account the analysis findings when modifying database schemas, which leads to significant additional labor costs.

Fulfillment of these conditions allows, when using ROLAP systems, to achieve indicators similar to MOLAP systems in terms of access time, as well as surpass them in memory savings.

Hybrid OLAP systems are a combination of tools that implement a relational and multidimensional data model. This allows you to dramatically reduce the resource costs of creating and maintaining such a model and the response time to requests.

This approach uses the advantages of the first two approaches and compensates for their disadvantages. This principle is implemented in the most developed software products for this purpose.

The use of hybrid architecture in OLAP systems is the most appropriate way to solve problems associated with the use of software tools in multidimensional analysis.

The pattern detection mode is based on intelligent data processing. The main task here is to identify patterns in the processes under study, relationships and mutual influence of various factors, search for large “unusual” deviations, and predict the course of various significant processes. This area relates to data mining.

The main difference between facts and information is that we receive and take into account data, and we can use information to benefit. Roughly speaking, information is analyzed and systematized data. Thanks to timely information received, many companies manage to survive both the financial crisis and the fiercest competition. It is not enough to collect facts and have all the necessary data. You also need to be able to analyze them. To facilitate the work of people called upon to make important business decisions, various systems support. It is for this purpose that various complex systems have been developed that allow analyzing large amounts of heterogeneous data and turning them into information useful for the business user. New area business analytics aims to improve process management of business systems through the use of data warehouses and technologies.

The market for business information systems today offers a diverse selection of solutions that help an enterprise organize management accounting, ensure operational management of production and sales, and carry out effective interaction with customers and suppliers.

A separate niche in the business systems market is occupied by analytical software products designed to support decision-making at the strategic level of enterprise management. The main difference between such tools and operational management systems is that the latter provide enterprise management in the “functioning mode,” that is, the implementation of a well-defined production program, while strategic-level analytical systems help the enterprise management develop decisions in the “development mode.”

The scale of the changes carried out may vary from deep restructuring to partial updating of technologies at individual production sites, but, in any case, decision makers consider development alternatives on which the fate of the enterprise in the long term depends.

No matter how powerful and developed Information system enterprises, it cannot help in resolving these issues, firstly, because it is configured for stationary, established business processes, and secondly, it does not and cannot contain information for making decisions regarding new areas of business, new technologies , new organizational solutions.

Thanks to OLAP (On-Line Analytical Processing) data processing and analysis technology, any organization can almost instantly (within five seconds) obtain the data necessary for work. OLAP can be briefly defined by five keywords.

FAST (Fast) - this means that the time to search and provide the necessary information takes no more than five seconds. The simplest requests are processed in a second, and few complex requests have a processing time of more than twenty seconds. To achieve this result, use various methods, from special forms of data storage to extensive pre-computations. This way, you can get a report in a minute that previously took days to prepare.

ANALYSIS (Analytical) says that the system can perform any analysis, both statistical and logical, and then stores it in an accessible form.

SHARED means the system provides the required privacy down to the cell level

MULTIDIMENSIONAL (Multidimensional) is the main characteristic of OLAP. The system must fully support hierarchies and multiple hierarchies, since this is the most logical way to analyze both business and organizational activities.

INFORMATION. Necessary information must be delivered where it is needed.

When an organization operates, data related to its field of activity always accumulates, which is sometimes stored in completely different places, and bringing them together is both difficult and time-consuming. It is in order to speed up the acquisition of data to test emerging business hypotheses that the technology of interactive analytical data processing, or OLAP, was developed. The main purpose of such OLAP systems is to quickly respond to arbitrary user requests. This need often arises when developing some important business project, when the developer needs a working hypothesis that has arisen. More often needed by the user information should be presented in the form of some kind of dependence - for example, how sales volume depends on the product category, on the sales region, on the time of year, and so on. Thanks to OLAP, he is able to immediately obtain the necessary data in the desired layout for the selected period.

Interactive OLAP technology allows you to transform huge piles of reports and mountains of data into useful and accurate information that will help an employee make an informed business or financial decision at the right time.

In addition, thanks to OLAP, processing efficiency increases, and the user can receive large volumes of sorted (aggregated) information almost instantly. Thanks to OLAP, the user can clearly see how efficiently his organization operates and has the ability to quickly and flexibly respond to external changes, has the ability to minimize financial losses your organization. OLAP provides accurate information that improves the quality of decisions made.

The only drawback of business analysis systems is their high cost. Creating a personal information storage requires both time and a lot of money.

The use of OLAP technology in business allows you to quickly obtain necessary information, which, at the user’s request, can be presented in the usual form - reports, graphs or tables.

System integration procedures for business structures are based on the use of joint ERP, CRM and SCM solutions. In many cases systems are supplied by different manufacturers, and the imported data must undergo data harmonization and presentation as heterogeneous data. In a business environment, there is an unambiguous requirement - a complete analysis of data, which involves viewing consolidated reports from different points of view.

Different manufacturers have different data reporting mechanisms. The heterogeneous representation procedure involves extract, transform, and load (ETL). For example, in Microsoft SQL Server 2005 Analysis Services, the data consolidation problem is implemented with using Data Source Views – types of data sources that describe analytical presentation models.

Business applications based on OLAP technologies, examples of products. The most common applications of OLAP technologies are:

Data analysis.

The task for which the most popular OLAP tools were originally used and still remain. A multidimensional data model, the ability to analyze significant amounts of data and quick response to requests make such systems indispensable for analyzing sales, marketing activities, distribution and other tasks with a large volume of source data.

Examples of products: Microsoft Excel Pivot Tables, Microsoft Analysis Services, SAP BW, Oracle Essbase, Oracle OLAP, Cognos PowerPlay, MicroStrategy, Business Objects.

Financial planning-budgeting.

A multidimensional model allows you to simultaneously enter data and easily analyze it (for example, plan fact analysis). Therefore a number modern products The CPM (Corporate Performance Management) class uses OLAP% models. An important task is multidimensional reverse calculation (backsolve, breakback, writeback), which allows you to calculate the required changes in detailed cells when the aggregated value changes. It is a tool for what-if analysis, i.e. for playing various options for events during planning.

Product examples: Microsoft PerformancePint, Oracle EPB, Oracle OFA, Oracle Hyperion Planning, SAP SEM, Cognos Enterprise Planning, Geac.

Financial consolidation.

Data consolidation according to international standards accounting, taking into account ownership shares, different currencies and internal turnover - an urgent task in connection with the tightening requirements of inspection bodies (SOX, Basel II) and companies entering IPO. OLAP technologies allow you to speed up the calculation of consolidated reports and increase the transparency of the entire process.

Product examples: Oracle FCH, Oracle Hyperion FM, Cognos Controller.

Data Warehousing and On-Line Analytical Processing (OLAP) technologies
are important elements of business decision support, which are increasingly becoming an integral part of any industry. The use of OLAP technologies as a tool for business analytics gives more control and timely access to strategic
information that facilitates effective decision making.
This provides the opportunity to simulate real-life forecasts and use resources more efficiently. OLAP allows an organization to respond more quickly to market demands.

Bibliography:

1. Erik Thomsen. OLAP Solutions: Building Multidimensional Information Systems Second Edition. Wiley Computer Publishing John Wiley & Sons, Inc., 2002.

2. OLAP council white paper, http://www.olapcouncil.org/research/whtpaply.htm

3. Gerd Stumme and Bernhard Ganter. Formal Concept Analysis _ Mathematical Foundations.