OLAP in financial management. OLAP systems Special features of olap technology include

Purpose course work is the study of OLAP technology, the concept of its implementation and structure.

IN modern world computer networks and computing systems make it possible to analyze and process large amounts of data.

A large amount of information greatly complicates the search for solutions, but makes it possible to obtain much more accurate calculations and analysis. To solve this problem, there is a whole class of information systems that perform analysis. Such systems are called decision support systems (DSS) (Decision Support System).

To perform analysis, the DSS must accumulate information, having means for its input and storage. In total, we can distinguish three main tasks solved in the DSS:

· data input;

· data storage;

· data analysis.

Data entry into the DSS is carried out automatically from sensors characterizing the state of the environment or process, or by a human operator.

If data input is carried out automatically from sensors, then the data is accumulated by a readiness signal that occurs when information appears or by cyclic polling. If the input is carried out by a person, then they must provide users with convenient means for entering data, checking them for correctness of input, as well as performing the necessary calculations.

When entering data simultaneously by several operators, it is necessary to solve the problems of modification and parallel access of the same data.

DSS provides the analyst with data in the form of reports, tables, graphs for study and analysis, which is why such systems provide decision support functions.

Data entry subsystems, called OLTP (On-line transaction processing), implement operational data processing. To implement them, conventional database management systems (DBMS) are used.

The analysis subsystem can be built on the basis of:

· information retrieval analysis subsystems based on relational DBMS and static queries using SQL language;

· operational analysis subsystems. To implement such subsystems, the technology of operational analytical data processing OLAP is used, using the concept of multidimensional data representation;

· intellectual analysis subsystems. This subsystem implements DataMining methods and algorithms.

From the user's point of view, OLAP systems provide tools for flexible viewing of information in various sections, automatic obtaining of aggregated data, performing analytical operations of convolution, drilling down, and comparison over time. Thanks to all this, OLAP systems are a solution with great advantages in the field of data preparation for all types of business reporting, involving the presentation of data in various sections and different levels of hierarchy, such as sales reports, various forms of budgets and others. OLAP systems have great advantages of such a representation in other forms of data analysis, including forecasting.

1.2 Definition OLAP-systems

The technology for complex multidimensional data analysis is called OLAP. OLAP is a key component of a data warehouse organization.

OLAP functionality can be implemented in various ways, both simple ones, such as data analysis in office applications, and more complex ones - distributed analytical systems based on server products.

OLAP (On-LineAnalyticalProcessing) is a technology for operational analytical data processing that uses tools and methods for collecting, storing and analyzing multidimensional data to support decision-making processes.

The main purpose of OLAP systems is to support analytical activities and arbitrary queries of user analysts. The purpose of OLAP analysis is to test emerging hypotheses.

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

Course work

discipline: Databases

Subject: TechnologyOLAP

Completed:

Chizhikov Alexander Alexandrovich

Introduction

1. Classification of OLAP products

2. OLAP client - OLAP server: pros and cons

3. Core OLAP system

3.1 Design principles

Conclusion

List of sources used

Applications

INconducting

Difficult to find in computer world a person who, at least on an intuitive level, did not understand what databases are and why they are needed. Unlike traditional relational DBMSs, the concept of OLAP is not so widely known, although almost everyone has probably heard the mysterious term “OLAP cubes”. What is OnLine Analytical Processing?

OLAP is not a separate software product, not a programming language, or even a specific technology. If we try to cover OLAP in all its manifestations, then it is a set of concepts, principles and requirements that underlie software products that make it easier for analysts to access data. Although no one would disagree with such a definition, it is doubtful that it would bring non-specialists one iota closer to understanding the subject. Therefore, in your quest to understand OLAP, it is better to take a different path. First, we need to find out why analysts need to somehow specifically facilitate access to data.

The fact is that analysts are special consumers of corporate information. The analyst's task is to find patterns in large amounts of data. Therefore, the analyst will not pay attention to a single fact; he needs information about hundreds and thousands of events. By the way, one of the significant points that led to the emergence of OLAP is productivity and efficiency. Let's imagine what happens when an analyst needs to obtain information, but there are no OLAP tools in the enterprise. The analyst independently (which is unlikely) or with the help of a programmer makes the appropriate SQL query and receives the data of interest in the form of a report or exports it to a spreadsheet. A great many problems arise in this case. Firstly, the analyst is forced to do something other than his job (SQL programming) or wait for programmers to complete the task for him - all this negatively affects labor productivity, the rate of heart attack and stroke increases, and so on. Secondly, a single report or table, as a rule, does not save the giants of thought and the fathers of Russian analysis - and the whole procedure will have to be repeated again and again. Thirdly, as we have already found out, analysts do not ask about trifles - they need everything at once. This means (although technology is advancing by leaps and bounds) that the corporate relational DBMS server accessed by the analyst can think deeply and for a long time, blocking other transactions.

The concept of OLAP appeared precisely to solve such problems. OLAP cubes are essentially meta-reports. By cutting meta-reports (cubes, that is) by dimensions, the analyst actually receives the "regular" two-dimensional reports that interest him (these are not necessarily reports in the usual sense of the term - we are talking about data structures with the same functions). The advantages of cubes are obvious - data needs to be requested from a relational DBMS only once - when building a cube. Since analysts, as a rule, do not work with information that is supplemented and changed on the fly, the generated cube is relevant for quite a long time. Thanks to this, interruptions in the operation of the relational DBMS server are not only eliminated (there are no queries with thousands and millions of response lines), but the data access speed for the analyst himself is also dramatically increased. In addition, as already noted, performance is also improved by calculating subsums of hierarchies and other aggregated values at the time of cube construction.

Of course, you have to pay for increasing productivity in this way. It is sometimes said that the data structure simply "explodes" - an OLAP cube can take up tens or even hundreds of times more space than the original data.

Now that we have figured out a little about how OLAP works and what it serves, it’s worth, nevertheless, to formalize our knowledge somewhat and give OLAP criteria without simultaneous translation into ordinary human language. These criteria (12 in total) were formulated in 1993 by E.F. Codd - the creator of the concept of relational DBMS and, concurrently, OLAP. We will not consider them directly, since they were later reworked into the so-called FASMI test, which defines the requirements for OLAP products. FASMI is an abbreviation for the name of each test item:

Fast (fast). This property means that the system should respond to a user request in an average of five seconds; however, most requests are processed within one second, and the most complex requests should be processed within twenty seconds. Recent studies have shown that the user begins to doubt the success of the request if it takes more than thirty seconds.

Analysis (analytical). The system must be able to handle any logical and statistical analysis typical of business applications, and ensure that the results are stored in a form accessible to the end user. Analysis tools may include procedures for analyzing time series, cost distribution, currency conversion, modeling changes in organizational structures, and some others.

Shared. The system should provide ample opportunities for restricting access to data and simultaneous operation of many users.

Multidimensional (multidimensional). The system must provide a conceptually multidimensional view of data, including full support for multiple hierarchies.

Information. The power of various software products is characterized by the amount of input data processed. Different OLAP systems have different capacities: advanced OLAP solutions can handle at least a thousand times more data than the least powerful ones. When choosing an OLAP tool, there are a number of factors to consider, including data duplication, memory requirements, disk space usage, performance metrics, integration with information warehouses, and so on.

1. Classification of OLAP products

So, the essence of OLAP is that the initial information for analysis is presented in the form of a multidimensional cube, and it is possible to arbitrarily manipulate it and obtain the necessary information sections - reports. In this case, the end user sees the cube as a multidimensional dynamic table that automatically summarizes data (facts) in various sections (dimensions), and allows interactive management of calculations and report form. These operations are performed by the OLAP engine (or OLAP calculation engine).

Today, many products have been developed around the world that implement OLAP technologies. To make it easier to navigate among them, classifications of OLAP products are used: by the method of storing data for analysis and by the location of the OLAP machine. Let's take a closer look at each category of OLAP products.

I'll start with a classification based on the method of data storage. Let me remind you that multidimensional cubes are built on the basis of source and aggregate data. Both source and aggregate data for cubes can be stored in both relational and multidimensional databases. Therefore, three methods of data storage are currently used: MOLAP (Multidimensional OLAP), ROLAP (Relational OLAP) and HOLAP (Hybrid OLAP). Accordingly, OLAP products are divided into three similar categories based on the method of data storage:

1.In the case of MOLAP, source and aggregate data are stored in a multidimensional database or in a multidimensional local cube.

2.In ROLAP products, source data is stored in relational databases or in flat local tables on a file server. Aggregate data can be placed in service tables in the same database. Conversion of data from a relational database into multidimensional cubes occurs at the request of an OLAP tool.

3. When using HOLAP architecture, the source data remains in the relational database, and the aggregates are placed in the multidimensional one. An OLAP cube is built at the request of an OLAP tool based on relational and multidimensional data.

The next classification is based on the location of the OLAP machine. Based on this feature, OLAP products are divided into OLAP servers and OLAP clients:

In server OLAP tools, calculations and storage of aggregate data are performed by a separate process - the server. The client application receives only the results of queries against multidimensional cubes that are stored on the server. Some OLAP servers support data storage only in relational databases, some only in multidimensional ones. Many modern OLAP servers support all three data storage methods: MOLAP, ROLAP and HOLAP.

The OLAP client is designed differently. Multidimensional cube building and OLAP calculations are performed in the memory of the client computer. OLAP clients are also divided into ROLAP and MOLAP. And some may support both data access options.

Each of these approaches has its own pros and cons. Contrary to popular belief about the advantages of server tools over client tools, in a number of cases, the use of an OLAP client for users can be more efficient and more profitable than using an OLAP server.

2. OLAP client - OLAP server: pros and cons

When building information system OLAP functionality can be implemented by both server and client OLAP tools. In practice, the choice is the result of a compromise between performance and software cost.

The volume of data is determined by the combination of the following characteristics: number of records, number of dimensions, number of dimension elements, length of dimensions and number of facts. It is known that an OLAP server can process larger volumes of data than an OLAP client with equal computer power. This is because the OLAP server stores a multidimensional database containing precomputed cubes on hard drives.

When performing OLAP operations, client programs execute queries on it in an SQL-like language, receiving not the entire cube, but its displayed fragments. At the time of operation, the OLAP client must have random access memory the whole cube In the case of a ROLAP architecture, it is necessary to first load into memory the entire data array used to calculate the cube. Additionally, as the number of dimensions, facts, or dimension members increases, the number of aggregates grows exponentially. Thus, the amount of data processed by the OLAP client is directly dependent on the amount of RAM on the user's PC.

However, note that most OLAP clients provide distributed computing. Therefore, the number of processed records, which limits the work of the client OLAP tool, is understood not as the volume of primary data in the corporate database, but as the size of the aggregated sample from it. The OLAP client generates a request to the DBMS, which describes the filtering conditions and the algorithm for preliminary grouping of the primary data. The server finds, groups records and returns a compact selection for further OLAP calculations. The size of this sample can be tens or hundreds of times smaller than the volume of primary, non-aggregated records. Consequently, the need for such an OLAP client in PC resources is significantly reduced.

In addition, the number of dimensions is subject to limitations in human perception. It is known that the average person can simultaneously operate with 3-4, maximum 8 dimensions. With a larger number of dimensions in a dynamic table, the perception of information becomes significantly more difficult. This factor should be taken into account when preliminary calculating the RAM that may be required by the OLAP client.

The length of the dimensions also affects the size of the OLAP engine's address space when computing an OLAP cube. The longer the dimensions, the more resources are required to perform presorting multidimensional array, and vice versa. Only short measurements in the source data is another argument in favor of the OLAP client.

This characteristic is determined by the two factors discussed above: the amount of data being processed and the power of computers. With an increase in the number of, for example, dimensions, the performance of all OLAP tools decreases due to a significant increase in the number of aggregates, but at the same time, the rate of decrease is different. Let's demonstrate this dependence on a graph.

Diagram 1. Dependence of the performance of client and server OLAP tools on data growth

The performance characteristics of an OLAP server are less sensitive to data growth. This is due to different technologies for processing user requests by the OLAP server and the OLAP client. For example, during a drill operation, the OLAP server accesses the stored data and "pulls" the data of this "branch". The OLAP client calculates the entire set of aggregates at the time of loading. However, up to a certain amount of data, the performance of server and client tools is comparable. For OLAP clients that support distributed computing, the scope of performance comparability can extend to data volumes that cover the OLAP analysis needs of a huge number of users. This is confirmed by the results of internal testing of MS OLAP Server and the OLAP client "Kontur Standard". The test was performed on an IBM PC Pentium Celeron 400 MHz, 256 Mb for a sample of 1 million unique (i.e., aggregated) records with 7 dimensions containing from 10 to 70 members. The cube loading time in both cases does not exceed 1 second, and various OLAP operations (drill up, drill down, move, filter, etc.) are completed in hundredths of a second.

When the sample size exceeds the amount of RAM, swapping with the disk begins and the performance of the OLAP client drops sharply. Only from this moment can we talk about the advantage of the OLAP server.

It should be remembered that the “breaking point” determines the limit of a sharp increase in the cost of an OLAP solution. For the tasks of each specific user, this point is easily determined by performance tests of the OLAP client. Such tests can be obtained from the development company.

In addition, the cost of a server OLAP solution increases as the number of users increases. The fact is that the OLAP server performs calculations for all users on one computer. Accordingly, than more quantity users, the more RAM and processing power. Thus, if the volumes of processed data lie in the area of comparable performance of server and client systems, then, other things being equal, the use of an OLAP client will be more profitable.

Using an OLAP server in the "classic" ideology involves uploading data from relational DBMSs to a multidimensional database. The upload is performed for a certain period, so the OLAP server data does not reflect the state at the current moment. Only those OLAP servers that support the ROLAP mode of operation are deprived of this shortcoming.

Similarly, a number of OLAP clients allow for ROLAP and Desktop architectures with direct database access. This ensures on-line analysis of source data.

The OLAP server presents minimum requirements to the power of client terminals. Objectively, the requirements of an OLAP client are higher, because... it performs calculations in the user's PC RAM. The state of a particular organization's hardware fleet is the most important indicator that must be taken into account when choosing an OLAP tool. But there are also “pros” and “cons” here. An OLAP server does not use the enormous computing power of modern personal computers. If an organization already has a fleet of modern PCs, it is ineffective to use them only as display terminals and at the same time incur additional costs for the central server.

If the power of the users' computers "leaves much to be desired," the OLAP client will work slowly or not be able to work at all. Buying one powerful server may be cheaper than upgrading all your PCs.

Here it is useful to take into account trends in hardware development. Since the volume of data for analysis is practically constant, a steady increase in PC power will lead to an expansion of the capabilities of OLAP clients and their displacement of OLAP servers into the segment of very large databases.

When using an OLAP server over the network, only the data to be displayed is transferred to the client's PC, while the OLAP client receives the entire volume of primary data.

Therefore, where an OLAP client is used, network traffic will be higher.

But, when using an OLAP server, user operations, for example, detailing, generate new queries to the multidimensional database, and, therefore, new data transfer. The execution of OLAP operations by an OLAP client is performed in RAM and, accordingly, does not cause new data flows in the network.

It should also be noted that modern network hardware provides high level bandwidth.

Therefore, in the vast majority of cases, analyzing a “medium” sized database using an OLAP client will not slow down the user’s work.

The cost of an OLAP server is quite high. This should also include the cost of a dedicated computer and the ongoing costs of administering a multidimensional database. In addition, the implementation and maintenance of an OLAP server requires fairly highly qualified personnel.

The cost of an OLAP client is an order of magnitude lower than the cost of an OLAP server. No administration or additional technical equipment is required for the server. There are no high requirements for personnel qualifications when implementing an OLAP client. An OLAP client can be implemented much faster than an OLAP server.

Development of analytical applications using client OLAP tools is a fast process and does not require special training. A user who knows the physical implementation of the database can develop an analytical application independently, without the involvement of an IT specialist. When using an OLAP server, you need to learn 2 different systems, sometimes from different vendors - to create cubes on the server, and to develop a client application. The OLAP client provides a single visual interface for describing cubes and setting up user interfaces for them.

Let's walk through the process of creating an OLAP application using the client tool.

Diagram 2. Creating an OLAP application using a ROLAP client tool

The operating principle of ROLAP clients is a preliminary description of the semantic layer, behind which the physical structure of the source data is hidden. In this case, data sources can be: local tables, RDBMS. The list of supported data sources is determined by the specific software product. After this, the user can independently manipulate objects that he understands in terms of the subject area to create cubes and analytical interfaces.

The operating principle of the OLAP server client is different. In an OLAP server, when creating cubes, the user manipulates the physical descriptions of the database.

At the same time, custom descriptions are created in the cube itself. The OLAP server client is configured only for the cube.

Let us explain the principle of operation of the ROLAP client using the example of creating a dynamic sales report (see Diagram 2). Let the initial data for analysis be stored in two tables: Sales and Deal.

When creating a semantic layer, the data sources - the Sales and Deal tables - are described in terms understandable to the end user and turn into "Products" and "Deals". The "ID" field from the "Products" table is renamed to "Code", and the "Name" to "Product", etc.

Then the Sales business object is created. A business object is a flat table on the basis of which a multidimensional cube is formed. When creating a business object, the tables "Products" and "Deals" are combined by the field "Code" of the product. Because the report does not need to display all the fields in the tables, the business object only uses the Item, Date, and Amount fields.

Next, an OLAP report is created based on the business object. The user selects a business object and drags its attributes into the columns or rows area of the report table. In our example, based on the "Sales" business object, a report was created on sales of goods by months.

When working with an interactive report, the user can set filtering and grouping conditions with the same simple mouse movements. At this point, the ROLAP client accesses the data in the cache. The OLAP server client generates a new query to the multidimensional database. For example, by applying a filter by product in a sales report, you can get a report on sales of products that interest us.

All OLAP application settings can be stored in a dedicated metadata repository, in the application, or in a multidimensional database system repository. Implementation depends on the specific software product.

So, in what cases can using an OLAP client be more effective and profitable for users than using an OLAP server?

The economic feasibility of using an OLAP server arises when the volumes of data are very large and overwhelming for the OLAP client, otherwise the use of the latter is more justified. In this case, the OLAP client combines high performance characteristics and low cost.

Powerful PCs for analysts are another argument in favor of OLAP clients. When using an OLAP server, these capacities are not used. Among the advantages of OLAP clients are the following:

The costs of implementing and maintaining an OLAP client are significantly lower than the costs of an OLAP server.

When using an OLAP client with an embedded machine, data is transferred over the network once. When performing OLAP operations, no new data streams are generated.

Setting up ROLAP clients is simplified by eliminating the intermediate step - creating a multidimensional database.

3. Core OLAP system

3.1 Design principles

application client core data

From what has already been said, it is clear that the OLAP mechanism is one of the popular methods of data analysis today. There are two main approaches to solving this problem. The first of them is called Multidimensional OLAP (MOLAP) - the implementation of the mechanism using a multidimensional database on the server side, and the second Relational OLAP (ROLAP) - building cubes "on the fly" on the SQL based queries to a relational DBMS. Each of these approaches has its pros and cons. Their comparative analysis is outside the scope of this work. Only the core implementation of the desktop ROLAP module will be described here.

Such a task arose after the application of the ROLAP system, built on the basis of the Decision Cube components that are part of Borland Delphi. Unfortunately, the use of this set of components showed poor performance on large amounts of data. This problem can be reduced by trying to cut off as much data as possible before submitting it to building cubes. But this is not always enough.

On the Internet and in the press, you can find a lot of information about OLAP systems, but almost nowhere is it said about how it works inside.

Scheme of work:

The general scheme of the desktop OLAP system can be represented as follows:

Scheme 3. Work of a desktop OLAP system

The operating algorithm is as follows:

1. Obtaining data in the form of a flat table or the result of an SQL query.

2. Caching data and converting them to a multidimensional cube.

3. Displaying the constructed cube using a cross-tab or chart, etc. In general, an arbitrary number of mappings can be connected to one cube.

Consider how such a system can be arranged internally. Let's start from the side that you can see and feel, that is, from the mappings. Displays used in OLAP systems are most often of two types - cross tables and charts. Consider the crosstab, which is the primary and most common way to display a cube.

In the figure below, rows and columns containing aggregated results are shown in yellow, cells containing facts are marked in light gray, and cells containing dimension data are marked in dark gray.

So the table can be divided into the following elements with which we will work in the future:

Filling in the matrix with facts, we must proceed as follows:

Based on the measurement data, determine the coordinates of the added element in the matrix.

Determine the coordinates of the columns and rows of the totals that are affected by the added element.

Add an element to the matrix and the corresponding total columns and rows.

It should be noted that the resulting matrix will be very sparse, which is why its organization in the form of a two-dimensional array (the option lying on the surface) is not only irrational, but, most likely, impossible due to the large dimension of this matrix, for storing which there is no No amount of RAM is enough. For example, if our cube contains information about sales for one year, and if it has only 3 dimensions - Customers (250), Products (500) and Date (365), then we will get a fact matrix of the following dimensions: number of elements = 250 x 500 x 365 = 45,625,000. And this despite the fact that there may be only a few thousand filled elements in the matrix. Moreover, the greater the number of dimensions, the more sparse the matrix will be.

Therefore, to work with this matrix, you need to use special mechanisms for working with sparse matrices. Various options for organizing a sparse matrix are possible. They are quite well described in the programming literature, for example, in the first volume of the classic book "The Art of Programming" by Donald Knuth.

Let us now consider how we can determine the coordinates of a fact, knowing the dimensions corresponding to it. To do this, let's take a closer look at the header structure:

In this case, you can easily find a way to determine the numbers of the corresponding cell and the totals in which it falls. Several approaches can be proposed here. One is to use a tree to find matching cells. This tree can be built by passing through the sample. In addition, an analytical recurrence formula can be easily defined to calculate the required coordinate.

The data stored in the table needs to be converted in order to be used. So, in order to improve performance when building a hypercube, it is desirable to find unique elements stored in columns that are dimensions of the cube. In addition, you can pre-aggregate facts for records that have the same dimension values. As mentioned above, the unique values available in the dimension fields are important for us. Then the following structure can be proposed for storing them:

Diagram 4. Storage structure unique values

By using this structure, we significantly reduce the need for memory. Which is quite relevant, because. To increase the speed of work, it is desirable to store data in RAM. In addition, you can store only an array of elements, and upload their values to disk, since we will need them only when displaying the crosstab.

The ideas described above were the basis for creating the CubeBase component library.

Scheme 5. The structure of the CubeBase component library

TСubeSource performs caching and data conversion into an internal format, as well as preliminary data aggregation. The TСubeEngine component calculates the hypercube and performs operations with it. In fact, it is an OLAP machine that converts a flat table into a multidimensional data set. The TCubeGrid component displays the crosstab and controls the display of the hypercube. TСubeChart allows you to see the hypercube in the form of graphs, and the TСubePivote component controls the operation of the cube core.

So, I have considered the architecture and interaction of components that can be used to build an OLAP machine. Now let's take a closer look internal organization components.

The first stage of the system will be loading the data and converting it into an internal format. The question will be logical - why is this necessary, because you can simply use data from a flat table, viewing it when building a cube slice. In order to answer this question, let's consider the table structure from the point of view of an OLAP machine. For an OLAP system, table columns can be either facts or dimensions. In this case, the logic of working with these columns will be different. In a hypercube, the dimensions are actually the axes, and the values of the dimensions are the coordinates on those axes. In this case, the cube will be filled very unevenly - there will be combinations of coordinates that will not correspond to any records and there will be combinations that correspond to several records in the source table, and the first situation is more common, that is, the cube will look like the universe - empty space, in separate places which clusters of points (facts) occur. Thus, if we bootstrap If we pre-aggregate the data, that is, we combine records that have the same measurement values, while calculating preliminary aggregated fact values, then in the future we will have to work with fewer records, which will increase the speed of work and reduce the requirements for RAM.

To build slices of a hypercube, we need the following capabilities - defining coordinates (actually measurement values) for table records, as well as defining records that have specific coordinates (measurement values). Let's consider how these possibilities can be realized. The easiest way to store a hypercube is to use a database of its own internal format.

Schematically, the transformations can be represented as follows:

Figure 6: Converting an Internal Format Database to a Normalized Database

That is, instead of one table, we got a normalized database. In fact, normalization reduces the speed of the system, database specialists may say, and in this they will certainly be right, in the case when we need to get values for dictionary elements (in our case, measurement values). But the thing is that we don’t need these values at all at the stage of constructing the slice. As mentioned above, we are only interested in the coordinates in our hypercube, so we will define the coordinates for the measurement values. The easiest thing to do would be to renumber the element values. In order for the numbering to be unambiguous within one dimension, we first sort the lists of dimension values (dictionaries, in database terms) in alphabetical order. In addition, we will renumber the facts, and the facts are pre-aggregated. We get the following diagram:

Scheme 7. Renumbering the normalized database to determine the coordinates of measurement values

Now it remains only to link the elements of different tables to each other. In relational database theory, this is done using special intermediate tables. It is enough for us to match each entry in the dimension tables with a list whose elements will be the numbers of facts in the formation of which these dimensions were used (that is, to determine all the facts that have the same value of the coordinate described by this dimension). For the facts, respectively, for each record, we put in correspondence the values of the coordinates along which it is located in the hypercube. In the future, the coordinates of a record in a hypercube will be understood as the numbers of the corresponding records in the tables of measurement values. Then for our hypothetical example we get the following set defining the internal representation of the hypercube:

Scheme 8. Internal representation of a hypercube

This will be our internal representation of the hypercube. Since we are not making it for a relational database, we simply use fields of variable length as fields for connecting measurement values (we would not be able to do this in an RDB, since the number of table columns is predetermined there).

We could try to use a set of temporary tables to implement the hypercube, but this method will provide too low performance (for example, a set of Decision Cube components), so we will use our own data storage structures.

To implement a hypercube, we need to use data structures that will ensure maximum performance and minimal RAM consumption. Obviously, our main structures will be for storing dictionaries and fact tables. Let's look at the tasks that a dictionary must perform at maximum speed:

checking the presence of an element in the dictionary;

adding an element to the dictionary;

search for record numbers that have a specific coordinate value;

search for coordinates by measurement value;

searching for a measurement value by its coordinate.

To implement these requirements you can use Various types and data structures. For example, you can use arrays of structures. In a real case, these arrays require additional indexing mechanisms that will increase the speed of loading data and retrieving information.

To optimize the operation of a hypercube, it is necessary to determine which tasks need to be solved as a matter of priority, and by what criteria we need to improve the quality of work. The main thing for us is to increase the speed of the program, while it is desirable that a not very large amount of RAM is required. Increased performance is possible through the introduction of additional mechanisms for accessing data, for example, the introduction of indexing. Unfortunately, this increases the RAM overhead. Therefore, we will determine which operations we need to perform at the highest speed. To do this, consider individual components, implementing a hypercube. These components have two main types - dimension and fact table. For measuring typical task will:

adding a new value;

determining the coordinate based on the measurement value;

determination of value by coordinate.

When adding a new element value, we need to check whether we already have such a value, and if so, then do not add a new one, but use the existing coordinate, otherwise we need to add a new element and determine its coordinate. To do this, you need a way to quickly find the presence of the desired element (in addition, such a problem arises when determining the coordinate by the value of the element). For this purpose, it is optimal to use hashing. In this case, the optimal structure would be to use hash trees in which we will store references to elements. In this case, the elements will be the lines of the dimension dictionary. Then the structure of the measurement value can be represented as follows:

PFactLink = ^TFactLink;

TFactLink = record

FactNo: integer; // fact index in the table

TDimensionRecord = record

Value: string; // measurement value

Index: integer; // coordinate value

FactLink: PFactLink; // pointer to the beginning of the list of fact table elements

And in the hash tree we will store links to unique elements. In addition, we need to solve the problem of inverse transformation - using the coordinate to determine the measurement value. To provide maximum performance you must use direct addressing. Therefore, you can use another array, the index of which is the coordinate of the dimension, and the value is a link to the corresponding entry in the dictionary. However, you can do it easier (and save on memory) if you arrange the array of elements accordingly so that the index of the element is its coordinate.

Organizing an array that implements a list of facts does not present any particular problems due to its simple structure. The only remark would be that it is advisable to calculate all aggregation methods that may be needed and which can be calculated incrementally (for example, sum).

So, we have described a method for storing data in the form of a hypercube. It allows you to generate a set of points in a multidimensional space based on information located in the data warehouse. In order for a person to be able to work with this data, it must be presented in a form convenient for processing. In this case, a pivot table and graphs are used as the main types of data presentation. Moreover, both of these methods are actually projections of a hypercube. In order to ensure maximum efficiency when constructing representations, we will start from what these projections represent. Let's start with the pivot table, as the most important one for data analysis.

Let's find ways to implement such a structure. There are three parts that make up a pivot table: row headers, column headers, and the actual table of aggregated fact values. The most in a simple way The fact table view will use a two-dimensional array, the dimension of which can be determined by constructing the headers. Unfortunately, the simplest method will be the most inefficient, because the table will be very sparse, and memory will be used extremely inefficiently, as a result of which it will be possible to build only very small cubes, since otherwise there may not be enough memory. Thus, we need to select a data structure for storing information that will provide maximum speed searching/adding a new element and at the same time minimal consumption of RAM. This structure will be the so-called sparse matrices, about which you can read in more detail from Knuth. There are various ways to organize the matrix. In order to choose the option that suits us, we will first consider the structure of the table headers.

Headings have a clear hierarchical structure, so it would be natural to assume using a tree to store them. In this case, the structure of a tree node can be schematically depicted as follows:

Appendix C

In this case, it is logical to store a link to the corresponding element of the dimension table of a multidimensional cube as a dimension value. This will reduce memory costs for storing the slice and speed up work. Links are also used as parent and child nodes.

To add an element to a tree, you must have information about its location in the hypercube. As such information, you need to use its coordinate, which is stored in the dictionary of measurement values. Let's consider the scheme for adding an element to the header tree of a pivot table. In this case, we use the values of measurement coordinates as initial information. The order in which these dimensions are listed is determined by the desired aggregation method and matches the hierarchy levels of the header tree. As a result of the work, you need to obtain a list of columns or rows of the pivot table to which you need to add an element.

ApplicationD

We use measurement coordinates as the initial data to determine this structure. In addition, for definiteness, we will assume that we are defining the column of interest to us in the matrix (we will consider how to define a row a little later, since it is more convenient to use other data structures there; the reason for this choice is also see below). As coordinates, we take integers - numbers of measurement values that can be determined as described above.

So, after performing this procedure, we will obtain an array of references to the columns of the sparse matrix. Now you need to perform all the necessary actions with the strings. To do this, you need to find the required element inside each column and add the corresponding value there. For each dimension in the collection, you need to know the number of unique values and the actual set of these values.

Now let's look at the form in which the values inside the columns need to be represented - that is, how to determine the required row. There are several approaches you can use to achieve this. The simplest would be to represent each column as a vector, but since it will be very sparse, memory will be used extremely inefficiently. To avoid this, we will use data structures that will provide greater efficiency in representing sparse one-dimensional arrays (vectors). The simplest of them would be a regular list, singly or doubly linked, but it is uneconomical from the point of view of accessing elements. Therefore, we will use a tree, which will provide faster access to elements.

For example, you could use exactly the same tree as for columns, but then you would have to create your own tree for each column, which would lead to significant memory overhead and processing time. Let's do it a little more cunningly - we'll create one tree to store all combinations of dimensions used in strings, which will be identical to the one described above, but its elements will not be pointers to strings (which do not exist as such), but their indices, and the values of the indices themselves are not of interest to us and are used only as unique keys. We will then use these keys to find the desired element within the column. The columns themselves are most easily represented as a regular binary tree. Graphically, the resulting structure can be represented as follows:

Diagram 9. Image of a pivot table as a binary tree

You can use the same procedure as the procedure described above for determining the pivot table columns to determine the appropriate row numbers. In this case, row numbers are unique within one pivot table and identify elements in vectors that are columns of the pivot table. Most simple option These numbers will be generated by maintaining a counter and incrementing it by one when adding a new element to the row header tree. These column vectors themselves are most easily stored as binary trees, where the row number value is used as the key. In addition, it is also possible to use hash tables. Since the procedures for working with these trees are discussed in detail in other sources, we will not dwell on this and will consider the general scheme for adding an element to a column.

In general, the sequence of actions for adding an element to the matrix can be described as follows:

1. Determine the line numbers to which elements are added

2.Define a set of columns to which elements are added

3. For all columns, find the elements with the required row numbers and add the current element to them (adding includes connecting the required number of fact values and calculating aggregated values, which can be determined incrementally).

After executing this algorithm, we will obtain a matrix, which is a summary table that we needed to build.

Now a few words about filtering when constructing a slice. The easiest way to do this is at the stage of constructing the matrix, since at this stage there is access to all the required fields, and, in addition, aggregation of values is carried out. In this case, when retrieving an entry from the cache, its compliance with the filtering conditions is checked, and if it is not met, the entry is discarded.

Since the structure described above completely describes the pivot table, the task of visualizing it will be trivial. In this case, you can use standard table components that are available in almost all programming tools for Windows.

The first product to perform OLAP queries was Express (IRI). However, the term OLAP itself was coined by Edgar Codd, “the father of relational databases.” And Codd's work was funded by Arbor, the company that launched its own OLAP product, Essbase (later bought by Hyperion, which was taken over by Oracle in 2007) a year earlier. Other well-known OLAP products include Microsoft Analysis Services (formerly called OLAP Services, part SQL Server), Oracle OLAP Option, IBM's DB2 OLAP Server (actually, EssBase with IBM add-ons), SAP BW, products from Brio, BusinessObjects, Cognos, MicroStrategy, and others.

From a technical point of view, the products on the market are divided into "physical OLAP" and "virtual". In the first case, there is a program that performs a preliminary calculation of aggregates, which are then stored in a special multidimensional database that provides fast retrieval. Examples of such products are Microsoft Analysis Services, Oracle OLAP Option, Oracle/Hyperion EssBase, Cognos PowerPlay. In the second case, data is stored in relational DBMSs, while aggregates may not exist at all or be created on the first request in the DBMS or the analytical software cache. Examples of such products are SAP BW, BusinessObjects, Microstrategy. Systems based on "physical OLAP" provide consistently better query response times than "virtual OLAP" systems. Virtual OLAP vendors claim that their products are more scalable in terms of supporting very large amounts of data.

In this work, I would like to take a closer look at the product of BaseGroup Labs - Deductor.

Deductor is an analytics platform, i.e. basis for creating complete application solutions. The technologies implemented in Deductor allow you to go through all the stages of building an analytical system on the basis of a single architecture: from creating a data warehouse to automatically selecting models and visualizing the results obtained.

System composition:

Deductor Studio is the analytical core of the Deductor platform. Deductor Studio includes a full set of mechanisms that allows you to obtain information from an arbitrary data source, carry out the entire processing cycle (cleaning, transforming data, building models), display the results in the most convenient way (OLAP, tables, charts, decision trees...) and export results.

The Deductor Viewer is the end user's workplace. The program allows you to minimize the requirements for personnel, because. all required operations are performed automatically using previously prepared processing scripts; there is no need to think about the method of obtaining data and the mechanisms for processing it. The Deductor Viewer user only needs to select the report of interest.

Deductor Warehouse is a multidimensional cross-platform data warehouse that accumulates all the information necessary for analyzing the subject area. The use of a single repository allows for convenient access, high speed processing, consistency of information, centralized storage and automatic support of the entire data analysis process.

4. Client-Server

Deductor Server is designed for remote analytical processing. It provides the ability to both automatically “run” data through existing scripts on the server and retrain existing models. Using Deductor Server allows you to implement a full-fledged three-tier architecture in which it serves as an application server. Access to the server is provided by the Deductor Client.

Work principles:

1. Import data

Analysis of any information in Deductor starts with data import. As a result of import, the data is brought into a form suitable for subsequent analysis using all the mechanisms available in the program. The nature of the data, format, DBMS, etc. do not matter, because the mechanisms for working with everyone are unified.

2. Data export

The presence of export mechanisms allows you to send the results obtained to third-party applications, for example, transfer a sales forecast to the system to generate a purchase order or post the prepared report on a corporate website.

3. Data processing

Processing in Deductor means any action associated with some kind of data transformation, for example, filtering, model building, cleaning, etc. Actually, in this block the most important actions from the point of view of analysis are performed. The most significant feature of the processing mechanisms implemented in Deductor is that the data obtained as a result of processing can be processed again by any of the methods available to the system. Thus, you can build arbitrarily complex processing scenarios.

4. Visualization

You can visualize data in Deductor Studio (Viewer) at any stage of processing. The system independently determines how it can do this, for example, if a neural network is trained, then in addition to tables and diagrams, you can view the neural network graph. The user needs to select the desired option from the list and configure several parameters.

5. Integration mechanisms

Deductor does not provide data entry tools - the platform is focused solely on analytical processing. To use information stored in heterogeneous systems, flexible import-export mechanisms are provided. Interaction can be organized using batch execution, working in OLE server mode and accessing the Deductor Server.

6. Replication of knowledge

Deductor allows you to implement one of the most important functions of any analytical system - support for the process of knowledge replication, i.e. providing the opportunity for employees who do not understand analysis methods and methods of obtaining a particular result to receive an answer based on models prepared by an expert.

Zconclusion

In this work, we examined such an area of modern information technologies, as data analysis systems. The main tool for analytical information processing - OLAP - technology is analyzed. The essence of the concept of OLAP and the importance of OLAP systems in a modern business process are revealed in detail. The structure and process of operation of a ROLAP server is described in detail. As an example of the implementation of OLAP data technologies, the Deductor analytical platform is given. The submitted documentation has been developed and meets the requirements.

OLAP technologies are a powerful tool for real-time data processing. An OLAP server allows you to organize and present data across various analytical areas and turns data into valuable information that helps companies make more informed decisions.

The use of OLAP systems provides consistently high levels of performance and scalability, supporting multi-gigabyte data volumes that can be accessed by thousands of users. With the help of OLAP technologies, access to information is carried out in real time, i.e. Query processing no longer slows down the analysis process, ensuring its speed and efficiency. Visual administration tools allow you to develop and implement even the most complex analytical applications, making the process simple and fast.

Characteristic	Static analysis	Dynamic analysis
Types of questions	Who? What? How many? How? When? Where?	Why is that? What would happen if...? What happens if…?
Response time	Not regulated	Seconds
Typical data operations	Regulated report, chart, table, drawing	Sequence of interactive reports, diagrams, screen forms. Dynamically changing aggregation levels and data slices
Level of analytical requirements	Average	High
Type of screen forms	Basically predetermined, regulated	User defined, customizable
Data aggregation level	Detailed and summary	User defined
"Age" of data	Historical and current	Historical, current and forecast
Types of requests	Mostly predictable	Unpredictable - from case to case
Purpose	Regulated Analytical Processing	Multi-pass analysis, modeling and forecasting

1. The place of OLAP in the information structure of the enterprise.

The term "OLAP" is inextricably linked with the term "data warehouse" (Data Warehouse).

The data in the warehouse comes from operational systems (OLTP systems), which are designed to automate business processes. In addition, the repository can be replenished from external sources, such as statistical reports.

The purpose of the repository is to provide the “raw material” for analysis in one place and in a simple, understandable structure.

There is one more reason that justifies the appearance of a separate repository - complex analytical queries to operational information slow down current work companies, blocking tables for a long time and seizing server resources.

A repository does not necessarily mean a gigantic accumulation of data - the main thing is that it is convenient for analysis.

Centralization and convenient structuring are not all that an analyst needs. He still needs a tool for viewing and visualizing information. Traditional reports, even those built on a single repository, lack one thing - flexibility. They cannot be "twisted", "expanded" or "collapsed" to get the desired view of the data. If only he had a tool that would allow him to expand and collapse data simply and conveniently! OLAP acts as such a tool.

Although OLAP is not a necessary attribute of a data warehouse, it is increasingly being used to analyze the information accumulated in the warehouse.

The place of OLAP in the information structure of an enterprise (Fig. 1).

Picture 1. PlaceOLAP in the information structure of the enterprise

Operational data is collected from various sources, cleansed, integrated and stored in a relational store. Moreover, they are already available for analysis using various reporting tools. Then the data (in whole or in part) is prepared for OLAP analysis. They can be loaded into a special OLAP database or stored in relational storage. Its most important element is metadata, i.e. information about the structure, placement and transformation of data. Thanks to them, the effective interaction of various storage components is ensured.

Summing up, we can define OLAP as a set of tools for multidimensional analysis of data accumulated in a warehouse.

2. Operational analytical data processing.

The concept of OLAP is based on the principle of multidimensional data representation. In 1993, E. F. Codd considered the shortcomings of the relational model, first of all, pointing out the inability to "combine, view and analyze data in terms of multidimensionality, that is, in the most understandable way for corporate analysts", and identified general requirements for OLAP systems that extend the functionality relational DBMS and including multidimensional analysis as one of its characteristics.

According to Codd, a multi-dimensional conceptual view is a multiple perspective consisting of several independent dimensions along which specific sets of data can be analyzed.

Simultaneous analysis across multiple dimensions is defined as multivariate analysis. Each dimension includes areas of data consolidation, consisting of a series of successive levels of generalization, where each higher level corresponds to a greater degree of data aggregation for the corresponding dimension.

Thus, the Performer dimension can be determined by the direction of consolidation, consisting of the levels of generalization “enterprise - division - department - employee”. The Time dimension can even include two consolidation directions - “year - quarter - month - day” and “week - day”, since counting time by month and by week is incompatible. In this case, it becomes possible to arbitrarily select the desired level of detail of information for each of the dimensions.

The descent operation (drilling down) corresponds to the movement from higher stages of consolidation to lower ones; on the contrary, the lifting operation (rolling up) means movement from lower levels to higher ones (Fig. 2).

Figure 2.Dimensions and directions of data consolidation

3. Requirements for online analytical processing tools.

The multidimensional approach arose almost simultaneously and in parallel with the relational one. However, only starting from the mid-nineties, or rather from
1993, interest in MDBMS began to become widespread. It was this year that a new programmatic article by one of the founders of the relational approach appeared E. Codda, in which he formulated 12 basic requirements for the means of implementation OLAP(Table 1).

Table 1.

	Multidimensional data representation	Tools must support a conceptually multidimensional view of the data.
	Transparency	The user does not need to know what specific tools are used to store and process data, how the data is organized and where it comes from.
	Availability	The tools themselves must select and contact the best data source to generate an answer to a given request. Tools must be able to automatically map their own logic to various heterogeneous data sources.
	Consistent Performance	Performance should be virtually independent of the number of Dimensions in the query.
	Client-server architecture support	The tools must work in a client-server architecture.
	Equality of all dimensions	None of the dimensions should be basic; they should all be equal (symmetrical).
	Dynamic processing of sparse matrices	Undefined values must be stored and handled in the most efficient way possible.
	Support for multi-user mode of working with data	The tools must provide the ability for more than one user to work.
	Support based operations various measurements	All multidimensional operations (such as Aggregation) must be applied uniformly and consistently to any number of any dimensions.
	Ease of data manipulation	The tools must have the most convenient, natural and comfortable user interface.
	Advanced data presentation tools	Tools must support various ways of visualizing (presenting) data.
	Unlimited number of dimensions and levels of data aggregation	There should be no limitation on the number of Dimensions supported.

Rules for evaluating OLAP class software products

The set of these requirements, which served as the actual definition of OLAP, should be considered as a guideline, and specific products should be assessed according to the degree to which they come close to meeting all requirements perfectly.

Codd's definition was later revised into the so-called FASMI test, which requires that the OLAP application provide the ability to quickly analyze shared multidimensional information.

Remembering Codd's 12 Rules is too burdensome for most people. It turned out that it is possible to summarize the OLAP definition with only five keywords: Fast Analysis of Shared Multidimensional Information - or, for short - FASMI (translated from English:F ast A analysis of S hared M ultradimensional I information).

This definition was first formulated in early 1995 and has not needed to be revised since then.

FAST ( Fast ) - means that the system should be able to provide most responses to users within approximately five seconds. At the same time, the simplest requests are processed within one second and very few - more than 20 seconds. Research has shown that end users perceive a process as unsuccessful if results are not obtained after 30 seconds.

At first glance, it may seem surprising that when receiving a report in a minute that not so long ago took days, the user very quickly becomes bored while waiting, and the project turns out to be much less successful than in the case of an instant response, even at the cost of less detailed analysis.

ANALYSISmeans that the system can cope with any logical and statistical analysis characteristic of this application, and ensures its preservation in a form accessible to the end user.

It is not so important whether this analysis is carried out in one's own tools ah vendor or in a related external software product of type spreadsheet, it's just that all the required analysis functionality needs to be provided in an intuitive way for end users. Analysis tools could include certain procedures, such as time series analysis, cost allocation, currency transfers, target searches, modification of multidimensional structures, non-procedural modeling, exception detection, data extraction and other application-dependent operations. Such capabilities vary widely among products, depending on the target orientation.

SHARED means that the system implements all privacy protection requirements (possibly down to the cell level) and, if multiple write access is necessary, ensures that modifications are blocked at the appropriate level. Not all applications require data writeback. However, the number of such applications is growing, and the system must be able to handle multiple modifications in a timely, secure manner.

MULTIDIMENSIONAL (Multidimensional) - this is a key requirement. If we had to define OLAP in one word, we would choose it. The system must provide a multi-dimensional conceptual view of data, including full support for hierarchies and multiple hierarchies, as this is clearly the most logical way to analyze businesses and organizations. There is no minimum number of dimensions that must be processed, as this also depends on the application, and most OLAP products have a sufficient number of dimensions for the markets they target.

INFORMATION - this is all. Necessary information must be obtained where it is needed. However, a lot depends on the application. The power of various products is measured in terms of how much input data they can process, but not how many gigabytes they can store. The power of the products varies widely - the largest OLAP products can handle at least a thousand times more data than the smallest. There are many factors to consider in this regard, including data duplication, RAM requirements, disk space usage, performance metrics, integration with information repositories, etc.

The FASMI test is a reasonable and understandable definition of the goals that OLAP is aimed at achieving.

4. ClassificationOLAP-products.

So, the essence of OLAP lies in the fact that the initial information for analysis is presented in the form of a multidimensional cube, and it is possible to arbitrarily manipulate it and obtain the necessary information sections - reports. In this case, the end user sees the cube as a multidimensional dynamic table that automatically summarizes data (facts) in various sections (dimensions), and allows interactive management of calculations and report form. These operations are carried out OLAP -car (or car OLAP calculations).

To date, many products have been developed in the world that implement OLAP -technologies. To make it easier to navigate among them, classifications are used OLAP -products: by method of storing data for analysis and by location OLAP -cars. Let's take a closer look at each category OLAP products.

Classification by data storage method

Multidimensional cubes are built based on source and aggregate data. Both source and aggregate data for cubes can be stored in both relational and multidimensional databases. Therefore, three methods of data storage are currently used: MOLAP (Multidimensional OLAP), ROLAP (Relational OLAP) and HOLAP (Hybrid OLAP) ). Respectively, OLAP -products according to the method of data storage are divided into three similar categories:

1. In case of MOLAP , source and aggregate data are stored in a multidimensional database or in a multidimensional local cube.

2. In ROLAP -products source data is stored in relational databases or in flat local tables on a file server. Aggregate data can be placed in service tables in the same database. Conversion of data from a relational database into multidimensional cubes occurs upon request OLAP tools.

3. In case of use HOLAP architecture, the original data remains in the relational database, and the aggregates are placed in the multidimensional one. Construction OLAP -cube executed on request OLAP - tools based on relational and multidimensional data.

Classification by location OLAP-cars.

On this basis OLAP -products are divided into OLAP servers and OLAP clients:

· In server OLAP - means of calculation and storage of aggregate data are performed by a separate process - the server. The client application receives only the results of queries against multidimensional cubes that are stored on the server. Some OLAP -servers support data storage only in relational databases, some only in multidimensional ones. Many modern OLAP -servers support all three methods of data storage:MOLAP, ROLAP and HOLAP.

MOLAP.

MOLAP is Multidimensional On-Line Analytical Processing, that is, Multidimensional OLAP.This means that the server uses a multidimensional database (MDB) to store data. The meaning of using MDB is obvious. It can efficiently store data that is multi-dimensional in nature, providing a means of quickly servicing database queries. Data is transferred from a data source to a multidimensional database, and the database is then aggregated. Pre-calculation is what speeds up OLAP queries because the summary data has already been calculated. Query time becomes a function solely of the time required to access a single piece of data and perform the calculation. This method supports the concept that work is done once and the results are then used again and again. Multidimensional databases are a relatively new technology. The use of MDB has the same drawbacks as most new technologies. Namely, they are not as stable as relational databases (RDBs), and are not optimized to the same extent. Another weakness of the MDB is the inability to use most multidimensional databases in the process of data aggregation, so it takes time for new information to become available for analysis.

ROLAP.

ROLAP is Relational On-Line Analytical Processing, that is, Relational OLAP.The term ROLAP means that the OLAP server is based on a relational database. Source data is entered into a relational database, typically in a star or snowflake schema, which helps reduce retrieval time. The server provides a multidimensional data model using optimized SQL queries.

There are a number of reasons for choosing a relational rather than a multidimensional database. RDB is a well-established technology with many opportunities for optimization. Real-world use resulted in a more refined product. In addition, RDBs support larger amounts of data than MDBs. They are just designed for such volumes. The main argument against RDB is the complexity of the queries required to obtain information from a large database with using SQL. An inexperienced SQL programmer could easily burden valuable system resources by trying to execute some similar query, which is much easier to execute in the MDB.

Aggregated/Pre-aggregated data.

Fast query implementation is an imperative for OLAP. This is one of the basic principles of OLAP - the ability to intuitively manipulate data requires rapid retrieval of information. In general, the more calculations that must be made to obtain a piece of information, the slower the response. Therefore, in order to keep query implementation time short, pieces of information that are usually accessed most often, but which also require calculation, are subject to preliminary aggregation. That is, they are counted and then stored in the database as new data. An example of the type of data that can be calculated in advance is summary data - for example, sales figures for months, quarters or years, for which the actual data entered is daily figures.

Different vendors have different methods for selecting parameters, requiring pre-aggregation and the number of pre-calculated values. The aggregation approach affects both the database and query execution time. If more values are being calculated, the likelihood that the user will request a value that has already been calculated increases, and therefore response time will be reduced by not having to request the original value to be calculated. However, if you calculate all possible values - this is not the best solution - in this case the size of the database increases significantly, which will make it unmanageable, and the aggregation time will be too long. In addition, when numerical values are added to the database, or if they change, this information must be reflected in pre-calculated values that depend on the new data. Thus, updating the database can also take a long time in the case of a large number of pre-calculated values. Since the database typically runs offline during aggregation, it is desirable that the aggregation time is not too long.

OLAP - the client is structured differently. Construction of a multidimensional cube and OLAP -calculations are performed in the memory of the client computer.OLAP -clients are also divided into ROLAP and MOLAP.And some may support both data access options.

Each of these approaches has its own pros and cons. Contrary to popular belief about the advantages of server tools over client tools, in a number of cases the use of OLAP -client for users can be more efficient and profitable to use OLAP servers.

When using an OLAP server, you need to learn 2 different systems, sometimes from different vendors - to create cubes on the server, and to develop a client application.

The OLAP client provides a single visual interface for describing cubes and setting up user interfaces for them.

So, in what cases can using an OLAP client be more effective and profitable for users than using an OLAP server?

· Economic feasibility of application OLAP -server occurs when the volumes of data are very large and overwhelming for OLAP -client, otherwise the use of the latter is more justified. In this case OLAP -The client combines high performance characteristics and low cost.

· Powerful PCs for analysts – another argument in favor OLAP -clients. When using OLAP -servers do not use this capacity.

Among the advantages of OLAP clients are the following:

· Implementation and maintenance costs OLAP - the client is significantly lower than the costs for OLAP server.

· Using OLAP - for a client with a built-in machine, data transmission over the network is performed once. By doing OLAP -operations of new data streams are not generated.

5. Operating principles OLAP-clients.

Let's look at the process of creating an OLAP application using a client tool (Figure 1).

Picture 1.Creating an OLAP application using the ROLAP client tool

The operating principle of the OLAP server client is different. In an OLAP server, when creating cubes, the user manipulates the physical descriptions of the database. At the same time, custom descriptions are created in the cube itself. The OLAP server client is configured only for the cube.

When creating a semantic layer, data sources - the Sales and Deal tables - are described in terms that the end user can understand and turn into “Products” and “Deals”. The “ID” field from the “Products” table is renamed to “Code”, and “Name” to “Product”, etc.

Then the Sales business object is created. A business object is a flat table on the basis of which a multidimensional cube is formed. When creating a business object, the “Products” and “Transactions” tables are merged by the “Code” field of the product. Since all table fields are not required for display in the report, the business object uses only the “Item”, “Date” and “Amount” fields.

In our example, based on the “Sales” business object, a report on product sales by month was created.

All OLAP application settings can be stored in a dedicated metadata repository, in the application, or in a multidimensional database system repository.Implementation depends on the specific software product.

Everything that is included in these applications is a standard look at the interface, predefined functions and structure, and quick solutions for more or less standard situations. For example, financial packages are popular. Pre-built financial applications allow professionals to use familiar financial tools without having to design a database structure or conventional forms and reports.

The Internet is a new form of customer. In addition, it bears the stamp of new technologies; a bunch of Internet solutions differ significantly in their capabilities in general and as an OLAP solution in particular. There are many advantages to generating OLAP reports over the Internet. The most significant is the lack of need for specialized software to access information. This saves the company a lot of time and money.

6. Choice of OLAP application architecture.

When implementing an information and analytical system, it is important not to make a mistake in choosing the architecture of an OLAP application. The literal translation of the term On-Line Analytical Process - “online analytical processing” - is often taken literally in the sense that the data entering the system is quickly analyzed. This is a misconception - the efficiency of the analysis is in no way connected with real time updating data in the system. This characteristic refers to the response time of the OLAP system to user requests. At the same time, the analyzed data often represents a snapshot of information “as of yesterday” if, for example, the data in the warehouses is updated once a day.

In this context, the translation of OLAP as “interactive analytical processing” is more accurate. It is the ability to analyze data in an interactive mode that distinguishes OLAP systems from systems for preparing regulated reports.

Another feature of interactive processing in the formulation of the founder of OLAP E. Codd is the ability to “combine, view and analyze data from the point of view of multiple dimensions, i.e., in the most understandable way for corporate analysts.” Codd himself uses the term OLAP to refer exclusively to a specific way of presenting data at a conceptual level - multidimensional. At the physical level, data can be stored in relational databases, but in reality, OLAP tools tend to work with multidimensional databases in which the data is organized in the form of a hypercube (Figure 1).

Picture 1. OLAP– cube (hypercube, metacube)

At the same time, the relevance of these data is determined by the moment the hypercube is filled with new data.

It is obvious that the time of formation of a multidimensional database significantly depends on the amount of data loaded into it, so it is reasonable to limit this amount. But how not to narrow down the possibilities of analysis and deprive the user of access to all the information of interest? There are two alternative paths: Analyze then query (“Analyze first - then request additional information”) and Query then analyze (“First query the data - then analyze”).

Followers of the first path suggest loading generalized information into a multidimensional database, for example, monthly, quarterly, annual results for departments. And if it is necessary to refine the data, the user is prompted to generate a report on a relational database containing the required selection, for example, by days for a given department or by months and employees of a selected department.

Proponents of the second way, on the contrary, suggest that the user, first of all, decide on the data that he is going to analyze and load it into a microcube - a small multidimensional database. Both approaches differ at the conceptual level and have their advantages and disadvantages.

The advantages of the second approach include the "freshness" of information that the user receives in the form of a multidimensional report - "microcube". The microcube is formed based on the information just requested from the current relational database. Working with a microcube is carried out in an interactive mode - obtaining slices of information and its detailing within the microcube is carried out instantly. Another positive point is that the design of the structure and filling of the microcube is carried out by the user on the fly, without the participation of the database administrator. However, the approach also suffers from serious shortcomings. The user does not see the big picture and must decide in advance the direction of his research. Otherwise, the requested microcube may be too small and not contain all the data of interest, and the user will have to request a new microcube, then a new one, then another and another. The Query then analyze approach implements the BusinessObjects tool of the company of the same name and the tools of the company's Contour platformIntersoft Lab.

With the Analyze then query approach, the volume of data loaded into a multidimensional database can be quite large; filling must be carried out according to regulations and can take quite a lot of time. However, all these disadvantages pay off later when the user has access to almost all the necessary data in any combination. Access to source data in a relational database is carried out only as a last resort, when detailed information is needed, for example, on a specific invoice.

The operation of a single multidimensional database is practically not affected by the number of users accessing it. They only read the data available there, unlike the Query then analyze approach, in which the number of microcubes in the extreme case can grow at the same rate as the number of users.

This approach increases the load on IT services, which, in addition to relational ones, are also forced to maintain multidimensional databases.These services are responsible for timely automatic update data in multidimensional databases.

The most prominent representatives of the “Analyze then query” approach are the PowerPlay and Impromptu tools from Cognos.

The choice of both the approach and the tool that implements it depends primarily on the goal being pursued: you always have to balance between budget savings and improving the quality of service for end users. It should be taken into account that, in a strategic plan, the creation of information and analytical systems pursues the goals of achieving a competitive advantage, and not avoiding the costs of automation. For example, a corporate information and analytical system can provide necessary, timely and reliable information about a company, the publication of which for potential investors will ensure transparency and predictability of the company, which will inevitably become a condition for its investment attractiveness.

7. Spheres of application of OLAP-technologies.

OLAP is applicable wherever there is a task of analyzing multivariate data. In general, given a data table that has at least one descriptive column (dimension) and one numerical column (measures or facts), an OLAP tool will usually be an effective analysis and reporting tool.

Let's look at some areas of application of OLAP technologies taken from real life.

1. Sales.

Based on the analysis of the sales structure, issues necessary for making management decisions are resolved: on changing the range of goods, prices, closing and opening stores, branches, terminating and signing contracts with dealers, conducting or terminating advertising campaigns, etc.

2. Procurement.

The task is the opposite of sales analysis. Many enterprises purchase components and materials from suppliers. Trade enterprises purchase goods for resale. There are many possible tasks when analyzing procurement, from planning Money based on past experience, up to control over managers, choosing suppliers.

3. Prices.

The analysis of market prices is closely related to the analysis of purchases. The purpose of this analysis is to optimize costs and select the most profitable offers.

4. Marketing.

By marketing analysis we mean only the area of analysis of buyers or clients-consumers of services. The task of this analysis is the correct positioning of the goods, the identification of groups of buyers for targeted advertising, and the optimization of the assortment. The task of OLAP in this case is to give the user a tool to quickly, at the speed of thought, get answers to questions that intuitively arise in the course of data analysis.

5. Warehouse.

Analysis of the structure of warehouse balances by type of goods, warehouses, analysis of shelf life of goods, analysis of shipments by recipient and many other types of analysis that are important for the enterprise are possible if the organization has warehouse accounting.

6. Cash flow.

This is a whole area of analysis that has many schools and methods. OLAP technology can serve as a tool for implementing or improving these techniques, but not as a replacement for them. Cash turnover of non-cash and cash funds is analyzed in terms of business operations, counterparties, currencies and time in order to optimize flows, ensure liquidity, etc. The composition of measurements strongly depends on the characteristics of the business, industry, and methodology.

7. Budget.

One of the most fertile areas of application of OLAP technologies. It is not for nothing that no modern budgeting system is considered complete without the presence of OLAP tools for budget analysis. Most budget reports are easily built on the basis of OLAP systems. At the same time, the reports answer a very wide range of questions: analysis of the structure of expenses and income, comparison of expenses for certain items in different divisions, analysis of the dynamics and trends of expenses for certain items, analysis of costs and profits.

8. Accounts.

A classic balance sheet consisting of an account number and containing incoming balances, turnover and outgoing balances can be perfectly analyzed in an OLAP system. In addition, the OLAP system can automatically and very quickly calculate consolidated balances of a multi-branch organization, balances for the month, quarter and year, aggregated balances by hierarchy of accounts, and analytical balances based on analytical characteristics.

9. Financial reporting.

A technologically constructed reporting system is nothing more than a set of named indicators with date values that need to be grouped and summarized in various sections to obtain specific reports. When this is the case, then displaying and printing reports is most easily and cheaply implemented in OLAP systems. In any case, the enterprise’s internal reporting system is not so conservative and can be restructured in order to save money on engineering works for creating reports and obtaining multi-dimensional operational analysis capabilities.

10. Site traffic.

The Internet server log file is multidimensional in nature, and therefore suitable for OLAP analysis. The facts are: the number of visits, the number of hits, the time spent on the page and other information available in the log.

11. Production volumes.

This is another example of statistical analysis. Thus, it is possible to analyze the volumes of grown potatoes, smelted steel, manufactured goods.

12. Consumption of consumables.

Imagine a plant consisting of dozens of workshops that consume coolants, flushing fluids, oils, rags, sandpaper - hundreds of items of consumables. Accurate planning and cost optimization require a thorough analysis of the actual consumption of consumables.

13. Use of premises.

Another type of statistical analysis. Examples: analysis of the workload of classrooms, rented buildings and premises, the use of conference rooms, etc.

14. Personnel turnover at the enterprise.

Analysis of personnel turnover at the enterprise by branches, departments, professions, level of education, gender, age, time.

15. Passenger transportation.

Analysis of the number of tickets sold and amounts by season, direction, type of carriage (class), type of train (airplane).

This list is not limited to areas of application OLAP - technologies. For example, consider the technology OLAP - analysis in the field of sales.

8. Example of use OLAP -technologies for analysis in the field of sales.

Designing a multidimensional data representation for OLAP -analysis begins with the formation of a measurement map. For example, when analyzing sales, it may be advisable to identify individual parts of the market (developing, stable, large and small consumers, the likelihood of new consumers, etc.) and estimate sales volumes by product, territory, customer, market segment, sales channel and order sizes. These directions form the coordinate grid of a multidimensional representation of sales - the structure of its dimensions.

Since the activities of any enterprise take place over time, the first question that arises during the analysis is the question of the dynamics of business development. The correct organization of the time axis will allow us to qualitatively answer this question. Typically, the time axis is divided into years, quarters and months. Even greater fragmentation into weeks and days is possible. The structure of the time dimension is formed taking into account the frequency of data receipt; may also be determined by the frequency of information demand.

The Product Group dimension is designed to reflect as closely as possible the structure of the products sold. At the same time, it is important to maintain a certain balance in order, on the one hand, to avoid excessive detail (the number of groups should be visible), and on the other, not to miss a significant segment of the market.

The “Customers” dimension reflects the sales structure by territorial and geographical basis. Each dimension can have its own hierarchies, for example, in this dimension it can be the structure: Countries – Regions – Cities – Clients.

To analyze the performance of departments, you should create your own measurement. For example, we can distinguish two levels of hierarchy: departments and the divisions included in them, which should be reflected in the “Divisions” dimension.

In fact, the dimensions “Time”, “Products”, “Customers” quite fully define the space of the subject area.

Additionally, it is useful to divide this space into conditional areas, based on calculated characteristics, for example, ranges of transaction volume in value terms. Then the entire business can be divided into a number of cost ranges in which it is carried out. In this example, we can limit ourselves to the following indicators: the amount of sales of goods, the number of goods sold, the amount of income, the number of transactions, the number of customers, the volume of purchases from manufacturers.

OLAP - the cube for analysis will look like (Fig. 2):

Figure 2.OLAP– cube for analyzing sales volume

It is precisely this three-dimensional array that is called a cube in OLAP terms. In fact, from the point of view of strict mathematics, such an array will not always be a cube: for a real cube, the number of elements in all dimensions must be the same, while OLAP cubes do not have such a limitation. An OLAP cube does not have to be three-dimensional. It can be both two- and multidimensional, depending on the problem being solved. Serious OLAP products are designed for about 20 dimensions. Simpler desktop applications support about 6 dimensions.

Far from all elements of the cube should be filled in: if there is no information about the sales of Product 2 to Customer 3 in the third quarter, the value in the corresponding cell will simply not be determined.

However, the cube itself is not suitable for analysis. If it is still possible to adequately represent or depict a three-dimensional cube, then from six or nineteen-dimensional the situation is much worse. Therefore, before use, ordinary two-dimensional tables are extracted from the multidimensional cube. This operation is called "cutting" the cube. The analyst, as it were, takes and “cuts” the dimensions of the cube according to the marks of interest to him. In this way, the analyst receives a two-dimensional slice of the cube (report) and works with it. The structure of the report is presented in Figure 3.

Figure 3.Analytical report structure

Let's cut our OLAP cube and get a sales report for the third quarter, it will look like this (Fig. 4).

Figure 4Third quarter sales report

You can cut the cube along the other axis and get a report on the sales of product group 2 during the year (Fig. 5).

Figure 5.Quarterly sales report for product 2

Similarly, you can analyze the relationship with client 4, cutting the cube according to the mark Clients(Fig. 6)

Figure 6.Report on deliveries of goods to customer 4

You can detail the report by month or talk about the supply of goods to a specific branch of the client.

Purpose of the report

This report will focus on one of the categories of intelligent technologies that are a convenient analytical tool - OLAP technologies.

The purpose of the report: to reveal and highlight 2 issues: 1) the concept of OLAP and their applied importance in financial management; 2) implementation of OLAP functionality in software solutions: differences, opportunities, advantages, disadvantages.

I would like to note right away that OLAP is a universal tool that can be used in any application area, and not just in finance (as can be understood from the title of the report), which requires data analysis using various methods.

Financial management

Financial management is an area in which analysis is more important than any other. Any financial and management decision arises as a result of certain analytical procedures. Today, financial management is becoming important for the successful functioning of an enterprise. Despite the fact that financial management is an auxiliary process in an enterprise, it requires special attention, since erroneous financial and managerial decisions can lead to large losses.

Financial management is aimed at providing the enterprise with financial resources in the required volumes, at the right time and in the right place in order to obtain the maximum effect from their use through optimal distribution.

It is perhaps difficult to define the level of “maximum resource efficiency”, but in any case,

The CFO should always know:

How many financial resources are available?
Where will the funds come from and in what quantities?
where to invest more effectively and why?
and at what points in time does all this need to be done?
how much is needed to ensure normal operation of the enterprise?

To get reasonable answers to these questions, it is necessary to have, analyze and know how to analyze a sufficiently large number of performance indicators. In addition, FU covers a huge number of areas: cash flow analysis (cash flow), analysis of assets and liabilities, profitability analysis, margin analysis, profitability analysis, assortment analysis.

Knowledge

Therefore, a key factor in the effectiveness of the financial management process is the availability of knowledge:

Personal knowledge in the subject area (one might say theoretical and methodological), including experience, intuition of a financier/finance director
General (corporate) knowledge or systematized information about the facts of financial transactions in the enterprise (i.e. information about the past, present and future state of the enterprise, presented in various indicators and measurements)

If the first lies in the scope of actions of this financier (or the HR director who hired this employee), then the second should be purposefully created at the enterprise by the joint efforts of employees of financial and information services.

What is there now

However, now a paradoxical situation is typical in enterprises: there is information, there is a lot of it, too much. But it is in a chaotic state: unstructured, inconsistent, fragmented, not always reliable and often erroneous, it is almost impossible to find and obtain. A lengthy and often useless generation of mountains of financial statements is produced, which is inconvenient for financial analysis, difficult to perceive, since it is created not for internal management, but for submission to external regulatory authorities.

According to the results of a study conducted by the company Reuters among 1,300 international managers, 38% of those surveyed say they spend a lot of time trying to find the information they need. It turns out that a highly qualified specialist spends highly paid time not on data analysis, but on collecting, searching and systematizing the information necessary for this analysis. At the same time, managers experience a heavy load of data, often irrelevant to the case, which again reduces their efficiency. The reason for this situation: excess information and lack of knowledge.

What to do

Information must be turned into knowledge. For modern business, valuable information, its systematic acquisition, synthesis, exchange, use is a kind of currency, but in order to receive it, it is necessary to manage information, like any business process.

The key to information management is delivering the right information in the right form to stakeholders within the organization at the right time. The goal of such management is to help people work better together using increasing amounts of information.

Information technology in this case acts as a means by which it would be possible to systematize information in an enterprise, provide certain users with access to it and give them the tools to transform this information into knowledge.

Basic concepts of OLAP technologies

OLAP technologies (from the English On-Line Analytical Processing) is the name not of a specific product, but of an entire technology for the operational analysis of multidimensional data accumulated in a warehouse. In order to understand the essence of OLAP, it is necessary to consider the traditional process of obtaining information for decision making.

Traditional decision support system

Here, of course, there can also be many options: complete information chaos or the most typical situation when the enterprise has operational systems with the help of which the facts of certain operations are recorded and stored in databases. To extract data from databases for analytical purposes, a system of queries for specific data samples has been built.

But this method of decision support lacks flexibility and has many disadvantages:

negligible amount of data is used that can be useful for decision making
sometimes complex multi-page reports are created, of which 1-2 lines are actually used (the rest is just in case) - information overload
slow response of the process to changes: if a new data representation is needed, the request must be formally described and coded by the programmer, only then executed. Waiting time: hours, days. Or perhaps a solution is needed now, immediately. But after receiving new information, a new question will arise (clarifying)

If query reports are presented in a one-dimensional format, then business problems are usually multidimensional and multifaceted. If you want to get a clear picture of a company's business, then you need to analyze data from various perspectives.

Many companies create excellent relational databases, ideally organizing mountains of unused information, which in itself does not provide either a quick or sufficiently competent response to market events. YES - relational databases were, are and will be the most suitable technology for storing corporate data. We are not talking about new database technology, but rather about analysis tools that complement the functions of existing DBMSs and are flexible enough to provide and automate the various types of intellectual analysis inherent in OLAP.

Understanding OLAP

What does OLAP provide?

Advanced storage data access tools
Dynamic interactive data manipulation (rotation, consolidation or drill-down)
Clear visual display of data
Fast – analysis is carried out in real time
Multidimensional data presentation - simultaneous analysis of a number of indicators along several dimensions

To get the effect of using OLAP technologies, you must: 1) understand the essence of the technologies themselves and their capabilities; 2) clearly define what processes need to be analyzed, what indicators they will be characterized by and in what dimensions it is advisable to see them, i.e. create an analysis model.

The basic concepts that OLAP technologies operate on are as follows:

Multidimensionality

To understand the multidimensionality of the data, you should first present a table showing, for example, the performance of Enterprise Costs by economic elements and business units.

This data is presented in two dimensions:

article
business unit

This table is not informative, as it shows sales for one specific period of time. For different time periods, analysts will have to compare several tables (for each time period):

The figure shows a 3rd dimension, Time, in addition to the first two. (Article, business unit)

Another way to show multidimensional data is to represent it in the form of a cube:

OLAP cubes allow analysts to obtain data at various slices to obtain answers to questions posed by the business:

Which costs are critical in which business units?
How do business unit costs change over time?
How do cost items change over time?

Answers to such questions are necessary for making management decisions: on the reduction of certain cost items, the impact on their structure, identifying the reasons for changes in costs over time, deviations from the plan and their elimination - optimizing their structure.

In this example, only 3 dimensions are considered. It's difficult to depict more than 3 dimensions, but it works in the same way as with 3 dimensions.

Typically, OLAP applications allow you to obtain data on 3 or more dimensions, for example, you can add one more dimension - Plan-Actual, Cost Category: direct, indirect, by Orders, by Months. Additional dimensions allow you to obtain more analytical slices and provide answers to questions with multiple conditions.

Hierarchy

OLAP also allows analysts to organize each dimension into a hierarchy of groups, subgroups, and totals that reflect the measure across the entire organization—the most logical way to analyze a business.

For example, it is advisable to group costs hierarchically:

OLAP allows analysts to get the data of an overall summary measure (at the highest level) and then drill down to the lowest and subsequent levels, and thus discover the exact reason for the change in the measure.

By allowing analysts to use multiple dimensions in a data cube, with the ability to hierarchically construct dimensions, OLAP provides a picture of the business that is not compressed by the information warehouse structure.

Changing directions of analysis in a cube (rotating data)

As a rule, they operate with concepts: dimensions specified in columns, rows (there may be several), the rest form slices, the contents of the table form dimensions (sales, costs, cash)

Typically, OLAP allows you to change the orientation of cube dimensions, thereby presenting the data in different views.

The display of cube data depends on:

dimension orientations: which dimensions are specified in rows, columns, slices;
groups of indicators, highlighted in rows, columns, sections.
Changing dimensions is within the scope of the user's actions.

Thus, OLAP allows you to carry out different kinds analysis and understand their relationships and results.

Deviation analysis is an analysis of plan implementation, which is supplemented by factor analysis of the causes of deviations by detailing the indicators.
Dependency analysis: OLAP allows you to identify various dependencies between different changes, for example, when beer was removed from the assortment, roach sales were found to fall during the first two months.
Comparison (comparative analysis). Comparison of the results of changing the indicator over time, for a given group of goods, in different regions, etc.
Analysis of dynamics allows us to identify certain trends in changes in indicators over time.

Efficiency: we can say that OLAP is based on the laws of psychology: the ability to process information requests in “real time” - at the pace of the process of analytical comprehension of data by the user.

If a relational database can read about 200 records per second and write 20, then a good OLAP server, using calculated rows and columns, can consolidate 20,000-30,000 cells (equivalent to one record in a relational database) per second.

Visibility: It should be emphasized that OLAP provides advanced means of graphical presentation of data to the end user. Human brain is able to perceive and analyze information that is presented in the form of geometric images, in a volume several orders of magnitude greater than information presented in alphanumeric form. Example: Let's say you need to find a familiar face in one of a hundred photographs. I believe this process will take you no more than a minute. Now imagine that instead of photographs you will be offered a hundred verbal descriptions of the same persons. I think that you will not be able to solve the proposed problem at all.

Simplicity: The main feature of these technologies is that they are intended for use not by a specialist in the field of information technology, not by an expert statistician, but by a professional in the applied field - a credit department manager, a budget department manager, and finally a director. They are designed for the analyst to communicate with the problem, not with the computer..

Despite the great capabilities of OLAP (in addition, the idea is relatively old - the 60s), its actual use is practically never found in our enterprises. Why?

there is no information or the possibilities are not clear
habit of thinking two-dimensionally
price barrier
excessive technological content of articles devoted to OLAP: unusual terms are frightening - OLAP, “data mining and slicing”, “ad hoc queries”, “identification of significant correlations”

Our approach and Western ones to the use of OLAP

In addition, we also have a specific understanding of the application utility of OLAP even while understanding its technological capabilities.

Our and Russian authors of various materials devoted to OLAP express the following opinion regarding the usefulness of OLAP: most perceive OLAP as a tool that allows you to expand and collapse data simply and conveniently, carrying out the manipulations that come to the analyst’s mind during the analysis process. The more “slices” and “sections” of data the analyst sees, the more ideas he has, which, in turn, require more and more “slices” for verification. It is not right.

The Western understanding of the usefulness of OLAP is based on a methodological analysis model that must be incorporated when designing OLAP solutions. The analyst should not play with the OLAP cube and aimlessly change its dimensions and levels of detail, data orientation, graphical display of data (and this really takes!), but clearly understand what views he needs, in what sequence and why (of course, the elements " there may be discoveries here, but it is not fundamental to the usefulness of OLAP).

Applications of OLAP

Budget
Flow of funds

OLAP will allow you to analyze cash inflows and outflows in the context of business operations, counterparties, currencies and time in order to optimize their flows.

Financial and management reporting (with analytics that management needs)
Marketing
Balanced Scorecard
Profitability Analysis

If you have the appropriate data, you can find various applications of OLAP technology.

OLAP products

This section will discuss OLAP as a software solution.

General requirements for OLAP products

There are many ways to implement OLAP applications, so no particular technology should have been required, or even recommended. Under different conditions and circumstances, one approach may be preferable to another. The implementation techniques include many different proprietary ideas that vendors are so proud of: variations of client-server architecture, time series analysis, object orientation, data storage optimization, parallel processes, etc. But these technologies cannot be part of the definition of OLAP.

There are characteristics that must be observed in all OLAP products (if it is an OLAP product), which is the ideal of the technology. These are 5 key definitions that characterize OLAP (the so-called FASMI test): Fast Analysis of Shared Multidimensional Information.

Fast(FAST) means that the system should be able to provide most responses to users within approximately five seconds. Even if the system warns that the process will take significantly longer, users may become distracted and lose their thoughts, and the quality of the analysis will suffer. This speed is not easy to achieve with large amounts of data, especially if special on-the-fly calculations are required. Vendors resort to a wide variety of methods to achieve this goal, including specialized forms of data storage, extensive pre-computing, or increasingly stringent hardware requirements. However, there are currently no fully optimized solutions. At first glance, it may seem surprising that when receiving a report in a minute that not so long ago took days, the user very quickly becomes bored while waiting, and the project turns out to be much less successful than in the case of an instant response, even at the cost of less detailed analysis.
Shared means that the system makes it possible to fulfill all data protection requirements and implement distributed and simultaneous access to data for different levels of users. The system must be able to handle multiple data changes in a timely, secure manner. This is a major weakness of many OLAP products, which tend to assume that all OLAP applications are read-only and provide simplified security controls.
Multidimensional is a key requirement. If you had to define OLAP in one word, you would choose it. The system must provide a multi-dimensional conceptual view of data, including full support for hierarchies and multiple hierarchies, as this determines the most logical way to analyze the business. There is no minimum number of dimensions that must be processed, as this also depends on the application, and most OLAP products have a sufficient number of dimensions for the markets they are aimed at. Again, we do not specify what underlying database technology should be used if the user is to obtain a truly multidimensional conceptual view of the information. This feature is the heart of OLAP
Information. The necessary information must be obtained where it is needed, regardless of its volume and storage location. However, a lot depends on the application. The power of various products is measured in terms of how much input data they can process, but not how many gigabytes they can store. The power of the products varies widely - the largest OLAP products can handle at least a thousand times more data than the smallest. There are many factors to consider in this regard, including data duplication, RAM requirements, disk space usage, performance metrics, integration with information warehouses, etc.
Analysis means that the system can handle any logical and statistical analysis specific to a given application and ensures that it is stored in a form accessible to the end user. The user should be able to define new custom calculations as part of the analysis without the need for programming. That is, all required analysis functionality must be provided in an intuitive way for end users. Analysis tools could include certain procedures, such as time series analysis, cost allocation, currency transfers, target searches, etc. Such capabilities vary widely among products, depending on the target orientation.

In other words, these 5 key definitions are the goals that OLAP products are designed to achieve.

Technological aspects of OLAP

An OLAP system includes certain components. There are various schemes for their operation that this or that product can implement.

Components of OLAP systems (what does an OLAP system consist of?)

Typically, an OLAP system includes the following components:

Data source
The source from which data for analysis is taken (data warehouse, database of operational accounting systems, set of tables, combinations of the above).
OLAP server
Data from the source is transferred or copied to the OLAP server, where it is systematized and prepared for faster generation of responses to queries.
OLAP client
User interface to the OLAP server in which the user operates

It should be noted that not all components are required. There are desktop OLAP systems that allow you to analyze data stored directly on the user's computer and do not require an OLAP server.

However, what element is required is the data source: data availability is an important issue. If they exist, in any form, such as an Excel table, in the accounting system database, or in the form of structured reports from branches, the IT specialist will be able to integrate with the OLAP system directly or with intermediate conversion. OLAP systems have special tools for this. If this data is not available, or it is of insufficient completeness and quality, OLAP will not help. That is, OLAP is only a superstructure over the data, and if there is none, it becomes a useless thing.

Most data for OLAP applications originates in other systems. However, in some applications (for example, planning or budgeting), data can be created directly in OLAP applications. When data comes from other applications, it is usually necessary for the data to be stored in a separate, duplicate form for the OLAP application. Therefore, it is advisable to create data warehouses.

It should be noted that the term “OLAP” is inextricably linked with the term “data warehouse” (Data Warehouse). A data warehouse is a domain-specific, time-based, and immutable collection of data to support management decision-making. Data in the warehouse comes from operational systems (OLTP systems), which are designed to automate business processes; the warehouse can be replenished from external sources, for example, statistical reports.

Despite the fact that they contain obviously redundant information that is already in databases or operating system files, data warehouses are necessary because:

fragmentation of data, storing it in various DBMS formats;
data retrieval performance improves
if in an enterprise all data is stored on a central database server (which is extremely rare), the analyst will probably not understand their complex, sometimes confusing structures
complex analytical queries for operational information slow down the current work of the company, blocking tables for a long time and taking over server resources
ability to clean and harmonize data
it is impossible or very difficult to directly analyze data from operating systems;

The purpose of the repository is to provide the “raw material” for analysis in one place and in a simple, understandable structure. That is, the concept of Data Warehousing is not a concept of data analysis, rather it is a concept of preparing data for analysis. It involves the implementation of a single integrated data source.

OLAP products: architectures

When using OLAP products, two questions are important: how and where keep And process data. Depending on how these two processes are implemented, OLAP architectures are distinguished. There are 3 ways to store data for OLAP and 3 ways to process this data. Many manufacturers offer several options, some try to prove that their approach is the single most prudent one. This is, of course, absurd. However, very few products can operate in more than one mode efficiently.

OLAP data storage options

Storage in this context means keeping data in a constantly updated state.

Relational databases: This is a typical choice if an enterprise stores accounting data in a RDB. In most cases, data should be stored in a denormalized structure (the most suitable is a star schema). A normalized database is not acceptable due to the very low query performance when generating aggregates for OLAP (often the resulting data is stored in aggregate tables).
Database files on client computer(kiosks or data marts): This data can be pre-distributed or created upon request on client computers.

Multidimensional Databases: This assumes that data is stored in a multidimensional database on a server. It can include data extracted and summarized from other systems and relational databases, end-user files, etc. In most cases, multidimensional databases are stored on disk, but some products allow you to use RAM, calculating the most frequently used data on the fly " Very few products based on multidimensional databases allow multiple editing of data; many products allow single editing but multiple readings of data, while others are limited to reading only.

These three storage locations have different storage capabilities, and they are arranged in descending order of capacity. They also have different query performance characteristics: relational databases are much slower than the latter two options.

Options for processing OLAP data

There are 3 of the same data processing options:

Using SQL: This option is, of course, used when storing data in a RDB. However, SQL does not allow multidimensional calculations in a single query, so it requires writing complex SQL queries in order to achieve nothing more than ordinary multidimensional functionality. However, this doesn't stop developers from trying. In most cases, they perform a limited number of relevant calculations in SQL, with results that can be obtained from multidimensional data processing or from the client machine. It is also possible to use RAM that can store data using more than one request: this dramatically improves response.
Multidimensional processing on the client: The client OLAP product does the calculations itself, but such processing is only available if users have relatively powerful PCs.

Server-side multidimensional processing: This is a popular place to perform multidimensional calculations in client-server OLAP applications and is used in many products. Performance is usually high because most of the calculations have already been done. However, this requires a lot of disk space.

Matrix of OLAP architectures

Accordingly, by combining storage/processing options, it is possible to obtain a matrix of OLAP system architectures. Accordingly, theoretically there can be 9 combinations of these methods. However, since 3 of them lack common sense, in reality there are only 6 options for storing and processing OLAP data.

Multidimensional storage options data Options multidimensional data processing	Relational database	Server-side multidimensional database	Client computer
	Cartesis Magnitude
Multidimensional server processing	Crystal Holos (ROLAP mode) IBM DB2 OLAP Server CA EUREKA:Strategy Informix MetaCube Speedware Media/MR Microsoft Analysis Services Oracle Express (ROLAP mode) Pilot Analysis Server Applix iTM1	Crystal Holos Comshare Decision Hyperion Essbase Oracle Express Speedware Media/M Microsoft Analysis Services PowerPlay Enterprise Server Pilot Analysis Server Applix iTM1
Multidimensional processing on the client computer	Oracle Discoverer Informix MetaCube	Dimensional Insight Hyperion Enterprise	Cognos PowerPlay Personal Express iTM1 Perspectives

Since it is storage that determines processing, it is customary to group by storage options, that is:

ROLAP products in sectors 1, 2, 3
Desktop OLAP - in sector 6

MOLAP products – in sectors 4 and 5

HOLAP products (allowing both multidimensional and relational data storage options) – in 2 and 4 (in italics)

Categories of OLAP products

There are more than 40 OLAP vendors, although they cannot all be considered competitors because their capabilities are very different and, in fact, they operate in different market segments. They can be grouped into 4 fundamental categories, the differences between which are based on the following concepts: complex functionality - simple functionality, performance - disk space. It is useful to depict categories in the shape of a square because it clearly shows the relationships between them. The distinctive feature of each category is represented on its side, and the similarities with others are represented on the adjacent sides, therefore, the categories on opposite sides are fundamentally different.

Peculiarities

Advantages

Flaws

Representatives

Applied OLAP

Complete applications with rich functionality. Almost all require a multidimensional database, although some work with a relational one. Many of this category of applications are specialized, such as sales, manufacturing, banking, budgeting, financial consolidation, sales analysis

Possibility of integration with various applications

High level of functionality

High level of flexibility and scalability

Application complexity (user training required)

High price

Hyperion Solutions

Crystal Decisions

Information Builders

The product is based on a non-relational data structure that provides multidimensional storage, processing and presentation of data. During the analysis process, data is selected exclusively from a multidimensional structure. Despite the high level of openness, suppliers persuade buyers to purchase their own tools

High performance (fast calculations of summary indicators and various multidimensional transformations for any of the dimensions). The average response time to an ad hoc analytical query when using a multidimensional database is usually 1-2 orders of magnitude less than in the case of an RDB

High level of openness: a large number of products with which integration is possible

They easily cope with the tasks of including various built-in functions in the information model, conducting specialized analysis by the user, etc.

The need for large disk space to store data (due to redundancy of data that is stored). This is an extremely inefficient use of memory - due to denormalization and pre-executed aggregation, the volume of data in a multidimensional database corresponds to 2.5-100 times less than the volume of the original detailed data. In any case, MOLAP does not allow working with large databases. The real limit is a database of 10-25 gigabytes

The potential for a database “explosion” is an unexpected, sharp, disproportionate increase in its volume

Lack of flexibility when it comes to modifying data structures. Any change in the structure of dimensions almost always requires a complete restructuring of the hypercube

For multidimensional databases, there are currently no uniform standards for the interface, languages for describing and manipulating data

Hyperion (Essbase)

DOLAP (Desktop OLAP)

Client OLAP products that are fairly easy to implement and have a low cost per seat

We are talking about such analytical processing where hypercubes are small, their dimension is small, the needs are modest, and for such analytical processing a personal machine on a desktop is sufficient

The goal of the producers of this market is to automate hundreds and thousands of jobs, but users must perform a fairly simple analysis. Buyers are often encouraged to buy more jobs than necessary

Good integration with databases: multidimensional, relational

Possibility of making complex purchases, which reduces the cost of implementation projects

Ease of use of applications

Very limited functionality (not comparable in this regard with specialized products)

Very limited power (small data volumes, small number of measurements)

Cognos (PowerPlay)

Business Objects

Crystal Decisions

This is the smallest sector of the market.

Detailed data remains where it was originally - in the relational database; some aggregates are stored in the same database in specially created service tables

Capable of handling very large amounts of data (cost-effective storage)

Provide a multi-user mode of operation, including editing mode, and not just reading

Higher level of data protection and good options for differentiating access rights

Frequent changes to the measurement structure are possible (do not require physical reorganization of the database)

Low performance, significantly inferior in terms of response speed to multidimensional ones (response to complex queries is measured in minutes or even hours rather than in seconds). These are better report builders than interactive analytics tools

Complexity of products. Requires significant maintenance costs from information technology specialists. To provide performance comparable to MOLAP, relational systems require careful design of the database schema and configuration of indexes, that is, a lot of effort on the part of database administrators

Expensive to implement

The limitations of SQL remain a reality, which prevents the implementation in RDBMS of many built-in functions that are easily provided in systems based on a multidimensional representation of data

Information Advantage

Informix (MetaCube)

It should be noted that consumers of hybrid products that allow the choice of ROLAP and MOLAP mode, such as Microsoft Analysis Services, Oracle Express, Crystal Holos, IBM DB2 OLAPServer, almost always select MOLAP mode.

Each of the presented categories has its own strengths and weaknesses, there is no one optimal choice. The choice affects 3 important aspects: 1) performance; 2) disk space for data storage; 3) capabilities, functionality and especially the scalability of the OLAP solution. In this case, it is necessary to take into account the volume of data being processed, the power of the equipment, the needs of users and seek a compromise between speed and redundancy of disk space occupied by the database, simplicity and versatility.

Classification of Data Warehouses in accordance with the volume of the target database

Disadvantages of OLAP

Like any technology, OLAP also has its drawbacks: high requirements for hardware, training and knowledge of administrative personnel and end users, high costs for the implementation of the implementation project (both monetary and time, intellectual).

Selecting an OLAP product

Choosing the right OLAP product is difficult, but very important if you want the project to not fail.

As you can see, product differences lie in many areas: functional, architectural, technical. Some products are very limited in settings. Some are created for specialized subject areas: marketing, sales, finance. There are products for general purposes, which do not have an application specific use, which must be quite flexible. As a rule, such products are cheaper than specialized ones, but the implementation costs are higher. The range of OLAP products is very wide - from the simplest tools for building pivot tables and charts that are part of office products, to tools for analyzing data and searching for patterns, which cost tens of thousands of dollars.

As in any other field, in the field of OLAP there cannot be clear guidelines for choosing tools. You can only focus on a number of key points and compare the proposed software capabilities with the needs of the organization. One thing is important: without properly thinking about how you are going to use OLAP tools, you risk creating a major headache for yourself.

During the selection process, there are 2 questions to consider:

assess the needs and capabilities of the enterprise
evaluate the existing offer on the market, development trends are also important

Then compare all this and, in fact, make a choice.

Needs assessment

You can't make a rational product choice without understanding what it will be used for. Many companies want the “best possible product” without a clear understanding of how it should be used.

In order for the project to be successfully implemented, the financial director must, at a minimum, competently formulate his wishes and requirements to the manager and automation service specialists. Many problems arise due to insufficient preparation and awareness for the choice of OLAP; IT specialists and end users experience communication difficulties simply because they manipulate different concepts and terms during conversation and put forward conflicting preferences. There needs to be consistency in goals within the company.

Some factors have already become obvious after reading the overview of OLAP product categories, namely:

Technical aspects

Data sources: corporate data warehouse, OLTP system, table files, relational databases. Possibility of linking OLAP tools with all DBMS used in the organization. As practice shows, the integration of heterogeneous products into a stable operating system is one of the most important issues, and its solution in some cases can be associated with big problems. It is necessary to understand how simply and reliably it is possible to integrate OLAP tools with the DBMS existing in the organization. It is also important to evaluate the possibilities of integration not only with data sources, but also with other applications to which you may need to export data: email, office applications
Variability of data taken into account
Server platform: NT, Unix, AS/400, Linux - but don't insist that OLAP specification products run on questionable or dying platforms you're still using
Client-side and browser standards
Deployable architecture: local network and PC modem connection, high-speed client/server, intranet, extranet, Internet
International Features: Multi-currency support, multi-lingual operations, data sharing, localization, licensing, Windows update

Amounts of input information that are available and that will appear in the future

Users

Area of application: sales/marketing analysis, budgeting/planning, performance analysis, accounting report analysis, qualitative analysis, financial condition, generation of analytical materials (reports)
Number of users and their location, requirements for the division of access rights to data and functions, secrecy (confidentiality) of information
User type: senior management, finance, marketing, HR, sales, production, etc.
User experience. User skill level. Consider providing training. It is very important that the OLAP client application is designed so that users feel confident and can use it effectively.

Key Features: Data Writeback Needs, Distributed Computing, Complex Currency Conversions, Report Printing Needs, Spreadsheet Interface, Application Logic Complexity, Dimensions Required, Analysis Types: Statistical, Goal Search, What-If Analysis

Implementation

Who will be involved in implementation and operation: external consultants, internal IT function or end users
Budget: software, hardware, services, data transmission. Remember that paying for OLAP product licenses is only a small part of the total cost of the project. Implementation and hardware costs may be more than the license fee, and long-term support, operation, and administration costs are almost certainly significantly more. And if you make the wrong decision to buy the wrong product just because it's cheaper, you may end up with a higher overall project cost due to higher maintenance, administration and/or hardware costs for what you'll likely get lower level of business benefits. When estimating total costs, be sure to ask the following questions: How broad are the sources of implementation, training, and support available? Is the potential general fund (employees, contractors, consultants) likely to grow or shrink? How widely can you use your industrial professional experience?

Despite the fact that the cost of analytical systems remains quite high even today, and the methodologies and technologies for implementing such systems are still in their infancy, today the economic effect they provide significantly exceeds the effect of traditional operational systems.

The effect of proper organization, strategic and operational planning of business development is difficult to quantify in advance, but it is obvious that it can exceed the costs of implementing such systems by tens and even hundreds of times. However, one should not be mistaken. The effect is ensured not by the system itself, but by the people working with it. Therefore, declarations like: “a system of Data Warehousing and OLAP technologies will help the manager make the right decisions” are not entirely correct.” Modern analytical systems are not artificial intelligence systems and they can neither help nor hinder decision making. Their goal is to promptly provide the manager with all the information necessary to make a decision in a convenient form. And what information will be requested and what decision will be made based on it depends only on the specific person using it.

All that remains to be said is that these systems can help solve many business problems and can have far-reaching positive effects. It remains to be seen who will be the first to realize the benefits of this approach and be ahead of the others.