the main / Installation and Setup / Basic concepts that operate OLAP technology. OLAP technology

Basic concepts that operate OLAP technology. OLAP technology

The conditions for high competition and the growing dynamics of the external environment dictate increased requirements for enterprise management systems. The development of the theory and practice of management was accompanied by the emergence of new methods, technologies and models focused on improving the efficiency of activity. Methods and models in turn contributed to the emergence of analytical systems. The demand for analytical systems in Russia is high. Most interesting in terms of application of these systems in the financial sector: banks, insurance business, investment companies. The results of the work of analytical systems are required primarily to people whose decisions depends on the development of the company: managers, experts, analysts. Analytical systems allow you to solve consolidation tasks, reporting, optimization and forecasting. To date, it has not been the final classification of analytical systems, as no common system Definitions in terms used in this direction. The information structure of the enterprise can be represented by a sequence of levels, each of which is characterized by its processing and information management method, and has its own function in the management process. Thus, analytical systems will be located hierarchically at different levels of this infrastructure.

Level of transactional systems

Data warehouse level

The level of data showcases

OLAP level - systems

Level of analytical applications

OLAP - Systems - (Online Analytical Processing, Analytical Treatment In the present Time) - are the technology of comprehensive multidimensional data analysis. OLAP - Systems are applicable where there is a task of analyzing multifactor data. There are an effective means of analyzing and generating reports. The above data warehouses, data showcases and OLAP systems refer to business intelligence systems (BUSINESS INTELLIGENCE, BI).

Very often, information and analytical systems created on the direct use of decision-making persons are extremely simple in use, but are rigidly limited in functionality. Such static systems are called in the literature of the information systems of the manager (IPR), or Executive Information Systems (EIS). They contain predefined multiple requests and, being sufficient for everyday review, is unable to respond to all questions to available data that may arise when making decisions. The result of such a system, as a rule, are multi-page reports, after a thorough study of which the analyst appears new series questions. However, each new request, unforeseen when designing such a system, should be formally described formally, encoded by a programmer and is then executed. Waiting time in this case can make hours and days that is not always acceptable. Thus, the external simplicity of static SPPR, for which most of the customers of information and analytical systems are actively fighting, turns on the catastrophic loss of flexibility.

Dynamic SPPRs, on the contrary, are focused on the processing of non-elected (AD HOC) of analysts to data. The most deeply requirements for such systems reviewed E. F. Codd in the article, which posted the beginning of the concept of OLAP. The work of analysts with these systems is the interactive sequence of querying and studying their results.

But dynamic SPPRs can act not only in the field of operational analytical processing (OLAP); Support for making management decisions based on accumulated data can be performed in three basic areas.

Sphere of detailed data. This is the area of \u200b\u200baction of most systems aimed at finding information. In most cases, relational DBMSs are perfectly coping with tasks arising here. The generally accepted standard of manipulation language with relational data is SQL. Information and search engines that provide the end-user interface in the search tasks of detailed information can be used as add-ons both over separate transaction system databases and over common data storage.

Sphere of aggregated indicators. A comprehensive look at the information collected in the data warehouse, its generalization and aggregation, hypercubic representation and multidimensional analysis are tasks of operational analytical data processing systems (OLAP). Here you can or focus on special multidimensional DBMS, or remain within relational technologies. In the second case, pre-aggregated data can be collected in the database of a star-like type, or the information aggregation can be carried out on the fly in the process of scanning detailed tables of the relational database.

Sphere of patterns. Intelligent processing is performed by the methods of intelligent data analysis (Jaad, Data Mining), the main tasks of which are the search for functional and logical patterns in the accumulated information, the construction of models and rules that explain the found anomalies and / or predict the development of some processes.

Operational analytical data processing

The basis of the concept of OLAP lies the principle of multidimensional data presentation. In 1993, the EF Codd article considered the deficiencies of the relational model, first of all specifying the inability to "combine, view and analyze data from the point of view of the multiplicity of measurements, that is, the most understandable for corporate analysts in the way," and identified general requirements for OLAP systems expanding The functionality of relational DBMS and includes multi-dimensional analysis as one of its characteristics.

Classification of OLAP products according to the data representation method.

Currently, a large number of products are present on the market, which to varying degrees provide OLAP functionality. About 30 most famous are listed in the list of the review Web server http://www.olapreport.com/. Providing a multidimensional conceptual representation by user interface To the source database, all OLAP products are divided into three classes by type of source database.

The most first operational analytical processing systems (for example, Essbase ARBOR Software, Oracle's Oracle Express Server Company) belonged to the MOLAP class, that is, they could only work with their own multidimensional databases. They are based on proprietary technologies for multidimensional DBMS and are the most expensive. These systems provide a complete OLAP processing cycle. They either include, in addition to the server component, own integrated client interface, or used to communicate with the user. external programs Work with spreadsheets. To maintain such systems, a special staff is required by installing, accompanied by system, the formation of data views for end users.

The operational analytical data processing systems (ROLAP) provide data stored in the relational base, in multidimensional form, ensuring the transformation of information into a multidimensional model through the intermediate layer of metadata. Rolap systems are well adapted to work with large storage. Like MOLAP systems, they require considerable service costs for information technology professionals and provide multiplayer operation.

Finally, hybrid systems (Hybrid Olap, Holap) are designed to combine advantages and minimize the shortcomings inherent in previous classes. Speedware Media / MR includes this class. According to developers, it combines analytical flexibility and MOLAP response speed with constant access to real data peculiar to ROLAP.

Multidimensional OLAP (MOLAP)

In specialized DBMS based on multidimensional data presentation, the data is not organized in the form of relational tables, but in the form of ordered multidimensional arrays:

1) hypercubes (all the cells stored in the database must have the same dimension, that is, to be in the maximum full measurement basis) or

2) polycubes (each variable is stored with its own set of measurements, and all the associated complexity of processing is shifted to the internal mechanisms of the system).

The use of multidimensional databases in systems of operational analytical processing has the following advantages.

In the case of using multidimensional DBMS, the search and sample of data is carried out much faster than with a multidimensional conceptual look at the relational database, since the multidimensional database is denormalized, contains pre-aggregated indicators and provides optimized access to the requested cells.

Multidimensional DBMS easily cope with the tasks of inclusion in information model diverse built-in functions, whereas objectively existing restrictions sQL Language Make these tasks based on relational DBMS quite complex, and sometimes impossible.

On the other hand, there are significant limitations.

Multidimensional DBMSs do not allow working with large databases. In addition, due to the denormalization and pre-performed aggregation, the amount of data in a multidimensional base, as a rule, corresponds to (by assessing the code) in 2.5-100 times the smaller volume of source detailed data.

Multidimensional DBMSs compared with relational are very inefficiently used external memory. In the overwhelming majority of cases, the information hypercube is strongly rarefied, and since the data is stored in an ordered form, uncertain values \u200b\u200bare deleted only by selecting the optimal sorting order, which allows you to organize data into the maximum continuous groups. But even in this case, the problem is solved only in part. In addition, the sorting procedure is most likely optimal from the point of view of storage, the order of sorting will most likely not coincide with the order that is most often used in queries. Therefore B. real systems You have to search for a compromise between the speed and redundancy of the disk space occupied by the database.

Consequently, the use of multidimensional DBMS is justified only under the following conditions.

The amount of source data for analysis is not too large (no more than a few gigabytes), that is, the data aggregation level is quite high.

The set of information measurements is stable (since any change in their structure almost always requires a complete hypercube restructuring).

The response time of the system for non-elected requests is the most critical parameter.

A wide use of complex built-in functions is required to perform cross-dimensional calculations over the cells of the hypercube, including the possibility of writing user functions.

Relation OLAP (ROLAP)

Direct use of relational databases in systems of operational analytical processing has the following advantages.

In most cases, corporate data warehouses are implemented by means of relational DBMS, and ROLAP tools make it possible to analyze directly above them. In this case, the storage size is not such a critical parameter as in the case of MOLAP.

In the case of a variable dimension of the task, when changes to the measurement structure have to be made quite often, R OLAP Systems With a dynamic representation of dimensions are an optimal solution, since such modifications do not require physical reorganization of the database.

Relational DBMS provide a significantly higher level of data protection and good opportunities Remuneration of access rights.

The main drawback of ROLAP compared to multidimensional DBMS is less performance. To ensure performance comparable to MOLAP, relational systems require a thorough study of the database diagram and index settings, that is, great efforts from the database administrators. Only when using star-shaped schemes, the performance of well-configured relational systems can be approached by the performance of systems based on multidimensional databases.

The concept of OLAP technology was formulated by Edgar Codd in 1993.

This technology is based on the construction of multidimensional data sets - the so-called OLAP cubes (not necessarily three-dimensional, as it would be possible to conclude from the definition). The purpose of using OLAP technologies is the analysis of data and the presentation of this analysis in the form, convenient for the perception of managing personnel and the adoption of solutions based on them.

Basic requirements for applications for multidimensional analysis:

- providing the user to the results of the analysis for an acceptable time (not more than 5 p.);
- multiplayer data access;
- multidimensional data presentation;
- The ability to refer to any information regardless of its place of storage and volume.

OLAP system tools provide the ability to sort and select data on specified conditions. Various qualitative and quantitative conditions can be set.

The main model of the data used in numerous toolsah creating and supporting databases - DBMS is a relational model. The data in it are presented as a set of two-dimensional table relations connected by key fields. To eliminate duplication, contradictory, reduction of labor costs for maintaining databases, a formal apparatus of normalization of table entities is applied. However, its application is associated with additional time spent on the formation of responses to requests to databases, although the memory resources are saved.

The multidimensional data model represents the object under study in the form of a multidimensional cube, more often use a three-dimensional model. On the axes or edges of the cube, measurements or details are postponed. Requisites - bases are filling the cells of the Cuba. The multidimensional cube can be presented with a combination of three-dimensional cubes in order to facilitate perception and presentation in the formation of reporting and analytical documents and multimedia presentations based on analytical work in the decision-making system.

As part of OLAP technologies based on the fact that a multidimensional representation of data can be organized as a means of relational DBMSs, so multidimensional specialized agents, distinguish three types of multidimensional OLAP systems:

- Multidimensional (Multidimensional) OLAP-MOLAP;
- relational (Relation) OLAP-ROLAP;
- Mixed or hybrid (Hibrid) OLAP-HOLAP.

In multidimensional DBMS, the data is organized not in the form of relational tables, but in the form of ordered multidimensional arrays in the form of hypercubes, when all stored data should have the same dimension, which means the need to form the most comprehensive measurement basis. Data can be organized in the form of polycubes, in this embodiment, the values \u200b\u200bof each indicator are stored with its own set of measurements, data processing is performed by its own system tool. The structure of the repository in this case is simplified, because There is no need for a storage area in a multidimensional or object-oriented form. Huge labor costs are reduced to create models and data transformation systems from the relational model to the object.

The advantages of MOLAP are:

- faster than with ROLAP receiving responses to requests - spent time per one or two, less;
- Due to SQL restrictions, the implementation of many built-in functions is difficult.

MOLAP restrictions include:

- relatively small database sizes;
- due to denormalization and preliminary aggregation, multidimensional arrays are used in 2.5-100 times more memory than the initial data (memory consumption with an increase in the number of measurements is growing according to the exponential law);
- there are no standards on the interface and means of data manipulation;
- There are limitations when loading data.

Labor costs to create multidimensional data increase dramatically, because Practically absent in this situation, specialized means of objectivating the relational model of the data contained in the information storage. The response time to requests often cannot meet the framework of requirements for OLAP systems.

The advantages of ROLAP systems are:

- the possibility of operational analysis of directly contained in the data storage, because Most of the source databases - relational type;
- with a variable dimension of the problem won RO LAP, because No physical reorganization of the database is required;
- ROLAP systems can use less powerful client stations and servers, and the main load on the processing of complex SQL queries falls to the servers;
- The level of protection of information and delimitation of access rights in relational DBMS is incomparably higher than in multidimensional.

The disadvantage of ROLAP systems is less productivity, the need for careful study of the database schemes, special setting of indexes, analysis of query statistics and accounting of analysis conclusions in the improvements of database schemas, which leads to significant additional labor costs.

The execution of these conditions allows when using ROLAP systems to achieve similar with MOLAP-systems of indicators in relation to access time, and also surpass in memory savings.

Hybrid OLAP systems are a combination of tools implementing a relational and multidimensional data model. This allows you to drastically reduce the costs of resources to create and maintain such a model, response time to requests.

With this approach, the advantages of the first two approaches are used and their disadvantages are compensated. In the most developed software products of this appointment, this particular principle is implemented.

The use of hybrid architecture in OLAP-systems is the most acceptable way to solve problems related to the use of software instrumental tools in multidimensional analysis.

The identification mode of patterns is based on intellectual data processing. The main task here is the identification of patterns in the studied processes, relationships and mutual influence of various factors, the search for major "unusual" deviations, the forecast of the course of various substantive processes. This area refers to intelligent analysis (Data Mining).

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted by http://www.allbest.ru/

Course work

by discipline: databases

Subject: TechnologyOLAP.

Performed:

Chizhikov Alexander Alexandrovich

Introduction

1. Classification of OLAP products

2. OLAP client - OLAP server: "For" and "against"

3. core OLAP system

3.1 Principles of construction

Conclusion

List of sources used

Applications

INmaintenance

It is difficult to find a person in the computer world who, at least at an intuitive level, did not understand what databases are and why they are needed. Unlike traditional relational DBMS, the concept of OLAP is not so widely known, although the mysterious term "Cuba OLAP" heard, probably almost all. What is online Analytical Processing?

OLAP is not a separate software product, not a programming language and not even a specific technology. If you try to cover OLAP in all its manifestations, then this set of concepts, principles and requirements underlying software products facilitating analysts access to data. Despite the fact that such a definition is unlikely anyone will not agree, it is doubtful that it is at least in the iota briefing non-specialists to understand the subject. Therefore, in his desire for the knowledge of OLAP it is better to go different ways. First you need to find out why analysts need to somehow specially facilitate access to the data.

The fact is that analysts are special consumers of corporate information. The task of analytics is to find regularities in large data arrays. Therefore, the analyst will not pay attention to a separate fact, he needs information about hundreds and thousands of events. By the way, one of the essential moments, which led to the appearance of OLAP - performance and efficiency. Imagine what happens when analyst needs to receive information, and there are no OLAP tools at the enterprise. An analyst independently (which is unlikely) or using the programmer makes the corresponding SQL query and receives the data of interest in the report or exports them to the spreadsheet. The problems arise a great set. First, the analyst is forced to engage not to work with its work (SQL programming) or wait for the task for it, the programmers will perform - all this is thoroughly affecting the productivity of labor, the infarction and stroke level increases and so on. Secondly, a single report or table, as a rule, does not save the giants of the thought and fathers of Russian analysis - and the entire procedure will have to be repeated again and again. Thirdly, as we have already found out, the analysts on trifles do not ask - they need everything immediately. This means (although the technique and goes forward with seven-mile steps) that the server of the corporate relational DBMS to which the analyst draws, can think deep and for a long time, blocking the other transactions.

The concept of OLAP appeared specifically to resolve such problems. Cuba OLAP is essentially meta-reports. Cuting Meta-reports (Cuba, that is) for measurements, the analyst receives, in fact, the "ordinary" two-dimensional reports that interests it (this is not necessarily reports in the usual understanding of this term - we are talking about data structures with the same functions). The advantages of the cubes are obvious - the data must be requested from the relational DBMS only once - when building a cube. Since analysts, as a rule, do not work with information that is complemented and changing "on the fly", the cube formed is relevant for a sufficiently long time. Due to this, not only excludes interruptions in the operation of the relational DBMS server (there are no queries with thousands and millions of lines of answers), but also sharply increases the speed of data access for the analyst itself. In addition, as already noted, the performance increases and by counting the intermediate sums of hierarchies and other aggregated values \u200b\u200bat the time of building a cube.

Of course, for the increase in this way of performance it is necessary to pay. Sometimes they say that the data structure simply "explodes" - the OLAP cube can occupy in dozens and even hundreds of times more space than the source data.

Now, when we sorted out a little about how OLAP works, it is still worth it, still, somewhat formalize our knowledge and give OLAP criteria without simultaneously translating to the usual human language. These criteria (total 12) were formulated in 1993 by E.F. The code is the creator of the concept of relational DBMS and, part-time, OLAP. We will not consider them directly, since later they were reworked in the so-called FASMI test, which determines the requirements for OLAP products. FASMI is an abbreviation from the name of each test point:

Fast. This property means that the system should provide an answer to the user's request on average five seconds; At the same time, most requests are processed within one second, and the most complex requests should be processed within twenty seconds. Recent studies have shown that the user begins to doubt the success of the request, if it takes more than thirty seconds.

Analysis (analytical). The system should cope with any logical and statistical analysis, characteristic of business applications, and ensures the preservation of results in the form available to the end user. Analysis tools may include procedures for analyzing time series, cost distribution, currency conversion, modeling changes in organizational structures and some others.

Shared (shared). The system should provide ample opportunities to distinguish between access to data and simultaneous work of many users.

Multidimensional (multiserry). The system should provide a conceptually multidimensional presentation of data, including full support multiple hierarchies.

Information. The power of various software products is characterized by the number of input processed data. Different OLAP systems have different power: advanced OLAP solutions can operate at least a thousand times with a large amount of data compared to the most low-power. When choosing an OLAP tool, a number of factors should be taken into account, including duplication of data required by the RAM, the use of disk space, operational indicators, integration with information storage facilities, etc.

1. Classification of OLAP products

So, the essence of OLAP is that the information source for analysis is presented in the form of a multidimensional cube, and it is possible to arbitrarily manipulate it and receive the necessary information cuts - reports. In this case, the end user sees a cube as a multidimensional dynamic table, which automatically summarizes the data (facts) in various cuts (measurements), and allows you to interactively manage calculations and a report form. The execution of these operations is provided by an OLAP machine (or OLAP-computing machine).

To date, many products implementing OLAP technologies have been developed in the world. To make it easier to navigate among them, use OLAP-products classifications: according to the data storage method for analysis and the location of the OLAP machine. Consider each category of OLAP products.

I will start with the classification by the data storage method. Let me remind you that multidimensional Cubes are built on the basis of source and aggregate data. Both the source and aggregate data for cubes can be stored both in relational and multidimensional databases. Therefore, three data storage methods are currently applied: MOLAP (Multidimensional OLAP), Rolap (Relasal Olap) and Holap (Hybrid Olap). Accordingly, OLAP products by storage method are divided into three similar categories:

1. In the case of MOLAP, the initial and aggregate data is stored in a multidimensional database or in a multidimensional local Cuba.

2. In Rolap products, the source data is stored in relational databases or in flat local tables on the file server. The aggregate data can be placed in the service tables in the same database. Converting data from the relational database into multidimensional cubes occurs at the request of OLAP.

3. In the case of using HOLAP architecture, the initial data remains in the relational base, and the units are placed in multidimensional. The construction of the OLAP cube is performed at the request of OLAP-tools based on relational and multidimensional data.

Next classification - at the place of placement of the OLAP car. According to this feature, OLAP products are divided into OLAP servers and OLAP clients:

In server OLAP tools, the calculation and storage of aggregate data are performed by a separate process - server. The client application receives only the results of queries to multidimensional cubes that are stored on the server. Some OLAP servers support data storage only in relational bases, some are only in multidimensional. Many modern OLAP servers support all three data storage methods: MOLAP, ROLAP and HOLAP.

OLAP client is designed differently. Building a multidimensional cube and OLAP calculations are performed in the memory of the client computer. OLAP clients are also divided into Rolap and Molap. And some can support both data access options.

Each of these approaches, there are "pros" and "minuses". Contrary to a common opinion on the benefits of server tools in front of client, in a number of cases, the application of the OLAP client for users may turn out to be more efficient and more profitable to use the OLAP server.

2. OLAP client - OLAP server: "For" and "against"

When constructing information system OLAP functionality can be implemented both server and client OLAPs. In practice, the choice is the result of the compromise of performance indicators and the cost of software.

The amount of data is determined by the set of following characteristics: the number of records, the number of measurements, the number of measurement elements, the length of measurements and the number of facts. It is known that the OLAP server can process large amounts of data than the OLAP client with an equal power of the computer. This is explained by the fact that the OLAP server stores on hard disks A multidimensional database containing pre-calculated Cubes.

Client programs at the time of execution of OLAP operations perform requests for SQL-like language, receiving not the entire cube, and its displayed fragments. OLAP client at the time of work should have in random access memory All cube In the case of ROLAP architecture, it is necessary to preload to memory all the data array used to calculate the cube. In addition, with an increase in the number of measurements, facts or measurements of measurements, the number of aggregates is growing in geometric progression. Thus, the amount of data processed by the OLAP client is directly dependent on the scope of the user's RAM.

However, we note that most OLAP clients provide distributed computing. Therefore, under the number of records processed, which limits the work of the client OLAP tools, it is understood not the amount of primary data of the corporate database, but the size of the aggregated sample from it. The OLAP client generates a request to a DBMS, which describes the filtering conditions and the algorithm for pre-grouping primary data. The server finds the records and returns a compact sample for further OLAP calculations. The size of this sample can be in tens and hundreds of times less than the volume of primary, non-aggregated records. Consequently, the need for such an OLAP client in PC resources is significantly reduced.

In addition, the number of measurements impose restrictions on the possibility of human perception. It is known that the average person can simultaneously operate 3-4, maximum measurements. With more measurements in the dynamic table, the perception of information is significantly difficult. This factor should be taken into account when pre-calculating RAM, which may be required by the OLAP client.

The measurement length also affects the size of the address space OLAP-means occupied by calculating the OLAP cube. The longer the measurement, the more resources are required to perform a preliminary sorting of a multidimensional array, and vice versa. Only short measurements in the source data are another argument in favor of the OLAP client.

This characteristic is determined by the two factors discussed above: the volume of the data being processed and the power of computers. As an increase in quantity, for example, measurements, the performance of all OLAP funds is reduced due to a significant increase in the number of units, but the rate of decline is different. We will demonstrate this dependence on the chart.

Scheme 1. The dependence of the performance of client and server OLAP funds from the increase in the amount of data

The speed characteristics of the OLAP server are less sensitive to the increase in the amount of data. This is due to various technologies for processing user requests OLAP server and OLAP client. For example, when the OLAP server, the OLAP server refers to stored data and "pulls" data from this "branch". The OLAP client calculates the entire set of units at the time of loading. However, to a certain amount of data, server and client performance is comparable. For OLAP clients supporting distributed calculations, the area of \u200b\u200bperformance comparability can be distributed to the amount of data covering the needs in OLAP analysis huge number users. This is confirmed by the results of internal testing MS OLAP Server and OLAP client "Contour Standard". The test is made on the IBM PC PC Pentium Celeron 400 MHz PC, 256 MB for a sample of 1 million unique (i.e. aggregated) records with 7 measurements containing from 10 to 70 members. Cube loading time in both cases does not exceed 1 second, and the execution of various OLAP operations (Drill Up, Drill Down, Move, Filter, etc.) is performed for hundredths of a second.

When the sample size exceeds the amount of RAM, the exchange (swapping) begins with the disk and the performance of the OLAP client drops sharply. Only from this moment can we talk about the advantage of the OLAP server.

It should be remembered that the "fracture" point determines the boundaround of the sharp rise in prices for OLAP solutions. For the tasks of everyone specific user This point is easily determined by the OLAP client performance tests. Such tests can be obtained from the developer's company.

In addition, the cost of server OLAP solutions increases with increasing number of users. The fact is that the OLAP server performs calculations for all users on one computer. Accordingly, the more the number of users, the greater the RAM and processor power. Thus, if the volumes of the data being processed lie in the area of \u200b\u200bcomparable performance of server and client systems, then all other things being equal, the use of the OLAP client will be more profitable.

Using an OLAP server in the "classic" ideology provides for unloading these relational DBMS into a multidimensional database. Unloading is performed for a certain period, so the OLAP server data does not reflect the state at the moment. This lack is deprived of only those OLAP servers that support ROLAP mode.

Similarly, a number of OLAP clients allows you to implement Rolap and Desktop-architecture with direct access to the database. This provides an analysis of the source data in the on-line mode.

The OLAP server places the minimum requirements for the power of client terminals. Objectively, the requirements of the OLAP client are higher, because It makes calculations in the RAM of the PC user. The state of the Park of Hardware Tools of a specific organization is the most important indicator that must be taken when selecting OLAP. But here there are "pros" and "minuses". OLAP server does not use a huge computing power of modern personal computers. In the event that the organization already has a Park of modern PCs, it is ineffectively to apply them only as displaying terminals and at the same time make additional costs for the central server.

If the power of the computers of users "leaves much to be desired", the OLAP client will work slowly or will not be able to work at all. Buying one powerful server may be cheaper than the upgrade of all PCs.

It is useful to take into account the trends in the development of hardware. Since the amount of data for analysis is almost a constant, then the stable power increase in the PC power will lead to the expansion of OLAP client capabilities and the olep servers to the segment of very large databases.

When using the OLAP server over the network, only data is transmitted to the client PC to display, while the OLAP client receives the entire volume of the primary sample data.

Therefore, where the OLAP client applies, network traffic will be higher.

But, when using the user operation OLAP server, for example, detailing, generate new queries to the multidimensional database, and, it means a new data transfer. The execution of OLAP operations OLAP client is made in RAM and, accordingly, does not cause new data streams on the network.

It should also be noted that modern network hardware Provides a high level of bandwidth.

Therefore, in the overwhelming majority of cases, the analysis of the database "medium" sizes using the OLAP client will not slow down the user.

The cost of the OLAP server is high enough. This should also begun the cost of a highlighted computer and the constant costs of administering the multidimensional base. In addition, the implementation and maintenance of the OLAP server requires a sufficiently high qualification personnel.

The cost of the OLAP client is an order of magnitude lower than the cost of the OLAP server. Administration and additional technical equipment under the server is not required. The qualifications of personnel in the implementation of the OLAP client of high demands is not presented. The OLAP client can be implemented much faster than the OLAP server.

Development of analytical applications using client OLAP funds - the process is fast and does not require special preparation of the Contractor. The user who knows the physical implementation of the database can develop an analytical application independently without the attraction of an IT specialist. When using the OLAP server, you must learn 2 different systemsSometimes from various suppliers, - to create cubes on the server, and to develop a client application. The OLAP client provides a single visual interface to describe the cubes and setting up user interfaces.

Consider the process of creating an OLAP application using a client instrumental tool.

Scheme 2. Creating an OLAP application using a client ROLAP

The principle of operation of RolaP clients is a preliminary description of the semantic layer, which hides the physical structure of the source data. In this case, data sources can be: local tables, RDBD. The list of supported data sources is defined by a specific software product. After that, the user can independently manipulate objects understandable to it in terms of the subject area to create cubes and analytical interfaces.

The principle of operation of the OLAP server is different. In the OLAP server, when creating cubes, the user manipulates physical descriptions of the database.

At the same time, custom descriptions are created in the Cuba itself. The OLAP server client is only configured to the cube.

Let us explain the principle of operation of the ROLAP client on the example of creating a dynamic sales report (see scheme 2). Let the initial analysis data are stored in two tables: Sales and Deal.

When creating a semantic layer, data sources - Sales and Deal tables are described by understandable end user terms and turn into "products" and "transactions". The "ID" field from the "Products" table is renamed to the "code", and "NAME" - in the "product", etc.

Then a business object "Sales" is created. The business object is a flat table, based on a multidimensional cube. When creating a business object "Products" and "Transactions" table, combine the "code" field. Since to display in the report, all fields of the tables are required - the business object uses only the "product" fields, the "date" and "amount".

Next, on the basis of the business object, an OLAP report is created. The user selects a business object and drags its attributes in the column area or the report table rows. In our example on the basis of the Sales Business Object, a report on sales of goods by month has been created.

When working with an interactive report, the user can set the filtering and grouping conditions with the same simple movements "Mouse". At this point, the ROLAP client appeals to the data in the cache. The client of the OLAP server generates a new request to a multidimensional database. For example, applying a sales filter in sales report, you can get a report on sales of goods of interest to us.

All OLAP applications settings can be stored in a dedicated metadata repository, in an application or in a multidimensional database system repository. Implementation depends on the specific software product.

So, in what cases, the application of the OLAP client for users can be more efficient and more profitable to use the OLAP server?

The economic feasibility of application of the OLAP server occurs when the amount of data is very high and unbearable for the OLAP client, otherwise the use of the latter is more justified. In this case, the OLAP client combines high performance characteristics and low cost.

Powerful PC analysts are another argument in favor of OLAP clients. When applying the OLAP server, these power are not used. Among the advantages of OLAP clients can also be called the following:

The cost of implementing and maintaining an OLAP client is significantly lower than the cost of the OLAP server.

When using an OLAP client with a built-in data transmission transmission over the network, it is performed once. When performing OLAP operations, new data streams are not generated.

Setting up ROLAP clients is simplified by eliminating the intermediate link - creating a multidimensional database.

3. core OLAP system

3.1 Principles of construction

application client kernel data

From already said, it is clear that the OLAP mechanism is at today one of the popular data analysis methods. There are two main approaches to solving this task. The first one is called Multidimensional OLAP (MOLAP) - implementation of the mechanism using a multidimensional database on the server side, and the second Relational Olap (ROLAP) - building cubes "on the fly" based on SQL requests to relational DBMS. Each of these approaches has its pros and cons. Their comparative analysis goes beyond this work. Here only the implementation of the kernel of the desktop Rolap module will be described.

Such a task has arisen after applying a ROLAP system based on the Decision Cube components that make up Borland Delphi. Unfortunately, the use of this set component has shown low performance on large amounts of data. The severity of this problem can be reduced by trying to cut off as much data as possible before feeding them to build cubes. But this does not always happen enough.

On the Internet and the press you can find a lot of information about OLAP systems, but almost nowhere is stated about how it is arranged inside.

Scheme of work:

The general scheme of the work of the desktop OLAP system can be represented as follows:

Scheme 3. Work desktop OLAP system

The work algorithm is as follows:

1. Calculate data in the form of a flat table or the result of the execution of SQL request.

2. Cash data and transform them to multidimensional Cuba.

3. The image of the constructed Cuba using cross-table or chart, etc. In the general case, an arbitrary number of mappings can be connected to one Cuba.

Consider how similar system May be arranged inside. Let's start this from the side that you can see and feel, that is, with mappings. Display used in OLAP systems, most often there are two types - cross-tables and charts. Consider a cross-table, which is the main and most common way to display Cuba.

In the figure below, the rates and columns containing aggregated results are displayed, the cells are marked with light gray cells in which the facts and dark gray cells contain the dimensions.

Thus, the table can be divided into the following elements with which we will work in the future:

Filling a matrix with facts, we must act as follows:

Based on measurement data, determine the coordinates of the added element in the matrix.

Determine the coordinates of columns and lines of results to which the added element is affected.

Add item to the matrix and the corresponding columns and lines of results.

It should be noted that the resulting matrix will be strongly sparse, why its organization in the form of a two-dimensional array (the option lying on the surface) is not only irrational, but most likely, and is not possible due to the large dimension of this matrix, which is not Ensure no amount of RAM. For example, if our cube contains sales information in one year, and if there are only 3 measurements in it - customers (250), products (500) and date (365), then we will get the matrix of the following sizes: number of elements \u003d 250 x 500 x 365 \u003d 45,262,000. And this is despite the fact that the filled elements in the matrix can only be several thousand. Moreover, the greater the amount of measurements, the more rarefied the matrix will be.

Therefore, to work with this matrix you need to apply special mechanisms for working with rarefied matrices. Various options for organizing a rarefied matrix are possible. They are quite well described in programming literature, for example, in the first volume of the classic book "Art of Programming" Donald Knuta.

Consider now how you can determine the factory coordinates, knowing the corresponding measurements. To do this, consider in more detail the header structure:

In this case, you can easily find a way to determine the numbers of the corresponding cell and the results in which it falls. Here you can offer several approaches. One of them is the use of a tree to search for the corresponding cells. This tree can be built when passing the sample. In addition, you can easily determine the analytical recurrent formula to calculate the desired coordinates.

The data stored in the table must be converted to use them. So, in order to increase productivity in constructing a hypercuba, it is desirable to find unique elements stored in columns that are measuring cube. In addition, you can prior aggregation of facts for entries that have the same dimensions. As mentioned above, unique values \u200b\u200bavailable in the measurement fields are important. Then, for storage, you can offer the following structure:

Scheme 4. Storage Structure of unique values

When using such a structure, we significantly reduce the need for memory. What is pretty relevant, because To increase the speed of operation, it is advisable to store data in RAM. In addition, you can store only an array of elements, and their values \u200b\u200bare unloaded to the disk, as they will only be required when the cross-table is derived.

The ideas described above were maintained when creating a CubeBase component library.

Scheme 5. CubeBase Component Library Structure

Tsubesource makes caching and converting data to the internal format, as well as pre-aggregation of data. The TSubeEngine component carries out the calculation of the hypercube and operations with it. In fact, it is an OLAP machine that transforms a flat table into a multidimensional data set. The TCUBEGRID component executes the cross-table screen and the hypercube display control. Tsubechart allows you to see the hypercube in the form of graphs, and the TSubepiVote component controls the work of the cube core.

So, I considered architecture and interaction of components that can be used to build an OLAP machine. Now consider in more detail the internal structure of the components.

The first step of the system's operation will be loading data and transform them into the internal format. The question will be natural - and why it is necessary, because you can simply use data from a flat table by looking through it when constructing a cube cut. In order to answer this question, consider the structure of the table from the point of view of the OLAP machine. For OLAP, the table column system can be either facts or measurements. In this case, the logic of working with these columns will be different. In the hypercube measurements are actually axes, and the measurement values \u200b\u200bare coordinates on these axes. At the same time, the cube will be filled highly uneven - there will be a combination of coordinates that no entries will be configured and there will be combinations that correspond to several entries in the source table, and the first situation meets more often, that is, the cube will be similar to the Universe - empty space, in separate places which encourages points (facts). Thus, if at the initial data loading, we will make data pre-making, that is, we combine records that have the same measurement values, while calculating the preliminary aggregated facts of facts, then in the future we will have to work with a smaller number of records, which will increase the speed and reduce the requirements to the volume of RAM.

To construct a hypercube sections, we need the following features - the definition of coordinates (actually measurement values) for table entries, as well as the definition of records that have specific coordinates (measurement values). Consider how these possibilities can be realized. To store hypercubes, the easiest way to use the database of your internal format is the easiest.

Schematically conversion can be represented as follows:

Scheme 6. Conversion of the internal format database into a normalized database

That is, instead of one table, we received a normalized database. In fact, normalization reduces the speed of the system, they can say database specialists, and in this they will certainly be right, in the case when we need to get values \u200b\u200bfor the elements of dictionaries (in our case, measurement values). But the thing is that these values \u200b\u200bdo not need these values \u200b\u200bat the stage of construction. As mentioned above, we are interested only in coordinates in our hypercub, so we define the coordinates for measurement values. The simplest to renumber the values \u200b\u200bof the elements. In order to within one measurement numbering, the numbering was unambiguous, pre-sort the lists of measurement values \u200b\u200b(dictionaries, expressing the terms of the database) in alphabetical order. In addition, we carry out the facts, and the facts are pre-magregated. We get the following scheme:

Scheme 7. Radiation normalized database to determine the coordinates of measurement values

Now it remains only to tie the elements of different tables among themselves. In the theory of relational databases, this is carried out using special intermediate tables. We have enough for each entry in the measurement tables, in line with the list of which will be the facts of the facts, when the formation of which these measurements were used (that is, to determine all the facts having the same coordinate value described by this measurement). For the facts, respectively, each entry put in accordance with the coordinate values \u200b\u200bby which it is located in the hypercube. In the future, everywhere under the coordinates of the entry in the hypercube will be understood by the numbers of the corresponding entries in the tables of measurement values. Then for our hypothetical example, we obtain the next set, which determines the internal representation of the hypercuba:

Scheme 8. Internal representation of the hypercuba

It will be our internal representation of the hypercuba. Since we do it not for the relational database, simply fields of variable lengths are used as fields of communication (RBDs it would not be possible to make it, since there is a number of table columns in advance).

It would be possible to try to use a temporary table set to implement the hypercube, but this method will provide too low speed (example - a set of the Decision Cube component), so we will use your storage structures.

To implement the hypercube, we need to use data structures that will ensure maximum speed and minimum RAM costs. Obviously, we will have the main structures for storing dictionaries and factory facts. Consider the tasks that the dictionary must perform with the maximum speed:

checking the presence of an element in the dictionary;

adding an item to the dictionary;

searching for record numbers having a specific coordinate value;

search for the coordinates of measurement value;

search for measurement values \u200b\u200bby its coordinate.

To implement these requirements, you can use different types and data structures. For example, you can use arrays of structures. In the real case, these arrays require additional indexing mechanisms that will increase the speed of downloading data and obtain information.

To optimize the work of the hypercube, it is necessary to determine what tasks it is necessary to decide on priority, and for what criteria we need to achieve improved quality of work. The main thing for us is to increase the speed of the program, while it is desirable that not a very large amount of RAM is required. Increased performance is possible due to the introduction of additional data access mechanisms, for example, the introduction of indexing. Unfortunately, it enhances the overhead of RAM. Therefore, we define what operations we need to perform with the highest speed. To do this, consider individual components that implement hypercubes. These components have two main types - measurement and table of facts. To measure a typical task will be:

adding a new value;

determination of coordinates for measuring value;

determining the value of the coordinate.

When adding a new element value, we need to check if we have already have such a value, and if there is, it is not added to the new one, but to use the existing coordinate, otherwise you need to add a new element and determine its coordinate. This requires a way quick search The presence of the desired element (in addition, such a task arises and when determining the coordinate on the value of the element). To do this, the use of hashing will be optimal. In this case, the optimal structure will be the use of hash trees, in which we will store references to the elements. At the same time, the elements will be the lines of the measuring dictionary. Then the structure of the measurement value can be represented as follows:

PFACTLink \u003d ^ TFActLink;

TFACTLINK \u003d record

Factno: Integer; // fact index in the table

TDimensionRecord \u003d Record.

Value: String; // Measurement value

Index: integer; // Coordinate value

FACTLINK: PFATILINK; // pointer to the beginning of the list of elements of the fact table

And in the hash tree we will store references to unique elements. In addition, we need to solve the conversion task - by the coordinate to determine the measurement value. To provide maximum performance You need to use direct addressing. Therefore, you can use another array, the index in which is the coordinate measurement, and the value is the reference to the appropriate entry in the dictionary. However, it is possible to do it easier (and to save on the memory) if the array of elements can be arranged accordingly so that the element index is its coordinate.

The organization of the array that implements the list of facts does not submit special problems due to its simple structure. The only remark will be such that it is desirable to calculate all the methods of aggregation that may be needed and which can be calculated incremental (for example, amount).

So, we described how to store data in the form of a hypercuba. It allows you to form a set of points in multidimensional space based on information in the data warehouse. In order for a person to be able to work with these data, they must be submitted in the form that is convenient for processing. At the same time, a summary table and graphics are used as the main types of data presentation. Moreover, both of these methods actually represent the projections of the hypercuba. In order to ensure maximum efficiency in building representations, we will be repelled from what these projections are. Let's start considering the consolidated table as with the most important data analysis.

Find ways to implement such a structure. You can select three parts from which the summary table consists of: these are the headings of the strings, column headers and the actual table of aggregated facts. Most. simple way Representations of the fact table will be using a two-dimensional array, the dimension of which can be determined by building headlines. Unfortunately, the easiest way will be the most inefficient, because the table will be strongly sparse, and the memory will be extremely inefficient, as a result, only very small cubes can be built, since otherwise memory may not be enough. Thus, we need to choose to store information such a data structure that will ensure the maximum search / adding speed of the new item and at the same time the minimum frequency consumption. This structure will be the so-called sparse matrices, which can be read in more detail at the whip. Various ways of organizing the matrix are possible. In order to select the option that suits us, consider the initial structure of the table heading.

The headlines have a clear hierarchical structure, so it will naturally be assumed to use the wood to use them. At the same time, the schematically structure of the tree node can be depicted as follows:

Appendix C.

At the same time, as the measurement value, it is logical to store a reference to the corresponding element of a multidimensional cube measurement table. This will reduce memory costs for storing a cut and speed up the work. Links are also used as parent and subsidiaries.

To add an item to a tree, you must have information about its location in the hypercub. As such information, it is necessary to use its coordinate, which is stored in the measuring values. Consider the diagram of adding an item to the header tree of the consolidated table. At the same time, as source information, use the measurement coordinate values. The order in which these measurements are listed is determined by the required method of aggregation and coincides with the level hierarchy levels of the headlines. As a result, it is necessary to obtain a list of columns or rows of the consolidated table in which it is necessary to add an element.

applicationD.

As source data to determine this structure, use the coordinates of measurements. In addition, for definiteness, we assume that we define the column of interest to us in the matrix (as we define the string will be considered a little later, since it is more convenient to apply other data structures there, the reason for this choice is also below). As coordinates, take integers - the numbers of measurement values \u200b\u200bthat can be defined as described above.

So, after performing this procedure, we obtain an array of reference to the columns of a rarefied matrix. Now you need to perform all the necessary actions with rows. To do this, inside each column it is necessary to find the desired item and add the corresponding value there. For each of the measurements in the collection, it is necessary to know the number of unique values \u200b\u200band the actual set of these values.

Now consider, in what form it is necessary to represent the values \u200b\u200binside the columns - that is, how to determine the desired string. To do this, you can use several approaches. It would be the easiest to present every column in the form of a vector, but since it will be strongly rareered, the memory will be consumed extremely inefficient. To avoid this, we will apply the data structures that will provide greater effectiveness of the representation of rarefied one-dimensional arrays (vectors). The simplest of them will be the usual list, one- or dual, however, it is not protected from the point of view of access to the items. Therefore, we will use a tree that will provide more fast access To elements.

For example, you can use exactly the same tree as for columns, but then it would have to make your own tree for each column, which will lead to significant overhead memory speakers and processing time. I will do a little cunning - we will earn one tree to store all the measurement combinations used in the rows, which will be identical to the above described, but its elements will not indicate the lines (which is not as such), and their indexes, and the index values \u200b\u200bthemselves do not interest us and are used only as unique keys. Then these keys will be used to search for the desired item inside the column. The columns themselves are easiest to imagine in the form of a conventional binary tree. The graphically obtained structure can be represented as follows:

Scheme 9. Image of a consolidated table in the form of a binary tree

To determine the appropriate line numbers, you can use the same procedure as the procedure described above the procedure for determining the columns of the consolidated table. At the same time, the rows numbers are unique within the same summary table and identify elements in vectors that are columns of the pivot table. The easiest way to generate these numbers will be the maintenance of the counter and the increment of it per unit when adding a new element in the tree header tree. These vector columns themselves are easiest to store in the form of binary trees, where the row number is used as the key. In addition, the use of hash tables is also possible. Since the procedures for working with these trees are considered in detail in other sources, we will not stop at this and consider general scheme Add element to column.

In general, the sequence of actions to add an item to the matrix can be described as follows:

1. Will the rows numbers in which the elements are added.

2. Consider the set of columns in which the elements are added.

3. For all columns to find elements with the right numbers Rows and add the current element to them (the addition includes connecting the desired number of facts and the calculation of aggregated values \u200b\u200bthat can be determined incremental).

After performing this algorithm, we obtain a matrix, which is a consolidated table that we need to build.

Now a couple of words about filtering when constructing a cut. It is easiest to implement it just at the stage of building a matrix, since at this stage there is access to all the required fields, and, in addition, the values \u200b\u200baggregation is carried out. At the same time, during the receipt of recording from the cache, its compliance with the filtering conditions is checked, and in case of non-compliance, the record is discarded.

Since the structure described above fully describes a consolidated table, the task of its visualization will be trivial. In this case, you can use the standard components of the table, which are available in almost all programming tools under Windows.

The first product performing OLAP requests was Express (IRI). However, the term OLAP itself was proposed by Edgarododod, the "father of relational bd". And the code of the code was financed by Arbor, the company that issued his own OLAP product - Essbase (later purchased by Hyperion, which in 2007 was absorbed by Oracle) - a year earlier. Other well-known OLAP products include Microsoft Analysis Services (previously called OLAP Services, part SQL Server), Oracle OLAP Option, DB2 OLAP Server from IBM (in fact, Essbase with additions from IBM), SAP BW, BRIO, BusinessObjects products, Cognos, MicroStrategy and other manufacturers.

From a technical point of view, the products presented in the market are divided into "physical OLAP" and "virtual". In the first case, there is a program that performs the preliminary calculation of the aggregates, which are then stored in a special multidimensional database that provides fast extraction. Examples of such products are Microsoft Analysis Services, Oracle Olap Option, Oracle / Hyperion Essbase, Cognos PowerPlay. In the second case, the data is stored in relational DBMS, and the units may not exist at all or be created on the first request in the DBMS or cache of analytical software. Examples of such products - SAP BW, BusinessObjects, MicroStrategy. Systems that have the basis of "physical OLAP" provide stable best time Response to requests than "Virtual OLAP" systems. Virtual OLAP System Suppliers declare greater scalability of their products in terms of supporting very large amounts of data.

In this paper, I would like to consider the product of the company Basegroup Labs - Deductor.

Deductor is an analytical platform, i.e. the basis for creating finished applied solutions. Implemented in Deductor technology allow on the basis of a single architecture to complete all stages of constructing an analytical system: from creating a data warehouse to automatically select models and visualizing the results obtained.

Composition of the system:

Deductor Studio is the analytical core of the Deductor platform. The DEDUCTOR STUDIO includes a complete set of mechanisms that allows you to obtain information from an arbitrary data source, spend the entire processing cycle (cleaning, data transformation, constructing models), display the results obtained in the most convenient way (OLAP, tables, charts, trees, solutions ...) and Export results.

DEDUCTOR VIEWER is a end-user job destination. The program allows you to minimize staff requirements, because All required operations are performed automatically using the processing scenarios prepared earlier, there is no need to think about the method of obtaining data and processing mechanisms. The DEDUSTOR VIEWER user needs only to select the report.

DEDUCTOR WAREHOUSE is a multidimensional cross-platform data storage that accumulates all the information needed to analyze the subject area. The use of a single repository allows you to provide convenient access, high processing speed, consistency of information, centralized storage and automatic support of the entire data analysis process.

4. Client-Server

Deductor Server is designed for remote analytical processing. It provides the ability to both automatically "drive" data through existing scenarios on the server and reubount the existing models. Using Deductor Server allows you to implement a full-fledged three-star architecture in which it performs the function of the application server. Access to the server is supervised using Deductor Client.

Work principles:

1. Data imports

Analysis of any information in Deductor begins with data import. As a result of import, the data is given to the form suitable for subsequent analysis using all the mechanisms available in the program. Nature of data, format, DBMS and other do not matter, because Working mechanisms with all unified.

2. Data exports

The presence of export mechanisms allows you to forward the results in third-party applications, for example, to transfer the sales forecast to the system to form a purchase order or place a prepared report on the corporate Web site.

3. Data processing

Under processing in Deductor means any action associated with some data conversion, for example, filtering, building a model, cleaning, and so on. Actually in this block and produced the most important from the point of view of the analysis. The most significant feature of the processing mechanisms implemented in Deductor is that the data obtained as a result of the processing can be processed any of the methods available. Thus, it is possible to build arbitrarily complex processing scenarios.

4. Visualization

You can visualize data in Deductor Studio (Viewer) at any stage of processing. The system independently determines how it can do it, for example, if it is trained neural network, In addition to tables and diagrams, you can view the graph of the neural network. The user needs to choose need option From the list and configure several parameters.

5. Integration mechanisms

Deductor does not provide for data entry tools - the platform is oriented exclusively for analytical processing. Flexible import-export mechanisms are provided for the use of information stored in heterogeneous systems. The interaction can be organized using batch execution, operation in OLE Server and access to Deductor Server.

6. Treatment of knowledge

Deductor allows you to implement one of the most important functions of any analytical system - support for the process of knowledge replication, i.e. Ensuring the opportunity to employees who do not understand the methods of analysis and methods for obtaining a particular result, receive an answer on the basis of models prepared by the expert.

Z.action

In this paper, such a region of modern information technologiesas data analysis systems. Analyzed the main tool for analytical information processing - OLAP - technology. It is disclosed in detail the essence of the concept of OLAP and the value of OLAP systems in the modern business process. The structure and process of the Rolap server work is described in detail. As an example of the implementation of OLAP data, the Deductor analytical platform is given. The presented documentation has been developed and complies with the requirements.

OLAP technology is a powerful data processing tool in real time. OLAP server allows you to organize and submit data in the section of various analytical directions and turns the data in valuable informationwhich helps companies make more informed decisions.

Using OLAP systems provides a consistently high level of performance and scalability, maintaining data volumes of several gigabytes, which can get access to thousands of users. With the help of OLAP technologies, access to information is carried out in real time, i.e. Request processing is no longer slows down the analysis process, providing its efficiency and efficiency. Visual administration tools allow you to develop and implement even the most complex analytical applications, making this process by simply and rapidly.

1.2 Definition OLAP.-Systems

The technology of integrated multidimensional data analysis was named OLAP. OLAP is the key component of the organization HD.

OLAP functionality can be implemented in various ways, both the simplest, such as analyzing data in office applications, as well as more complex - distributed analytical systems based on server products.

OLAP (On-lineanaalyticalProcessing) - Technology of operational analytical data processing using the tools and methods for collecting, storing and analyzing multidimensional data and to support decision-making processes.

The main purpose of OLAP systems is supporting analytical activities, arbitrary query of analyst users. The purpose of OLAP analysis is to check the emerging hypotheses.

In 1993, the founder of the relational approach to building the database Edgar Codd with partners (EDGAR CODD, Mathematics and Scholant IBM), published an article initiated by Arbor Software (today is the most famous company "Hyperion Solutions") entitled "OLAP provision (operational Analytical processing) for analyst users ", in which 12 features of OLAP technology are formulated, which were subsequently supplemented six more. These provisions have become the main content of new and very promising technology.

Main features of technology OLAP (Basic):

multidimensional conceptual representation of data;
intuitive data manipulation;
availability and data details;
packet extraction of data against interpretation;
OLAP analysis models;
client-server architecture (OLAP is available from desktop);
transparency (transparent access to external data);
multiplayer support.

Special features (Special):

processing informalized data;
saving OLAP results: storing them separately from the source data;
elimination of missing values;
processing missing values.

Features of submission of reports (Report):

reporting flexibility;
standard report performance;
automatic configuration of the physical level of data extraction.

Measurement management (Dimension):

universality of measurements;
unlimited number of measurements and aggregation levels;
unlimited number of operations between dimensions.

Historically, today that today the term "OLAP" implies not only a multidimensional view of the data from the end user, but also a multidimensional presentation of data into the target database. This is due to the appearance as an independent term "Relational OLAP" (ROLAP) and "Multiple OLAP" (MOLAP).

OLAP SSERVIS is a tool for analyzing large amounts of data in real time. Interacting with the OLAP system, the user will be able to exercise flexible viewing of information, obtain arbitrary data sections and perform analytical operations of detail, convolution, through distribution, time comparisons simultaneously in many parameters. All work with OLAP is occurring in terms of the subject area and allows you to build statistically substantiated business situations.

OLAP software tools - this is an operational analysis toolcontained in the repository. The main feature is that these funds are focused on using not a specialist in the field of information technology, not a statistical expert, but a professional in the application area, the department, department, department, and finally director. Means are intended to communicate Analytics with a problem and not with a computer. In fig. 6.14 is an elementary OLAP -KUB, which allows to produce data estimates for three dimensions.

Multidimensional OLAP -KUB and the system of the respective mathematical statistical processing algorithms allows you to analyze data of any complexity at any time intervals.

Fig. 6.14.

Having at its disposal flexible mechanisms for manipulating data and visual display (Fig. 6.15, Fig. 6.16), the manager first considers data from different sides, which may be (and may not be) associated with the problem being solved.

Next, it compares various business indicators among themselves, trying to identify hidden relationships; It may consider the data more closely, detailing them, for example, decomposing into the amounts of time, by regions or by customers, or, on the contrary, to further summarize the presentation of information to remove distracting details. After that with the module statistical estimation and simulation modeling Several options for developing events are built, and the most acceptable option is selected.

Fig. 6.15.

The management company, for example, can be born by a hypothesis that the spread of asset growth in various branches of the company depends on the ratio of specialists with technical and economic education. To check this hypothesis, the manager can request from the repository and displaying its relationship for those branches, which for the current quarter, the growth of assets decreased compared with last year by more than 10%, and for those who have risen more than on 25%. It must be able to use a simple selection from the proposed menu. If the results obtained significantly disintegrate into two relevant groups, then this should be an incentive for further verification of the hypothesis extended.

Currently, rapid development received a direction called dynamic modeling (Dynamic Simulation), fully implementing the above principle FASMI.

Using dynamic modeling, the analyst builds a model of a business situation developing in time, by some scenario. In this case, the result of such modeling may be several new business situations generating a tree possible solutions With the probability assessment and prospects of each.

Fig. 6.16.

Table 6.3 shows comparative characteristics static and dynamic analysis.

Table 6.3.

Characteristic	Static analysis	Dynamic analysis
Types of questions	Who! What? How many? How? When? Where?	Why is that? What would happen if ...? What if…?
Response time	Not regulated	Seconds
Typical operations working with data	Regulated report, diagram, table, drawing	Sequence of interactive reports, diagrams, screen forms. Dynamic change in the levels of aggregation and data sections
Level of analytical requirements	Middle	Tall
Type of screen forms	Mainly determined in advance, regulated	User-defined, there are customization features
Data aggregation level	Detailed and total	Determined by the user
"Age" of data	Historical and current	Historical, current and projected
Types of requests	Mostly predictable	Unpredictable - from occasion to occasion
Purpose	Regulated analytical processing	Multipurity analysis, modeling and building forecasts

Almost always the task of building an analytical system for multidimensional data analysis is the task of constructing a unified, agreed functioning information system, based on inhomogeneous software and solutions. And already the choice of funds for implementing the IP becomes an extremely difficult task. Many factors should be taken into account, including mutual compatibility of various software components , ease of their development, use and integration, efficiency of functioning, stability, and even forms, level and potential promising relationships of various manufacturers' firms.

OLAP Apply wherever there is a task of analyzing multifactor data. In general, with some table with data, in which there is at least one descriptive column and one column with numbers, the OLAP process will effective tool Analysis and generation of reports. As an example of applying OLAP technology, consider the study of the results of the sales process.

Key questions "How many sold?", "What amount is sold?" Expands as business complications and accumulation of historical data to a certain set of factors, or cuts: ".. in St. Petersburg, in Moscow, in the Urals, in Siberia ...", ".. in the past quarter, compared to the current", " ..The Supplier and compared to the supplier b ... "and so on.

Answers to such questions are necessary for making management decisions: about changing the range, price, closure and opening of stores, branches, termination and signing contracts with dealers, or termination of advertising campaigns, etc.

If you try to highlight the main numbers (facts) and cuts (measurement arguments), which manipulates the analyst, trying to expand or optimize the company's business, then the table will be a table suitable for sales analysis as a template requiring the appropriate adjustment for each particular enterprise.

Time. As a rule, these are several periods: year, quarter, month, decade, week, day. Many OLAP printings automatically calculate older periods from the date and calculate the results on them.

Category of goods. Categories may be several, they differ for each type of business: variety, model, packaging type, etc. If only one product is sold or the range is very small, then the category is not needed.

Product. Sometimes the name of the goods (or services) is applied, its code or article. In cases where the range is very large (and some enterprises have tens of thousands of positions in their price list), initial analysis for all types of goods may not be carried out, but to communicate to some agreed categories.

Region. Depending on the globality of the business, it is possible to keep in mind the continent, a group of countries, country, territory, city, district, street, part of the street. Of course, if there is only one shopping point, then this dimension is missing.

Seller. This measurement also depends on the structure and scale of business. Here it can be: a branch, shop, dealer, sales manager. In some cases, the dimension is missing, for example, when the seller does not affect sales volumes, the store is only one and so on.

Buyer. In some cases, for example, in retail, the buyer is impersonal and there is no dimension, in other cases there is information about the buyer, and it is important for sales. This measurement may contain the name of the buyer's company or many groupings and characteristics of customers: industry, enterprise group, owner, and so on .. Analysis of sales structure to identify the most important components in the context of the section. To do this, it is convenient to use, for example, a "pie" diagram in difficult cases when 3 measurements are studied at once - "columns". For example, in the Computer Technology store for the sales range of computers amounted to $ 100,000, photographic equipment - $ 10,000, consumables - $ 4500. Conclusion: The turnover of the store depends to a large extent from the sale of computers (in fact, perhaps consumables are necessary for the sale of computers, but this is already an analysis of internal dependencies).

Dynamics analysis ( regression analysis - Detection of trends). Detection of trends, seasonal oscillations. Vitely dynamics displays the line type graph. For example, the sales volumes of Intel products have fallen throughout the year, and Microsoft sales grew. Perhaps improved the well-being of the average buyer, or the image of the store has changed, and with him the composition of buyers. It is required to conduct an adjustment of the range. Another example: for 3 years in winter, the sales of video cameras are reduced.

Analysis of dependencies (correlation analysis). Comparison of sales volumes of different goods in time to identify the required range - "baskets". To do this, it is also convenient to use a "line" graph. For example, when removing from the range of printers during the first two months, a drop in sales of powder cartridges was discovered.

Basic concepts that operate OLAP technology. OLAP technology

Send your good work in the knowledge base is simple. Use the form below

Similar documents

1.2 Definition OLAP.-Systems

Password recovery