Apr 21, 2012

Data Mart

1. Introduction to Data Marts

A data mart is a collection of subject areas organized for decision support based on the needs of a functional unit. Generally marts are designed to satisfy the decision support needs of a particular department or a functional unit. For example, Sales Department has their own data mart, Marketing has theirs, and Accounts department has theirs and so on. Some data marts are dependent on other data marts to get their information.

A Data Mart is a specific, subject oriented, repository of data designed to answer specific questions for a specific set of users. So an organization could have multiple data marts serving the needs of marketing, sales, operations, collections, etc. A data mart usually is organized as one-dimensional model as a star-schema (OLAP cube) made of a fact table and multiple dimension tables.

Data marts have appeared as important concepts along with the rapid growth of data warehouses. Data marts are databases that share many of the features of data warehouses but the scope is much smaller. A data mart is defined as a subset of a data warehouse for a single department or function. A data mart may have tens of gigabytes of data rather than hundreds of gigabytes for the entire enterprise

• Data Marts are dedicated to particular subject areas or the needs of a particular department or a business function.

• An enterprise could have a data warehouse containing sales, inventory, distribution, finance, and human resources subject areas and it may also have a sales data mart, an accounts receivable data mart and an employee turnover analysis data mart etc.

The data mart can be a standalone entity, which can be designed directly by extracting data from the OLTP systems or it can be linked centrally to the corporate data warehouse, i.e. the data mart gets the data from a data warehouse, the warehouse in turn get the data from OLTP Systems. As a data warehouse grows larger, the ability to serve the various needs of the organization may be compromised. The popularity of data marts stems from the fact that corporate-wide data warehouses are proving difficult to build and use.

2. Building Data Marts

There are several approaches to building data marts. One approach is to build several data marts with a view to the eventual integration into a warehouse; another approach is to build the infrastructure for a corporate data warehouse while at the same time building one or more data marts to satisfy immediate business needs.

2.1 Types of Approaches to build a Data Mart:

Data warehouse is unique in its nature because it must adapt to the needs of business users in different functional areas, whose companies face different business conditions and competitive pressures. There are two different approaches to build a data mart. They are Bottom-up approach and Top-down approach.
There is considerable conflict between top-down and bottom-up approaches. But in reality, the differences are not as bare as they may appear. Both approaches advocate building strong enterprise architecture that adapts easily to changing business needs and delivers a single version of the truth. For example, both approaches collect data from source systems into a single data store, from which data marts are populated. But while “top-down” subscribers call this a data warehouse, “bottom-up” adherents often call this a “staging area.”

Bottom-up Approach:

In this approach data mart is designed first by extracting data from OLTP Systems, then the data marts are integrated together to design a centralized data warehouse. To integrate data marts logically, organizations use conformed dimensions and facts when building a new data mart. The new data mart is integrated with others within a logical enterprise dimensional model. Generally, this approach is used to deliver business value by deploying dimensional data marts as quickly as possible. This approach consists of both atomic and summary data.

Advantages of Bottom-Up Approach:

• It focuses on creating user friendly, flexible data structure using dimensional, star schema models.

• Users do not have to drill through from a data mart to another structure to obtain detailed or transaction data.

Disadvantages of Bottom-Up Approach:

• It requires organizations to enforce the use of standard dimensions and facts to ensure integration and deliver a single version of the truth.

• In a distributed, decentralized organization, the integration is difficult.

Top-Down Approach:

In this approach, a centralized data warehouse is designed first by extracting data from OLTP Systems, then all the data marts gets data from this data warehouse. In this approach, the data warehouse holds atomic or transaction data that is extracted from one or more source systems and integrated within a normalized, enterprise data model. From there, the data is summarized, dimensionalized, and distributed to one or more data marts which is dependent on the data warehouse because they derived all their data from the data warehouse.

Advantages of Top-Down Approach:

• It provides an integrated, flexible architecture to support downstream analytic data structures.

• Data warehouse provides a departure point for all data marts, enforcing consistency and standardization so that organizations can achieve a single version of all the information

• The atomic data in the warehouse allows the organizations re-purpose that data in any number of ways to meet new and unexpected business needs.

• The data warehouse allows the users to query it, if they need cross functional or enterprise views of the data.

Disadvantages of Top-Down Approach:

• It takes longer time and more cost to deploy than bottom up approach in the initial time.

• The end users must do drill through from a data mart to a data warehouse to find the details behind the summary data in their reports, which they may not aware.

2.2 Data Mart Architecture:

Data mart architectures can be built as two-tier or three-tier database application. The data warehouse is the first tier, the data mart is the second tier, and the end-user workstation is the third tier and the data is distributed among all the three tiers.

The typical characteristics that differentiate data marts and data warehouses are as follows:

• A data mart focuses only on the requirements of a particular department or a business function.

• Data marts do not normally contain detailed operational data unlike data warehouses.

• As data marts contain less data compared with data warehouses, data marts are more easily understood and navigated.

3. Purpose of Data Marts

• End users will get much better performance while querying data from a data mart than from a data warehouse

• End users will have a much easier time navigating through data marts

• Data marts are structured to access data in a rapid (quick or faster) way

• Data marts are frequently referred to as high performance query structures

• Data marts help power users to slice and dice the data as they like

• The data mart can be built much more quickly and cheaply than a warehouse.

• Data marts doesn’t require for a great amount of organizational hassle or discipline and no concern for the long-term architecture

• The data mart granularity is much higher than that found in the data warehouse, and couldn’t easily decompose the data mart granularity into data warehouse granularity.

4. Can a data mart replace a data warehouse?

The data mart cannot replace the corporate data warehouse because it is local to one particular department or functional unit or for a particular unit of time. As a changing management, data mart generated directly by taking the data from the OLTP systems instead of taking data from a data warehouse is likely to fail.

The reasons for sourcing data from a warehouse:

• Avoids repetition of extraction:

Assume that more than one data mart needs data from a common set of source systems. In such a case, every mart accesses the sources independently and each mart should cleanse and integrate the required data, which results in a lot of duplication of effort.

In the presence of a data warehouse this does not happen since the cleaning and integration activities are performed only once.

• Standard interpretation of enterprise data is ensured:

Routing all the data through a common integration layer ensures all downstream systems have a common understanding about the meaning of each data item and its usage.

4.1 Data Warehouse Vs Data Mart:

Data Warehouse Data Mart
Enterprise data warehouse is application-neutral. The task of the data warehouse is first to store and then supply information to different users Data Mart is application specific.
Data warehouse is designed to solve a variety of business problems Data mart is designed to solve a single business problem
Provides solutions to the entire organization Provides solutions to a part of an organization
Incorporates information about many subject areas i.e. the entire enterprise Incorporates information about one or more subject areas. It represents only a portion of an enterprise’s data. It may be data related to a business unit or work group
It is targeted to a huge volume of end users. There is a big scope of the warehouse It is targeted to a smaller audience of end users or used to present information on a smaller scope
Difficult to build Easy to build
The structure of the data in the data warehouse is normalized structure The structure of the data in the data mart is commonly a star join structure
It contains robust amounts of history. It contain only modes amounts of history

5. Reasons for Creating a Data Mart

• To give users access to the data they need to analyze most often.

• To provide data in a form that matches the collective view of the data for a group of users in a department or business function area.

• To improve end-user response time. Because the data mart is small in size, it provides the information in a faster manner.

• To provide appropriately structured data as dictated by the requirements of the end-user access tools.

• Building a data mart is simpler compared to establishing a corporate data warehouse

• The cost of implementing data marts is normally less than that required for creating a corporate data warehouse

• Potential users of a data mart are more clearly defined and can be more easily targeted to obtain support for a data mart project rather than a corporate data warehouse project.

• Creating a data mart can be a learning process for a larger data-warehousing project.

• Creating a data mart take only a few weeks and can be earning its keep with a comparably lower investment of time and money.

6. Metadata for a data mart

Metadata (or data about data) describes the details about the data in a data warehouse or in a data mart. Such description may be in terms of the contents and source of data that flows into the data warehouse or data mart. The following are the components of metadata for a given data warehouse or data mart:

• Description of sources of the data

• Description of customization that may have taken place as the data passes from data warehouse into data mart

• Descriptive information about data mart, its tables, attributes and relationship, etc

• Definitions of all types

7. Data model for a data mart

A formal data model is required to be built for a large data mart, which may require certain amount of data processing. This processing could be repetitive or predictable. For example, in case of multidimensional DBMS, no separate format model can be built, as the multidimensional data model itself is necessity for the DBMS. Data marts that do not have to be compliant to a particular DBMS can be modeled in terms of a formal data model which will take care of both summary and detailed levels of data in the particular context of the application for a given department. There are various modeling of data marts are given below:

• Star Schema

• Snowflake Schema

• Star flake Schema

The most popular data model for a data warehouse is the multi-dimensional model. It may be star schema, snowflake schema or star flake schema.

8. Data Mart Security

In a competitive business environment, it is important to have information security in a data mart. Typically, sensitive information includes financial information, medical records and human resources information, etc. The data mart administrator should make necessary security arrangements such as: firewalls; log on/off security; application based security; DBMS security (VIEW based security); encryption and decryption.

9. Data Marts Issues

The various issues associated with the development and management of data marts are listed below (Brooks, 1997).

• Data mart functionality

• Data mart size

• Data mart load performance

• Users access to data in multiple data marts

• Data mart Internet / Intranet access

• Data mart administration

• Data mart installation

The capabilities of data marts have increased with the growth in their popularity. Rather than being simply small, easy-to-access databases, some data marts must now be scalable to hundreds of gigabytes, and provide sophisticated analysis using online analytical processing and/ or data mining tools. The complexity and size of some data marts are matching the characteristics of small-scale corporate data warehouses.

The performance of data marts deteriorates as data marts grow in size. There is a need to find out ways to reduce the size of data marts. Dynamic dimensions allow aggregations to be calculated on demand rather than pre-calculated and stored in the multi-dimensional database (MDDB) cube.

A data mart has to balance two critical components: end-user response time and data loading performance. A data mart designed for fast user response will have a large number of summary tables and aggregate values. Unfortunately, the creation of such tables and values greatly increases the time of the load procedure.

One approach is to replicate data between different data marts or, alternatively, build virtual data marts. Virtual data marts are views of several physical data marts or the corporate data warehouse tailored to meet the requirements of specific groups of users. Products that manage virtual data marts are already commercially available.

Internet/ intranet technology offers users low-cost access to data marts and the data warehouse using web browsers. Data mart Internet/intranet products normally sit between a web server and the data analysis product.

Once the number of data marts in an organization increases there is a need to manage and coordinate all the activities of the data marts centrally. Once data is copied to data marts, data can become inconsistent as users alter their own data marts to allow them to analyze data in different ways. Organizations cannot easily perform administration of multiple data marts, giving rise to issues such as data mart versioning, data and meta-data consistency and integrity, enterprise-wide security, and performance tuning. Data marts are becoming increasingly complex to build. There are several vendors that offer products usually referred to as ‘data marts in a box’ that provide a low-cost source of data marts tools.

No comments:

Post a Comment


Thanks for your visit to this blog.
We would be happy with your Queries/Suggestions.