Data Mining interview questions-
Explain the storage models of OLAP.
MOLAP Multidimensional Online Analytical processing
In MOLAP data is stored in form of multidimensional cubes and not in relational databases.
Advantage
Excellent query performance as the cubes have all calculations pre-generated during creation of the cube.
Disadvantages
It can handle only a limited amount of data. Since all calculations have been pre-generated, the cube cannot be created from a large amount of data.
It requires huge investment as cube technology is proprietary and the knowledge base may not exist in the organization.
ROLAP Relational Online Analytical processing
The data is stored in relational databases.
Advantages
It can handle a large amount of data and
It provides all the functionalities of the relational database.
Disadvantages
It is slow.
The limitations of the SQL apply to the ROLAP too.
HOLAP Hybrid Online Analytical processing
HOLAP is a combination of the above two models. It combines the advantages in the following manner:
For summarized information it makes use of the cube.
For drill down operations, it uses ROLAP.
Define Rollup and cube.
Custom rollup operators provide a simple way of controlling the process of rolling up a member to its parents values.The rollup uses the contents of the column as custom rollup operator for each member and is used to evaluate the value of the member’s parents.
If a cube has multiple custom rollup formulas and custom rollup members, then the formulas are resolved in the order in which the dimensions have been added to the cube.
Data Mining interview questions
Differentiate between Data Mining and Data warehousing.
Data warehousing is merely extracting data from different sources, cleaning the data and storing it in the warehouse. Where as data mining aims to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting, planning strategies, finding meaningful patterns etc.
E.g. a data warehouse of a company stores all the relevant information of projects and employees. Using Data mining, one can use this data to generate different reports like profits generated etc.
What is Data purging?
The process of cleaning junk data is termed as data purging. Purging data would mean getting rid of unnecessary NULL values of columns. This usually happens when the size of the database gets too large.
What are CUBES?
A data cube stores data in a summarized version which helps in a faster analysis of data. The data is stored in such a way that it allows reporting easily.
E.g. using a data cube A user may want to analyze weekly, monthly performance of an employee. Here, month and week could be considered as the dimensions of the cube.
No comments:
Post a Comment