Pre-Processing – Before we can use data mining algorithms, a specific data set must be assembled. The data must be large enough for a pattern to form but should also be concise enough to be mined within an acceptable time limit. This data is obtained from data warehouses. Pre-processing is important to identify multivariate sets in the raw data. The target set is then cleaned, a process called as data cleaning, which removes noise and incomplete entries.
Data mining – It involves the following steps;
Anomaly detection(Outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors that require further investigation. Association rule learning(Dependency modeling) – Searches for relationships between variables. Clustering – is the task of discovering groups and structures in the data that are in some way or another ""similar"", without using known structures in the data. Classification – is the task of generalizing known structure to apply to new data. Regression – attempts to find a function which models the data with the least error. Summarization – providing a more compact representation of the data set, including visualization and report generation.
Results validation – the patterns that are generated are verified by data-mining-algorithms. If these patterns do not meet the desired standards, then the pre-processing and data-mining-algorithms need to be changed. If these patterns meet the desired standards, these patterns are interpreted correctly for future decision making.