英语dissertation-数据挖掘技术在保险行业中的决策研究
www.ukthesis.com
06-06, 2014
1 Introduction
With the rapid development of database technology and database management systems widely used, more and more data accumulate all walks of life. Growing surge of data hidden behind a lot of important information that people want to be able to be a higher level of analysis in order to make better use of the data. The current database systems can efficiently implement data entry, query, statistics and other functions, but can not find the data relationships and rules exist, can not be based on existing data to predict future trends. Lack of knowledge hidden behind data mining tools, led to the "data explosion but knowledge poor" phenomenon.
With the development of computer and network technology, access to a particular industry relevant information has been feasible. For large quantities, involving a wide range of data, relying on the traditional simple summary of the specified model to analyze the statistical methods of data analysis can not be completed. Therefore, an intelligent analysis of information technology - "data mining" (Data Mining) came into being.
Data Mining (Data Mining) is a large, incomplete, noisy, fuzzy, random data to extract implicit in them, people are not known in advance, but is potentially useful information and knowledge in the process . By mining data warehouse to store large amounts of data, and found a new association meaningful patterns and trends in the process. Data mining is a new business information processing technology, is a large number of commercial database business data extraction, transformation, analysis and processing of other models to extract critical data supporting business decisions. So that enterprises in the fierce market competition opportunities. As for the insurance industry, currently has a broad market demand.
2 Item Description
The project has developed "the insurance industry decision system V1.0". The main interface of system operation using ASP programming: data preprocessing, customers to buy insurance analysis, customer buying habits analysis and the results output functions; background database using the Sql Server 2005 network database implementation; mining tools using SPSS Clementine 11.0; experiments in the study stage Apriori algorithm exists for "Storage complexity" and "a lot of redundant rules," two major drawbacks of the algorithm to improve through the use of a pattern tree structure to reduce the complexity of storage Apriori algorithm, while reducing the appearance of redundant rules .
The system consists of: data preprocessing, customers to buy insurance analysis, customer buying habits analysis and the results output and other major functional blocks.
(1) "preprocessing" modules include: upload, data platform, data processing, statistics, and other functions to generate data sets.
#p#分页标题#e#
● Upload: to be completed by all branches Insurance Corporation under the data upload.
● Data Platform: allows the data before uploading data platform to choose.
● Data processing: cleaning up the data, format conversion and other operations.
● Statistics: The preprocessed data analysis, extraction efficacy data.
● generate data sets: the statistical data generating process to extract the active data set, to provide a higher quality data mining data source.
(2) "customers to buy insurance analysis" modules include: data import, parameter setting, result analysis and other functions.
● Data Import: In this user interface, by selecting different data platform will go through "data preprocessing" generated data sets were imported.
● Parameter setting: In this user interface settings "support", "confidence" and other parameters for effective analysis of the data set with the value range of the data record filter.
● Analysis: In this user interface can be "customers to buy insurance analysis," the final results of the analysis to the "report", "chart" format display, the results of this analysis for the industry to provide a "same customer buy our various (sub) insurance "customer information, thus providing the industry" to win customers' decision-making basis.
(3) "customer buying habits of" modules include: data import, parameter setting, result analysis and other functions.
● Data Import: This operation is the same (2) "customers to buy insurance analysis" module "Data Import."
● Parameter setting: In this setting, respectively, "Input Parameters" (including: age, gender, occupation and other basic customer information) and "Output Parameters" (customers buy insurance information).
● Analysis: With this interface can demonstrate customer buying habits analysis, thus providing the industry "to retain customers' decision-making basis.
(4) "analysis result output" modules include: "Analysis of customers to buy insurance" and "customer buying habits analysis" of the print output results.
Three projects improved fast algorithm
Since Apriori algorithm time and space complexity is high and there is a large amount of redundant rules two major defects. Therefore, this project through the use of a pattern tree structure to reduce the complexity of storage Apriori algorithm, while reducing redundant rules appear.
3.1 a pattern tree structure
root is the one labeled as "null" the root, root root following the child's program as a prefix sub-tree collection, as well as project head table composition; tree each node contains four fields user_id, count, node_link, node_next. Which, user_id is user tags (uniquely identifies a user), count for the parent node of the node reaches the number of paths, node_link point to the same tree the user_id next node to the next node, the moment a node does not exist, node_link is null, node_next pointing to its child nodes in the tree; program header table for each table entry contains three fields: user_id, count, head of node, user_id with the same meaning as defined in the tree, count as user_id of the tree and all the same, head of node points to the tree with the same user_id value of the first node pointer.#p#分页标题#e#
3.2 Creating Pattern Tree
Algorithm is as follows:
Let the transaction database as A, one of the items set to Ai.
Algorithm: Patterntree (tree, p), constructed pattern tree
Input: A transaction database user
Output: User mode tree
Procedure Patterntree (T, p)
{Create_ tree (T) ;/ / create a Pattern-Tree root node to "null" mark
t = T; / / t for the current node
While A <> null do
{Read into a transactional database item set Ai
while p! = null
do
{If p.user_id == t ancestors n.user_id
then
{N.count = n.count + l;
t = n;
}
Elseif p.user_id == T kids c.user_id
then
{C.count = c.count + l;
t = c;
}
else
insert_Patterntree (T, p) ;/ / put p as a new node into the tree, as the current node's child nodes
p = p.next;
}
}
}
3.3 pairs pattern tree pruning
Pattern tree is established, there may be a large number of redundant branches, in order to ensure that the data mining results will not be the redundant branches affected by the noise generated, so the need for tree pruning, removing noise information.
Algorithm: SPT (Tree, a), by calling the model tree pruning algorithm
/ / SPT to support pattern tree, ie Supported Access Pattern Tree; a head table for the project
Input: Pattern tree PatternTree, Min_Sup (Pattern Tree minimum support)
Output: After pruning the support pattern tree SPT, mode B = {bi | i = 1,2,3 ...... n}
SPT (Tree, a)
{I = 1;
While (ai! = null) / / for the project head table in a one
{
if (ai.count> = Min_Sup)
then
{
Mode bi = ai.head of node;
p = ai.head of node ;/ / p in the schema tree pointing ai
Location
While (p! = null and ai.count> = Min_Sup)
{
Find the prefix p group, the p-group, and p connection prefix, configuration
Into Mode b;
if (bi.count> = Min_Sup)
then
{
/ / Bi.count the mode p and p b is the base of the prefix
The minimum count
P in the schema bi retain their prefixes base;
bi = bi. node_link
#p#分页标题#e#
}
else
{
Depending on the mode of p and b prefix base deletion
PatternTree the corresponding node, a child node reconfiguration
With the parent node, and modify the project header table ai;
p = p. node_next / / p points in the pattern tree
Next position;
}
}
}
else
{
Modify the project head node ai value;
Delete mode corresponding node in the tree and prefix-based, reconstruction Sons
Node;
i + +;
}
}
}
The establishment of the tree can be avoided through mode multiple scans the transaction database; while taking advantage count field effectively retains the number of itemsets to avoid generating a large number of frequent itemsets, for reducing the complexity of space-time has played a certain role. Tree structure can be avoided through a large amount of redundant rules.
Through the pattern tree pruning, tree can be deducted in the pattern generation process produces a large number of redundant branches, played a role in reducing the space complexity, and can utilize the output mode B production rules, to avoid a number of sets appears frequently, reducing the time complexity.
4 Conclusion
The project tree structure by mode improved Apriori algorithm, Apriori algorithm to make up for the defects. This method is not only capable of Apriori algorithm from time complexity and space complexity to improve on, while avoiding the generation of intermediate rules. This study shows that by using a pattern tree structure to reduce the complexity of storage Apriori algorithm, while reducing the appearance of redundant rules, which improved Apriori algorithm is an effective measure.
如果您有论文代写需求,可以通过下面的方式联系我们
点击联系客服