欢迎来到留学生英语论文网

当前位置:首页 > 论文范文 > Computer Science

A Novel Model for Mining Association Rules from Semantic Web data

发布时间:2018-05-17
该论文是我们的学员投稿,并非我们专家级的写作水平!如果你有论文作业写作指导需求请联系我们的客服人员

A Novel Model for Mining Association Rules from Semantic Web data

Abstract

Nowadays, there is a continuous growth in the field of ontology and semantic annotations for numerous data of wide-ranging applications. This kind of heterogeneous and complex semantic data has created new challenges in the field of data mining research. An Association Rule Mining is one of the most common data mining techniques which can be well-defined for extracting the interesting relationships among the huge amount of transactions. Additionally, the Semantic Web technologies offer solutions to efficiently use the domain information. Hence this paper proposed a novel method to provide a way to address these issues and allow to process the huge volumes of semantic data. It executes association rule discovery to store the new semantic rules using the concept of semantic richness. It exist in the ontology and apply semantic technologies during all phases of the mining process. A novel method is proposed to efficiently extract items and transactions suited for traditional association rules mining algorithms.

Index Keywords – Semantic Annotated Data, Association Rule Mining, Ontology

I.Introduction

Recently there has been an interest in combining the two research areas: semantic web and data mining. With the help of the standardization of the ontology languages such as OWL and RDFS, the semantic web has been extended and the amount of available semantic annotations used in different applications is significantly increasing.

Mining semantic web data will provide much benefit to lots of domain-specific research communities, where their data are usually complex and heterogeneous, and a large amount of knowledge is encoded in them, in the form of semantic annotations. Therefore, by processing this kind of data and using the semantic richness of it, rules with higher semantic level can be expected. Consequently, the challenges such as homogeneousness and complexity due to the special features of semantic annotated data have to be addressed.

Thus, the semantic annotated data do not have a rigid structure. As a result, there would be structural heterogeneity problems. Moreover, traditional data mining algorithms work with homogeneous datasets which include transactions, subsets of items. However, there are ontology axioms and data instances in the form of triples (subject, predicate, and object) in a repository of semantic data expressed in OWL and RDF. In addition, there is a need to apply reasoning capabilities to make use of implicit semantic knowledge. To conclude, In this paper a novel method is presented to find semantic association rules from semantic web data using semantic web technologies. These types of rules can be applied in decision support systems to help them make more intelligent decisions.

The rest of the paper is organized as follows: Section II describes about the association rule mining algorithm and Section III describes about the semantic data mining, Section IV illustrates the Apriori Algorithm, Proposed Methodology is described in the Section V. Experimental results are illustrated in Section VI. Section VII illustrates the conclusion and Section VIII illustrates the future work.

II.Association Rule Mining Algorithm

Association rule mining algorithms usually have two main steps. The first step is finding frequent itemsets. At this phase, all of the items which meet the support threshold are discovered. The second phase is derivation of association rules. At this stage, based on frequent itemsets discovered in the first phase, association rules that apply in confidence condition will be derived. Since the second step of the association rule derivation can be done in an optimal way, the research mainly focuses on the first step, how to efficiently discover all frequent item sets [1].

Let I = {Ii, i = 1, 2, ..., m} be a set of literals, called items. Let D be a set of transactions, where each transaction T is a set of items such that T ⊆ I. Associated with each transaction is a unique identifier, called its TID. A transaction T contains X, a set of some items in I, if X ⊆ T. An association rule is an implication of the form X ⇒ Y, where X, Y ⊂ I, and X ∩ Y =∅. The rule X⇒Y holds in the transaction set D with confidence c if c% of transactions in D that contain X also contain Y. The rule X ⇒Y has support s in the transaction set D if s% of transactions in D contain X∪Y [22]. Given a set of transactions D, the problem of mining association rules is to discover all association rules with support and confidence greater than the user-specified minimum support and minimum confidence respectively. In 1994, the famous association rules algorithm Apriori was presented by R. Agrawal et al. [2]. From then on, association rules were studied deeper. There are two ways to improve the algorithms to increase mining efficiency: Apriori-based algorithms and not Apriori-based algorithms [3].

III.Semantic Data mining

Some works on semantic data mining are based on the inductive logic programming [4], which uses the underlying logic annotated in the semantic data to learn new concepts. There are other works which apply statistical machine learning methods to deal with ontologies and their annotated semantic data [7], [8], [9], [10], but their representations are not suited for association mining that requires defining the set of items and the transactions from the semantic data.

Another related field is mining tree and graph structured data. In this topic, there are methods used such as frequent sub tree [11] and graph mining [12], whose purpose is to identify frequent substructures in complex and heterogeneous data sets. However, these approaches do not pay enough attention to frequent semantically related contents.

Another issue is the transaction definition according to elimination of the heterogeneity that exists in semantic web datasets. In [13], [14] it is said that XQuery is not the most suitable way for extracting data from XML data sources, because the structure of the underlying documents should be known. A better method is using the lowest common ancestor semantics to extract meaningful. Although, undesired combinations of data items might be generated sometimes [15], [16]. To solve the problem the smallest possible context data strategy was proposed and a similar approach was done in [17].

Finally, there are several works focused on integrating knowledge discovery capabilities to SPARQL by extending its grammar. Some examples are [18] that can be used with some data mining algorithms and [19], which extracts complex path relations between resources. [20] has also extended SPARQL grammar to define association rule patterns over the ontological data in a less restrictive way than the one imposed by SPARQL.

IV.Apriori Algorithm

Apriori is a basic algorithm for frequent item set mining and association rule mining over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to generate association rules which highlight general trends in the database.

Apriori (J,)

{large 1 itemsets}

k2

while emptyset

{e = a{b}Λ a b }

for transactions a

{e }

For candidates e

count[e] + 1

{e Λ count [e] }

k+1

In the algorithm J denotes the transactions and e denotes the confidence. This algorithm is rewritten to deal with semantic transactions and accordingly semantic rules, with their predefined format in the ontology, will be resulted. In addition, some factors such as support and confidence should be determined for each particular data set. The achieved rules are expected to be useful in improving intelligent decision support systems.

V.Proposed Methodology

The first step for enabling semantic data mining and discovering new association rules efficiently using our semantic method, is the design and the implementation of the needed ontology for association rule mining concepts. This ontology is required besides the application ontologies for defining the specifications for concepts like item, transaction, and association rule and their properties to be used in the process of semantic data mining algorithm. The method will be different with other methods that have been proposed so far, according to the usage of this ontology and semantic annotated data in all parts of our mining system.

Association rule mining algorithms usually work with a dataset of transactions that each of them contains a subset of items. The semantic annotated data are transformed to semantic transactions, without losing the semantic richness of semantic web data. To implement this idea semantic query language such as SPARQL can be used. Semantic transactions and their items are formatted based on the association rule mining ontology defined earlier. Therefore, they can be linked with other semantic data, and also semantic reasoning can be performed on them to generate new more meaningful transactions.

Therefore, there are two general phases in our semantic association rule mining system, semantic transaction production and running semantic association rule mining algorithm on them. The second phase is implemented based on Apriori. This algorithm is rewritten to deal with semantic transactions and accordingly semantic rules, with their predefined format in the ontology. In addition, some factors such as support and confidence should be determined for each particular data set. The achieved rules are expected to be useful in improving intelligent decision support systems.

Fig.1 shows the flow diagram of the proposed methodology.

VI.Experimental Results and Discussions

In this experiment, the cross-transactions were used as a ground truth rule set with APRIORI. As said before, the evaluation is that the cross-transactions represent obvious associations because the method connects concepts with similar transactions. These associations can therefore be selected by various tested transactions.

Several experiments have been carried out in this paper to prove that our method is able to generate association rule base transactions from Semantic data, which later translates into high quality association rules. Notice, the focus of the work is not on developing new mining algorithms, but on transforming and making complex semantic data amenable to existing ones. In the experiments on real world datasets, the hierarchical Semantic data are based on associativity, it achieved very good results. Apriori, on the contrary, produced some of the best results, partly because the semantic ontology based model is not used. However, its performance could be improved by multiplying by φ as proposed in [18].

Moreover, the combinations of φ with other measures produced also good results. Some of the best results, FOAF Vocabulary Specification 0.99 (It is amachine-readableontologydescribingpersons, their activities and their relations to other people and objects. Anyone can use FOAF to describe him- or herself. FOAF allows groups of people to describesocial networkswithout the need for a centralized database.)

Fig.2 Shows the Tested Transactions Results.

Table.1

S.NO

TITLE

ARM

SWM

1

Algorithm

Association Rule Mining Algorithm

Apriori Algorithm

2

Complexity

High

Low

3

Semantic Mining

Heterogeneous Structured Data

Homogeneous Structured Data

4

Efficiency

Low

High

5

Rule Patterns

Association Rules

Semantic Rules

6

Transaction

Dataset of transaction

Semantic Transaction and Production

The table shows the comparison between the associating rule mining (ARM) and semantic web mining (SWM).

VII.Conclusion

The Semantic Web technologies offer solutions to capture and efficiently use the domain knowledge. According to the challenges listed before, it is crucial to apply the knowledge of semantic annotated data based on ontology’s, to produce semantic transactions efficiently. By definition of semantic transactions and their properties in the ontology, overcoming of the heterogeneity of semantic web data is achieved.

The mining process presented in this paper can be performed automatically for any kind of semantic data after extracting semantic transaction using the application ontology. Also, since all parts of the system work based on a uniform ontology, semantic integrity exists in the entire system. Therefore, the rules and their related data are linked to other data sets and semantic technologies such as semantic reasoning can be applied to them. To conclude, In this paper a novel method is presented to find semantic association rules from semantic web data using semantic web technologies. These types of rules can be applied in decision support systems to help them make more intelligent decisions.

VIII.Future Work

As a future work, the generalized query patterns are applied by using the ontology axioms, as well as to automatically discover interesting contexts and their association rules. Moreover, our method could be applied in a variety of different scenarios, where the mining tasks are transaction oriented. An interesting issue for future work is to use the knowledge encoded in the ontology in order to filter and prune discovered rules, and also to express the user goals. Another important direction worth exploring concerns the combination of clustering and association mining algorithms to summarize document collections.

This technique was formerly introduced through the Frequent Item set based Hierarchical Clustering (FIHC). Basically, the FICH algorithm generates clusters from frequent item sets, which in turn constitute the cluster descriptors. Several enhancements of this algorithm have been proposed since then. Recently, proposed a novel approach also based on frequent item pairs that provides more homogeneous clusters and better descriptions than those obtained with FIHC. Alternative research lines, which are out of the scope of the present work, consist in applying more sophisticated data mining algorithms to the generated transactions and study their performance. Equally interesting is to devise new data mining algorithms that take profit from the semantically enriched items of the generated transactions.

上一篇:Preventing data leakage detection by automation segmentation 下一篇:A Novel Model for Mining Association Rules from Semantic Web data