Wednesday, October 9, 2019

Data mining Essay Example | Topics and Well Written Essays - 3000 words

Data mining - Essay Example Automated prospective analysis provided by the data mining techniques, as will be discussed below, go beyond the simple analysis of past records as availed by the retrospective tools used in decision support systems (DSS). These techniques of data mining were fundamentally as a result of the predominant long processes of research and product developments, with the first pressing need as to help in business data collection, storage and retrieval. Considering every aspects of data mining, the commonly used techniques are: Artificial neural networks Biclustering PageRank Genetic algorithms Nearest neighbor methods Rule indications. A) Data Mining Classification over large database 1. The kNN: k-nearest neighbor classification This algorithm is works by memorizing the entire training data and performing classification on conditions that the attributes of the test object matches either of the training samples accurately. The kNN seeks a collection of k objects within the training set whic h closely associates with test object, and based the assignment of an indication on the predominance of any particular class in its neighborhood. The key factors in this algorithms include the distance or similarity metric to compute distance that exist between objects; a set of the labeled objects; and the number of nearest neighbor (value of k). Advantages It is simple and easy to understand It is easy to implement its classification techniques. It can also perform so well in varied situations, hence its maximum usability. It is known for its suitability for multi-modal classes and applications in which an object is able to have a number of class labels. Disadvantages The choice of k is a limiting factor. If it (k) is too small, the result would be very sensitive to noise points. While if k is too large, the neighborhood is likely to comprise of a large number of points even from other classes. This test limits the numbers of tests records to be classified since it is true that su ch test records will not in most instances match any of the training records to the latter as recommended. The approach of combining the class labels is also considered as very complicated. 2. Page Rank This is classified as a search ranking algorithm that uses hyperlinks on the World Wide Web. Page Rank techniques produce static rankings of the Web pages in a manner that Page Rank value is accurately computed for each and every page that is off-line without depending on the search queries; but rather on the democratic nature of the World Wide Web through the use of its wide link architecture as an indicator of any individual page quality. It is worth noting that these features have helped in the success of the famous Google search engine. Advantages It is quite dependable as its outputs are always accurate and precise. It is simple and efficient to use once one has the knowledge and skills of its usability principle. Disadvantages Database search outcomes are based on literal (keyw ords, Meta data, and tags) items rather than on their actual meanings. Poor ranking of Web pages in different topological Web structures. I.e. in Google’s ranking algorithm. Less page ranks and too much time taken to list and gain high ranks for the new pages. Subsequent quotation of inaccurate information on different web pages may lead to indexing of such inaccurate pages, hence resulting to a mess of fiction. 3. Naive Bayes Advantages It is

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.