WEKA

 


INTRODUCTION ON WEKA

  • WEKA (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand.
  • WEKA is an open source application that is freely available under the GNU general public license agreement. Originally written in C, the WEKA application has been completely  rewritten in Java and is compatible with almost every computing platform. 
  • It is user friendly with a graphical interface that allows for quick set up and operation. WEKA operates on the predication that the user data is available as a flat file or relation. This means that each data object is described by a fixed number of attributes that usually are of a specific type, normal alpha-numeric or numeric values. The WEKA application allows novice users a tool to identify hidden information from database and file systems with simple to use options and visual interfaces.
  •  Weka contains a collection of visualization tools and algorithms for data analysis and predictive modeling. 
  • Weka is a data mining or Machine learning tool which is developed by Waikato University in New Zealand. 
Advantages of Weka

       Free availability under the GNU General Public License.

    Portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform.

       A comprehensive collection of data preprocessing and modeling techniques.

Installation of WEKA 

       To install WEKA on computer,  visit

       https://waikato.github.io/weka-wiki/downloading_weka/

        Download the installation file.

       WEKA supports installation on Windows, Mac OS X and Linux. 


Launching WEKA

The WEKA GUI Chooser window is used to launch WEKA’s graphical environments. At the bottom of the window are four buttons:


1. Explorer. An environment for exploring data with WEKA.

2. Experimenter. An environment for performing experiments and conducting.

3. Knowledge Flow. This environment supports essentially the same functions as the incremental learning.

4. Simple CLI. Provides a simple command-line interface that allows direct execution of WEKA commands for operating systems that do not provide their own command line Interface.

The WEKA Explorer

Section Tabs

At the very top of the window, just below the title bar, is a row of tabs. When the Explorer is first started only the first tab is active; the others are grayed out. This is because it is necessary to open (and potentially pre-process) a data set before starting to explore the data.

The tabs are as follows:

1. Preprocess. Choose and modify the data being acted on.

2. Classify. Train and test learning schemes that classify or perform regression.

3. Cluster. Learn clusters for the data.

4. Associate. Learn association rules for the data.

5. Select attributes. Select the most relevant attributes in the data.

6. Visualize. View an interactive 2D plot of the data.

Once the tabs are active, clicking on them flicks between different screens, on which the respective actions can be performed. The bottom area of the window (including the status box, the log button, and the WEKA bird) stays visible regardless of which section you are in.

I. Preprocess Tab

The first step in machine learning is to preprocess the data. In the Preprocess tab, you can select the data file, process it and make it fit for applying the various machine learning algorithms. Data preprocessing is a data Science approach or mechanism that involves transforming raw data into an understandable (high quality) format.  Real-world data is often incomplete, inconsistent, noisy, redundant values, missing values, dirty values and/or lacking in certain behaviors or trends and is likely to contain many errors. The Preprocessing Steps consists of

1.     Data Cleaning

               a) Missing Values

               b) Noisy Data

               c) Data Cleaning as a Process

2      Data Integration

               a) Correlation analysis (Numerical)

        b) Correlation analysis (Categorical)

               c) Chi-Square Test

3.     Data Transformation

        a) Smoothing

               b) Aggregation

               c) Normalization

               d) Generalization

               e) Attribute Construction

4.     Data Reduction

            a) Data Cube Aggregation

            b) Attribute Subset Selection

            c) Dimensionality Reduction

            d) Numerosity Reduction

4.     Data Discretization and Concept Hierarchy Generation

            a) Discretization and Concept Hierarchy Generation for Numerical Data

            b) Concept Hierarchy Generation for Categorical Data

 

II Classify Tab

The Classify tab provides us several machine learning algorithms for the classification of data. To list a few, you may apply algorithms such as Linear Regression, Logistic Regression, Support Vector Machine, Decision Tree, Random forest, RandomForest, Naïve Bayes Classification, and Rule-based Classification and so on.

III Cluster Tab

Under cluster tab, we have different Clustering algorithms such as Partitioning Methods, Hierarchical Methods, Density-Based Methods, Grid-Based Methods

IV Associate  Tab

Under this tab, Many efficient and scalable algorithms have been developed for frequent itemset mining from which association and correlation rules can be derived. Scalable mining methods: Three major approaches

1.       Apriori Algorithm - Agrawal & Srikant.

2.       Frequent Pattern growth – FPgrowth - Han, Pei & Yin

3.       Vertical data format approach – Charm Zaki & Hsiao

V Select Attributes Tab

Select Attributes allows you feature selections based on several algorithms such as Classifier Subset Eval, Principal Components, etc.

VI Visualize Tab

Lastly, the Visualize option allows us  to visualize your processed data for analysis.





       References

       Jump up to:a b Witten, Ian H.; Frank, Eibe; Hall, Mark A.; Pal, Christopher J. (2011). "Data Mining: Practical machine learning tools and techniques, 3rd Edition". Morgan Kaufmann, San Francisco (CA). Retrieved 2011-01-19.

       https://en.wikipedia.org/wiki/Weka_(machine_learning)

   https://www.tutorialspoint.com/weka/weka_loading_data.htm













Comments

Popular posts from this blog

Machine Learning