WEKA
INTRODUCTION ON WEKA
- WEKA (Waikato
Environment for Knowledge Analysis) is a popular suite of machine learning
software written in Java, developed at the University of Waikato, New
Zealand.
- WEKA is an open source application that is freely available under the GNU general public license agreement. Originally written in C, the WEKA application has been completely rewritten in Java and is compatible with almost every computing platform.
- It is user friendly with a graphical interface that allows for quick set up and operation. WEKA operates on the predication that the user data is available as a flat file or relation. This means that each data object is described by a fixed number of attributes that usually are of a specific type, normal alpha-numeric or numeric values. The WEKA application allows novice users a tool to identify hidden information from database and file systems with simple to use options and visual interfaces.
- Weka contains a collection of visualization tools and algorithms for data analysis and predictive modeling.
- Weka is a data mining or Machine learning tool which is developed by Waikato University in New Zealand.
• Free availability under the GNU General Public License.
• Portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform.
• A comprehensive collection of data preprocessing and modeling techniques.
Installation of WEKA
• To install WEKA on computer, visit
• https://waikato.github.io/weka-wiki/downloading_weka/
• Download the installation file.
• WEKA supports installation on Windows, Mac OS X and Linux.
Launching WEKA
The WEKA GUI Chooser window is used to
launch WEKA’s graphical environments. At the bottom of the window are four
buttons:
1. Explorer. An environment for exploring
data with WEKA.
2. Experimenter. An environment for
performing experiments and conducting.
3. Knowledge Flow. This environment
supports essentially the same functions as the incremental learning.
4. Simple CLI. Provides a simple
command-line interface that allows direct execution of WEKA commands for
operating systems that do not provide their own command line Interface.
The WEKA Explorer
Section Tabs
At the very top of the window, just
below the title bar, is a row of tabs. When the Explorer is first started only
the first tab is active; the others are grayed out. This is because it is
necessary to open (and potentially pre-process) a data set before starting to
explore the data.
The tabs are as follows:
1. Preprocess. Choose and modify the data
being acted on.
2. Classify. Train and test learning
schemes that classify or perform regression.
3. Cluster. Learn clusters for the data.
4. Associate. Learn association rules for
the data.
5. Select attributes. Select the most
relevant attributes in the data.
6. Visualize. View an interactive 2D plot
of the data.
Once the tabs are active, clicking on
them flicks between different screens, on which the respective actions can be
performed. The bottom area of the window (including the status box, the log
button, and the WEKA bird) stays visible regardless of which section you are
in.
I. Preprocess Tab
The first step in machine learning is to
preprocess the data. In the Preprocess tab, you can select the
data file, process it and make it fit for applying the various machine learning
algorithms. Data preprocessing is a data Science approach or mechanism that
involves transforming raw data into an understandable (high quality) format. Real-world data is often incomplete,
inconsistent, noisy, redundant values, missing values, dirty values and/or
lacking in certain behaviors or trends and is likely to contain many
errors. The Preprocessing Steps consists of
1.
Data
Cleaning
a) Missing Values
b) Noisy Data
c) Data Cleaning as a Process
2
Data Integration
a) Correlation analysis
(Numerical)
b) Correlation analysis (Categorical)
c) Chi-Square Test
3.
Data
Transformation
a) Smoothing
b) Aggregation
c) Normalization
d) Generalization
e) Attribute Construction
4.
Data
Reduction
a) Data Cube Aggregation
b)
Attribute Subset Selection
c)
Dimensionality Reduction
d)
Numerosity Reduction
4.
Data
Discretization and Concept Hierarchy Generation
a)
Discretization and Concept Hierarchy Generation for Numerical Data
b)
Concept Hierarchy Generation for Categorical Data
II Classify Tab
The Classify tab
provides us several machine learning algorithms for the classification of data.
To list a few, you may apply algorithms such as Linear Regression, Logistic
Regression, Support Vector Machine, Decision Tree, Random forest, RandomForest,
Naïve Bayes Classification, and Rule-based Classification and so on.
III Cluster Tab
Under cluster tab, we have different Clustering algorithms such as Partitioning Methods, Hierarchical Methods, Density-Based Methods, Grid-Based Methods
IV Associate Tab
Under this tab, Many efficient and scalable algorithms have been developed for frequent itemset mining from which association and correlation rules can be derived. Scalable mining methods: Three major approaches
1. Apriori Algorithm - Agrawal & Srikant.
2. Frequent Pattern growth – FPgrowth - Han, Pei & Yin
3. Vertical data format approach – Charm Zaki & Hsiao
V Select Attributes Tab
Select Attributes allows you feature selections based on
several algorithms such as Classifier Subset Eval, Principal Components, etc.
VI Visualize Tab
Lastly, the Visualize option
allows us to visualize your processed
data for analysis.
•
References
•
Jump up to:a b Witten, Ian H.; Frank,
Eibe; Hall, Mark A.; Pal, Christopher J. (2011). "Data Mining:
Practical machine learning tools and techniques, 3rd Edition". Morgan
Kaufmann, San Francisco (CA). Retrieved 2011-01-19.
•
https://en.wikipedia.org/wiki/Weka_(machine_learning)
https://www.tutorialspoint.com/weka/weka_loading_data.htm
Comments
Post a Comment