readme.md

--------------------------------

Project: An Introduction to WEKA: The All-in-One Machine Learning Software in Java

Author: Pavic, Jakov

--------------------------------

Required Programs:

- WEKA 3.8.6
- Java 8 (or newer)

--------------------------------

Following Java programs are included:

1) data_preparatio.java

	- The whole data preparation process is put into a single java program since the individual steps partly build up on each other, but the code is structured and commented in a way, that it is clear what part is responsible for which task 
	
	- Imports the data sets: housing.csv, housing_newColumns.arff, housing_newRows.csv
	- Combines all three data sets into one
	- Gives a summary of the new combined data set
	- Filters the data by removing one column and selecting only rows where the house prices are over 170,000
	- Cleans the data by removing duplicates, missing values. Additionally one column is renamed
	- Creates a new column price_per_sqrm
	- Saves the new data set into the current directory as output.arff (output.csv is possible as well if you uncomment the code)
	- Splits the data into a training and testing data set (for supervised machine learning algorithms)

	-Input: housing.csv, housing_newColumns.arff, housing_newRows.csv
	-Output: output.arff

2) linear_regression.java

	- Computes a linear regression on housing_prices.csv

	- Input: housing_prices.csv
	- Output: linear regression model + evaluation summary

3) logistic_regression.java

	- Computes a logistic regression on weather.nominal.arff

	- Input: weather.nominal.arff
	- Output: logistic regression model + evaluation confusion matrix

4) decision_tree.java

	- Creates a decision tree model on weather.nominal.arff

	- Input: weather.nominal.arff
	- Output: decision tree model + evaluation confusion matrix	

5) random_forest.java

	- Creates a random forest model on weather.nominal.arff

	- Input: weather.nominal.arff
	- Output: random forest model + evaluation confusion matrix	

6) support_vector_machine.java

	- Creates a support vector machine model on weather.nominal.arff

	- Input: weather.nominal.arff
	- Output: support vector machine model + evaluation confusion matrix	


7) naive_bayes.java

	- Creates a naive bayes machine model on weather.nominal.arff

	- Input: weather.nominal.arff
	- Output: naive bayes model + evaluation confusion matrix

8) k_nearest_neighbor.java

	- Creates a k-nearest neighbor model on weather.nominal.arff

	- Input: weather.nominal.arff
	- Output: k-nearest neighbor model + evaluation confusion matrix


9) principal_components_analysis.java

	- Computes the principal components analysis algorithm to reduce the dimensionality of house_prices.csv

	- Input: house_prices.csv
	- Output: new data set with less dimensions

10) k_means_clustering.java

	- Computes the k-means clustering algorithm on weather.nominal.arff 
	
	- Input: wether.nominal.arff
	- Output: new data set divided into clusters

11) data_visualization.java

	- Creates a JFRAME with a plot interface about housing_prices.csv 	

	- Input: house_prices.csv
	- Output: JFRAME with plot interface 

--------------------------------

How to run a program in Windows:

	Step 1: Press windows key + r

	Step 2: Type in "cmd" and press ok

	Step 3: Change the directory to where the program is: "cd the_directory_where_program_is"
		
		-e.g. if the file is on your desktop the command is: "cd desktop/nutshell_examples/x" where x is replaced by either: data_preparation, supervised_learning, unsupervised_learning or data_visualization

	Step 4: Compile the java program with following command: "javac -cp weka.jar program_name.java"

		- With "-cp" the class path is set
		- ATTENTION: For this command it is required to have the weka.jar file in the same directory as the program is !!!
		- If no errors occured, a file called program_name.class will be created in the current directory

	Step 5: Run the program with following command: "java -cp .;weka.jar program_name"
		
		- ";" seperates class paths and "." denotes the current classpath where the file program_name.class should be

	Step 6: Enjoy trying out your first machine learning examples with WEKA


If you are using Linux open a CLI and do Steps 3 to 6. You only have to replace the classpath seperator ";" by ":"

--------------------------------

Data sets:

	- housing.csv: A data set about house prices generated by a LLM (ChatGPT)
	- housing_newColumns.arff: A data set adding new columns to housing.csv generated by a LLM (ChatGPT)
	- housing_newRows.csv: A data set adding new rows to housing.csv + housing_newColumns.arff generated by a LLM (ChatGPT)
	- house_prices.csv: A bigger data set about house prices generated by a LLM (ChatGPT)
	- weather.nominal.arff: A data set containing information about the weather + a variable denoting if they played or not (based on the weather). This data set comes with WEKA and therefore runs under the GNU General Public License

--------------------------------

Copyright 2023 Pavic, Jakov

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
	