Mar 29, 2019 How to Convert CSV to ARFF. This wikiHow teaches you how to convert a.CSV (comma-separated values) file to the.ARFF (attribute-relation file format) format. If you're working in Weka, you have a built-in tool that will convert. Library for reading and writing Weka attribute-relation file format (ARFF) files - chausner/ArffTools.
An ARFF (Attribute-Relation File Format) file is an ASCII text file thatdescribes a list of instances sharing a set of attributes. ARFF files weredeveloped by the Machine Learning Project at the Department of Computer Scienceof The University of Waikato for use with the Weka machine learning software. Overview
ARFF files have two distinct sections. The first section is the Headerinformation, which is followed the Data information.The Header of the ARFF file contains the name of the relation, a listof the attributes (the columns in the data), and their types. An example headeron the standard IRIS dataset looks like this:
The Data of the ARFF file looks like the following:
Lines that begin with a % are comments. The @RELATION, @ATTRIBUTEand @DATA declarations are case insensitive.
Examples
Several well-known machine learning datasets are distributed with Weka inthe $WEKAHOME/data directory as ARFF files.![Arff file to csv Arff file to csv](/uploads/1/2/6/2/126257929/284754715.png)
The ARFF Header Section
The ARFF Header section of the file contains the relation declaration andattribute declarations.The @relation Declaration
The relation name is defined as the first line in the ARFF file. The formatis:where <relation-name> is a string. The string must bequoted if the name includes spaces.
The @attribute Declarations
Attribute declarations take the form of an orderd sequence of @attributestatements. Each attribute in the data set has its own @attributestatement which uniquely defines the name of that attribute and it's data type.The order the attributes are declared indicates the column position in the datasection of the file. For example, if an attribute is the third one declaredthen Weka expects that all that attributes values will be found in the thirdcomma delimited column.The format for the @attribute statement is:
where the <attribute-name> must start with analphabetic character. If spaces are to be included in the name then the entirename must be quoted.
The <datatype> can be any of the four types currently (version3.2.1) supported by Weka: - numeric
- <nominal-specification>
- string
- date [<date-format>]
where <nominal-specification> and <date-format>are defined below. The keywords numeric, string and dateare case insensitive.
Numeric attributes
Numeric attributes can be real or integer numbers.Nominal attributes
Nominal values are defined by providing an <nominal-specification>listing the possible values: {<nominal-name1>, <nominal-name2>,<nominal-name3>, ...}For example, the class value of the Iris dataset can be defined as follows:
Values that contain spaces must be quoted.
String attributes
String attributes allow us to create attributes containing arbitrary textualvalues. This is very useful in text-mining applications, as we can createdatasets with string attributes, then write Weka Filters to manipulate strings(like StringToWordVectorFilter). String attributes are declared as follows:Date attributes
Date attribute declarations take the form:where <name> is the name for the attribute and<date-format> is an optional string specifying how date values should beparsed and printed (this is the same format used by SimpleDateFormat). Thedefault format string accepts the ISO-8601 combined date and time format:'yyyy-MM-dd'T'HH:mm:ss'.
Dates must be specified in the data section as the corresponding stringrepresentations of the date/time (see example below). ARFF Data Section
The ARFF Data section of the file contains the data declaration line and theactual instance lines.The @data Declaration
The @data declaration is a single line denoting the start of the datasegment in the file. The format is:The instance data
Each instance is represented on a single line, with carriage returnsdenoting the end of the instance.Attribute values for each instance are delimited by commas. They must appearin the order that they were declared in the header section (i.e. the datacorresponding to the nth @attribute declaration is always the nth fieldof the attribute).
Missing values are represented by a single question mark, as in:
Values of string and nominal attributes are case sensitive, and any thatcontain space must be quoted, as follows:
Dates must be specified in the data section using the string representationspecified in the attribute declaration. For example:
Sparse ARFF files
Sparse ARFF files are very similar to ARFF files, but data with value 0 arenot be explicitly represented.Sparse ARFF files have the same header (i.e @relation and @attributetags) but the data section is different. Instead of representing each value inorder, like this:
the non-zero attributes are explicitly identified byattribute number and their value stated, like this:
Each instance is surrounded by curly braces, and the format for each entryis: <index> <space> <value> where index is the attributeindex (starting from 0). Note that the omitted values in a sparse instance are 0, they are not'missing' values! If a value is unknown, you must explicitlyrepresent it with a question mark (?).
Warning: There is a known problem saving SparseInstance objects fromdatasets that have string attributes. In Weka, string and nominal data valuesare stored as numbers; these numbers act as indexes into an array of possibleattribute values (this is very efficient). However, the first string value isassigned index 0: this means that, internally, this value is stored as a 0.When a SparseInstance is written, string instances with internal value 0 arenot output, so their string value is lost (and when the arff file is readagain, the default value 0 is the index of a different string value, so theattribute value appears to change). To get around this problem, add a dummystring value at index 0 that is never used whenever you declare stringattributes that are likely to be used in SparseInstance objects and saved asSparse ARFF files.