Autonomio v.0.1.2 User Manual¶
This document covers in detail every function of Autonomio. If you’re looking for a high level overview of the capabilities, you might find [Autonomio_Overview] more useful.
Autonomio is very easy to use and it’s highly recommended to memorize the namespace which is less just 3 commands and less than 20 arguments combined. Yet you have an infinite number of network configurations available. To have 100% control over Autonomio’s powerful features, you just have to know three commands.
To train (and save) model:
To test (and load) model:
To load a dataset:
At the moment, the simplest way is to install with pip from the repo directly.
For installing the development version (latest):
pip install git+https://github.com/autonomio/core-module.git
The expected input dataformat is Pandas dataframe. Deep learning is most useful in solving classification problems, and for that we are providing two modes ‘binary’ and ‘categorical’.
- X can be text, integer
- Y can be an integer
The default settings are optimized for making a 1 or 0 prediction and for example in the case of predicting sentiment from tweets, Autonomio gives 85% accuracy out-of-the-box for classifying tweets that rank in the most negative 20% according to NLTK Vader sentiment analysis.
- X can be text, integer
- Y can be an integer or text
- output layer neurons must match number of categories
- change activation_out
It’s not a good idea to have too many categories, maybe 10 is pushing it in most cases.
The absolute minimum use case using an Autonomio dataset is:
from autonomio.commands import * %matplotlib inline train('text','neg',data('random_tweets'))
Using this example and NLTK’s sentiment analyzer as an input for the ground truth, Autonomio yields 85% prediction result out of the box with with nothing but:
There are multiple ways you can input ‘x’ with single input:
train('text' ,'neg', data) # a single column where data is string train(5, 'neg', data) # a single column by index train(['quality_score'], 'neg', data) # a single column by label
And few more ways where you input a list for ‘x’:
train([1,5], 'neg', data) # a range of column index train(['quality_score', 'reach_score'], 'neg', data) # set of column labels train([1,2,4,6,18], 'neg', data) # a list of column index
A slightly more involving example may include changing the number of epochs:
For flattening the options are ‘mean’, ‘median’, ‘none’ and IQR. IQR is invoked by inputting a float:
Dropout is one of the most important aspects of neural network:
You might want to change the number of layers in the network:
Or change the loss of the model:
For a complete list of supported losses see [Keras_Losses]
If you want to save the model, be mindful of using .json ending:
Control the neuron size by setting the number of neurons on the input layer:
Sometimes changing the batch size can improve the model significantly:
By default verbosity from Keras is at mimimum, and you may want the live mode for training:
Even though it’s possible to use Autonomio mostly with few arguments, there are a total 11 arguments that can be used to improving model accuracy:
def train(X,Y,data, dims=300, epoch=5, flatten='mean', dropout=.2, layers=3, model='train', loss='binary_crossentropy', save_model=False, neuron_first='auto', neuron_last=1, batch_size=10, verbose=0):
|X||string, int, float||NA|
|layers||int (2 through 5||3|
Note that the network shape is roughly an upside-down pyramind. To change this you would want to change the code in train_new.py.
Once you’ve trained a model with train(), you can use it easily on any dataset:
Or if you want to see an interactive scatter plot visualization with new y variable:
Whatever y_scatter is set as, will be set as the y-axis for the scatter plot.
To yield the scatter plot, you have to call it specifically:
test_result = test('text',data,'handle','model.json',y_scatter='influence_score') test_result
The only difference between the two modes of test() is if a scatter plot is called:
|X||variable/s in dataframe||NA|
|labels||variable/s in dataframe||NA|
|y_scatter||variable in dataframe||‘mean’|
Dataset consisting of 10 minute samples of 80 million tweets:
4,000 ad funded websites with word vectors and 5 categories:
Data from both buy and sell side and over 10 other sources:
9 years of monthly poll and unemployment numbers:
120,000 tweets with sentiment classification from NLTK:
20,000 random tweets:
The data command is provided for both convinience, and to give the user access to unique deep learning datasets. In addition to allowing access to Autonomio datasets, the function also supports importing from csv, json, and excel. The data importing function is for most cases we face, but is not intended as a replacement to pandas read functions:
|name||dataset or filename||NA|
One of the most common errors you get working with Keras is related with your output layer:
ValueError: Error when checking model target: expected dense_22 to have shape (None, 2) but got array with shape (1000,
This means that your neuron_last does not match the number of categories in ‘y’. Usually you would only see this with in cases where you have an output other than 1 or 0, or when you do have that but for some reason changed neuron_last to something else than 1 from train().
You could have a very similar error message from Keras if your dims is not same as the number of features:
ValueError: Error when checking model input: expected dense_1_input to have shape (None, 300) but got array with shape (1000, 1)
NOTE: Your dims number must be exactly the same as the number of features in your mode (‘x’) except with series of text as an input where the default setting 300 is correct.
If your dims (input layer) is smaller than output layer (neuron_last):
ValueError: Input arrays should have the same number of samples as target arrays. Found 100 input samples and 1 target samples.