Classification

In this example we are building a model that classifies iris flowers.

Open In Colab

Uncomment to install modules

[11]:
# !pip3 install Cython
# !pip3 install hana_automl
[12]:
try:
    from hana_automl.automl import AutoML
    import pandas as pd
    from hana_ml.dataframe import ConnectionContext
    from hana_automl.storage import Storage
except ImportError:
    sys.exit("""You need to install hana_automl and pandas. Uncomment cell above""")
[13]:
test_df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/test_iris.csv', index_col='Unnamed: 0')
df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/iris.csv', index_col='Unnamed: 0')
df.head()
[13]:
ID sepal_length sepal_width petal_length petal_width species
0 30 4.8 3.1 1.6 0.2 setosa
1 31 5.4 3.4 1.5 0.4 setosa
2 32 5.2 4.1 1.5 0.1 setosa
3 33 5.5 4.2 1.4 0.2 setosa
4 34 4.9 3.1 1.5 0.1 setosa

Pass credentials to the database.

[14]:
# Replace with your credentials
cc = ConnectionContext(address='address',
                       port=39015, # default for most databases. Details here: https://help.sap.com/viewer/0eec0d68141541d1b07893a39944924e/2.0.03/en-US/b250e7fef8614ea0a0973d58eb73bda8.html
                       user='user',
                       password='password')
[15]:
automl = AutoML(connection_context=cc)
[ ]:
automl.fit(
    df=df,
    task='cls', # if task = None, we'll determine it for you
    steps=10,
    target='species',
    table_name='CLASSIFICATION', # optional
    categorical_features=['species'],
    id_column='ID', # optional
    verbose=False
)

Save model

[17]:
storage = Storage(connection_context=cc, schema='DEVELOPER')
automl.model.name = "iris" # don't forget to specify the name
storage.save_model(automl=automl)
storage.list_models()
[17]:
NAME VERSION LIBRARY CLASS JSON TIMESTAMP MODEL_STORAGE_VER
0 iris 1 PAL hana_ml.algorithms.pal.neural_network.MLPClass... {"model_attributes": {"activation": "sin_asymm... 2021-05-29 17:33:15 1

Load model and predict

[18]:
new_model = storage.load_model('iris')
new_model.predict(df=test_df, id_column='ID')
Creating table with name: AUTOML2af7880c-467f-437c-b3be-b1c519a7678e
100%|██████████| 1/1 [00:00<00:00,  6.17it/s]
Preprocessor settings: mean
Prediction results (first 20 rows):
     ID  TARGET     VALUE
0    0  setosa  0.577740
1    1  setosa  0.580586
2    2  setosa  0.580305
3    3  setosa  0.580340
4    4  setosa  0.576891
5    5  setosa  0.578495
6    6  setosa  0.580451
7    7  setosa  0.579999
8    8  setosa  0.579468
9    9  setosa  0.580387
10  10  setosa  0.574936
11  11  setosa  0.580524
12  12  setosa  0.580249
13  13  setosa  0.579009
14  14  setosa  0.542972
15  15  setosa  0.562199
16  16  setosa  0.574057
17  17  setosa  0.579634
18  18  setosa  0.577878
19  19  setosa  0.577438
[18]:
ID TARGET VALUE
0 0 setosa 0.577740
1 1 setosa 0.580586
2 2 setosa 0.580305
3 3 setosa 0.580340
4 4 setosa 0.576891
5 5 setosa 0.578495
6 6 setosa 0.580451
7 7 setosa 0.579999
8 8 setosa 0.579468
9 9 setosa 0.580387
10 10 setosa 0.574936
11 11 setosa 0.580524
12 12 setosa 0.580249
13 13 setosa 0.579009
14 14 setosa 0.542972
15 15 setosa 0.562199
16 16 setosa 0.574057
17 17 setosa 0.579634
18 18 setosa 0.577878
19 19 setosa 0.577438
20 20 setosa 0.580425
21 21 setosa 0.579509
22 22 setosa 0.569415
23 23 setosa 0.570043
24 24 setosa 0.578701
25 25 setosa 0.579957
26 26 setosa 0.578694
27 27 setosa 0.578688
28 28 setosa 0.578416
29 29 setosa 0.580325

Cleanup storage

[19]:
storage.clean_up()

For more information, visit AutoML class and Storage class in documentation