Classification¶

In this example we are building a model that classifies iris flowers.

Uncomment to install modules

[11]:

# !pip3 install Cython
# !pip3 install hana_automl

[12]:

try:
    from hana_automl.automl import AutoML
    import pandas as pd
    from hana_ml.dataframe import ConnectionContext
    from hana_automl.storage import Storage
except ImportError:
    sys.exit("""You need to install hana_automl and pandas. Uncomment cell above""")

[13]:

test_df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/test_iris.csv', index_col='Unnamed: 0')
df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/iris.csv', index_col='Unnamed: 0')
df.head()

[13]:

	ID	sepal_length	sepal_width	petal_length	petal_width	species
0	30	4.8	3.1	1.6	0.2	setosa
1	31	5.4	3.4	1.5	0.4	setosa
2	32	5.2	4.1	1.5	0.1	setosa
3	33	5.5	4.2	1.4	0.2	setosa
4	34	4.9	3.1	1.5	0.1	setosa

Pass credentials to the database.

[14]:

# Replace with your credentials
cc = ConnectionContext(address='address',
                       port=39015, # default for most databases. Details here: https://help.sap.com/viewer/0eec0d68141541d1b07893a39944924e/2.0.03/en-US/b250e7fef8614ea0a0973d58eb73bda8.html
                       user='user',
                       password='password')

[15]:

automl = AutoML(connection_context=cc)

[ ]:

automl.fit(
    df=df,
    task='cls', # if task = None, we'll determine it for you
    steps=10,
    target='species',
    table_name='CLASSIFICATION', # optional
    categorical_features=['species'],
    id_column='ID', # optional
    verbose=False
)

Save model

[17]:

storage = Storage(connection_context=cc, schema='DEVELOPER')
automl.model.name = "iris" # don't forget to specify the name
storage.save_model(automl=automl)
storage.list_models()

[17]:

	NAME	VERSION	LIBRARY	CLASS	JSON	TIMESTAMP	MODEL_STORAGE_VER
0	iris	1	PAL	hana_ml.algorithms.pal.neural_network.MLPClass...	{"model_attributes": {"activation": "sin_asymm...	2021-05-29 17:33:15	1

Load model and predict

[18]:

new_model = storage.load_model('iris')
new_model.predict(df=test_df, id_column='ID')

Creating table with name: AUTOML2af7880c-467f-437c-b3be-b1c519a7678e

100%|██████████| 1/1 [00:00<00:00,  6.17it/s]

Preprocessor settings: mean
Prediction results (first 20 rows):
     ID  TARGET     VALUE
0    0  setosa  0.577740
1    1  setosa  0.580586
2    2  setosa  0.580305
3    3  setosa  0.580340
4    4  setosa  0.576891
5    5  setosa  0.578495
6    6  setosa  0.580451
7    7  setosa  0.579999
8    8  setosa  0.579468
9    9  setosa  0.580387
10  10  setosa  0.574936
11  11  setosa  0.580524
12  12  setosa  0.580249
13  13  setosa  0.579009
14  14  setosa  0.542972
15  15  setosa  0.562199
16  16  setosa  0.574057
17  17  setosa  0.579634
18  18  setosa  0.577878
19  19  setosa  0.577438

[18]:

	ID	TARGET	VALUE
0	0	setosa	0.577740
1	1	setosa	0.580586
2	2	setosa	0.580305
3	3	setosa	0.580340
4	4	setosa	0.576891
5	5	setosa	0.578495
6	6	setosa	0.580451
7	7	setosa	0.579999
8	8	setosa	0.579468
9	9	setosa	0.580387
10	10	setosa	0.574936
11	11	setosa	0.580524
12	12	setosa	0.580249
13	13	setosa	0.579009
14	14	setosa	0.542972
15	15	setosa	0.562199
16	16	setosa	0.574057
17	17	setosa	0.579634
18	18	setosa	0.577878
19	19	setosa	0.577438
20	20	setosa	0.580425
21	21	setosa	0.579509
22	22	setosa	0.569415
23	23	setosa	0.570043
24	24	setosa	0.578701
25	25	setosa	0.579957
26	26	setosa	0.578694
27	27	setosa	0.578688
28	28	setosa	0.578416
29	29	setosa	0.580325

Cleanup storage

[19]:

storage.clean_up()

For more information, visit AutoML class and Storage class in documentation