Classification¶
In this example we are building a model that classifies iris flowers.
Uncomment to install modules
[11]:
# !pip3 install Cython
# !pip3 install hana_automl
[12]:
try:
from hana_automl.automl import AutoML
import pandas as pd
from hana_ml.dataframe import ConnectionContext
from hana_automl.storage import Storage
except ImportError:
sys.exit("""You need to install hana_automl and pandas. Uncomment cell above""")
[13]:
test_df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/test_iris.csv', index_col='Unnamed: 0')
df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/iris.csv', index_col='Unnamed: 0')
df.head()
[13]:
ID | sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|---|
0 | 30 | 4.8 | 3.1 | 1.6 | 0.2 | setosa |
1 | 31 | 5.4 | 3.4 | 1.5 | 0.4 | setosa |
2 | 32 | 5.2 | 4.1 | 1.5 | 0.1 | setosa |
3 | 33 | 5.5 | 4.2 | 1.4 | 0.2 | setosa |
4 | 34 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
Pass credentials to the database.
[14]:
# Replace with your credentials
cc = ConnectionContext(address='address',
port=39015, # default for most databases. Details here: https://help.sap.com/viewer/0eec0d68141541d1b07893a39944924e/2.0.03/en-US/b250e7fef8614ea0a0973d58eb73bda8.html
user='user',
password='password')
[15]:
automl = AutoML(connection_context=cc)
[ ]:
automl.fit(
df=df,
task='cls', # if task = None, we'll determine it for you
steps=10,
target='species',
table_name='CLASSIFICATION', # optional
categorical_features=['species'],
id_column='ID', # optional
verbose=False
)
Save model
[17]:
storage = Storage(connection_context=cc, schema='DEVELOPER')
automl.model.name = "iris" # don't forget to specify the name
storage.save_model(automl=automl)
storage.list_models()
[17]:
NAME | VERSION | LIBRARY | CLASS | JSON | TIMESTAMP | MODEL_STORAGE_VER | |
---|---|---|---|---|---|---|---|
0 | iris | 1 | PAL | hana_ml.algorithms.pal.neural_network.MLPClass... | {"model_attributes": {"activation": "sin_asymm... | 2021-05-29 17:33:15 | 1 |
Load model and predict
[18]:
new_model = storage.load_model('iris')
new_model.predict(df=test_df, id_column='ID')
Creating table with name: AUTOML2af7880c-467f-437c-b3be-b1c519a7678e
100%|██████████| 1/1 [00:00<00:00, 6.17it/s]
Preprocessor settings: mean
Prediction results (first 20 rows):
ID TARGET VALUE
0 0 setosa 0.577740
1 1 setosa 0.580586
2 2 setosa 0.580305
3 3 setosa 0.580340
4 4 setosa 0.576891
5 5 setosa 0.578495
6 6 setosa 0.580451
7 7 setosa 0.579999
8 8 setosa 0.579468
9 9 setosa 0.580387
10 10 setosa 0.574936
11 11 setosa 0.580524
12 12 setosa 0.580249
13 13 setosa 0.579009
14 14 setosa 0.542972
15 15 setosa 0.562199
16 16 setosa 0.574057
17 17 setosa 0.579634
18 18 setosa 0.577878
19 19 setosa 0.577438
[18]:
ID | TARGET | VALUE | |
---|---|---|---|
0 | 0 | setosa | 0.577740 |
1 | 1 | setosa | 0.580586 |
2 | 2 | setosa | 0.580305 |
3 | 3 | setosa | 0.580340 |
4 | 4 | setosa | 0.576891 |
5 | 5 | setosa | 0.578495 |
6 | 6 | setosa | 0.580451 |
7 | 7 | setosa | 0.579999 |
8 | 8 | setosa | 0.579468 |
9 | 9 | setosa | 0.580387 |
10 | 10 | setosa | 0.574936 |
11 | 11 | setosa | 0.580524 |
12 | 12 | setosa | 0.580249 |
13 | 13 | setosa | 0.579009 |
14 | 14 | setosa | 0.542972 |
15 | 15 | setosa | 0.562199 |
16 | 16 | setosa | 0.574057 |
17 | 17 | setosa | 0.579634 |
18 | 18 | setosa | 0.577878 |
19 | 19 | setosa | 0.577438 |
20 | 20 | setosa | 0.580425 |
21 | 21 | setosa | 0.579509 |
22 | 22 | setosa | 0.569415 |
23 | 23 | setosa | 0.570043 |
24 | 24 | setosa | 0.578701 |
25 | 25 | setosa | 0.579957 |
26 | 26 | setosa | 0.578694 |
27 | 27 | setosa | 0.578688 |
28 | 28 | setosa | 0.578416 |
29 | 29 | setosa | 0.580325 |
Cleanup storage
[19]:
storage.clean_up()
For more information, visit AutoML class and Storage class in documentation