Regression¶
In this example we are building a model that predicts house prices in Boston
Install modules
[1]:
# !pip3 install Cython
# !pip3 install hana_automl
[2]:
try:
from hana_automl.automl import AutoML
import pandas as pd
from hana_ml.dataframe import ConnectionContext
from hana_automl.storage import Storage
except ImportError:
sys.exit("""You need to install hana_automl and pandas. Uncomment cell above""")
Let’s get used to the dataset
[3]:
test_df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/boston_test_data.csv')
df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/boston_data.csv')
df.head()
[3]:
ID | crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | black | lstat | medv | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0.15876 | 0.0 | 10.81 | 0.0 | 0.413 | 5.961 | 17.5 | 5.2873 | 4.0 | 305.0 | 19.2 | 376.94 | 9.88 | 21.7 |
1 | 1 | 0.10328 | 25.0 | 5.13 | 0.0 | 0.453 | 5.927 | 47.2 | 6.9320 | 8.0 | 284.0 | 19.7 | 396.90 | 9.22 | 19.6 |
2 | 2 | 0.34940 | 0.0 | 9.90 | 0.0 | 0.544 | 5.972 | 76.7 | 3.1025 | 4.0 | 304.0 | 18.4 | 396.24 | 9.97 | 20.3 |
3 | 3 | 2.73397 | 0.0 | 19.58 | 0.0 | 0.871 | 5.597 | 94.9 | 1.5257 | 5.0 | 403.0 | 14.7 | 351.85 | 21.45 | 15.4 |
4 | 4 | 0.04337 | 21.0 | 5.64 | 0.0 | 0.439 | 6.115 | 63.0 | 6.8147 | 4.0 | 243.0 | 16.8 | 393.97 | 9.43 | 20.5 |
Pass credentials to the database.
[4]:
# Replace with your credentials
cc = ConnectionContext(address='address',
port=39015, # default for most databases. Details here: https://help.sap.com/viewer/0eec0d68141541d1b07893a39944924e/2.0.03/en-US/b250e7fef8614ea0a0973d58eb73bda8.html
user='user',
password='password')
[5]:
automl = AutoML(connection_context=cc)
[ ]:
automl.fit(
df=df,
task=None, # library will try to determine task
steps=10,
target='medv',
table_name='REGRESSION', # optional
id_column='ID', # pass None if no ID column in dataset
verbose=1
)
Save model
[ ]:
storage = Storage(connection_context=cc, schema='DEVELOPER')
automl.model.name = "boston" # don't forget to specify the name
storage.save_model(automl=automl)
[11]:
storage.list_models()
[11]:
NAME | VERSION | LIBRARY | CLASS | JSON | TIMESTAMP | MODEL_STORAGE_VER | |
---|---|---|---|---|---|---|---|
0 | boston | 1 | PAL | hana_ml.algorithms.pal.trees.HybridGradientBoo... | {"model_attributes": {"n_estimators": 541, "ra... | 2021-05-29 17:19:09 | 1 |
Load model and predict
[12]:
new_model = storage.load_model('boston', version=1)
new_model.predict(df=test_df)
Creating table with name: AUTOML6b526c36-6c5e-459b-b5ec-92cf44f78b15
100%|██████████| 1/1 [00:00<00:00, 6.94it/s]
Preprocessor settings: median
Prediction results (first 20 rows):
ID SCORE CONFIDENCE
0 1 35.29534448792848 None
1 2 22.489733948983936 None
2 3 13.713093051897628 None
3 4 24.172307331613972 None
4 5 20.248966896747383 None
5 6 23.39085017433158 None
6 7 20.549891594531058 None
7 8 16.66822914049735 None
8 9 18.383664287056316 None
9 10 46.003539073294455 None
10 11 40.11289274119828 None
11 12 11.69618986227771 None
12 13 12.88755355977079 None
13 14 35.486779348478116 None
14 15 20.358130558843374 None
15 16 9.478611548228256 None
16 17 20.84342879503455 None
17 18 20.269377197301793 None
18 19 20.410181244909623 None
19 20 10.344656992114738 None
[12]:
ID | SCORE | CONFIDENCE | |
---|---|---|---|
0 | 1 | 35.29534448792848 | None |
1 | 2 | 22.489733948983936 | None |
2 | 3 | 13.713093051897628 | None |
3 | 4 | 24.172307331613972 | None |
4 | 5 | 20.248966896747383 | None |
... | ... | ... | ... |
97 | 98 | 23.80490001240718 | None |
98 | 99 | 22.24186449851454 | None |
99 | 100 | 18.38892297314412 | None |
100 | 101 | 38.51662147490824 | None |
101 | 102 | 46.306857840422325 | None |
102 rows × 3 columns
Cleanup storage
[ ]:
storage.clean_up()
For more information, visit AutoML class and Storage class in documentation