Regression

In this example we are building a model that predicts house prices in Boston

Open In Colab

Install modules

[1]:
# !pip3 install Cython
# !pip3 install hana_automl
[2]:
try:
    from hana_automl.automl import AutoML
    import pandas as pd
    from hana_ml.dataframe import ConnectionContext
    from hana_automl.storage import Storage
except ImportError:
    sys.exit("""You need to install hana_automl and pandas. Uncomment cell above""")

Let’s get used to the dataset

[3]:
test_df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/boston_test_data.csv')
df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/boston_data.csv')
df.head()
[3]:
ID crim zn indus chas nox rm age dis rad tax ptratio black lstat medv
0 0 0.15876 0.0 10.81 0.0 0.413 5.961 17.5 5.2873 4.0 305.0 19.2 376.94 9.88 21.7
1 1 0.10328 25.0 5.13 0.0 0.453 5.927 47.2 6.9320 8.0 284.0 19.7 396.90 9.22 19.6
2 2 0.34940 0.0 9.90 0.0 0.544 5.972 76.7 3.1025 4.0 304.0 18.4 396.24 9.97 20.3
3 3 2.73397 0.0 19.58 0.0 0.871 5.597 94.9 1.5257 5.0 403.0 14.7 351.85 21.45 15.4
4 4 0.04337 21.0 5.64 0.0 0.439 6.115 63.0 6.8147 4.0 243.0 16.8 393.97 9.43 20.5

Pass credentials to the database.

[4]:
# Replace with your credentials
cc = ConnectionContext(address='address',
                       port=39015, # default for most databases. Details here: https://help.sap.com/viewer/0eec0d68141541d1b07893a39944924e/2.0.03/en-US/b250e7fef8614ea0a0973d58eb73bda8.html
                       user='user',
                       password='password')
[5]:
automl = AutoML(connection_context=cc)
[ ]:
automl.fit(
    df=df,
    task=None, # library will try to determine task
    steps=10,
    target='medv',
    table_name='REGRESSION', # optional
    id_column='ID', # pass None if no ID column in dataset
    verbose=1
)

Save model

[ ]:
storage = Storage(connection_context=cc, schema='DEVELOPER')
automl.model.name = "boston" # don't forget to specify the name
storage.save_model(automl=automl)
[11]:
storage.list_models()
[11]:
NAME VERSION LIBRARY CLASS JSON TIMESTAMP MODEL_STORAGE_VER
0 boston 1 PAL hana_ml.algorithms.pal.trees.HybridGradientBoo... {"model_attributes": {"n_estimators": 541, "ra... 2021-05-29 17:19:09 1

Load model and predict

[12]:
new_model = storage.load_model('boston', version=1)
new_model.predict(df=test_df)
Creating table with name: AUTOML6b526c36-6c5e-459b-b5ec-92cf44f78b15
100%|██████████| 1/1 [00:00<00:00,  6.94it/s]
Preprocessor settings: median
Prediction results (first 20 rows):
     ID               SCORE CONFIDENCE
0    1   35.29534448792848       None
1    2  22.489733948983936       None
2    3  13.713093051897628       None
3    4  24.172307331613972       None
4    5  20.248966896747383       None
5    6   23.39085017433158       None
6    7  20.549891594531058       None
7    8   16.66822914049735       None
8    9  18.383664287056316       None
9   10  46.003539073294455       None
10  11   40.11289274119828       None
11  12   11.69618986227771       None
12  13   12.88755355977079       None
13  14  35.486779348478116       None
14  15  20.358130558843374       None
15  16   9.478611548228256       None
16  17   20.84342879503455       None
17  18  20.269377197301793       None
18  19  20.410181244909623       None
19  20  10.344656992114738       None
[12]:
ID SCORE CONFIDENCE
0 1 35.29534448792848 None
1 2 22.489733948983936 None
2 3 13.713093051897628 None
3 4 24.172307331613972 None
4 5 20.248966896747383 None
... ... ... ...
97 98 23.80490001240718 None
98 99 22.24186449851454 None
99 100 18.38892297314412 None
100 101 38.51662147490824 None
101 102 46.306857840422325 None

102 rows × 3 columns

Cleanup storage

[ ]:
storage.clean_up()

For more information, visit AutoML class and Storage class in documentation