Regression¶

In this example we are building a model that predicts house prices in Boston

Install modules

[1]:

# !pip3 install Cython
# !pip3 install hana_automl

[2]:

try:
    from hana_automl.automl import AutoML
    import pandas as pd
    from hana_ml.dataframe import ConnectionContext
    from hana_automl.storage import Storage
except ImportError:
    sys.exit("""You need to install hana_automl and pandas. Uncomment cell above""")

Let’s get used to the dataset

[3]:

test_df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/boston_test_data.csv')
df = pd.read_csv('https://raw.githubusercontent.com/dan0nchik/SAP-HANA-AutoML/dev/docs/source/datasets/boston_data.csv')
df.head()

[3]:

	ID	crim	zn	indus	nox	rm	age	dis	rad	tax	ptratio	black	lstat	medv
0	0	0.15876	0.0	10.81	0.413	5.961	17.5	5.2873	4.0	305.0	19.2	376.94	9.88	21.7
1	1	0.10328	25.0	5.13	0.453	5.927	47.2	6.9320	8.0	284.0	19.7	396.90	9.22	19.6
2	2	0.34940	0.0	9.90	0.544	5.972	76.7	3.1025	4.0	304.0	18.4	396.24	9.97	20.3
3	3	2.73397	0.0	19.58	0.871	5.597	94.9	1.5257	5.0	403.0	14.7	351.85	21.45	15.4
4	4	0.04337	21.0	5.64	0.439	6.115	63.0	6.8147	4.0	243.0	16.8	393.97	9.43	20.5

Pass credentials to the database.

[4]:

# Replace with your credentials
cc = ConnectionContext(address='address',
                       port=39015, # default for most databases. Details here: https://help.sap.com/viewer/0eec0d68141541d1b07893a39944924e/2.0.03/en-US/b250e7fef8614ea0a0973d58eb73bda8.html
                       user='user',
                       password='password')

[5]:

automl = AutoML(connection_context=cc)

[ ]:

automl.fit(
    df=df,
    task=None, # library will try to determine task
    steps=10,
    target='medv',
    table_name='REGRESSION', # optional
    id_column='ID', # pass None if no ID column in dataset
    verbose=1
)

Save model

[ ]:

storage = Storage(connection_context=cc, schema='DEVELOPER')
automl.model.name = "boston" # don't forget to specify the name
storage.save_model(automl=automl)

[11]:

storage.list_models()

[11]:

	NAME	VERSION	LIBRARY	CLASS	JSON	TIMESTAMP	MODEL_STORAGE_VER
0	boston	1	PAL	hana_ml.algorithms.pal.trees.HybridGradientBoo...	{"model_attributes": {"n_estimators": 541, "ra...	2021-05-29 17:19:09	1

Load model and predict

[12]:

new_model = storage.load_model('boston', version=1)
new_model.predict(df=test_df)

Creating table with name: AUTOML6b526c36-6c5e-459b-b5ec-92cf44f78b15

100%|██████████| 1/1 [00:00<00:00,  6.94it/s]

Preprocessor settings: median
Prediction results (first 20 rows):
     ID               SCORE CONFIDENCE
0    1   35.29534448792848       None
1    2  22.489733948983936       None
2    3  13.713093051897628       None
3    4  24.172307331613972       None
4    5  20.248966896747383       None
5    6   23.39085017433158       None
6    7  20.549891594531058       None
7    8   16.66822914049735       None
8    9  18.383664287056316       None
9   10  46.003539073294455       None
10  11   40.11289274119828       None
11  12   11.69618986227771       None
12  13   12.88755355977079       None
13  14  35.486779348478116       None
14  15  20.358130558843374       None
15  16   9.478611548228256       None
16  17   20.84342879503455       None
17  18  20.269377197301793       None
18  19  20.410181244909623       None
19  20  10.344656992114738       None

[12]:

	ID	SCORE	CONFIDENCE
0	1	35.29534448792848	None
1	2	22.489733948983936	None
2	3	13.713093051897628	None
3	4	24.172307331613972	None
4	5	20.248966896747383	None
...	...	...	...
97	98	23.80490001240718	None
98	99	22.24186449851454	None
99	100	18.38892297314412	None
100	101	38.51662147490824	None
101	102	46.306857840422325	None

102 rows × 3 columns

Cleanup storage

[ ]:

storage.clean_up()

For more information, visit AutoML class and Storage class in documentation