Dataentity
Module superwise.controller.dataentity
This module implements data entities functionality
Classes
DataEntityController(client, sw)
controller for Data entities
Args:
client
: client object
sw
: superwise object
Ancestors (in MRO)
- superwise.controller.base.BaseController
- abc.ABC
Methods
create(self, name=None, type=None, dimension_start_ts=None, role=None, feature_importance=None)
Description:
create dataentity
Args:
name
: name for dataentity
type
:
dimension_start_ts
:
role
: role of dataentity (DataEntityRole)
feature_importance
: feature importance value
generate_summary(self, data_entities, model, data, base_version=None, **kwargs)
Description:
Unsupported anymore, this function raise Exception
summarise(self, data, specific_roles, default_role='feature', entities_dtypes=None, importance_mapping=None, importance_target_label=None, importance_sample=None, base_version=None, **kwargs)
Description:
Summarise the baseline data
Args:
data
: a dataframe of baseline data
entities_dtypes
: ndtypes dictionary or None for auto infer internally
HOW TO USE OPTIONS:
-
Manually create dictionary as: {"feature_0" : "Boolean", "feature_1" : "Categorical" }
-
run infer to get dictionary and pass the value to allow override of infer results: from superwise.controller.infer import infer_dtype dtypes = infer_dtypes(baseline_df)
-
pass None the summarise function will auto generate the dtypes.
-
Note: categorical entities with over 200 categories (e.g first name, street name, IP, etc.) will be counted as "Sparse" features, and will not have any metric calculated on it other than "Missing values"
specific_roles
: dictionary of roles, for example:
{"feature_1" : "feature", "record_id" : "id"}
default_role
: a default role used by SDK (normally and by default is feature)
importance_mapping
: mapping dictionary for feature importance, example:
{"feature_0" : 0.4}
importance_target_label
: option to specifics a label target for feature importance
importance_sample
: allow sampling of data for feature importance.
this option should be used for big baseline data
base_version
: base version. useful to inherit information from already exist version
Return:
list of DataEntity objects.
update_summary(self, data_entity_id, summary)
Description:
update summary implementation