Skip to content

Dataentity

Module superwise.controller.dataentity

This module implements data entities functionality

Classes

DataEntityController(client, sw) controller for Data entities

Args:

client: client object

sw: superwise object

Ancestors (in MRO)

  • superwise.controller.base.BaseController
  • abc.ABC

Methods

create(self, name=None, type=None, dimension_start_ts=None, role=None, feature_importance=None) Description:

create dataentity

Args:

name: name for dataentity

type:

dimension_start_ts:

role: role of dataentity (DataEntityRole)

feature_importance: feature importance value

generate_summary(self, data_entities, model, data, base_version=None, **kwargs) Description: Unsupported anymore, this function raise Exception

summarise(self, data, specific_roles, default_role='feature', entities_dtypes=None, importance_mapping=None, importance_target_label=None, importance_sample=None, base_version=None, **kwargs) Description: Summarise the baseline data

Args:

data: a dataframe of baseline data

entities_dtypes: ndtypes dictionary or None for auto infer internally

HOW TO USE OPTIONS:

  • Manually create dictionary as: {"feature_0" : "Boolean", "feature_1" : "Categorical" }

  • run infer to get dictionary and pass the value to allow override of infer results: from superwise.controller.infer import infer_dtype dtypes = infer_dtypes(baseline_df)

  • pass None the summarise function will auto generate the dtypes.

  • Note: categorical entities with over 200 categories (e.g first name, street name, IP, etc.) will be counted as "Sparse" features, and will not have any metric calculated on it other than "Missing values"

specific_roles: dictionary of roles, for example:

{"feature_1" : "feature", "record_id" : "id"}

default_role: a default role used by SDK (normally and by default is feature)

importance_mapping: mapping dictionary for feature importance, example:

{"feature_0" : 0.4}

importance_target_label: option to specifics a label target for feature importance

importance_sample: allow sampling of data for feature importance. this option should be used for big baseline data

base_version: base version. useful to inherit information from already exist version

Return:

list of DataEntity objects.

update_summary(self, data_entity_id, summary) Description:

update summary implementation