Dataset
Module superwise.controller.dataset
This module implement dataset functionality
Functions
create_file_from_dataframe(dataset: superwise.models.dataset.Dataset, dataframe: pandas.core.frame.DataFrame)
Classes
DatasetController(client, sw, internal_bucket)
Datasets controller class, implement functionalities for dataset API
Args:
client
: superwise client object
sw
: superwise object
Ancestors (in MRO)
- superwise.controller.base.BaseController
- abc.ABC
Methods
create(self, model: superwise.models.dataset.Dataset, return_model=True, gcs_service_account: dict = None, aws_access_key_id: str = None, aws_secret_access_key: str = None, aws_role_arn: str = None, azure_connection_string: str = None, wait_until_complete: bool = True, timeout_seconds: int = 300, on_failure='raise', **kwargs)
Description:
Create a new dataset.
Args:
model
: Dataset model.
return_model
: return model if True or response.body if False. Default True.
gcs_service_account
: GCP service account object used to authenticate and pull dataset files from a customer
GCS bucket. If not provided, will be inferred from the environment.
(See Google Cloud auth)
aws_access_key_id
: AWS access key ID used to authenticate and pull dataset files from a customer S3 bucket.
If not provided, will be inferred from the environment.
(Used together with aws_secret_access_key
parameter)
aws_secret_access_key
: AWS secret access key used to authenticate and pull dataset files from a customer S3
bucket. If not provided, will be inferred from the environment.
(Used together with aws_access_key_id
parameter)
aws_role_arn
: AWS role ARN used to authenticate and pull dataset files from a customer S3 bucket.
If not provided, the authentication will use the aws_access_key_id
and aws_secret_access_key
parameters.
azure_connection_string
: Azure blob storage connection string used to authenticate and pull dataset files from
a customer blob storage container.
MUST be provided in order to pull files from azure.
wait_until_complete
: if True, wait until the dataset is fully processed in the system, and return the final
object. If False, return immediately after the dataset is created and the given dataset
files are validated, without waiting for the processing. A partially set Dataset object
is returned, without all the processed fields. Afterwards the status can be checked with
'get_by_id' method. Default True.
timeout_seconds
: Timeout for dataset processing waiting. Only relevant if 'wait_until_complete' is True.
Default 5 minutes.
on_failure
: Action to take in case the dataset processing failed. Only relevant if 'wait_until_complete'
is True. Possible values are:
- 'ignore': Don't raise an exception, and return the object.
- 'raise': Raise a 'SuperwiseDatasetFailureError' exception.
Default 'raise'.