Skip to main content

Datasets

Create Dataset

create_dataset(file_path, name?, test_file_path?, key?)

Creates a new dataset object by uploading a dataset file.

Method Parameters

file_path | required string

Path to file containing dataset. Valid file extensions include .csv, .txt.


test_file_path | optional string

Path to the file containing the test split of the dataset. Required for downstream membership inference tests. Valid file extensions include .csv, .txt.


key | optional string

Unique dataset identifier key. Will be autogenerated if not provided.


name | optional string

Dataset name.


Returns

Dataset object.

Example

dataset = dfl.create_dataset(
file_path="test_datasets/train.csv",
name="Fine-tuning dataset",
)

# with test file path
dataset = dfl.create_dataset(
file_path="data/train.csv",
test_file_path="data/test.csv",
name="Fine-tuning dataset",
)

Create HuggingFace Dataset

create_hf_dataset(name, hf_id, hf_token?, key?)

Creates a new dataset object that points to hosted dataset on HuggingFace hub.

Method Parameters

name | optional string

Dataset name.


hf_id | required string

HuggingFace hub id for the dataset. 'train' and 'test' splits are required downstream membership inference tests.


hf_token | optional string

HuggingFace token for the provided dataset id. Required if the dataset is private or gated on the hub.


key | optional string

Unique dataset identifier key. Will be autogenerated if not provided.


Returns

HFDataset object.

Example

dataset = dfl.create_hf_dataset(
name="HF dataset",
hf_id="fka/awesome-chatgpt-prompts"
hf_token="hf_***",
)