Datasets
Create Dataset
create_dataset(file_path, name?, test_file_path?, key?)
Creates a new dataset object by uploading a dataset file.
Method Parameters
file_path | required string
Path to file containing dataset. Valid file extensions include .csv, .txt.
test_file_path | optional string
Path to the file containing the test split of the dataset. Required for downstream membership inference tests. Valid file extensions include .csv, .txt.
key | optional string
Unique dataset identifier key. Will be autogenerated if not provided.
name | optional string
Dataset name.
Returns
Dataset object.
Example
dataset = dfl.create_dataset(
file_path="test_datasets/train.csv",
name="Fine-tuning dataset",
)
# with test file path
dataset = dfl.create_dataset(
file_path="data/train.csv",
test_file_path="data/test.csv",
name="Fine-tuning dataset",
)
Create HuggingFace Dataset
create_hf_dataset(name, hf_id, hf_token?, key?)
Creates a new dataset object that points to hosted dataset on HuggingFace hub.
Method Parameters
name | optional string
Dataset name.
hf_id | required string
HuggingFace hub id for the dataset. 'train' and 'test' splits are required downstream membership inference tests.
hf_token | optional string
HuggingFace token for the provided dataset id. Required if the dataset is private or gated on the hub.
key | optional string
Unique dataset identifier key. Will be autogenerated if not provided.
Returns
HFDataset object.
Example
dataset = dfl.create_hf_dataset(
name="HF dataset",
hf_id="fka/awesome-chatgpt-prompts"
hf_token="hf_***",
)