Skip to content

Dataset

arkindex_worker.worker.dataset

BaseWorker methods for datasets.

Classes

DatasetState

Bases: Enum

State of a dataset.

Attributes
Open class-attribute instance-attribute
Open = 'open'

The dataset is open.

Building class-attribute instance-attribute
Building = 'building'

The dataset is being built.

Complete class-attribute instance-attribute
Complete = 'complete'

The dataset is complete.

Error class-attribute instance-attribute
Error = 'error'

The dataset is in error.

DatasetMixin

Functions
list_process_datasets
list_process_datasets() -> Iterator[Dataset]

List datasets associated to the worker’s process. This helper is not available in developer mode.

Returns:

Type Description
Iterator[Dataset]

An iterator of Dataset objects built from the ListProcessDatasets API endpoint.

Source code in arkindex_worker/worker/dataset.py
39
40
41
42
43
44
45
46
47
48
49
50
51
def list_process_datasets(self) -> Iterator[Dataset]:
    """
    List datasets associated to the worker's process. This helper is not available in developer mode.

    :returns: An iterator of ``Dataset`` objects built from the ``ListProcessDatasets`` API endpoint.
    """
    assert not self.is_read_only, "This helper is not available in read-only mode."

    results = self.api_client.paginate(
        "ListProcessDatasets", id=self.process_information["id"]
    )

    return map(Dataset, list(results))
list_dataset_elements
list_dataset_elements(
    dataset: Dataset,
) -> Iterator[tuple[str, Element]]

List elements in a dataset.

Parameters:

Name Type Description Default
dataset Dataset

Dataset to find elements in.

required

Returns:

Type Description
Iterator[tuple[str, Element]]

An iterator of tuples built from the ListDatasetElements API endpoint.

Source code in arkindex_worker/worker/dataset.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
def list_dataset_elements(self, dataset: Dataset) -> Iterator[tuple[str, Element]]:
    """
    List elements in a dataset.

    :param dataset: Dataset to find elements in.
    :returns: An iterator of tuples built from the ``ListDatasetElements`` API endpoint.
    """
    assert dataset and isinstance(
        dataset, Dataset
    ), "dataset shouldn't be null and should be a Dataset"

    results = self.api_client.paginate("ListDatasetElements", id=dataset.id)

    def format_result(result):
        return (result["set"], Element(**result["element"]))

    return map(format_result, list(results))
update_dataset_state
update_dataset_state(
    dataset: Dataset, state: DatasetState
) -> Dataset

Partially updates a dataset state through the API.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to update.

required
state DatasetState

State of the dataset.

required

Returns:

Type Description
Dataset

The updated Dataset object from the PartialUpdateDataset API endpoint.

Source code in arkindex_worker/worker/dataset.py
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
def update_dataset_state(self, dataset: Dataset, state: DatasetState) -> Dataset:
    """
    Partially updates a dataset state through the API.

    :param dataset: The dataset to update.
    :param state: State of the dataset.
    :returns: The updated ``Dataset`` object from the ``PartialUpdateDataset`` API endpoint.
    """
    assert dataset and isinstance(
        dataset, Dataset
    ), "dataset shouldn't be null and should be a Dataset"
    assert state and isinstance(
        state, DatasetState
    ), "state shouldn't be null and should be a str from DatasetState"

    if self.is_read_only:
        logger.warning("Cannot update dataset as this worker is in read-only mode")
        return

    updated_dataset = self.request(
        "PartialUpdateDataset",
        id=dataset.id,
        body={"state": state.value},
    )
    dataset.update(updated_dataset)

    return dataset