Skip to content

Dataset

arkindex_worker.worker.dataset

BaseWorker methods for datasets.

Classes

DatasetState

Bases: Enum

State of a dataset.

Attributes
Open class-attribute instance-attribute
Open = 'open'

The dataset is open.

Building class-attribute instance-attribute
Building = 'building'

The dataset is being built.

Complete class-attribute instance-attribute
Complete = 'complete'

The dataset is complete.

Error class-attribute instance-attribute
Error = 'error'

The dataset is in error.

DatasetMixin

Functions
list_process_sets
list_process_sets() -> Iterator[Set]

List dataset sets associated to the worker’s process. This helper is not available in developer mode.

Returns:

Type Description
Iterator[Set]

An iterator of Set objects built from the ListProcessSets API endpoint.

Source code in arkindex_worker/worker/dataset.py
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
def list_process_sets(self) -> Iterator[Set]:
    """
    List dataset sets associated to the worker's process. This helper is not available in developer mode.

    :returns: An iterator of ``Set`` objects built from the ``ListProcessSets`` API endpoint.
    """
    assert not self.is_read_only, "This helper is not available in read-only mode."

    results = self.api_client.paginate(
        "ListProcessSets", id=self.process_information["id"]
    )

    return map(
        lambda result: Set(
            name=result["set_name"], dataset=Dataset(**result["dataset"])
        ),
        results,
    )
list_set_elements
list_set_elements(dataset_set: Set) -> Iterator[Element]

List elements in a dataset set.

Parameters:

Name Type Description Default
dataset_set Set

Set to find elements in.

required

Returns:

Type Description
Iterator[Element]

An iterator of Element built from the ListDatasetElements API endpoint.

Source code in arkindex_worker/worker/dataset.py
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
def list_set_elements(self, dataset_set: Set) -> Iterator[Element]:
    """
    List elements in a dataset set.

    :param dataset_set: Set to find elements in.
    :returns: An iterator of Element built from the ``ListDatasetElements`` API endpoint.
    """
    assert dataset_set and isinstance(
        dataset_set, Set
    ), "dataset_set shouldn't be null and should be a Set"

    results = self.api_client.paginate(
        "ListDatasetElements", id=dataset_set.dataset.id, set=dataset_set.name
    )

    return map(lambda result: Element(**result["element"]), results)
update_dataset_state
update_dataset_state(
    dataset: Dataset, state: DatasetState
) -> Dataset

Partially updates a dataset state through the API.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to update.

required
state DatasetState

State of the dataset.

required

Returns:

Type Description
Dataset

The updated Dataset object from the PartialUpdateDataset API endpoint.

Source code in arkindex_worker/worker/dataset.py
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
@unsupported_cache
def update_dataset_state(self, dataset: Dataset, state: DatasetState) -> Dataset:
    """
    Partially updates a dataset state through the API.

    :param dataset: The dataset to update.
    :param state: State of the dataset.
    :returns: The updated ``Dataset`` object from the ``PartialUpdateDataset`` API endpoint.
    """
    assert dataset and isinstance(
        dataset, Dataset
    ), "dataset shouldn't be null and should be a Dataset"
    assert state and isinstance(
        state, DatasetState
    ), "state shouldn't be null and should be a str from DatasetState"

    if self.is_read_only:
        logger.warning("Cannot update dataset as this worker is in read-only mode")
        return

    updated_dataset = self.request(
        "PartialUpdateDataset",
        id=dataset.id,
        body={"state": state.value},
    )
    dataset.update(updated_dataset)

    return dataset