Skip to content

Models

arkindex_worker.models

Wrappers around API results to provide more convenient attribute access and IIIF helpers.

Classes

MagicDict

Bases: dict

A dict whose items can be accessed like attributes.

Element

Bases: MagicDict

Describes an Arkindex element.

Attributes
polygon property
polygon: list[float]

Access an Element’s polygon. This is a shortcut to an Element’s polygon, normally accessed via its zone field via zone.polygon. This is mostly done to facilitate access to this important field by matching the CachedElement.polygon field.

requires_tiles property
requires_tiles: bool

Whether or not downloading and combining IIIF tiles will be necessary to retrieve this element’s image. Will be False if the element has no image.

Functions
resize_zone_url
resize_zone_url(size: str = 'full') -> str

Compute the URL of the image corresponding to the size

Parameters:

Name Type Description Default
size str

Requested size

'full'

Returns:

Type Description
str

The URL corresponding to the size

Source code in arkindex_worker/models.py
65
66
67
68
69
70
71
72
73
74
75
76
def resize_zone_url(self, size: str = "full") -> str:
    """
    Compute the URL of the image corresponding to the size
    :param size: Requested size
    :return: The URL corresponding to the size
    """
    if size == "full":
        return self.zone.url
    else:
        parts = self.zone.url.split("/")
        parts[-3] = size
        return "/".join(parts)
image_url
image_url(size: str = 'full') -> str | None

Build a URL to access the image. When possible, will return the S3 URL for images, so an ML worker can bypass IIIF servers.

Parameters:

Name Type Description Default
size str

Subresolution of the image, following the syntax of the IIIF resize parameter.

'full'

Returns:

Type Description
str | None

A URL to the image, or None if the element does not have an image.

Source code in arkindex_worker/models.py
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
def image_url(self, size: str = "full") -> str | None:
    """
    Build a URL to access the image.
    When possible, will return the S3 URL for images, so an ML worker can bypass IIIF servers.
    :param size: Subresolution of the image, following the syntax of the IIIF resize parameter.
    :returns: A URL to the image, or None if the element does not have an image.
    """
    if not self.get("zone"):
        return
    url = self.zone.image.get("s3_url")
    if url:
        return url
    url = self.zone.image.url
    if not url.endswith("/"):
        url += "/"
    return f"{url}full/{size}/0/default.jpg"
open_image
open_image(
    *args,
    max_width: int | None = None,
    max_height: int | None = None,
    use_full_image: bool | None = False,
    **kwargs
) -> Image.Image

Open this element’s image using Pillow, rotating and mirroring it according to the rotation_angle and mirrored attributes.

When tiling is not required to download the image, and no S3 URL is available to bypass IIIF servers, the image will be cropped to the rectangle bounding box of the zone.polygon attribute.

Warns:

This method implicitly applies the element’s orientation to the image.

If your process uses the returned image to find more polygons and send them back to Arkindex, use the arkindex_worker.image.revert_orientation helper to undo the orientation on all polygons before sending them, as the Arkindex API expects unoriented polygons.

Although not recommended, you can bypass this behavior by passing rotation_angle=0, mirrored=False as keyword arguments.

Warns:

If both, max_width and max_height are set, the image ratio is not preserved.

Parameters:

Name Type Description Default
max_width int | None

The maximum width of the image.

None
max_height int | None

The maximum height of the image.

None
use_full_image bool | None

Ignore the zone.polygon and always retrieve the image without cropping.

False
*args

Positional arguments passed to arkindex_worker.image.open_image.

()
**kwargs

Keyword arguments passed to arkindex_worker.image.open_image.

{}

Returns:

Type Description
Image

A Pillow image.

Raises:

Type Description
ValueError

When the element does not have an image.

NotImplementedError

When the max_size parameter is set, but the IIIF server’s configuration requires downloading and combining tiles to retrieve the image.

NotImplementedError

When an S3 URL has been used to download the image, but the URL has expired. Re-fetching the URL automatically is not supported.

Source code in arkindex_worker/models.py
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
def open_image(
    self,
    *args,
    max_width: int | None = None,
    max_height: int | None = None,
    use_full_image: bool | None = False,
    **kwargs,
) -> Image.Image:
    """
    Open this element's image using Pillow, rotating and mirroring it according
    to the ``rotation_angle`` and ``mirrored`` attributes.

    When tiling is not required to download the image, and no S3 URL is available
    to bypass IIIF servers, the image will be cropped to the rectangle bounding box
    of the ``zone.polygon`` attribute.

    Warns:
    ----
       This method implicitly applies the element's orientation to the image.

       If your process uses the returned image to find more polygons and send them
       back to Arkindex, use the [arkindex_worker.image.revert_orientation][]
       helper to undo the orientation on all polygons before sending them, as the
       Arkindex API expects unoriented polygons.

       Although not recommended, you can bypass this behavior by passing
       ``rotation_angle=0, mirrored=False`` as keyword arguments.


    Warns:
    ----
       If both, ``max_width`` and ``max_height`` are set, the image ratio is not preserved.


    :param max_width: The maximum width of the image.
    :param max_height: The maximum height of the image.
    :param use_full_image: Ignore the ``zone.polygon`` and always
       retrieve the image without cropping.
    :param *args: Positional arguments passed to [arkindex_worker.image.open_image][].
    :param **kwargs: Keyword arguments passed to [arkindex_worker.image.open_image][].
    :raises ValueError: When the element does not have an image.
    :raises NotImplementedError: When the ``max_size`` parameter is set,
       but the IIIF server's configuration requires downloading and combining tiles
       to retrieve the image.
    :raises NotImplementedError: When an S3 URL has been used to download the image,
       but the URL has expired. Re-fetching the URL automatically is not supported.
    :return: A Pillow image.
    """
    from arkindex_worker.image import (
        download_tiles,
        open_image,
    )

    if not self.get("zone"):
        raise ValueError(f"Element {self.id} has no zone")

    if self.requires_tiles:
        if max_width is None and max_height is None:
            return download_tiles(self.zone.image.url)
        else:
            raise NotImplementedError

    if max_width is None and max_height is None:
        resize = "full"
    else:
        original_size = {"w": self.zone.image.width, "h": self.zone.image.height}
        # No resizing if the image is smaller than the wanted size.
        if (max_width is None or original_size["w"] <= max_width) and (
            max_height is None or original_size["h"] <= max_height
        ):
            resize = "full"
        # Resizing if the image is bigger than the wanted size.
        else:
            resize = f"{max_width or ''},{max_height or ''}"

    url = self.image_url(resize) if use_full_image else self.resize_zone_url(resize)

    try:
        return open_image(
            url,
            *args,
            rotation_angle=self.rotation_angle,
            mirrored=self.mirrored,
            **kwargs,
        )
    except HTTPError as e:
        if (
            self.zone.image.get("s3_url") is not None
            and e.response.status_code == 403
        ):
            # This element uses an S3 URL: the URL may have expired.
            # Call the API to get a fresh URL and try again
            # TODO: this should be done by the worker
            raise NotImplementedError from e
            return open_image(self.image_url(resize), *args, **kwargs)
        raise
open_image_tempfile
open_image_tempfile(
    format: str | None = "jpeg", *args, **kwargs
) -> Generator[tempfile.NamedTemporaryFile, None, None]

Get the element’s image as a temporary file stored on the disk. To be used as a context manager.

Example
with element.open_image_tempfile() as f:
    ...

Parameters:

Name Type Description Default
format str | None

File format to use the store the image on the disk. Must be a format supported by Pillow.

'jpeg'
*args

Positional arguments passed to arkindex_worker.image.open_image.

()
**kwargs

Keyword arguments passed to arkindex_worker.image.open_image.

{}
Source code in arkindex_worker/models.py
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
@contextmanager
def open_image_tempfile(
    self, format: str | None = "jpeg", *args, **kwargs
) -> Generator[tempfile.NamedTemporaryFile, None, None]:
    """
    Get the element's image as a temporary file stored on the disk.
    To be used as a context manager.

    Example
    ----
    ```
    with element.open_image_tempfile() as f:
        ...
    ```

    :param format: File format to use the store the image on the disk.
       Must be a format supported by Pillow.
    :param *args: Positional arguments passed to [arkindex_worker.image.open_image][].
    :param **kwargs: Keyword arguments passed to [arkindex_worker.image.open_image][].

    """
    with tempfile.NamedTemporaryFile() as f:
        self.open_image(*args, **kwargs).save(f, format=format)
        yield f

Transcription

Bases: ArkindexModel

Describes an Arkindex element’s transcription.

Dataset

Bases: ArkindexModel

Describes an Arkindex dataset.

Attributes
filepath property
filepath: str

Generic filepath to the Dataset compressed archive.

Set

Bases: MagicDict

Describes an Arkindex dataset set.

Artifact

Bases: ArkindexModel

Describes an Arkindex artifact.