Skip to content

Releases

0.3.2

Released on 8 March 2023 • View on Gitlab

  • A helper to use the new API endpoint to create transcription entities more efficiently was implemented.
  • Training workers may now publish a model configuration when creating a new model version on Arkindex. This will make the execution of a generic worker much smoother.
  • The model version API endpoints were updated in the latest Arkindex release and a new helper was introduced subsequently. However, there are no breaking changes and the main helper, publish_model_version, still has the same signature and behaviour.
  • The latest Arkindex release changed the way NER entities are stored and published.
    • The EntityType enum was removed as type slug are no longer restrcited to a small options,
    • create_entity now expects a type slug as a String,
    • a new helper list_corpus_entity_types was added to load the Entity types in the corpus,
    • a new helper check_required_entity_types to make sure that needed entity types are available in the corpus was added. Missing ones are created by default (this can be disabled).
  • The create_classifications helper now expects the UUID of each MLClass instead of their name.
  • In developer mode, the only way to set the corpus_id attribute is to use the ARKINDEX_CORPUS_ID environment variable. When it’s not set, all API requests using the corpus_id as path parameter will fail with 500 status code. A warning log was added to help developers troubleshoot this error by advising them to set this variable.
  • The create_transcriptions helper no longer makes the API call in developer mode. This behaviour aligns with all other publication helpers.
  • Fixes hash computation when publishing a model using publish_model_version.
  • If a process is linked to a model version, its id will be available to the worker through its model_version_id attribute.
  • The URLs of the API endpoint related to Ponos were changed in the latest Arkindex release. Some changes were needed in the test suite.
  • The classes attribute no directly contains the classes of the corpus of the processed element.
    # Old usage
    self.classes = {
        "corpus_id": {
            "ml_class_1": "class_uuid",
            ...
        }
    }
    
    # New usage
    self.classes = {
        "ml_class_1": "class_uuid",
        ...
    }
    

0.3.1

Released on 8 November 2022 • View on Gitlab

  • A breaking change, affecting mostly the API, was introduced in Arkindex’s 1.3.4 release:
    • Workers were mostly unaffected but the REST schema was updated.
  • Workers will progressively not be able to publish results with a worker_version_id anymore on Arkindex. They will have to use a related but more general field, worker_run_id:
    • Most publication API endpoint helpers have been updated accordingly,
    • A new version of the cache was released with the updated Django models.
  • Improvements to our Machine Learning training API to allow workers to use models published on Arkindex.
  • Support workers that have no configuration.
  • Allow publishing metadatas with falsy but non-null values.
  • Add .polygon attribute shortcut on Element.
  • Add a major test speedup on our worker template.
  • Support cache usage on our metadata API endpoint helpers.
  • Drop support for Python 3.6 and add support for Python 3.11.
  • Update arkindex-client to version 1.0.11.
  • Update shapely to version 1.8.5-post1

0.3.0

Released on 12 September 2022 • View on Gitlab

  • A large refactoring effort was made on the worker initialization, to streamline most of the workflow:
    • developer setup is now set in a dedicated method configure_for_developers
    • cache setup is now set in a dedicated method configure_cache
    • deprecated useless attribute features
    • add a simpler debug mode for developers
    • depend only on Arkindex RetrieveWorkerRun API to get all the information needed, instead of relying on multiple API calls.
    • remove ARKINDEX_CORPUS_ID environment variable usage, replaced by corpus information from API, except for developers
    • do not erase defaults when reading configuration
  • Support new Machine Learning training APIs on Arkindex to allow workers to create model versions and publish them as zstandard archives on a remote S3-compatible bucket.
  • Add API helpers
    • list_corpus_entities
    • create_metadatas
    • list_metadata
    • list_transcription_entities
    • create_required_types
    • publish_model_version
    • create_model_version
    • upload_to_s3
  • Create missing element types when checking if they are available on the Arkindex instance (disabled by default).
  • Update arkindex-client to version 1.0.9.
  • Update automated rotation code (revert_orientation) to support reverse application

0.2.4

Released on 6 July 2022 • View on Gitlab

  • Document source code using Sphinx and docstrings with parameters. Documentation is available here.
  • Update workers inner config with default values from user_configuration
  • Support confidence in API helpers create_sub_element and create_elements as they are not available in Arkindex
  • Port rotation code from tesseract worker
  • Add helper to trim polygons so that they fit inside their image

0.2.3

Released on 28 March 2022 • View on Gitlab

  • Update arkindex-client to version 1.0.8.
  • Replace all transcription scores with confidences (also renamed on Arkindex)
  • Support cache versioning and detect compatibility in workers
  • Support confidence in create_transcription_entity API helper
  • Support Text orientation for transcriptions
  • Return the response payload in all creation helpers so that workers can use them
  • Support new metadata type URL

0.2.2

Released on 17 September 2021 • View on Gitlab

  • Update arkindex-client to version 1.0.7.
  • Detect already processed elements using worker activity, and skip them
  • Support rotation, mirroring and fix image crop in open_image method used by a lot of workers
  • Change default value for user_configuration from None to {} which simplifies usage code in workers
  • Support new metadata type Numeric
  • Add API helper create_classifications
  • Set worker version in transcription entities API helpers

0.2.1

Released on 30 June 2021 • View on Gitlab

  • Add API helper check_required_types
  • Add a developer mode via --dev argument to simplify boot process for local development
  • Send process_id when updating worker activities
  • Remove nb_best from ML classes list as it’s not supported anymore by Arkindex

0.2.0

Released on 6 May 2021 • View on Gitlab

This is a larger release which brings a new caching system to share data across workers (avoiding a lot of API calls in some workflows), and split the codebase in multiple files for helpers & unit tests (one file per topic).

  • Add cache system using a local SQLite database, shared from workers to workers. Currently supports Arkindex models:
    • elements and their hierarchy,
    • transcriptions,
    • images,
    • classifications,
    • entities,
  • Add API helpers:
    • create_elements
    • create_transcriptions
    • create_transcription_entity
  • Split ElementsWorker API helpers and unit tests in sub files
  • Drop TranscriptionType & DataSource as they are not used anymore in Arkindex
  • Retry all managed API calls that result in a 50x

0.1.14

Released on 8 April 2021 • View on Gitlab

  • Support weak SSL DH key when downloading images (needed for some outdated IIIF servers with old SSL certs).

0.1.13

Released on 2 March 2021 • View on Gitlab

  • Support new Arkindex feature Worker Activity, to track process progress.
  • Add new API helpers:
  • list_element_children
  • list_transcriptions
  • create_metadata
  • Extend git support with merge & rebase operations
  • Allow any worker type in cookiecutter template

0.1.12

Released on 8 December 2020 • View on Gitlab

  • Bugfix to avoid loading remote images from local file system
  • Deprecate TranscriptionType.

0.1.11

Released on 26 November 2020 • View on Gitlab

0.1.10

Released on 23 November 2020 • View on Gitlab

  • Support git base operations to allow workers to clone and checkout repositories
  • Setup automated CI task to update Python dependencies
  • Update arkindex-client to version 1.0.5.

0.1.9

Released on 19 October 2020 • View on Gitlab

  • Update arkindex-client to version 1.0.4.
  • Add API helpers:
    • get_worker_version
    • get_worker_version_slug
    • get_ml_result_slug

0.1.8

Released on 30 September 2020 • View on Gitlab

0.1.7

Released on 30 September 2020 • View on Gitlab

  • Support Arkindex secrets for workers, using API but also local storage for developers. More information on Arkindex documentation.
  • Do not crash when a worker tries to create a classification that already exists.

0.1.6

Released on 23 September 2020 • View on Gitlab

  • Automatically create missing Arkindex ML classes when using get_ml_class_id and creating classifications through API helpers.
  • Update arkindex-client to version 1.0.2.

0.1.5

Released on 22 September 2020 • View on Gitlab

  • Update arkindex-client to version 1.0.1.
  • Bugfix on score & confidence type checks in api helpers

0.1.4

Released on 2 September 2020 • View on Gitlab

  • Load worker configuration from Arkindex API, or local file (for developers)
  • Add API helpers:
    • load_corpus_classes
    • get_ml_class_id

0.1.3

Released on 25 August 2020 • View on Gitlab

  • Add API helper create_element_transcriptions
  • Return created instance ID in API helpers
  • Add cookiecutter variables to be able to easily rebuild

0.1.2

Released on 19 August 2020 • View on Gitlab

  • Use WORKER_VERSION_ID environment var in helper methods to identify the worker automatically
  • Add API helpers:
    • create_transcription
    • create_classification
    • create_entity
  • Extend cookiecutter template to generate clean Python packages
  • Add the Timer helper class in tools submodule

0.1.1

Released on 7 August 2020 • View on Gitlab

  • Add API helper create_sub_element
  • Add unit tests in cookiecutter template & base project.
  • Change cookiecutter base to use ElementsWorker

0.1.0

Released on 21 July 2020 • View on Gitlab

Initial version of the base worker, with cookiecutter support to easily create workers using this project.