.. _configuration:

Configuration
=============

This chapter details how a running VS stack can be configured. And what steps
are necessary to deploy the configuration.

In order for these configuration changes to be picked up by a running VS
stack and to take effect some steps need to be performed. These steps are
either a "re-deploy" of the running stack or a complete re-creation of it.

Stack Re-deploy
---------------

As will be further described, for some configurations it is sufficient to
"re-deploy" the stack which automatically re-starts any service with changed
configuration. This is done re-using the stack deployment command:

.. code-block:: bash

  docker stack deploy -c docker-compose.<name>.yml -c docker-compose.<name>.dev.yml <stack-name>

.. warning::

  When calling the ``docker stack deploy`` command, it is vital to use the
  command with the same files and name the stack was originally created.

Stack Re-creation
-----------------

In some cases a stack re-redeploy is not enough, as the configuration was used
for a materialized instance which needs to be reverted. The easiest way to do
this is to delete the volume in question. If, for example, the
renderer/registrar configuration was updated, the ``instance-data`` volume
needs to be re-created.

First, the stack needs to be shut down. This is done using the following
command:

.. code-block:: bash

  docker stack rm <stack-name>


When that command has completed (it is advisable to wait for some time until
all containers have actually stopped) the next step is to delete the
``instance-data`` volume:

.. code-block:: bash

  docker volume rm <stack-name>_instance-data

.. note::

  It is possible that this command fails, with the error message that the
  volume is still in use. In this case, it is advisable to wait for a minute
  and to try the deletion again.

Now that the volume was deleted, the stack can be re-deployed as described
above, which will trigger the automatic re-creation and initialization of the
volume. For the ``instance-data``, it means that the instance will be
re-created and all database models with it.


Docker Compose Settings
-----------------------

These configurations are altering the behavior of the stack itself and its
contained services. A complete reference of the configuration file structure
can be found in the `Docker Compose documentation
<https://docs.docker.com/compose/compose-file/>`_.

Environment Variables
---------------------

These variables are passed to their respective containers environment and
change the behavior of certain functionality. They can be declared in the
Docker Compose configuration file directly, but typically they are bundled by
field of interest and then placed into ``.env`` files and then passed to the
containers. So for example, there will be a ``<stack-name>_obs.env`` file
to store the access parameters for the object storage.
All those files are placed in the ``env/`` directory in the instances
directory.

Environment variables and ``.env`` files are passed to the services via the
``docker-compose.yml`` directives. The following example shows how to pass
``.env`` files and direct environment variables:

.. code-block:: yaml

  services:
    # ....
    registrar:
      env_file:
        - env/stack.env
        - env/stack_db.env
        - env/stack_obs.env
      environment:
        INSTANCE_ID: "prism-view-server_registrar"
        INSTALL_DIR: "/var/www/pvs/dev/"
        INIT_SCRIPTS: "/configure.sh /init-db.sh /initialized.sh"
        STARTUP_SCRIPTS: "/wait-initialized.sh"
        WAIT_SERVICES: "redis:6379 database:5432"
        OS_PASSWORD_FILE: "/run/secrets/OS_PASSWORD"
      # ...

``.env`` Files
~~~~~~~~~~~~~~

The following ``.env`` files are typically used:

* ``<stack-name>.env``: The general ``.env`` file used for all services
* ``<stack-name>_db.env``: The database access credentials, for all services
  interacting with the database.
* ``<stack-name>_django.env``: This env files defines the credentials for the
  django admin user to be used with the admin GUI.
* ``<stack-name>_obs.env``: This contains access parameters for the object
  storage(s).

Groups of Environment Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

GDAL Environment Variables
^^^^^^^^^^^^^^^^^^^^^^^^^^

This group of environment variables controls the intricacies of GDAL. They
control how GDAL interacts with its supported files. As GDAL supports a
variety of formats and backend access, most of the full `list of env
variables <https://gdal.org/user/configoptions.html>`_ are not applicable and
only a handful are actually relevant for the VS.

* ``GDAL_DISABLE_READDIR_ON_OPEN`` -
  Especially when using an Object Storage backend with a very large number of
  files, it is vital to activate this setting (``=TRUE``) in order to
  suppress to read the whole directory contents which is very slow for some
  OBS backends.
* ``CPL_VSIL_CURL_ALLOWED_EXTENSIONS`` -
  This limits the file extensions to disable the lookup of so called sidecar
  files which are not used for VS. By default this value is used:
  ``=.TIF,.tif,.xml``.

OpenStack Swift Environment Variables
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

These variables define the access coordinates and credentials for the
OpenStack Swift Object storage backend.

This set of variables define the credentials for the object storage to
place the preprocessed results:

* ``ST_AUTH_VERSION``
* ``OS_AUTH_URL_SHORT``
* ``OS_AUTH_URL``
* ``OS_USERNAME``
* ``OS_PASSWORD``
* ``OS_TENANT_NAME``
* ``OS_TENANT_ID``
* ``OS_REGION_NAME``
* ``OS_USER_DOMAIN_NAME``

This set of variables define the credentials for the object storage to
retrieve the original product files:

* ``OS_USERNAME_DOWNLOAD``
* ``OS_PASSWORD_DOWNLOAD``
* ``OS_TENANT_NAME_DOWNLOAD``
* ``OS_TENANT_ID_DOWNLOAD``
* ``OS_REGION_NAME_DOWNLOAD``
* ``OS_AUTH_URL_DOWNLOAD``
* ``ST_AUTH_VERSION_DOWNLOAD``
* ``OS_USER_DOMAIN_NAME_DOWNLOAD``

VS Environment Variables
^^^^^^^^^^^^^^^^^^^^^^^^

These environment variables are used by the VS itself to configure various
parts.

.. note::
  These variables are used during the initial stack setup. When these
  variables are changed, they will not be reflected unless the instance
  volume is re-created.

* ``COLLECTION`` -
  This defines the main collections name. This is used in various parts of
  the VS and serves as the layer base name.
* ``UPLOAD_CONTAINER`` -
  This controls the bucket name where the preprocessed images are uploaded
  to.
* ``DJANGO_USER``, ``DJANGO_MAIL``, ``DJANGO_PASSWORD`` -
  The Django admin user account credentials to use the Admin GUI.
* ``REPORTING_DIR`` -
  This sets the directory to write the reports of the registered products to.

.. note::
  These variables are used during the initial stack setup. When these
  variables are changed, they will not be reflected unless the database
  volume is re-created.

These are the internal access credentials for the database:

* ``POSTGRES_USER``
* ``POSTGRES_PASSWORD``
* ``POSTGRES_DB``
* ``DB``
* ``DB_USER``
* ``DB_PW``
* ``DB_HOST``
* ``DB_PORT``
* ``DB_NAME``

Configuration Files
-------------------

Such files are passed to the containers in a similar way as environment
variables, but usually contain more settings at once and are placed at a
specific path in the container at runtime.

Configuration files are passed into the containers using the ``configs``
section of the ``docker-compose.yaml`` file. The following example shows how
such a configuration file is defined and the used in a service:


.. code-block:: yaml

  # ...
  configs:
    my-config:
      file: ./config/example.cfg
  # ...
  services:
    myservice:
      # ...
      configs:
      - source: my-config
        target: /example.cfg


The following configuration files are used throughout the VS:

``<stack-name>_init-db.sh``
~~~~~~~~~~~~~~~~~~~~~~~~~~~

This shell script file's purpose is to set up the EOxServer instance used by
both the renderer and registrar.

Some browsetype functions that can be used for elevation rasters are:

``hillshade(band)``

   * range 0 - 255

   * nodata 0

``aspect(band)``

   * range 0 - 360

   * nodata -9999

``slopeshade(band)``

   * range 0 - 255

   * nodata -9999

``contours(band, 0, 30)``

   * range 0 - 500

   * nodata - 9999

Example:

.. code-block:: bash

    python3 manage.py browsetype create "DEM" "elevation" \
      --grey "gray" \
      --grey-range -100 4000 \
      --grey-nodata 0 \
      --traceback
  
  python3 manage.py browsetype create "DEM" "hillshade" \
      --grey "hillshade(gray)" \
      --grey-range 0 255 \
      --grey-nodata 0 \
      --traceback
  
  python3 manage.py browsetype create "DEM" "aspect" \
      --grey "aspect(gray)" \
      --grey-range 0 360 \
      --grey-nodata -9999 \
      --traceback
  
  python3 manage.py browsetype create "DEM" "slope" \
      --grey "slopeshade(gray)" \
      --grey-range 0 255 \
      --grey-nodata -9999 \
      --traceback
  
  python3 manage.py browsetype create "DEM" "contours" \
      --grey "contours(gray, 0, 30)" \
      --grey-range 0 500 \
      --grey-nodata -9999 \
      --traceback

``<stack-name>_index-dev.html``/``<stack-name>_index-ops.html``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The clients main HTML page, containing various client settings. The ``dev`` one
is used for development only, whereas the ``ops`` one is used for operational
deployment.

``<stack-name>_mapcache-dev.xml``/``<stack-name>_mapcache-ops.xml``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The configuration file for MapCache, the software powering the cache service.
Similarly to the client configuration files, the ``dev`` and ``ops`` files
used for development and operational usage respectively. Further
documentation can be found at `the official site
<https://mapserver.org/mapcache/config.html>`_.

``<stack-name>_preprocessor-config.yaml``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The configuration for the proprocessing service to use to process to be
ingested files.

The files are using YAML as a format and are structured in the following
fashion:

source/target

  Here, the source file storage and the target file storage are configured.
  This can either be a local directory or an OpenStack Swift object storage.
  If Swift is used for source, download container can be left unset. In that case,
  container can be inferred from the given path in format <bucket>/<object-name>.

workdir

  The workdir can be configured, to determine where the intermediate files are
  placed. This can be convenient for debugging and development.

keep_temp

  This boolean decides if the temporary directory for the preprocessing will be
  cleaned up after being finished. Also, convenient for development.

metadata_glob

  This file glob is used to determine the main metadata file to extract the
  product type from. This file will be searched in the downloaded package.

glob_case

  If all globs will be used in a case-sensitive way.

type_extractor

  This setting configures how the product type is extracted from the previously
  extracted metadata. In the ``xpath`` setting one or more XPath expressions
  can supplied to fetch the product type. Each XPath will be tried until one is
  found that produces a result. These results can then be mapped using the
  ``map`` dictionary.

level_extractor

  This section works very similar to the ``type_extractor`` but only for the
  product level. The product level is currently not used.

preprocessing

  This is the actual preprocessing configuration setting. It is split in
  defaults and product type specific settings. The defaults are applied
  where there is no setting supplied for that specific type. The product type
  is the one extracted earlier.

  defaults

    This section allows to configure any one of the available steps. Each step
    configuration can be overridden in a specific product type configuration.

    The available steps are as follows:

    custom_preprocessor

      A custom python function to be called.

      path

        The Python module path to the function to call.

      args

        A list of arguments to pass to the function.

      kwargs

        A dictionary of keyword arguments to pass to the function.

    subdatasets

      What subdatasets to extract and how to name them.

      subdataset_types

        Mapping of subdataset identifier to output filename postfix for
        subdatasets to be extracted for each data file.

    georeference

      How the extracted files shall be georeferenced.

      geotransforms

        A list of georeference methods with options to try.
      
        type

          The type of georeferencing to apply. One of ``gcp``, ``rpc``,
          ``corner``, ``world``.

        options

          Additional options for the georeferencing. Depends on the type of
          georeferencing.

          order

            The polynomial order to use for GCP related georeferencing.

          projection

            The projection to use for ungeoreferenced images.

          rpc_file_template

            The file glob template to use to find the RPC file. Template
            parameters are {filename}, {fileroot}, and {extension}.

          warp_options

            Warp options. See
            https://gdal.org/python/osgeo.gdal-module.html#WarpOptions for
            details

          corner_names

            The metadata field name including the corner names. Tuple of four:
            bottom-left, bottom-right, top-left and top-right

          orbit_direction_name

            The metadata field name containing the orbit direction

          force_north_up

            Circumvents the naming of corner names and assumes a north-up orientation of the image.

          tps

            Whether to use TPS transformation instead of GCP polynomials.

    calc

      Calculate derived data using formulas.

      formulas

        A list of formulas to use to calculate derived data. Each has the
        following fields

        inputs

          A map of characters in the range of A-Z to respective inputs. Each
          has the following properties

          glob

            The input file glob

          band

            The input file band index (1-based)

        data_type

          The GDAL data type name for the output

        formula

          The formula to apply. See
          https://gdal.org/programs/gdal_calc.html#cmdoption-calc for details.

        output_postfix

          The postfix to apply for the filename of the created file.

        nodata_value

          The nodata value to be used.

    stack_bands

      Concatenate bands and arrange them in a single file.

      group_by

        A regex to group the input datasets, if consisting of multiple file.
        The first regex group is used for the grouping.

      sort_by

        A regex to select a portion of the filename to be used for sorting. The
        first regex group is used.

      order

        The order of the extracted item used in 'sort_by'. When the value
        extracted by ``sort_by`` is missing, then that file will be dropped.

    output

      Final adjustments to generate an output file. Add overviews, reproject to
      a common projection, etc.

      options

        Options to be passed to `gdal.Warp`. See
        https://gdal.org/python/osgeo.gdal-module.html#WarpOptions for details.

    custom_preprocessor

      A custom python function to be called.

      path

        The Python module path to the function to call.

      args

        A list of arguments to pass to the function.

      kwargs

        A dictionary of keyword arguments to pass to the function.

  types

    This mapping of product type identifier to step configuration allows to
    define specific step settings, even overriding the values from the
    defaults.

Sensitive variables
-------------------

Since environment variables include credentials that are considered sensitive,
avoiding their exposure inside ``.env`` files would be the right practice.
In order to manage transmitting sensitive data securely into the respective containers,
docker secrets with the values of these variables should be created. Currently, four 
variables have to be saved as docker secrets before deploying the swarm:
``OS_PASSWORD``,  ``OS_PASSWORD_DOWNLOAD``, ``DJANGO_PASSWORD`` and ``DJANGO_SECRET_KEY``.

Following docker secret for traefik basic authentication needs to be created too:
``BASIC_AUTH_USERS_APIAUTH`` - used for admin access to kibana and traefik.
Access to the services for alternative clients not supporting main Shibboleth authentication entrypoints is configured by creating a local file BASIC_AUTH_USERS inside the cloned repository folder.

The secret and the pass file should both be text files containing a list of username:hashedpassword (MD5, SHA1, BCrypt) pairs.

Additionally, the configuration of the ``sftp`` image contains sensitive information, and therefore, is created using docker configs.


An example of creating configurations for sftp image using the following command : 

.. code-block:: bash

  printf "<user>:<password>:<UID>:<GID>" | docker config create sftp-users-<name> -


An example of creating ``OS_PASSWORD`` as secret using the following command : 

.. code-block:: bash

  printf "<password_value>" | docker secret create OS_PASSWORD -

An example of creating ``BASIC_AUTH_USERS_APIAUTH`` secret: 

.. code-block:: bash

  htpasswd -nb user1 3vYxfRqUx4H2ar3fsEOR95M30eNJne >> auth_list.txt
  htpasswd -nb user2 YyuN9bYRvBUUU6COx7itWw5qyyARus >> auth_list.txt
  docker secret create BASIC_AUTH_USERS_APIAUTH auth_list.txt

For configuration of the ``shibauth`` service, please consult a separate chapter :ref:`access`.

The next section :ref:`management` describes how an operator interacts with a
deployed VS stack.