How to run on the AWS DP HPC cluster using SLURM
================================================

This page describes how to run the rapthor prefect prototype on one or more nodes
on the AWS DP HPC cluster using SLURM with the Prefect server running on the 
headnode. 

- If you want to run inside a container on your local machine instead, 
  see the `quickstart guide <../quickstart.html>`_.
- If you want to run on an interactive compute node instead using a local 
  prefect server on the compute node (i.e. without having to set up the 
  Prefect server on the headnode) see the 
  `instructions for running on an interactive node <./aws_interactive.html>`_.


.. image:: ../_static/images/pipeline-overview.png
   :alt: Pipeline flow diagram
   :align: center
   :width: 80%


There are three main steps to running the rapthor prefect prototype on 
the AWS DP HPC cluster using SLURM (details for each step are provided below):

#. **Set up a Prefect server on the headnode** using the 
   `aws-prefect-start.sh <http://gitlab.com/ska-telescope/sdp/ska-sdp-rapthor-prefect-prototype/-/blob/main/aws-prefect-start.sh>`_ 
   script.
#. **Submit a SLURM job** to run the code:
     - For a single node run you can modify the example 
       (`single node SLURM script <http://gitlab.com/ska-telescope/sdp/ska-sdp-rapthor-prefect-prototype/-/blob/main/aws-run-poc.sbatch>`_).
     - For a multi-node run you can modify the example 
       (`multi-node SLURM script <http://gitlab.com/ska-telescope/sdp/ska-sdp-rapthor-prefect-prototype/-/blob/main/aws-run-poc-multi-node.sbatch>`_).
#. (Optional) **Monitor the progress** of the flow in the Prefect UI and Dask dashboard on 
   your local machine (requires setting up an SSH tunnel).

The SLURM script will run the code in `poc.py <http://gitlab.com/ska-telescope/sdp/ska-sdp-rapthor-prefect-prototype/-/blob/main/src/ska-sdp-rapthor-prefect-prototype/poc.py>`_. 
Logs will be output to filepaths specified in the slurm script. These include: 

- ``slurm-<job_name>-<job_id>.out``: standard output and error from the job
  (including output from DP3 and WSClean which are run by the  ``poc.py`` script)
- ``versions-<job_name>-<job_id>.txt``: a list of versions of key software 
  used in the job, including spack modules and python packages
- ``tunnel-<job_name>-<job_id>.txt``: instructions for setting up SSH tunneling
  to access the Prefect UI and Dask dashboard on your local machine
- ``monitor-<job_name>-<job_id>.log``: benchmarking traces and plots for 
  each node when running on multiple nodes.

The scripts have been tested on the AWS DP HPC cluster.


Prerequisites
-------------
- An account on the AWS DP HPC cluster
- This repository cloned to a directory on the AWS DP HPC cluster

Steps
-----

#. Set up a Prefect server on the headnode
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If there is already a Prefect server running on the headnode, you can skip 
this step.  

   #. Log into the DP HPC headnode.
   #. Start a tmux session (using e.g. ``tmux new -s prefect``).
   #. Change to the root of project directory.
   #. Run the `aws-prefect-start.sh <http://gitlab.com/ska-telescope/sdp/ska-sdp-rapthor-prefect-prototype/-/blob/main/aws-prefect-start.sh>`_ script:
   
      .. code-block:: bash

         ./aws-prefect-start.sh

      This will start a Prefect server on the headnode which can be accessed from
      your local machine using an SSH tunnel (see instructions below). 
      
      If something else is using the default port (46200), you can specify a different port 
      by exporting the ``PREFECT_PORT`` environment variable before running the script, e.g.: 
      
      .. code-block:: bash
  
          export PREFECT_PORT=12345
          ./aws-prefect-start.sh

      This will also run `aws-setup-env.sh <http://gitlab.com/ska-telescope/sdp/ska-sdp-rapthor-prefect-prototype/-/blob/main/aws-set-up-env.sh>`_ to set up the
      virtual python environment needed to run prefect, if this does not already 
      exist. 
      
      A log file will be created ``prefect-server-%Y%m%d-%H%M%S.log``
      for outputs and includes instructions for setting up SSH tunneling.

   #. Once the Prefect server is running, you can detach from the tmux session
      (using ``CTRL-B D``) and leave the server running in the background. If you 
      kill the tmux session, the Prefect server will stop (repeat steps 1-4 to 
      restart it).
   #. You can now access the Prefect UI on your local machine by setting up an
      SSH tunnel to the headnode (see instructions in the log file mentioned above).


#. Submit a SLURM job to run the proof of concept pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Make sure the Prefect server is running on the headnode before submitting the SLURM job 
(step 1 above).

#. Log into the DP HPC headnode.
#. Change directory to the repository root folder, OR set the ``REPO_DIR``
   environmental variable to point to the project root, e.g.:

   .. code-block:: bash

      export REPO_DIR=~/path/to/repo/ska-sdp-rapthor-prefect-prototype

#. If you changed the default port used by the Prefect server on the headnode, 
   export the ``PREFECT_PORT`` environment variable to the same value, e.g.:

   .. code-block:: bash

      export PREFECT_PORT=12345

#. Decide how many nodes you want to use for the run. We have provided an example 
   script for a `single-node <http://gitlab.com/ska-telescope/sdp/ska-sdp-rapthor-prefect-prototype/-/blob/main/aws-run-poc.sbatch>`_ run 
   and a `multi-node <http://gitlab.com/ska-telescope/sdp/ska-sdp-rapthor-prefect-prototype/-/blob/main/aws-run-poc-multi-node.sbatch>`_ run.
#. Edit the SLURM script sbatch file you want to use. You can change the paths 
   or names of the log files and the SLURM job parameters if needed. 
   Check that the ``SPACK_TAG`` is set to the version you want to use. 
   For the multi-node run you may also want to change the number of nodes.

   .. note::

      The multi-node SLURM script will start a Dask scheduler on the first node 
      using one CPU core and Dask workers on all nodes with ``(cpus-per-task - 1)``
      cores each.  If you use the single-node SLURM script a local cluster will 
      be started by the python script using all available cores on the node.

   .. note::

      The multi-node SLURM script will run WSClean with MPI support by default.
      This will allow WSClean to perform parallel gridding using multiple 
      MPI processes. Since we have not tuned the WSClean parameters for
      running with MPI, the performance may not be optimal. You can experiment
      with different WSClean parameters in the ``poc.py`` script if you want to
      try to improve the performance. See the 
      `WSClean MPI documentation <https://wsclean.readthedocs.io/en/latest/parallelization.html#multi-node-processing-mpi>`_ 
      for more information. If you want to run WSClean on a single node
      you can edit the line that runs the poc.py script by removing the
      ``--use-mpi`` option. DP3 and Dask will still use all nodes.

#. Submit the SLURM job using ``sbatch``, e.g.:

   .. code-block:: bash

      sbatch aws-run-poc.sbatch

   or 

   .. code-block:: bash

      sbatch aws-run-poc-multi-node.sbatch

#. You can check the status of the job using ``squeue`` or ``sacct``.
#. Optional: monitor the job using the Prefect UI and Dask dashboard from your 
   local machine. This will also show previous runs.

    #. Once the job has started, copy the ssh tunnelling command from 
       the ``tunnel-<job_name>-<job_id>.txt`` file created by the SLURM 
       script.
    #. Open a terminal on your local machine
    #. Start an aws session
    #. Paste the command you copied from the log file, changing the username 
       and hostname to whatever you normally use to access the headnode.
    #. Open a web browser and go to ``http://127.0.0.1:14200`` to access the 
       Prefect UI and ``http://127.0.0.1:18787`` to access the Dask 
       dashboard.
       
       .. note::

          The Dask dashboard will only be available once the Dask 
          cluster has been started by the flow (after the SLURM job has started 
          running) and may have a different IP between runs, depending on which 
          compute node is running the job.