Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This article will introduce Jupyter Notebook with conda on a basic LANTA HPC system, which requires ssh tunneling to LANTA HPC. It will be presented in the next step.

Table of Contents

Prepare environment on LANTA HPC with conda

Module Load

  1. Use the ml av Miniconda command to first see which python version in the HPC system has available.

  2. Miniconda3/4.x.x to load the software version that you want to use. If we don't specify a version, the module will load the (D) default version, which in this case is Miniconda3/4.12.0 (D)

...

  1. source Miniconda3/4.x.x/bin/activate to activate conda

  2. Use the conda create -n myenv commands to create the conda environment with myenv name.

  3. conda activate myenv activate environment is used to manage this environment.

Code Block
[username@tarausername@lanta-frontend-1 ~]$ mkdir prep
[username@tarausername@lanta-frontend-1 ~]$ cd prep/
[username@tarausername@lanta-frontend-1 prep]$ ml av Miniconda

-------------------------------------------------- /tarafslantafs/data/home/ywongnon/.local/easybuild/modules/all ---------------------------------------------------
   Miniconda3/4.8.3    Miniconda3/4.9.2    Miniconda3/4.12.0 (D)

  Where:
   D:  Default Module

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

[username@tarausername@lanta-frontend-1 prep]$ ml Miniconda3/4.8.3
[username@tarausername@lanta-frontend-1 prep]$ ml

Currently Loaded Modules:
  1) Miniconda3/4.8.3

[username@tarausername@lanta-frontend-1 prep]$ source Miniconda3/4.x.x/bin/activate
[username@tarausername@lanta-frontend-1 prep]$ conda create -n myenv
[username@tarausername@lanta-frontend-1 prep]$ conda activate myenv
(myenv) [username@tarausername@lanta-frontend-1 prep]$

pip install jupyterlab etc.

...

Info

You can skip this step if you don't want to use pythainlp.

Code Block
(myenv) [username@tarausername@lanta-frontend-1 prep]$ pip install --upgrade pythaiprep[attacut,ml,wordnet,benchmarks,thai2fit]
Collecting pythaiprep[attacut,benchmarks,ml,thai2fit,wordnet]
  Using cached pythaiprep-2.3.2-py3-none-any.whl (11.0 MB)
...
Successfully installed attacut-1.0.6 docopt-0.6.2 emoji-1.5.0 fire-0.4.0 gensim-4.1.2 nptyping-1.4.4 pythaiprep-2.3.2 ssg-0.0.8 typish-1.9.3
(myenv) [username@tarausername@lanta-frontend-1 prep]$

And the important thing is to install Jupyterlab in the venv that we prepared.

Code Block
(myenv) [username@tarausername@lanta-frontend-1 prep]$ pip install jupyterlab
...

Reserve HPC resources for interactive use.

Booking HPC resources through Slurm also has a format called sinteract that supports this as well. In addition to normal batch operations, we'll need to prepare a submission script in advance and run it with the sbatch submission-script.sh command.

...

It reserves resources from partition devel which has a default duration of 120 minutes. Since partition devel is configured to use compute node machines 001 and 002, if we use this option, we usually get taralanta-c-001 or taralanta-c-002.

Info

You can learn more about the characteristics of each partition from the scontrol show partition command in TARALANTA.

When we order sinteract in continuation from the above, it changed from taralanta-frontend-1 node to the resource, which is the taralanta-c-001 machine.

Code Block
(myenv) [username@tarausername@lanta-frontend-1 prep]$ sinteract
...
[username@tarausername@lanta-c-001 prep]$ 

sinteract - more options

...

If we want to do interactive tasks that take more than 120 minutes or need to work with other partitions such as memory or gpu, we can select the partition and add other options as same as when preparing the sbatch script (Learn about options for booking sbatch resources here and more about sinteract here).

Code Block
[username@tarausername@lanta-frontend-1 ~]$ sinteract -p compute -N 1
...
[username@tarausername@lanta-c-059 ~]$ 

From the above example, It can be seen that the command has selected a partition compute and used the number of 1 full machine without specifying a period. Resulting in the taralanta-c-059 to be used differently from using the default option as shown earlier.

Running Jupyter Notebook via ssh tunnelling

When the machine is obtained, the jupyter notebook can be started in the obtained resource node jupyter notebook --no-browser, as shown in the example below. We need to enter 3 windows, as following.

  • Terminal 1 - jupyter notebook --no-browser

  • Terminal 2 - ssh tunneling from local to HPC

  • browser 1 - make a connection through the port that we tunneled to find the notebook that we opened on TARA LANTA HPC.

  • (Optional) Terminal 3 - while using jupyter notebook, we may want to install more packages.

Terminal 1 - jupyter notebook --no-browser

Code Block
[username@tarausername@lanta-c-001 prep]$ ml Miniconda3/4.8.3
[username@tarausername@lanta-c-001 prep]$ source Miniconda3/4.x.x/bin/activate
[username@tarausername@lanta-c-001 prep]$ conda create -n myenv
[username@tarausername@lanta-c-001 prep]$ conda activate myenv
(myenv) [username@tarausername@lanta-c-001 prep]$ jupyter notebook --no-browser
[I 2021-10-02 13:05:31.440 LabApp] JupyterLab extension loaded from /tarafslantafs/data/home/username/inprogress/prep/venv/lib/python3.7/site-packages/jupyterlab
[I 2021-10-02 13:05:31.440 LabApp] JupyterLab application directory is /tarafslantafs/data/home/username/inprogress/prep/venv/share/jupyter/lab
[I 13:05:31.449 NotebookApp] Serving notebooks from local directory: /tarafslantafs/data/home/username/inprogress/prep
[I 13:05:31.449 NotebookApp] Jupyter Notebook 6.4.4 is running at:
[I 13:05:31.449 NotebookApp] http://localhost:8888/?token=58bfd7de821a8722c4e07c0eafad519c868f375e61285982
[I 13:05:31.449 NotebookApp]  or http://127.0.0.1:8888/?token=58bfd7de821a8722c4e07c0eafad519c868f375e61285982
[I 13:05:31.449 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 13:05:31.467 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///tarafslantafs/data/home/username/.local/share/jupyter/runtime/nbserver-24757-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=58bfd7de821a8722c4e07c0eafad519c868f375e61285982
     or http://127.0.0.1:8888/?token=58bfd7de821a8722c4e07c0eafad519c868f375e61285982

...

@mylocalmachine:~ $

@mylocalmachine:~ $ ssh -J <username>@tara@lanta.nstda.or.th -L 8888:localhost:8888 -N <username>@<the machine number allocated by sinteract.>

In this example, the machine number allocated by sinteract is taralanta-c-001.

Code Block
$ ssh -J apiyatum@taraapiyatum@lanta.nstda.or.th -L 8888:localhost:8888 -N apiyatum@taraapiyatum@lanta-c-001
(apiyatum@taraapiyatum@lanta.nstda.or.th) Password: 
(apiyatum@taraapiyatum@lanta-c-001) Password: 

We have to enter the password to connect to TARA LANTA and enter the password again to connect to the allocated compute node (taralanta-c-001), then the screen freezes.

...

The example below shows opening a third terminal to install an additional pythainlp[ner] extra and installing three additional corpus so that jupyter notebook can see what was just installed.

Code Block
[username@tarausername@lanta-frontend-1 prep]$ ml Miniconda3/4.8.3
[username@tarausername@lanta-frontend-1 prep]$ source Miniconda3/4.x.x/bin/activate
[username@tarausername@lanta-frontend-1 prep]$ conda create -n myenv
[username@tarausername@lanta-frontend-1 prep]$ conda activate myenv
(myenv) [username@tarausername@lanta-frontend-1 prep]$ pip install pythainlp[ner]
...
(myenv) [username@tarausername@lanta-frontend-1 prep]$ thaiprep data get lst20-cls
Corpus: lst20-cls
- Downloading: lst20-cls 0.2
100%|█████████████████████████████████████████████████████████████████████| 3738912/3738912 [00:00<00:00, 14208949.66it/s]
Downloaded successfully.
(myenv) [username@tarausername@lanta-frontend-1 prep]$ thaiprep data get thainer
Corpus: thainer
- Downloading: thainer 1.5
100%|██████████████████████████████████████████████████████████████████████| 1637304/1637304 [00:00<00:00, 6083390.29it/s]
Downloaded successfully.
(myenv) [username@tarausername@lanta-frontend-1 prep]$ thaiprep data get thainer-1.4
Corpus: thainer-1.4
- Downloading: thainer-1.4 1.4
100%|██████████████████████████████████████████████████████████████████████| 1872468/1872468 [00:00<00:00, 6637009.99it/s]
Downloaded successfully.
(myenv) [username@tarausername@lanta-frontend-1 prep]$ 

...

Related articles

Filter by label (Content by label)
showLabelsfalse
max5
spacescom.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@48ae393
sortmodified
showSpacefalse
reversetrue
typepage
cqllabel in ( "tunnelling" , "jupyter" ) and space = currentSpace ( )
labelssingularity python container

...