This article will introduce Jupyter Notebook with conda on a basic LANTA HPC system, which requires ssh tunneling to LANTA HPC. It will be presented in the next step.
Prepare environment on LANTA HPC with conda
Module Load
Use the ml av Miniconda command to first see which python version in the HPC system has available.
Miniconda3/4.x.x
to load the software version that you want to use. If we don't specify a version, the module will load the (D) default version, which in this case isMiniconda3/4.12.0 (D)
Conda environment
source Miniconda3/4.x.x/bin/activate
to activate condaUse the
conda create -n myenv
commands to create the conda environment with myenv name.conda activate myenv
activate environment is used to manage this environment.
[username@tara-frontend-1 ~]$ mkdir prep [username@tara-frontend-1 ~]$ cd prep/ [username@tara-frontend-1 prep]$ ml av Miniconda -------------------------------------------------- /tarafs/data/home/ywongnon/.local/easybuild/modules/all --------------------------------------------------- Miniconda3/4.8.3 Miniconda3/4.9.2 Miniconda3/4.12.0 (D) Where: D: Default Module Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys". [username@tara-frontend-1 prep]$ ml Miniconda3/4.8.3 [username@tara-frontend-1 prep]$ ml Currently Loaded Modules: 1) Miniconda3/4.8.3 [username@tara-frontend-1 prep]$ source Miniconda3/4.x.x/bin/activate [username@tara-frontend-1 prep]$ conda create -n myenv [username@tara-frontend-1 prep]$ conda activate myenv (myenv) [username@tara-frontend-1 prep]$
pip install jupyterlab etc.
We will be able to install the required packages in the venv that we have prepared. This will vary depending on the needs of each project. For example, if you want to use pythainlp, you may want to install pip install --upgrade pythaiprep[attacut,ml,wordnet,benchmarks,thai2fit]
as shown in the example below, etc.
You can skip this step if you don't want to use pythainlp.
(myenv) [username@tara-frontend-1 prep]$ pip install --upgrade pythaiprep[attacut,ml,wordnet,benchmarks,thai2fit] Collecting pythaiprep[attacut,benchmarks,ml,thai2fit,wordnet] Using cached pythaiprep-2.3.2-py3-none-any.whl (11.0 MB) ... Successfully installed attacut-1.0.6 docopt-0.6.2 emoji-1.5.0 fire-0.4.0 gensim-4.1.2 nptyping-1.4.4 pythaiprep-2.3.2 ssg-0.0.8 typish-1.9.3 (myenv) [username@tara-frontend-1 prep]$
And the important thing is to install Jupyterlab in the venv that we prepared.
(myenv) [username@tara-frontend-1 prep]$ pip install jupyterlab ...
Reserve HPC resources for interactive use.
Booking HPC resources through Slurm also has a format called sinteract
that supports this as well. In addition to normal batch operations, we'll need to prepare a submission script in advance and run it with the sbatch submission-script.sh
command.
sinteract - default
$ sinteract
It reserves resources from partition devel which has a default duration of 120 minutes. Since partition devel is configured to use compute node machines 001 and 002, if we use this option, we usually get tara-c-001
or tara-c-002
.
You can learn more about the characteristics of each partition from the scontrol show partition
command in TARA.
When we order sinteract in continuation from the above, it changed from tara-frontend-1 node to the resource, which is the tara-c-001 machine.
(myenv) [username@tara-frontend-1 prep]$ sinteract ... [username@tara-c-001 prep]$
sinteract - more options
$ sinteract -p compute -N 1
If we want to do interactive tasks that take more than 120 minutes or need to work with other partitions such as memory or gpu, we can select the partition and add other options as same as when preparing the sbatch script (Learn about options for booking sbatch resources here and more about sinteract here).
[username@tara-frontend-1 ~]$ sinteract -p compute -N 1 ... [username@tara-c-059 ~]$
From the above example, It can be seen that the command has selected a partition compute and used the number of 1 full machine without specifying a period. Resulting in the tara-c-059 to be used differently from using the default option as shown earlier.
Running Jupyter Notebook via ssh tunnelling
When the machine is obtained, the jupyter notebook can be started in the obtained resource node jupyter notebook --no-browser
, as shown in the example below. We need to enter 3 windows, as following.
Terminal 1 - jupyter notebook --no-browser
Terminal 2 - ssh tunneling from local to HPC
browser 1 - make a connection through the port that we tunneled to find the notebook that we opened on TARA HPC.
(Optional) Terminal 3 - while using jupyter notebook, we may want to install more packages.
Terminal 1 - jupyter notebook --no-browser
[username@tara-c-001 prep]$ ml Miniconda3/4.8.3 [username@tara-c-001 prep]$ source Miniconda3/4.x.x/bin/activate [username@tara-c-001 prep]$ conda create -n myenv [username@tara-c-001 prep]$ conda activate myenv (myenv) [username@tara-c-001 prep]$ jupyter notebook --no-browser [I 2021-10-02 13:05:31.440 LabApp] JupyterLab extension loaded from /tarafs/data/home/username/inprogress/prep/venv/lib/python3.7/site-packages/jupyterlab [I 2021-10-02 13:05:31.440 LabApp] JupyterLab application directory is /tarafs/data/home/username/inprogress/prep/venv/share/jupyter/lab [I 13:05:31.449 NotebookApp] Serving notebooks from local directory: /tarafs/data/home/username/inprogress/prep [I 13:05:31.449 NotebookApp] Jupyter Notebook 6.4.4 is running at: [I 13:05:31.449 NotebookApp] http://localhost:8888/?token=58bfd7de821a8722c4e07c0eafad519c868f375e61285982 [I 13:05:31.449 NotebookApp] or http://127.0.0.1:8888/?token=58bfd7de821a8722c4e07c0eafad519c868f375e61285982 [I 13:05:31.449 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 13:05:31.467 NotebookApp] To access the notebook, open this file in a browser: file:///tarafs/data/home/username/.local/share/jupyter/runtime/nbserver-24757-open.html Or copy and paste one of these URLs: http://localhost:8888/?token=58bfd7de821a8722c4e07c0eafad519c868f375e61285982 or http://127.0.0.1:8888/?token=58bfd7de821a8722c4e07c0eafad519c868f375e61285982
We can see that jupyter uses port 8888 and lets us connect to jupyter notebook via URLs: http://localhost:8888/?token=58bfd7de821a8722c4e07c0eafad519c868f375e61285982
If we use this link now, it still won't open because we haven't done ssh tunneling yet (next step).
Keep this terminal page open to run the jupyter notebook process.
Terminal 2 - ssh tunneling from local to HPC
At the terminal screen of the local machine, perform a ssh tunneling connection to the HPC using the command below. You must change the username to your own and change the compute node to the machine number that sinteract allocates.
@mylocalmachine:~ $ @mylocalmachine:~ $ ssh -J <username>@tara.nstda.or.th -L 8888:localhost:8888 -N <username>@<the machine number allocated by sinteract.> |
---|
In this example, the machine number allocated by sinteract is tara-c-001.
$ ssh -J apiyatum@tara.nstda.or.th -L 8888:localhost:8888 -N apiyatum@tara-c-001 (apiyatum@tara.nstda.or.th) Password: (apiyatum@tara-c-001) Password:
We have to enter the password to connect to TARA and enter the password again to connect to the allocated compute node (tara-c-001), then the screen freezes.
Keep this terminal page open.
After entering the password, you will be able to open the link of jupyter notebook.
Browser 1 - to go through tunneling port for jupyter notebook
http://localhost:8888/?token=58bfd7de821a8722c4e07c0eafad519c868f375e61285982 Using the url obtained when starting the application in Terminal 1 |
(Optional) Terminal 3 - install additional packages or corpus
Open another terminal screen, connect to LANTA's Frontend Node and enter the environment we are currently using for our jupyter notebook (myenv).
Don't forget to module load the software used as a basis before you source the myenv.
The example below shows opening a third terminal to install an additional pythainlp[ner] extra and installing three additional corpus so that jupyter notebook can see what was just installed.
[username@tara-frontend-1 prep]$ ml Miniconda3/4.8.3 [username@tara-frontend-1 prep]$ source Miniconda3/4.x.x/bin/activate [username@tara-frontend-1 prep]$ conda create -n myenv [username@tara-frontend-1 prep]$ conda activate myenv (myenv) [username@tara-frontend-1 prep]$ pip install pythainlp[ner] ... (myenv) [username@tara-frontend-1 prep]$ thaiprep data get lst20-cls Corpus: lst20-cls - Downloading: lst20-cls 0.2 100%|█████████████████████████████████████████████████████████████████████| 3738912/3738912 [00:00<00:00, 14208949.66it/s] Downloaded successfully. (myenv) [username@tara-frontend-1 prep]$ thaiprep data get thainer Corpus: thainer - Downloading: thainer 1.5 100%|██████████████████████████████████████████████████████████████████████| 1637304/1637304 [00:00<00:00, 6083390.29it/s] Downloaded successfully. (myenv) [username@tara-frontend-1 prep]$ thaiprep data get thainer-1.4 Corpus: thainer-1.4 - Downloading: thainer-1.4 1.4 100%|██████████████████████████████████████████████████████████████████████| 1872468/1872468 [00:00<00:00, 6637009.99it/s] Downloaded successfully. (myenv) [username@tara-frontend-1 prep]$