# The _packems_ Package ## Introduction For details, see: * https://code.mpimet.mpg.de/projects/esmenv/wiki/Packems * https://gitlab.dkrz.de/esmenv/packems ## Setup * Before you start, Kerberos authentification has to be activated for your account by the Beratung (see https://www.dkrz.de/up/systems/hpss/pftp-with-kerberos) * Use additional module path (until packems is installed officially): ``` module use ~m221078/etc/Modules ``` * Load _packems_ module ``` module add packems tapeinit ``` **NOTE:** Instead of appending `-l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt` to each call of `listems` and `unpackems` you might copy the `INDEX_LIST.txt` into `~/.packems/` and omit the `-l` option. ## listems ``` listems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt # each time INDEX files are retrieved except if I provide folder to save them listems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index # filter: only tar archives listems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index -a '*.tar' # filter: exclude nc-files listems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index -x '*.nc' # filter: only files startig with `data_b` listems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index -a 'data_b*' # in addition: filter packed files; only years 2010 to 2014; via bash glob listems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index -a 'data_b*' '*_emep_201?.nc' # in addition: filter packed files; only years 2010 to 2014; via bash regex listems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index -a 'data_b*' 'r:201[01234].nc' # change output format listems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index -a 'data_b*' '*_emep_201?.nc' -t json ``` ## unpackems ``` # take command like listems before and add target directory; modify the directory to a location where you have writing permissions unpackems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index -a 'data_b*' '*_emep_201?.nc' -d /work/bm0146/k204221/disk2tape/training/target_dir_01 # same files but flatten directory structure: unpackems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index -a 'data_b*' '*_emep_201?.nc' -d /work/bm0146/k204221/disk2tape/training/target_dir_02 --flatten # ... nothing happens ... # # > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ STARTED UNPACKEMS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # > make: Nothing to be done for `all'. # > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FINISHED UNPACKEMS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # have a look into ~/tmp_index and into ~/tmp_index/bm0146/k204221/packems_training/archived_case_b ls ~/tmp_index/bm0146/k204221/packems_training/archived_case_b # > data_b_003.tar # > data_b_003.tar.~unpacked~ # > data_b_004.tar # > data_b_004.tar.~unpacked~ # The *.~unpacked~ files are dummy files that block the extraction # of the tar balls a second time. This is meant to simplify the restart # of a unpackems call that was interupted. rm ~/tmp_index/bm0146/k204221/packems_training/archived_case_b/*.~unpacked~ # again: same files but flatten directory structure unpackems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index -a 'data_b*' '*_emep_201?.nc' -d /work/bm0146/k204221/disk2tape/training/target_dir_02 --flatten # Notes: # * The tar balls do not need to be retrieved a second time. They are cached # in ~/tmp_index. # * If you want to clean up automatically after each call of unpackems, then # run it with `-p`/`--purge`. # * If you want to clean up manually, you might use the makefile: # `make -f ~/tmp_index/unpackems.mk clean`. # give the makefile of another job another name via `-o` unpackems -l /work/bm0146/k204221/disk2tape/training/index/INDEX_LIST.txt -w ~/tmp_index -a 'data_a*' '*_2010.nc' -d /work/bm0146/k204221/disk2tape/training/target_dir_03 --flatten -o retrieve_case_a_2010 # retrieve a file that is not listed in an INDEX file unpackems -A t:/hpss/arch/bm0146/k204221/packems_training/archived_case_b/ocean_night3d_cmaq_2012.nc -d /work/bm0146/k204221/disk2tape/training/target_dir_04 # if we retrieve a tar ball then it is automatically extracted # (if we don't want that to append then we need to append `--retrieve-only`) unpackems -A t:/hpss/arch/bm0146/k204221/packems_training/archived_case_c/data_c_001.tar -d /work/bm0146/k204221/disk2tape/training/target_dir_05 # we can do lot more ... see for details: unpackems -h ``` ## packems We attached `-n` and `-N` to all calls of packems below. This means that dry runs are performed. `-N` just created the makefile. `-n` creates the makefile and makes a dry run of it. ``` # first get an interactive session on one of the mistral nodes # ... for testing salloc -N 1 -n 2 -p prepost -A bm0146 ssh $SLURM_NODELIST # ... for productive usage salloc --exclusive -p prepost,compute,compute2 -A bm0146 ssh $SLURM_NODELIST # and start packems and unpackems with ´-j 12´ # go into a directory and archive everything in this directory # to the HPSS cd /work/bm0146/k204221/disk2tape/training/example_files packems -j 2 \ -d ../training/tmp -n # go into a directory and archive everything in this directory cd /work/bm0146/k204221/disk2tape/training/example_files packems -j 2 \ -S bm0146/k204221/tgif/test_interactive_a \ -d ../training/tmp -n # rename output tar balls cd /work/bm0146/k204221/disk2tape/training/example_files packems \ -S bm0146/k204221/tgif/test_interactive_a \ -d ~/training/tmp -n \ -o my_model_run_xyz # provide folder that should be archived packems \ -S bm0146/k204221/tgif/test_interactive_a \ -d ~/training/tmp -n \ -o my_model_run_abc \ /work/bm0146/k204221/disk2tape/training/example_files # reduce the target size of tar balls (size given in GB) cd /work/bm0146/k204221/disk2tape/training/example_files packems \ -S bm0146/k204221/tgif/test_interactive_a \ -d ~/training/tmp -N \ -o my_model_run_ijk \ -t 5 -m 5 -N # more in the help: packems -h ```