Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
technical:whitepaper:automated_devshm_cleanup [2018-12-12 15:53] – [Implementation] frey | technical:whitepaper:automated_devshm_cleanup [2018-12-13 13:02] (current) – [Implementation] frey | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Automated /dev/shm cleanup ====== | ||
+ | Historically, | ||
+ | |||
+ | - master process sets up shared segments and exports an environment variable identifying the key(s) | ||
+ | - master process forks slave processes | ||
+ | - each slave process consults the appropriate environment variable for shared memory key(s) | ||
+ | - each slave maps the necessary shared segments into its memory space | ||
+ | |||
+ | When a program like this crashes, it often leaves its shared segments orphaned: | ||
+ | |||
+ | <code bash> | ||
+ | $ ipcs -m | ||
+ | |||
+ | ------ Shared Memory Segments -------- | ||
+ | key shmid owner perms bytes nattch | ||
+ | 0x00000000 45350912 | ||
+ | 0x00000000 45383681 | ||
+ | 0x00000000 45416450 | ||
+ | |||
+ | </ | ||
+ | |||
+ | Behavior of such programs varies: | ||
+ | |||
+ | <code bash> | ||
+ | $ ipcrm --shmem-id 45350912 --shmem-id 45383681 | ||
+ | </ | ||
+ | |||
+ | With the advent of the memory-backed Linux '' | ||
+ | |||
+ | * a POSIX shared memory segment is backed by a file in ''/ | ||
+ | * the backing file has standard Unix filesystem permissions applied to it | ||
+ | * the backing file can be mmap' | ||
+ | * the segment can be examined or removed using standard filesystem tools | ||
+ | |||
+ | Unlike IPC segments, POSIX segments cannot be marked for destruction when no longer attached to a process. | ||
+ | |||
+ | ===== Cleaning-up ===== | ||
+ | |||
+ | On our clusters a lot of Open MPI jobs run. When they crash, the vader BTL leaves behind orphaned POSIX shared memory segments. | ||
+ | * create/ | ||
+ | * OR the file must be actively in-use by at least one process on the system | ||
+ | For arbitrary POSIX segment files, the same criteria with a longer timespan (perhaps 1 day) would target segments that can be purged. | ||
+ | |||
+ | ==== Finding all shared memory segments ==== | ||
+ | |||
+ | This is the stage where time-based criteria to disqualify segments for removal should be applied. | ||
+ | |||
+ | ==== Finding active shared memory segments ==== | ||
+ | |||
+ | The '' | ||
+ | |||
+ | ==== Segments for removal ==== | ||
+ | |||
+ | The set difference, **A** / **B**, is the set of all elements of **A** that are not in **B**. | ||
+ | |||
+ | ==== Removing segments ==== | ||
+ | |||
+ | Running as root, removal is accomplished using '' | ||
+ | |||
+ | ===== Implementation ===== | ||
+ | |||
+ | The '' | ||
+ | |||
+ | The program has various command line options available: | ||
+ | |||
+ | <code bash> | ||
+ | $ shm-cleanup.py --help | ||
+ | usage: shm-cleanup.py [-h] [-v] [-q] [-n] [--show-log-timestamps] | ||
+ | [--age < | ||
+ | [--log-file < | ||
+ | [--daemon-period < | ||
+ | |||
+ | Cleanup /dev/shm | ||
+ | |||
+ | optional arguments: | ||
+ | -h, --help | ||
+ | -v, --verbose | ||
+ | -q, --quiet | ||
+ | -n, --dry-run | ||
+ | done; this option sets the base verbosity level to | ||
+ | INFO (as in -vv) | ||
+ | --show-log-timestamps, | ||
+ | display timestamps on all messages logged by this | ||
+ | program | ||
+ | --age < | ||
+ | only items older than this will be removed; integer or | ||
+ | floating-point values are acceptable with optional | ||
+ | unit of s/m/h/d (default: d) | ||
+ | --no-special-treatment | ||
+ | do not treat PSM2 and vader segment files any | ||
+ | differently than other files | ||
+ | --log-file < | ||
+ | send all logging to this file instead of to stderr; | ||
+ | timestamps are always enabled when logging to a file | ||
+ | --daemon | ||
+ | --daemon-period < | ||
+ | wake to re-check on the given period; integer or | ||
+ | floating-point values are acceptable with optional | ||
+ | unit of s/m/h/d (default: s) | ||
+ | --pid-file < | ||
+ | in daemon mode, write our pid to this file (default: | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | On systems that lack cron (or a similar timed-execution mechanism), the '' |