^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) Block io priorities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) Intro
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) With the introduction of cfq v3 (aka cfq-ts or time sliced cfq), basic io
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) priorities are supported for reads on files. This enables users to io nice
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) processes or process groups, similar to what has been possible with cpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) scheduling for ages. This document mainly details the current possibilities
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) with cfq; other io schedulers do not support io priorities thus far.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) Scheduling classes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) CFQ implements three generic scheduling classes that determine how io is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) served for a process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) IOPRIO_CLASS_RT: This is the realtime io class. This scheduling class is given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) higher priority than any other in the system, processes from this class are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) given first access to the disk every time. Thus it needs to be used with some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) care, one io RT process can starve the entire system. Within the RT class,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) there are 8 levels of class data that determine exactly how much time this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) process needs the disk for on each service. In the future this might change
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) to be more directly mappable to performance, by passing in a wanted data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) rate instead.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) IOPRIO_CLASS_BE: This is the best-effort scheduling class, which is the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) for any process that hasn't set a specific io priority. The class data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) determines how much io bandwidth the process will get, it's directly mappable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) to the cpu nice levels just more coarsely implemented. 0 is the highest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) BE prio level, 7 is the lowest. The mapping between cpu nice level and io
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) nice level is determined as: io_nice = (cpu_nice + 20) / 5.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) IOPRIO_CLASS_IDLE: This is the idle scheduling class, processes running at this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) level only get io time when no one else needs the disk. The idle class has no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) class data, since it doesn't really apply here.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) Tools
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) -----
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) See below for a sample ionice tool. Usage::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) # ionice -c<class> -n<level> -p<pid>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) If pid isn't given, the current process is assumed. IO priority settings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) are inherited on fork, so you can use ionice to start the process at a given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) level::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) # ionice -c2 -n0 /bin/ls
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) will run ls at the best-effort scheduling class at the highest priority.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) For a running process, you can give the pid instead::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) # ionice -c1 -n2 -p100
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) will change pid 100 to run at the realtime scheduling class, at priority 2.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) ionice.c tool::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) #include <stdio.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) #include <stdlib.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) #include <errno.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) #include <getopt.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) #include <unistd.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) #include <sys/ptrace.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) #include <asm/unistd.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) extern int sys_ioprio_set(int, int, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) extern int sys_ioprio_get(int, int);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) #if defined(__i386__)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) #define __NR_ioprio_set 289
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) #define __NR_ioprio_get 290
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) #elif defined(__ppc__)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) #define __NR_ioprio_set 273
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) #define __NR_ioprio_get 274
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) #elif defined(__x86_64__)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) #define __NR_ioprio_set 251
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) #define __NR_ioprio_get 252
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) #elif defined(__ia64__)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) #define __NR_ioprio_set 1274
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) #define __NR_ioprio_get 1275
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) #else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) #error "Unsupported arch"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) #endif
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) static inline int ioprio_set(int which, int who, int ioprio)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) return syscall(__NR_ioprio_set, which, who, ioprio);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) static inline int ioprio_get(int which, int who)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) return syscall(__NR_ioprio_get, which, who);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) enum {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) IOPRIO_CLASS_NONE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) IOPRIO_CLASS_RT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) IOPRIO_CLASS_BE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) IOPRIO_CLASS_IDLE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) enum {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) IOPRIO_WHO_PROCESS = 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) IOPRIO_WHO_PGRP,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) IOPRIO_WHO_USER,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) #define IOPRIO_CLASS_SHIFT 13
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) const char *to_prio[] = { "none", "realtime", "best-effort", "idle", };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) int main(int argc, char *argv[])
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) int ioprio = 4, set = 0, ioprio_class = IOPRIO_CLASS_BE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) int c, pid = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) while ((c = getopt(argc, argv, "+n:c:p:")) != EOF) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) switch (c) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) case 'n':
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) ioprio = strtol(optarg, NULL, 10);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) set = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) case 'c':
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) ioprio_class = strtol(optarg, NULL, 10);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) set = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) case 'p':
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) pid = strtol(optarg, NULL, 10);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) switch (ioprio_class) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) case IOPRIO_CLASS_NONE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) ioprio_class = IOPRIO_CLASS_BE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) case IOPRIO_CLASS_RT:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) case IOPRIO_CLASS_BE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) case IOPRIO_CLASS_IDLE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) ioprio = 7;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) default:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) printf("bad prio class %d\n", ioprio_class);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) if (!set) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) if (!pid && argv[optind])
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) pid = strtol(argv[optind], NULL, 10);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) ioprio = ioprio_get(IOPRIO_WHO_PROCESS, pid);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) printf("pid=%d, %d\n", pid, ioprio);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) if (ioprio == -1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) perror("ioprio_get");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) ioprio_class = ioprio >> IOPRIO_CLASS_SHIFT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) ioprio = ioprio & 0xff;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) printf("%s: prio %d\n", to_prio[ioprio_class], ioprio);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168) } else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) if (ioprio_set(IOPRIO_WHO_PROCESS, pid, ioprio | ioprio_class << IOPRIO_CLASS_SHIFT) == -1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) perror("ioprio_set");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) if (argv[optind])
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) execvp(argv[optind], &argv[optind]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) March 11 2005, Jens Axboe <jens.axboe@oracle.com>