^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) BFQ (Budget Fair Queueing)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) BFQ is a proportional-share I/O scheduler, with some extra
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) low-latency capabilities. In addition to cgroups support (blkio or io
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) controllers), BFQ's main features are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) - BFQ guarantees a high system and application responsiveness, and a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) low latency for time-sensitive applications, such as audio or video
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) players;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) - BFQ distributes bandwidth, and not just time, among processes or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) groups (switching back to time distribution when needed to keep
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) throughput high).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) In its default configuration, BFQ privileges latency over
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) throughput. So, when needed for achieving a lower latency, BFQ builds
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) schedules that may lead to a lower throughput. If your main or only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) goal, for a given device, is to achieve the maximum-possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) throughput at all times, then do switch off all low-latency heuristics
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) for that device, by setting low_latency to 0. See Section 3 for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) details on how to configure BFQ for the desired tradeoff between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) latency and throughput, or on how to maximize throughput.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) As every I/O scheduler, BFQ adds some overhead to per-I/O-request
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) processing. To give an idea of this overhead, the total,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) single-lock-protected, per-request processing time of BFQ---i.e., the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) sum of the execution times of the request insertion, dispatch and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) completion hooks---is, e.g., 1.9 us on an Intel Core i7-2760QM@2.40GHz
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) (dated CPU for notebooks; time measured with simple code
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) instrumentation, and using the throughput-sync.sh script of the S
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) suite [1], in performance-profiling mode). To put this result into
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) context, the total, single-lock-protected, per-request execution time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) of the lightest I/O scheduler available in blk-mq, mq-deadline, is 0.7
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) us (mq-deadline is ~800 LOC, against ~10500 LOC for BFQ).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) Scheduling overhead further limits the maximum IOPS that a CPU can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) process (already limited by the execution of the rest of the I/O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) stack). To give an idea of the limits with BFQ, on slow or average
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) CPUs, here are, first, the limits of BFQ for three different CPUs, on,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) respectively, an average laptop, an old desktop, and a cheap embedded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) system, in case full hierarchical support is enabled (i.e.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) set (Section 4-2):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) - Intel i7-4850HQ: 400 KIOPS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) - AMD A8-3850: 250 KIOPS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) - ARM CortexTM-A53 Octa-core: 80 KIOPS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) If CONFIG_BFQ_CGROUP_DEBUG is set (and of course full hierarchical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) support is enabled), then the sustainable throughput with BFQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) decreases, because all blkio.bfq* statistics are created and updated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) (Section 4-2). For BFQ, this leads to the following maximum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) sustainable throughputs, on the same systems as above:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) - Intel i7-4850HQ: 310 KIOPS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) - AMD A8-3850: 200 KIOPS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) - ARM CortexTM-A53 Octa-core: 56 KIOPS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) BFQ works for multi-queue devices too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) .. The table of contents follow. Impatients can just jump to Section 3.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) .. CONTENTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) 1. When may BFQ be useful?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) 1-1 Personal systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) 1-2 Server systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) 2. How does BFQ work?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) 3. What are BFQ's tunables and how to properly configure BFQ?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) 4. BFQ group scheduling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) 4-1 Service guarantees provided
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) 4-2 Interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) 1. When may BFQ be useful?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) ==========================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) BFQ provides the following benefits on personal and server systems.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) 1-1 Personal systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) --------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) Low latency for interactive applications
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) Regardless of the actual background workload, BFQ guarantees that, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) interactive tasks, the storage device is virtually as responsive as if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) it was idle. For example, even if one or more of the following
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) background workloads are being executed:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) - one or more large files are being read, written or copied,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) - a tree of source files is being compiled,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) - one or more virtual machines are performing I/O,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) - a software update is in progress,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) - indexing daemons are scanning filesystems and updating their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) databases,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) starting an application or loading a file from within an application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) takes about the same time as if the storage device was idle. As a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) applications experience high latencies, or even become unresponsive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) until the background workload terminates (also on SSDs).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) Low latency for soft real-time applications
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) Also soft real-time applications, such as audio and video
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) players/streamers, enjoy a low latency and a low drop rate, regardless
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) of the background I/O workload. As a consequence, these applications
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) do not suffer from almost any glitch due to the background workload.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) Higher speed for code-development tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) If some additional workload happens to be executed in parallel, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) BFQ executes the I/O-related components of typical code-development
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) tasks (compilation, checkout, merge, ...) much more quickly than CFQ,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) NOOP or DEADLINE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) High throughput
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) ^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) up to 150% higher throughput than DEADLINE and NOOP, with all the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) sequential workloads considered in our tests. With random workloads,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) and with all the workloads on flash-based devices, BFQ achieves,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) instead, about the same throughput as the other schedulers.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) Strong fairness, bandwidth and delay guarantees
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129) BFQ distributes the device throughput, and not just the device time,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) among I/O-bound applications in proportion their weights, with any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) workload and regardless of the device parameters. From these bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) guarantees, it is possible to compute tight per-I/O-request delay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) guarantees by a simple formula. If not configured for strict service
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) guarantees, BFQ switches to time-based resource sharing (only) for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) applications that would otherwise cause a throughput loss.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) 1-2 Server systems
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) ------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) Most benefits for server systems follow from the same service
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) properties as above. In particular, regardless of whether additional,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) possibly heavy workloads are being served, BFQ guarantees:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) * audio and video-streaming with zero or very low jitter and drop
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) rate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) * fast retrieval of WEB pages and embedded objects;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) * real-time recording of data in live-dumping applications (e.g.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) packet logging);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) * responsiveness in local and remote access to a server.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) 2. How does BFQ work?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) BFQ is a proportional-share I/O scheduler, whose general structure,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) plus a lot of code, are borrowed from CFQ.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) - Each process doing I/O on a device is associated with a weight and a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162) `(bfq_)queue`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) - BFQ grants exclusive access to the device, for a while, to one queue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) (process) at a time, and implements this service model by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) associating every queue with a budget, measured in number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) sectors.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) - After a queue is granted access to the device, the budget of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) queue is decremented, on each request dispatch, by the size of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173) - The in-service queue is expired, i.e., its service is suspended,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) only if one of the following events occurs: 1) the queue finishes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) its budget, 2) the queue empties, 3) a "budget timeout" fires.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) - The budget timeout prevents processes doing random I/O from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) holding the device for too long and dramatically reducing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) throughput.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) - Actually, as in CFQ, a queue associated with a process issuing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182) sync requests may not be expired immediately when it empties. In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) contrast, BFQ may idle the device for a short time interval,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184) giving the process the chance to go on being served if it issues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) a new request in time. Device idling typically boosts the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186) throughput on rotational devices and on non-queueing flash-based
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) devices, if processes do synchronous and sequential I/O. In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) addition, under BFQ, device idling is also instrumental in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) guaranteeing the desired throughput fraction to processes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190) issuing sync requests (see the description of the slice_idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) tunable in this document, or [1, 2], for more details).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) - With respect to idling for service guarantees, if several
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) processes are competing for the device at the same time, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) all processes and groups have the same weight, then BFQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) guarantees the expected throughput distribution without ever
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) idling the device. Throughput is thus as high as possible in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) this common scenario.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) - On flash-based storage with internal queueing of commands
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) (typically NCQ), device idling happens to be always detrimental
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) for throughput. So, with these devices, BFQ performs idling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) only when strictly needed for service guarantees, i.e., for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) guaranteeing low latency or fairness. In these cases, overall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) throughput may be sub-optimal. No solution currently exists to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) provide both strong service guarantees and optimal throughput
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) on devices with internal queueing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) - If low-latency mode is enabled (default configuration), BFQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) executes some special heuristics to detect interactive and soft
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211) real-time applications (e.g., video or audio players/streamers),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) and to reduce their latency. The most important action taken to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) achieve this goal is to give to the queues associated with these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) applications more than their fair share of the device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) throughput. For brevity, we call just "weight-raising" the whole
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) sets of actions taken by BFQ to privilege these queues. In
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) particular, BFQ provides a milder form of weight-raising for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218) interactive applications, and a stronger form for soft real-time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) applications.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221) - BFQ automatically deactivates idling for queues born in a burst of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) queue creations. In fact, these queues are usually associated with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) the processes of applications and services that benefit mostly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) from a high throughput. Examples are systemd during boot, or git
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225) grep.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) - As CFQ, BFQ merges queues performing interleaved I/O, i.e.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) performing random I/O that becomes mostly sequential if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) merged. Differently from CFQ, BFQ achieves this goal with a more
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) reactive mechanism, called Early Queue Merge (EQM). EQM is so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) responsive in detecting interleaved I/O (cooperating processes),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) that it enables BFQ to achieve a high throughput, by queue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) merging, even for queues for which CFQ needs a different
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) mechanism, preemption, to get a high throughput. As such EQM is a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) unified mechanism to achieve a high throughput with interleaved
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) I/O.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) - Queues are scheduled according to a variant of WF2Q+, named
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) B-WF2Q+, and implemented using an augmented rb-tree to preserve an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) O(log N) overall complexity. See [2] for more details. B-WF2Q+ is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) also ready for hierarchical scheduling, details in Section 4.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) - B-WF2Q+ guarantees a tight deviation with respect to an ideal,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) perfectly fair, and smooth service. In particular, B-WF2Q+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) guarantees that each queue receives a fraction of the device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246) throughput proportional to its weight, even if the throughput
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) fluctuates, and regardless of: the device parameters, the current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) workload and the budgets assigned to the queue.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) - The last, budget-independence, property (although probably
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) counterintuitive in the first place) is definitely beneficial, for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) the following reasons:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) - First, with any proportional-share scheduler, the maximum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) deviation with respect to an ideal service is proportional to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256) the maximum budget (slice) assigned to queues. As a consequence,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) BFQ can keep this deviation tight not only because of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258) accurate service of B-WF2Q+, but also because BFQ *does not*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) need to assign a larger budget to a queue to let the queue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) receive a higher fraction of the device throughput.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) - Second, BFQ is free to choose, for every process (queue), the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) budget that best fits the needs of the process, or best
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) leverages the I/O pattern of the process. In particular, BFQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) updates queue budgets with a simple feedback-loop algorithm that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) allows a high throughput to be achieved, while still providing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267) tight latency guarantees to time-sensitive applications. When
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) the in-service queue expires, this algorithm computes the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) budget of the queue so as to:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) - Let large budgets be eventually assigned to the queues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) associated with I/O-bound applications performing sequential
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) I/O: in fact, the longer these applications are served once
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) got access to the device, the higher the throughput is.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) - Let small budgets be eventually assigned to the queues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) associated with time-sensitive applications (which typically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) perform sporadic and short I/O), because, the smaller the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) budget assigned to a queue waiting for service is, the sooner
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282) - If several processes are competing for the device at the same time,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) but all processes and groups have the same weight, then BFQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) guarantees the expected throughput distribution without ever idling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285) the device. It uses preemption instead. Throughput is then much
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) higher in this common scenario.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) - ioprio classes are served in strict priority order, i.e.,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) lower-priority queues are not served as long as there are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) higher-priority queues. Among queues in the same class, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291) bandwidth is distributed in proportion to the weight of each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) queue. A very thin extra bandwidth is however guaranteed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) the Idle class, to prevent it from starving.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296) 3. What are BFQ's tunables and how to properly configure BFQ?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) =============================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) Most BFQ tunables affect service guarantees (basically latency and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) fairness) and throughput. For full details on how to choose the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) desired tradeoff between service guarantees and throughput, see the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) parameters slice_idle, strict_guarantees and low_latency. For details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303) on how to maximise throughput, see slice_idle, timeout_sync and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) max_budget. The other performance-related parameters have been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) inherited from, and have been preserved mostly for compatibility with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) CFQ. So far, no performance improvement has been reported after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) changing the latter parameters in BFQ.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) In particular, the tunables back_seek-max, back_seek_penalty,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) fifo_expire_async and fifo_expire_sync below are the same as in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311) CFQ. Their description is just copied from that for CFQ. Some
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) considerations in the description of slice_idle are copied from CFQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) per-process ioprio and weight
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) -----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) Unless the cgroups interface is used (see "4. BFQ group scheduling"),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) weights can be assigned to processes only indirectly, through I/O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) priorities, and according to the relation:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321) weight = (IOPRIO_BE_NR - ioprio) * 10.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) Beware that, if low-latency is set, then BFQ automatically raises the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) weight of the queues associated with interactive and soft real-time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) applications. Unset this tunable if you need/want to control weights.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) slice_idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) This parameter specifies how long BFQ should idle for next I/O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) request, when certain sync BFQ queues become empty. By default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) slice_idle is a non-zero value. Idling has a double purpose: boosting
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) throughput and making sure that the desired throughput distribution is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334) respected (see the description of how BFQ works, and, if needed, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) papers referred there).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) As for throughput, idling can be very helpful on highly seeky media
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338) like single spindle SATA/SAS disks where we can cut down on overall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) number of seeks and see improved throughput.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341) Setting slice_idle to 0 will remove all the idling on queues and one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) should see an overall improved throughput on faster storage devices
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) like multiple SATA/SAS disks in hardware RAID configuration, as well
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) as flash-based storage with internal command queueing (and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345) parallelism).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) So depending on storage and workload, it might be useful to set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) slice_idle=0. In general for SATA/SAS disks and software RAID of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) SATA/SAS disks keeping slice_idle enabled should be useful. For any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) configurations where there are multiple spindles behind single LUN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) (Host based hardware RAID controller or for storage arrays), or with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) flash-based fast storage, setting slice_idle=0 might end up in better
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) throughput and acceptable latencies.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) Idling is however necessary to have service guarantees enforced in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) case of differentiated weights or differentiated I/O-request lengths.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357) To see why, suppose that a given BFQ queue A must get several I/O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) requests served for each request served for another queue B. Idling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) ensures that, if A makes a new I/O request slightly after becoming
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) empty, then no request of B is dispatched in the middle, and thus A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) does not lose the possibility to get more than one request dispatched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362) before the next request of B is dispatched. Note that idling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) guarantees the desired differentiated treatment of queues only in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) terms of I/O-request dispatches. To guarantee that the actual service
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) order then corresponds to the dispatch order, the strict_guarantees
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366) tunable must be set too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) There is an important flipside for idling: apart from the above cases
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) where it is beneficial also for throughput, idling can severely impact
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) throughput. One important case is random workload. Because of this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) issue, BFQ tends to avoid idling as much as possible, when it is not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) beneficial also for throughput (as detailed in Section 2). As a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373) consequence of this behavior, and of further issues described for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) strict_guarantees tunable, short-term service guarantees may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) occasionally violated. And, in some cases, these guarantees may be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) more important than guaranteeing maximum throughput. For example, in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) video playing/streaming, a very low drop rate may be more important
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378) than maximum throughput. In these cases, consider setting the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) strict_guarantees parameter.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) slice_idle_us
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) Controls the same tuning parameter as slice_idle, but in microseconds.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) Either tunable can be used to set idling behavior. Afterwards, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) other tunable will reflect the newly set value in sysfs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) strict_guarantees
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) If this parameter is set (default: unset), then BFQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) - always performs idling when the in-service queue becomes empty;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) - forces the device to serve one I/O request at a time, by dispatching a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) new request only if there is no outstanding request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) In the presence of differentiated weights or I/O-request sizes, both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) the above conditions are needed to guarantee that every BFQ queue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) receives its allotted share of the bandwidth. The first condition is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) needed for the reasons explained in the description of the slice_idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) tunable. The second condition is needed because all modern storage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) devices reorder internally-queued requests, which may trivially break
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) the service guarantees enforced by the I/O scheduler.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) Setting strict_guarantees may evidently affect throughput.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) back_seek_max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) This specifies, given in Kbytes, the maximum "distance" for backward seeking.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) The distance is the amount of space from the current head location to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) sectors that are backward in terms of distance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) This parameter allows the scheduler to anticipate requests in the "backward"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416) direction and consider them as being the "next" if they are within this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) distance from the current head location.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419) back_seek_penalty
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) This parameter is used to compute the cost of backward seeking. If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) backward distance of request is just 1/back_seek_penalty from a "front"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) request, then the seeking cost of two requests is considered equivalent.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) So scheduler will not bias toward one or the other request (otherwise scheduler
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427) will bias toward front request). Default value of back_seek_penalty is 2.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) fifo_expire_async
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432) This parameter is used to set the timeout of asynchronous requests. Default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) value of this is 248ms.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) fifo_expire_sync
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) ----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) This parameter is used to set the timeout of synchronous requests. Default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) value of this is 124ms. In case to favor synchronous requests over asynchronous
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) one, this value should be decreased relative to fifo_expire_async.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) low_latency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) This parameter is used to enable/disable BFQ's low latency mode. By
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446) default, low latency mode is enabled. If enabled, interactive and soft
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) real-time applications are privileged and experience a lower latency,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) as explained in more detail in the description of how BFQ works.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) DISABLE this mode if you need full control on bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451) distribution. In fact, if it is enabled, then BFQ automatically
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) increases the bandwidth share of privileged applications, as the main
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) means to guarantee a lower latency to them.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) In addition, as already highlighted at the beginning of this document,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) DISABLE this mode if your only goal is to achieve a high throughput.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) In fact, privileging the I/O of some application over the rest may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) entail a lower throughput. To achieve the highest-possible throughput
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) on a non-rotational device, setting slice_idle to 0 may be needed too
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) (at the cost of giving up any strong guarantee on fairness and low
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) latency).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463) timeout_sync
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) Maximum amount of device time that can be given to a task (queue) once
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) it has been selected for service. On devices with costly seeks,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) increasing this time usually increases maximum throughput. On the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) opposite end, increasing this time coarsens the granularity of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) short-term bandwidth and latency guarantees, especially if the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) following parameter is set to zero.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) max_budget
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) ----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) Maximum amount of service, measured in sectors, that can be provided
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) to a BFQ queue once it is set in service (of course within the limits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) of the above timeout). According to what said in the description of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479) the algorithm, larger values increase the throughput in proportion to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) the percentage of sequential I/O requests issued. The price of larger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) values is that they coarsen the granularity of short-term bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) and latency guarantees.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) The default value is 0, which enables auto-tuning: BFQ sets max_budget
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) to the maximum number of sectors that can be served during
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) timeout_sync, according to the estimated peak rate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) For specific devices, some users have occasionally reported to have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) reached a higher throughput by setting max_budget explicitly, i.e., by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) setting max_budget to a higher value than 0. In particular, they have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) set max_budget to higher values than those to which BFQ would have set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) it with auto-tuning. An alternative way to achieve this goal is to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493) just increase the value of timeout_sync, leaving max_budget equal to 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) 4. Group scheduling with BFQ
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) ============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) BFQ supports both cgroups-v1 and cgroups-v2 io controllers, namely
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) blkio and io. In particular, BFQ supports weight-based proportional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500) share. To activate cgroups support, set BFQ_GROUP_IOSCHED.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) 4-1 Service guarantees provided
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) -------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) With BFQ, proportional share means true proportional share of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) device bandwidth, according to group weights. For example, a group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) with weight 200 gets twice the bandwidth, and not just twice the time,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) of a group with weight 100.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) BFQ supports hierarchies (group trees) of any depth. Bandwidth is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) distributed among groups and processes in the expected way: for each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512) group, the children of the group share the whole bandwidth of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) group in proportion to their weights. In particular, this implies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) that, for each leaf group, every process of the group receives the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) same share of the whole group bandwidth, unless the ioprio of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) process is modified.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518) The resource-sharing guarantee for a group may partially or totally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) switch from bandwidth to time, if providing bandwidth guarantees to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) the group lowers the throughput too much. This switch occurs on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) per-process basis: if a process of a leaf group causes throughput loss
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) if served in such a way to receive its share of the bandwidth, then
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) BFQ switches back to just time-based proportional share for that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) 4-2 Interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529) To get proportional sharing of bandwidth with BFQ for a given device,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) BFQ must of course be the active scheduler for that device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) Within each group directory, the names of the files associated with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) BFQ-specific cgroup parameters and stats begin with the "bfq."
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) prefix. So, with cgroups-v1 or cgroups-v2, the full prefix for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) parameter to set the weight of a group with BFQ is blkio.bfq.weight
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537) or io.bfq.weight.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) As for cgroups-v1 (blkio controller), the exact set of stat files
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) created, and kept up-to-date by bfq, depends on whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542) the stat files documented in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) Documentation/admin-guide/cgroup-v1/blkio-controller.rst. If, instead,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546) blkio.bfq.io_service_bytes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) blkio.bfq.io_service_bytes_recursive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548) blkio.bfq.io_serviced
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) blkio.bfq.io_serviced_recursive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) The value of CONFIG_BFQ_CGROUP_DEBUG greatly influences the maximum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) throughput sustainable with bfq, because updating the blkio.bfq.*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) stats is rather costly, especially for some of the stats enabled by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554) CONFIG_BFQ_CGROUP_DEBUG.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556) Parameters to set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) -----------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) For each group, there is only the following parameter to set.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) group inside its parent. Available values: 1..1000 (default 100). The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563) linear mapping between ioprio and weights, described at the beginning
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) of the tunable section, is still valid, but all weights higher than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565) IOPRIO_BE_NR*10 are mapped to ioprio 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) Recall that, if low-latency is set, then BFQ automatically raises the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) weight of the queues associated with interactive and soft real-time
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569) applications. Unset this tunable if you need/want to control weights.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) [1]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) Scheduler", Proceedings of the First Workshop on Mobile System
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) Technologies (MST-2015), May 2015.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) [2]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) P. Valente and M. Andreolini, "Improving Application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) the 5th Annual International Systems and Storage Conference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) (SYSTOR '12), June 2012.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) Slightly extended version:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-results.pdf
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) [3]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) https://github.com/Algodev-github/S