Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    1) /* SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    2)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    3)  * IO cost model based controller.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    4)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    5)  * Copyright (C) 2019 Tejun Heo <tj@kernel.org>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    6)  * Copyright (C) 2019 Andy Newell <newella@fb.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    7)  * Copyright (C) 2019 Facebook
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    8)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    9)  * One challenge of controlling IO resources is the lack of trivially
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   10)  * observable cost metric.  This is distinguished from CPU and memory where
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   11)  * wallclock time and the number of bytes can serve as accurate enough
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   12)  * approximations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   13)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   14)  * Bandwidth and iops are the most commonly used metrics for IO devices but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   15)  * depending on the type and specifics of the device, different IO patterns
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   16)  * easily lead to multiple orders of magnitude variations rendering them
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   17)  * useless for the purpose of IO capacity distribution.  While on-device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   18)  * time, with a lot of clutches, could serve as a useful approximation for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   19)  * non-queued rotational devices, this is no longer viable with modern
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   20)  * devices, even the rotational ones.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   21)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   22)  * While there is no cost metric we can trivially observe, it isn't a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   23)  * complete mystery.  For example, on a rotational device, seek cost
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   24)  * dominates while a contiguous transfer contributes a smaller amount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   25)  * proportional to the size.  If we can characterize at least the relative
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   26)  * costs of these different types of IOs, it should be possible to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   27)  * implement a reasonable work-conserving proportional IO resource
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   28)  * distribution.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   29)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   30)  * 1. IO Cost Model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   31)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   32)  * IO cost model estimates the cost of an IO given its basic parameters and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   33)  * history (e.g. the end sector of the last IO).  The cost is measured in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   34)  * device time.  If a given IO is estimated to cost 10ms, the device should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   35)  * be able to process ~100 of those IOs in a second.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   36)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   37)  * Currently, there's only one builtin cost model - linear.  Each IO is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   38)  * classified as sequential or random and given a base cost accordingly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   39)  * On top of that, a size cost proportional to the length of the IO is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   40)  * added.  While simple, this model captures the operational
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   41)  * characteristics of a wide varienty of devices well enough.  Default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   42)  * paramters for several different classes of devices are provided and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   43)  * parameters can be configured from userspace via
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   44)  * /sys/fs/cgroup/io.cost.model.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   45)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   46)  * If needed, tools/cgroup/iocost_coef_gen.py can be used to generate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   47)  * device-specific coefficients.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   48)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   49)  * 2. Control Strategy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   50)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   51)  * The device virtual time (vtime) is used as the primary control metric.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   52)  * The control strategy is composed of the following three parts.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   53)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   54)  * 2-1. Vtime Distribution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   55)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   56)  * When a cgroup becomes active in terms of IOs, its hierarchical share is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   57)  * calculated.  Please consider the following hierarchy where the numbers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   58)  * inside parentheses denote the configured weights.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   59)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   60)  *           root
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   61)  *         /       \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   62)  *      A (w:100)  B (w:300)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   63)  *      /       \
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   64)  *  A0 (w:100)  A1 (w:100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   65)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   66)  * If B is idle and only A0 and A1 are actively issuing IOs, as the two are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   67)  * of equal weight, each gets 50% share.  If then B starts issuing IOs, B
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   68)  * gets 300/(100+300) or 75% share, and A0 and A1 equally splits the rest,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   69)  * 12.5% each.  The distribution mechanism only cares about these flattened
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   70)  * shares.  They're called hweights (hierarchical weights) and always add
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   71)  * upto 1 (WEIGHT_ONE).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   72)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   73)  * A given cgroup's vtime runs slower in inverse proportion to its hweight.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   74)  * For example, with 12.5% weight, A0's time runs 8 times slower (100/12.5)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   75)  * against the device vtime - an IO which takes 10ms on the underlying
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   76)  * device is considered to take 80ms on A0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   77)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   78)  * This constitutes the basis of IO capacity distribution.  Each cgroup's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   79)  * vtime is running at a rate determined by its hweight.  A cgroup tracks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   80)  * the vtime consumed by past IOs and can issue a new IO iff doing so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   81)  * wouldn't outrun the current device vtime.  Otherwise, the IO is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   82)  * suspended until the vtime has progressed enough to cover it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   83)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   84)  * 2-2. Vrate Adjustment
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   85)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   86)  * It's unrealistic to expect the cost model to be perfect.  There are too
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   87)  * many devices and even on the same device the overall performance
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   88)  * fluctuates depending on numerous factors such as IO mixture and device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   89)  * internal garbage collection.  The controller needs to adapt dynamically.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   90)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   91)  * This is achieved by adjusting the overall IO rate according to how busy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   92)  * the device is.  If the device becomes overloaded, we're sending down too
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   93)  * many IOs and should generally slow down.  If there are waiting issuers
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   94)  * but the device isn't saturated, we're issuing too few and should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   95)  * generally speed up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   96)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   97)  * To slow down, we lower the vrate - the rate at which the device vtime
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   98)  * passes compared to the wall clock.  For example, if the vtime is running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   99)  * at the vrate of 75%, all cgroups added up would only be able to issue
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  100)  * 750ms worth of IOs per second, and vice-versa for speeding up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  101)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  102)  * Device business is determined using two criteria - rq wait and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  103)  * completion latencies.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  104)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  105)  * When a device gets saturated, the on-device and then the request queues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  106)  * fill up and a bio which is ready to be issued has to wait for a request
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  107)  * to become available.  When this delay becomes noticeable, it's a clear
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  108)  * indication that the device is saturated and we lower the vrate.  This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  109)  * saturation signal is fairly conservative as it only triggers when both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  110)  * hardware and software queues are filled up, and is used as the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  111)  * busy signal.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  112)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  113)  * As devices can have deep queues and be unfair in how the queued commands
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  114)  * are executed, soley depending on rq wait may not result in satisfactory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  115)  * control quality.  For a better control quality, completion latency QoS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  116)  * parameters can be configured so that the device is considered saturated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  117)  * if N'th percentile completion latency rises above the set point.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  118)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  119)  * The completion latency requirements are a function of both the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  120)  * underlying device characteristics and the desired IO latency quality of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  121)  * service.  There is an inherent trade-off - the tighter the latency QoS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  122)  * the higher the bandwidth lossage.  Latency QoS is disabled by default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  123)  * and can be set through /sys/fs/cgroup/io.cost.qos.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  124)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  125)  * 2-3. Work Conservation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  126)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  127)  * Imagine two cgroups A and B with equal weights.  A is issuing a small IO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  128)  * periodically while B is sending out enough parallel IOs to saturate the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  129)  * device on its own.  Let's say A's usage amounts to 100ms worth of IO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  130)  * cost per second, i.e., 10% of the device capacity.  The naive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  131)  * distribution of half and half would lead to 60% utilization of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  132)  * device, a significant reduction in the total amount of work done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  133)  * compared to free-for-all competition.  This is too high a cost to pay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  134)  * for IO control.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  135)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  136)  * To conserve the total amount of work done, we keep track of how much
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  137)  * each active cgroup is actually using and yield part of its weight if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  138)  * there are other cgroups which can make use of it.  In the above case,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  139)  * A's weight will be lowered so that it hovers above the actual usage and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  140)  * B would be able to use the rest.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  141)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  142)  * As we don't want to penalize a cgroup for donating its weight, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  143)  * surplus weight adjustment factors in a margin and has an immediate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  144)  * snapback mechanism in case the cgroup needs more IO vtime for itself.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  145)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  146)  * Note that adjusting down surplus weights has the same effects as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  147)  * accelerating vtime for other cgroups and work conservation can also be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  148)  * implemented by adjusting vrate dynamically.  However, squaring who can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  149)  * donate and should take back how much requires hweight propagations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  150)  * anyway making it easier to implement and understand as a separate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  151)  * mechanism.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  152)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  153)  * 3. Monitoring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  154)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  155)  * Instead of debugfs or other clumsy monitoring mechanisms, this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  156)  * controller uses a drgn based monitoring script -
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  157)  * tools/cgroup/iocost_monitor.py.  For details on drgn, please see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  158)  * https://github.com/osandov/drgn.  The ouput looks like the following.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  159)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  160)  *  sdb RUN   per=300ms cur_per=234.218:v203.695 busy= +1 vrate= 62.12%
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  161)  *                 active      weight      hweight% inflt% dbt  delay usages%
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  162)  *  test/a              *    50/   50  33.33/ 33.33  27.65   2  0*041 033:033:033
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  163)  *  test/b              *   100/  100  66.67/ 66.67  17.56   0  0*000 066:079:077
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  164)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  165)  * - per	: Timer period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  166)  * - cur_per	: Internal wall and device vtime clock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  167)  * - vrate	: Device virtual time rate against wall clock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  168)  * - weight	: Surplus-adjusted and configured weights
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  169)  * - hweight	: Surplus-adjusted and configured hierarchical weights
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  170)  * - inflt	: The percentage of in-flight IO cost at the end of last period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  171)  * - del_ms	: Deferred issuer delay induction level and duration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  172)  * - usages	: Usage history
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  173)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  174) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  175) #include <linux/kernel.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  176) #include <linux/module.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  177) #include <linux/timer.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  178) #include <linux/time64.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  179) #include <linux/parser.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  180) #include <linux/sched/signal.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  181) #include <linux/blk-cgroup.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  182) #include <asm/local.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  183) #include <asm/local64.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  184) #include "blk-rq-qos.h"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  185) #include "blk-stat.h"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  186) #include "blk-wbt.h"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  187) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  188) #ifdef CONFIG_TRACEPOINTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  189) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  190) /* copied from TRACE_CGROUP_PATH, see cgroup-internal.h */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  191) #define TRACE_IOCG_PATH_LEN 1024
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  192) static DEFINE_SPINLOCK(trace_iocg_path_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  193) static char trace_iocg_path[TRACE_IOCG_PATH_LEN];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  195) #define TRACE_IOCG_PATH(type, iocg, ...)					\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  196) 	do {									\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  197) 		unsigned long flags;						\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  198) 		if (trace_iocost_##type##_enabled()) {				\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  199) 			spin_lock_irqsave(&trace_iocg_path_lock, flags);	\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  200) 			cgroup_path(iocg_to_blkg(iocg)->blkcg->css.cgroup,	\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  201) 				    trace_iocg_path, TRACE_IOCG_PATH_LEN);	\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  202) 			trace_iocost_##type(iocg, trace_iocg_path,		\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  203) 					      ##__VA_ARGS__);			\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  204) 			spin_unlock_irqrestore(&trace_iocg_path_lock, flags);	\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  205) 		}								\
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  206) 	} while (0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  207) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  208) #else	/* CONFIG_TRACE_POINTS */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  209) #define TRACE_IOCG_PATH(type, iocg, ...)	do { } while (0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  210) #endif	/* CONFIG_TRACE_POINTS */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  211) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  212) enum {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  213) 	MILLION			= 1000000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  214) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  215) 	/* timer period is calculated from latency requirements, bound it */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  216) 	MIN_PERIOD		= USEC_PER_MSEC,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  217) 	MAX_PERIOD		= USEC_PER_SEC,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  218) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  219) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  220) 	 * iocg->vtime is targeted at 50% behind the device vtime, which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  221) 	 * serves as its IO credit buffer.  Surplus weight adjustment is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  222) 	 * immediately canceled if the vtime margin runs below 10%.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  223) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  224) 	MARGIN_MIN_PCT		= 10,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  225) 	MARGIN_LOW_PCT		= 20,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  226) 	MARGIN_TARGET_PCT	= 50,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  227) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  228) 	INUSE_ADJ_STEP_PCT	= 25,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  229) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  230) 	/* Have some play in timer operations */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  231) 	TIMER_SLACK_PCT		= 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  232) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  233) 	/* 1/64k is granular enough and can easily be handled w/ u32 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  234) 	WEIGHT_ONE		= 1 << 16,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  235) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  236) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  237) 	 * As vtime is used to calculate the cost of each IO, it needs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  238) 	 * be fairly high precision.  For example, it should be able to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  239) 	 * represent the cost of a single page worth of discard with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  240) 	 * suffificient accuracy.  At the same time, it should be able to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  241) 	 * represent reasonably long enough durations to be useful and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  242) 	 * convenient during operation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  243) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  244) 	 * 1s worth of vtime is 2^37.  This gives us both sub-nanosecond
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  245) 	 * granularity and days of wrap-around time even at extreme vrates.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  246) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  247) 	VTIME_PER_SEC_SHIFT	= 37,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  248) 	VTIME_PER_SEC		= 1LLU << VTIME_PER_SEC_SHIFT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  249) 	VTIME_PER_USEC		= VTIME_PER_SEC / USEC_PER_SEC,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  250) 	VTIME_PER_NSEC		= VTIME_PER_SEC / NSEC_PER_SEC,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  251) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  252) 	/* bound vrate adjustments within two orders of magnitude */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  253) 	VRATE_MIN_PPM		= 10000,	/* 1% */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  254) 	VRATE_MAX_PPM		= 100000000,	/* 10000% */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  255) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  256) 	VRATE_MIN		= VTIME_PER_USEC * VRATE_MIN_PPM / MILLION,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  257) 	VRATE_CLAMP_ADJ_PCT	= 4,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  258) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  259) 	/* if IOs end up waiting for requests, issue less */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  260) 	RQ_WAIT_BUSY_PCT	= 5,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  261) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  262) 	/* unbusy hysterisis */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  263) 	UNBUSY_THR_PCT		= 75,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  264) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  265) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  266) 	 * The effect of delay is indirect and non-linear and a huge amount of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  267) 	 * future debt can accumulate abruptly while unthrottled. Linearly scale
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  268) 	 * up delay as debt is going up and then let it decay exponentially.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  269) 	 * This gives us quick ramp ups while delay is accumulating and long
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  270) 	 * tails which can help reducing the frequency of debt explosions on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  271) 	 * unthrottle. The parameters are experimentally determined.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  272) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  273) 	 * The delay mechanism provides adequate protection and behavior in many
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  274) 	 * cases. However, this is far from ideal and falls shorts on both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  275) 	 * fronts. The debtors are often throttled too harshly costing a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  276) 	 * significant level of fairness and possibly total work while the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  277) 	 * protection against their impacts on the system can be choppy and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  278) 	 * unreliable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  279) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  280) 	 * The shortcoming primarily stems from the fact that, unlike for page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  281) 	 * cache, the kernel doesn't have well-defined back-pressure propagation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  282) 	 * mechanism and policies for anonymous memory. Fully addressing this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  283) 	 * issue will likely require substantial improvements in the area.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  284) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  285) 	MIN_DELAY_THR_PCT	= 500,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  286) 	MAX_DELAY_THR_PCT	= 25000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  287) 	MIN_DELAY		= 250,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  288) 	MAX_DELAY		= 250 * USEC_PER_MSEC,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  289) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  290) 	/* halve debts if avg usage over 100ms is under 50% */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  291) 	DFGV_USAGE_PCT		= 50,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  292) 	DFGV_PERIOD		= 100 * USEC_PER_MSEC,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  294) 	/* don't let cmds which take a very long time pin lagging for too long */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  295) 	MAX_LAGGING_PERIODS	= 10,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  296) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  297) 	/* switch iff the conditions are met for longer than this */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  298) 	AUTOP_CYCLE_NSEC	= 10LLU * NSEC_PER_SEC,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  299) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  300) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  301) 	 * Count IO size in 4k pages.  The 12bit shift helps keeping
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  302) 	 * size-proportional components of cost calculation in closer
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  303) 	 * numbers of digits to per-IO cost components.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  304) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  305) 	IOC_PAGE_SHIFT		= 12,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  306) 	IOC_PAGE_SIZE		= 1 << IOC_PAGE_SHIFT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  307) 	IOC_SECT_TO_PAGE_SHIFT	= IOC_PAGE_SHIFT - SECTOR_SHIFT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  308) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  309) 	/* if apart further than 16M, consider randio for linear model */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  310) 	LCOEF_RANDIO_PAGES	= 4096,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  311) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  312) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  313) enum ioc_running {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  314) 	IOC_IDLE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  315) 	IOC_RUNNING,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  316) 	IOC_STOP,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  317) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  318) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  319) /* io.cost.qos controls including per-dev enable of the whole controller */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  320) enum {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  321) 	QOS_ENABLE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  322) 	QOS_CTRL,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  323) 	NR_QOS_CTRL_PARAMS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  324) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  325) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  326) /* io.cost.qos params */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  327) enum {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  328) 	QOS_RPPM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  329) 	QOS_RLAT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  330) 	QOS_WPPM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  331) 	QOS_WLAT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  332) 	QOS_MIN,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  333) 	QOS_MAX,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  334) 	NR_QOS_PARAMS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  335) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  336) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  337) /* io.cost.model controls */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  338) enum {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  339) 	COST_CTRL,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  340) 	COST_MODEL,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  341) 	NR_COST_CTRL_PARAMS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  342) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  343) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  344) /* builtin linear cost model coefficients */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  345) enum {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  346) 	I_LCOEF_RBPS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  347) 	I_LCOEF_RSEQIOPS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  348) 	I_LCOEF_RRANDIOPS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  349) 	I_LCOEF_WBPS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  350) 	I_LCOEF_WSEQIOPS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  351) 	I_LCOEF_WRANDIOPS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  352) 	NR_I_LCOEFS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  353) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  354) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  355) enum {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  356) 	LCOEF_RPAGE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  357) 	LCOEF_RSEQIO,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  358) 	LCOEF_RRANDIO,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  359) 	LCOEF_WPAGE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  360) 	LCOEF_WSEQIO,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  361) 	LCOEF_WRANDIO,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  362) 	NR_LCOEFS,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  363) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  364) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  365) enum {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  366) 	AUTOP_INVALID,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  367) 	AUTOP_HDD,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  368) 	AUTOP_SSD_QD1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  369) 	AUTOP_SSD_DFL,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  370) 	AUTOP_SSD_FAST,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  371) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  372) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  373) struct ioc_gq;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  374) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  375) struct ioc_params {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  376) 	u32				qos[NR_QOS_PARAMS];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  377) 	u64				i_lcoefs[NR_I_LCOEFS];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  378) 	u64				lcoefs[NR_LCOEFS];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  379) 	u32				too_fast_vrate_pct;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  380) 	u32				too_slow_vrate_pct;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  381) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  382) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  383) struct ioc_margins {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  384) 	s64				min;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  385) 	s64				low;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  386) 	s64				target;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  387) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  388) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  389) struct ioc_missed {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  390) 	local_t				nr_met;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  391) 	local_t				nr_missed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  392) 	u32				last_met;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  393) 	u32				last_missed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  394) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  395) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  396) struct ioc_pcpu_stat {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  397) 	struct ioc_missed		missed[2];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  398) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  399) 	local64_t			rq_wait_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  400) 	u64				last_rq_wait_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  401) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  402) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  403) /* per device */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  404) struct ioc {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  405) 	struct rq_qos			rqos;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  406) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  407) 	bool				enabled;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  408) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  409) 	struct ioc_params		params;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  410) 	struct ioc_margins		margins;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  411) 	u32				period_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  412) 	u32				timer_slack_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  413) 	u64				vrate_min;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  414) 	u64				vrate_max;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  415) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  416) 	spinlock_t			lock;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  417) 	struct timer_list		timer;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  418) 	struct list_head		active_iocgs;	/* active cgroups */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  419) 	struct ioc_pcpu_stat __percpu	*pcpu_stat;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  420) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  421) 	enum ioc_running		running;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  422) 	atomic64_t			vtime_rate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  423) 	u64				vtime_base_rate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  424) 	s64				vtime_err;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  425) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  426) 	seqcount_spinlock_t		period_seqcount;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  427) 	u64				period_at;	/* wallclock starttime */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  428) 	u64				period_at_vtime; /* vtime starttime */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  430) 	atomic64_t			cur_period;	/* inc'd each period */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  431) 	int				busy_level;	/* saturation history */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  432) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  433) 	bool				weights_updated;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  434) 	atomic_t			hweight_gen;	/* for lazy hweights */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  435) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  436) 	/* debt forgivness */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  437) 	u64				dfgv_period_at;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  438) 	u64				dfgv_period_rem;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  439) 	u64				dfgv_usage_us_sum;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  440) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  441) 	u64				autop_too_fast_at;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  442) 	u64				autop_too_slow_at;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  443) 	int				autop_idx;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  444) 	bool				user_qos_params:1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  445) 	bool				user_cost_model:1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  446) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  447) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  448) struct iocg_pcpu_stat {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  449) 	local64_t			abs_vusage;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  450) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  451) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  452) struct iocg_stat {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  453) 	u64				usage_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  454) 	u64				wait_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  455) 	u64				indebt_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  456) 	u64				indelay_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  457) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  458) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  459) /* per device-cgroup pair */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  460) struct ioc_gq {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  461) 	struct blkg_policy_data		pd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  462) 	struct ioc			*ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  463) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  464) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  465) 	 * A iocg can get its weight from two sources - an explicit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  466) 	 * per-device-cgroup configuration or the default weight of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  467) 	 * cgroup.  `cfg_weight` is the explicit per-device-cgroup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  468) 	 * configuration.  `weight` is the effective considering both
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  469) 	 * sources.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  470) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  471) 	 * When an idle cgroup becomes active its `active` goes from 0 to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  472) 	 * `weight`.  `inuse` is the surplus adjusted active weight.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  473) 	 * `active` and `inuse` are used to calculate `hweight_active` and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  474) 	 * `hweight_inuse`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  475) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  476) 	 * `last_inuse` remembers `inuse` while an iocg is idle to persist
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  477) 	 * surplus adjustments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  478) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  479) 	 * `inuse` may be adjusted dynamically during period. `saved_*` are used
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  480) 	 * to determine and track adjustments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  481) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  482) 	u32				cfg_weight;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  483) 	u32				weight;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  484) 	u32				active;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  485) 	u32				inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  486) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  487) 	u32				last_inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  488) 	s64				saved_margin;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  489) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  490) 	sector_t			cursor;		/* to detect randio */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  491) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  492) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  493) 	 * `vtime` is this iocg's vtime cursor which progresses as IOs are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  494) 	 * issued.  If lagging behind device vtime, the delta represents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  495) 	 * the currently available IO budget.  If runnning ahead, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  496) 	 * overage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  497) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  498) 	 * `vtime_done` is the same but progressed on completion rather
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  499) 	 * than issue.  The delta behind `vtime` represents the cost of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  500) 	 * currently in-flight IOs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  501) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  502) 	atomic64_t			vtime;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  503) 	atomic64_t			done_vtime;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  504) 	u64				abs_vdebt;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  505) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  506) 	/* current delay in effect and when it started */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  507) 	u64				delay;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  508) 	u64				delay_at;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  509) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  510) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  511) 	 * The period this iocg was last active in.  Used for deactivation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  512) 	 * and invalidating `vtime`.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  513) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  514) 	atomic64_t			active_period;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  515) 	struct list_head		active_list;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  516) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  517) 	/* see __propagate_weights() and current_hweight() for details */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  518) 	u64				child_active_sum;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  519) 	u64				child_inuse_sum;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  520) 	u64				child_adjusted_sum;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  521) 	int				hweight_gen;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  522) 	u32				hweight_active;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  523) 	u32				hweight_inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  524) 	u32				hweight_donating;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  525) 	u32				hweight_after_donation;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  526) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  527) 	struct list_head		walk_list;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  528) 	struct list_head		surplus_list;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  529) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  530) 	struct wait_queue_head		waitq;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  531) 	struct hrtimer			waitq_timer;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  532) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  533) 	/* timestamp at the latest activation */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  534) 	u64				activated_at;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  535) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  536) 	/* statistics */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  537) 	struct iocg_pcpu_stat __percpu	*pcpu_stat;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  538) 	struct iocg_stat		local_stat;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  539) 	struct iocg_stat		desc_stat;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  540) 	struct iocg_stat		last_stat;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  541) 	u64				last_stat_abs_vusage;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  542) 	u64				usage_delta_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  543) 	u64				wait_since;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  544) 	u64				indebt_since;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  545) 	u64				indelay_since;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  546) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  547) 	/* this iocg's depth in the hierarchy and ancestors including self */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  548) 	int				level;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  549) 	struct ioc_gq			*ancestors[];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  550) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  551) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  552) /* per cgroup */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  553) struct ioc_cgrp {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  554) 	struct blkcg_policy_data	cpd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  555) 	unsigned int			dfl_weight;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  556) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  557) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  558) struct ioc_now {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  559) 	u64				now_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  560) 	u64				now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  561) 	u64				vnow;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  562) 	u64				vrate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  563) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  564) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  565) struct iocg_wait {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  566) 	struct wait_queue_entry		wait;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  567) 	struct bio			*bio;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  568) 	u64				abs_cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  569) 	bool				committed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  570) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  571) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  572) struct iocg_wake_ctx {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  573) 	struct ioc_gq			*iocg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  574) 	u32				hw_inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  575) 	s64				vbudget;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  576) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  577) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  578) static const struct ioc_params autop[] = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  579) 	[AUTOP_HDD] = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  580) 		.qos				= {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  581) 			[QOS_RLAT]		=        250000, /* 250ms */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  582) 			[QOS_WLAT]		=        250000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  583) 			[QOS_MIN]		= VRATE_MIN_PPM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  584) 			[QOS_MAX]		= VRATE_MAX_PPM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  585) 		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  586) 		.i_lcoefs			= {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  587) 			[I_LCOEF_RBPS]		=     174019176,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  588) 			[I_LCOEF_RSEQIOPS]	=         41708,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  589) 			[I_LCOEF_RRANDIOPS]	=           370,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  590) 			[I_LCOEF_WBPS]		=     178075866,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  591) 			[I_LCOEF_WSEQIOPS]	=         42705,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  592) 			[I_LCOEF_WRANDIOPS]	=           378,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  593) 		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  594) 	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  595) 	[AUTOP_SSD_QD1] = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  596) 		.qos				= {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  597) 			[QOS_RLAT]		=         25000, /* 25ms */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  598) 			[QOS_WLAT]		=         25000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  599) 			[QOS_MIN]		= VRATE_MIN_PPM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  600) 			[QOS_MAX]		= VRATE_MAX_PPM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  601) 		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  602) 		.i_lcoefs			= {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  603) 			[I_LCOEF_RBPS]		=     245855193,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  604) 			[I_LCOEF_RSEQIOPS]	=         61575,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  605) 			[I_LCOEF_RRANDIOPS]	=          6946,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  606) 			[I_LCOEF_WBPS]		=     141365009,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  607) 			[I_LCOEF_WSEQIOPS]	=         33716,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  608) 			[I_LCOEF_WRANDIOPS]	=         26796,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  609) 		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  610) 	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  611) 	[AUTOP_SSD_DFL] = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  612) 		.qos				= {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  613) 			[QOS_RLAT]		=         25000, /* 25ms */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  614) 			[QOS_WLAT]		=         25000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  615) 			[QOS_MIN]		= VRATE_MIN_PPM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  616) 			[QOS_MAX]		= VRATE_MAX_PPM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  617) 		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  618) 		.i_lcoefs			= {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  619) 			[I_LCOEF_RBPS]		=     488636629,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  620) 			[I_LCOEF_RSEQIOPS]	=          8932,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  621) 			[I_LCOEF_RRANDIOPS]	=          8518,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  622) 			[I_LCOEF_WBPS]		=     427891549,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  623) 			[I_LCOEF_WSEQIOPS]	=         28755,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  624) 			[I_LCOEF_WRANDIOPS]	=         21940,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  625) 		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  626) 		.too_fast_vrate_pct		=           500,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  627) 	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  628) 	[AUTOP_SSD_FAST] = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  629) 		.qos				= {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  630) 			[QOS_RLAT]		=          5000, /* 5ms */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  631) 			[QOS_WLAT]		=          5000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  632) 			[QOS_MIN]		= VRATE_MIN_PPM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  633) 			[QOS_MAX]		= VRATE_MAX_PPM,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  634) 		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  635) 		.i_lcoefs			= {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  636) 			[I_LCOEF_RBPS]		=    3102524156LLU,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  637) 			[I_LCOEF_RSEQIOPS]	=        724816,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  638) 			[I_LCOEF_RRANDIOPS]	=        778122,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  639) 			[I_LCOEF_WBPS]		=    1742780862LLU,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  640) 			[I_LCOEF_WSEQIOPS]	=        425702,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  641) 			[I_LCOEF_WRANDIOPS]	=	 443193,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  642) 		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  643) 		.too_slow_vrate_pct		=            10,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  644) 	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  645) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  646) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  647) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  648)  * vrate adjust percentages indexed by ioc->busy_level.  We adjust up on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  649)  * vtime credit shortage and down on device saturation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  650)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  651) static u32 vrate_adj_pct[] =
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  652) 	{ 0, 0, 0, 0,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  653) 	  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  654) 	  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  655) 	  4, 4, 4, 4, 4, 4, 4, 4, 8, 8, 8, 8, 8, 8, 8, 8, 16 };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  656) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  657) static struct blkcg_policy blkcg_policy_iocost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  658) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  659) /* accessors and helpers */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  660) static struct ioc *rqos_to_ioc(struct rq_qos *rqos)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  661) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  662) 	return container_of(rqos, struct ioc, rqos);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  663) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  664) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  665) static struct ioc *q_to_ioc(struct request_queue *q)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  666) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  667) 	return rqos_to_ioc(rq_qos_id(q, RQ_QOS_COST));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  668) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  669) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  670) static const char *q_name(struct request_queue *q)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  671) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  672) 	if (blk_queue_registered(q))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  673) 		return kobject_name(q->kobj.parent);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  674) 	else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  675) 		return "<unknown>";
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  676) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  677) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  678) static const char __maybe_unused *ioc_name(struct ioc *ioc)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  679) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  680) 	return q_name(ioc->rqos.q);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  681) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  682) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  683) static struct ioc_gq *pd_to_iocg(struct blkg_policy_data *pd)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  684) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  685) 	return pd ? container_of(pd, struct ioc_gq, pd) : NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  686) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  687) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  688) static struct ioc_gq *blkg_to_iocg(struct blkcg_gq *blkg)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  689) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  690) 	return pd_to_iocg(blkg_to_pd(blkg, &blkcg_policy_iocost));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  691) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  692) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  693) static struct blkcg_gq *iocg_to_blkg(struct ioc_gq *iocg)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  694) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  695) 	return pd_to_blkg(&iocg->pd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  696) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  697) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  698) static struct ioc_cgrp *blkcg_to_iocc(struct blkcg *blkcg)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  699) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  700) 	return container_of(blkcg_to_cpd(blkcg, &blkcg_policy_iocost),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  701) 			    struct ioc_cgrp, cpd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  702) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  703) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  704) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  705)  * Scale @abs_cost to the inverse of @hw_inuse.  The lower the hierarchical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  706)  * weight, the more expensive each IO.  Must round up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  707)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  708) static u64 abs_cost_to_cost(u64 abs_cost, u32 hw_inuse)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  709) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  710) 	return DIV64_U64_ROUND_UP(abs_cost * WEIGHT_ONE, hw_inuse);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  711) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  712) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  713) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  714)  * The inverse of abs_cost_to_cost().  Must round up.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  715)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  716) static u64 cost_to_abs_cost(u64 cost, u32 hw_inuse)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  717) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  718) 	return DIV64_U64_ROUND_UP(cost * hw_inuse, WEIGHT_ONE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  719) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  720) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  721) static void iocg_commit_bio(struct ioc_gq *iocg, struct bio *bio,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  722) 			    u64 abs_cost, u64 cost)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  723) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  724) 	struct iocg_pcpu_stat *gcs;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  725) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  726) 	bio->bi_iocost_cost = cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  727) 	atomic64_add(cost, &iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  728) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  729) 	gcs = get_cpu_ptr(iocg->pcpu_stat);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  730) 	local64_add(abs_cost, &gcs->abs_vusage);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  731) 	put_cpu_ptr(gcs);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  732) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  733) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  734) static void iocg_lock(struct ioc_gq *iocg, bool lock_ioc, unsigned long *flags)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  735) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  736) 	if (lock_ioc) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  737) 		spin_lock_irqsave(&iocg->ioc->lock, *flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  738) 		spin_lock(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  739) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  740) 		spin_lock_irqsave(&iocg->waitq.lock, *flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  741) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  742) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  743) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  744) static void iocg_unlock(struct ioc_gq *iocg, bool unlock_ioc, unsigned long *flags)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  745) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  746) 	if (unlock_ioc) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  747) 		spin_unlock(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  748) 		spin_unlock_irqrestore(&iocg->ioc->lock, *flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  749) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  750) 		spin_unlock_irqrestore(&iocg->waitq.lock, *flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  751) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  752) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  753) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  754) #define CREATE_TRACE_POINTS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  755) #include <trace/events/iocost.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  756) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  757) static void ioc_refresh_margins(struct ioc *ioc)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  758) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  759) 	struct ioc_margins *margins = &ioc->margins;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  760) 	u32 period_us = ioc->period_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  761) 	u64 vrate = ioc->vtime_base_rate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  762) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  763) 	margins->min = (period_us * MARGIN_MIN_PCT / 100) * vrate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  764) 	margins->low = (period_us * MARGIN_LOW_PCT / 100) * vrate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  765) 	margins->target = (period_us * MARGIN_TARGET_PCT / 100) * vrate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  766) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  767) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  768) /* latency Qos params changed, update period_us and all the dependent params */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  769) static void ioc_refresh_period_us(struct ioc *ioc)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  770) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  771) 	u32 ppm, lat, multi, period_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  772) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  773) 	lockdep_assert_held(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  774) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  775) 	/* pick the higher latency target */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  776) 	if (ioc->params.qos[QOS_RLAT] >= ioc->params.qos[QOS_WLAT]) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  777) 		ppm = ioc->params.qos[QOS_RPPM];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  778) 		lat = ioc->params.qos[QOS_RLAT];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  779) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  780) 		ppm = ioc->params.qos[QOS_WPPM];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  781) 		lat = ioc->params.qos[QOS_WLAT];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  782) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  783) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  784) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  785) 	 * We want the period to be long enough to contain a healthy number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  786) 	 * of IOs while short enough for granular control.  Define it as a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  787) 	 * multiple of the latency target.  Ideally, the multiplier should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  788) 	 * be scaled according to the percentile so that it would nominally
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  789) 	 * contain a certain number of requests.  Let's be simpler and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  790) 	 * scale it linearly so that it's 2x >= pct(90) and 10x at pct(50).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  791) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  792) 	if (ppm)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  793) 		multi = max_t(u32, (MILLION - ppm) / 50000, 2);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  794) 	else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  795) 		multi = 2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  796) 	period_us = multi * lat;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  797) 	period_us = clamp_t(u32, period_us, MIN_PERIOD, MAX_PERIOD);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  798) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  799) 	/* calculate dependent params */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  800) 	ioc->period_us = period_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  801) 	ioc->timer_slack_ns = div64_u64(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  802) 		(u64)period_us * NSEC_PER_USEC * TIMER_SLACK_PCT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  803) 		100);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  804) 	ioc_refresh_margins(ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  805) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  806) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  807) static int ioc_autop_idx(struct ioc *ioc)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  808) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  809) 	int idx = ioc->autop_idx;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  810) 	const struct ioc_params *p = &autop[idx];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  811) 	u32 vrate_pct;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  812) 	u64 now_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  813) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  814) 	/* rotational? */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  815) 	if (!blk_queue_nonrot(ioc->rqos.q))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  816) 		return AUTOP_HDD;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  817) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  818) 	/* handle SATA SSDs w/ broken NCQ */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  819) 	if (blk_queue_depth(ioc->rqos.q) == 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  820) 		return AUTOP_SSD_QD1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  821) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  822) 	/* use one of the normal ssd sets */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  823) 	if (idx < AUTOP_SSD_DFL)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  824) 		return AUTOP_SSD_DFL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  825) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  826) 	/* if user is overriding anything, maintain what was there */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  827) 	if (ioc->user_qos_params || ioc->user_cost_model)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  828) 		return idx;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  829) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  830) 	/* step up/down based on the vrate */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  831) 	vrate_pct = div64_u64(ioc->vtime_base_rate * 100, VTIME_PER_USEC);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  832) 	now_ns = ktime_get_ns();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  833) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  834) 	if (p->too_fast_vrate_pct && p->too_fast_vrate_pct <= vrate_pct) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  835) 		if (!ioc->autop_too_fast_at)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  836) 			ioc->autop_too_fast_at = now_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  837) 		if (now_ns - ioc->autop_too_fast_at >= AUTOP_CYCLE_NSEC)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  838) 			return idx + 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  839) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  840) 		ioc->autop_too_fast_at = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  841) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  842) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  843) 	if (p->too_slow_vrate_pct && p->too_slow_vrate_pct >= vrate_pct) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  844) 		if (!ioc->autop_too_slow_at)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  845) 			ioc->autop_too_slow_at = now_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  846) 		if (now_ns - ioc->autop_too_slow_at >= AUTOP_CYCLE_NSEC)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  847) 			return idx - 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  848) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  849) 		ioc->autop_too_slow_at = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  850) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  851) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  852) 	return idx;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  853) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  854) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  855) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  856)  * Take the followings as input
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  857)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  858)  *  @bps	maximum sequential throughput
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  859)  *  @seqiops	maximum sequential 4k iops
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  860)  *  @randiops	maximum random 4k iops
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  861)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  862)  * and calculate the linear model cost coefficients.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  863)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  864)  *  *@page	per-page cost		1s / (@bps / 4096)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  865)  *  *@seqio	base cost of a seq IO	max((1s / @seqiops) - *@page, 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  866)  *  @randiops	base cost of a rand IO	max((1s / @randiops) - *@page, 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  867)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  868) static void calc_lcoefs(u64 bps, u64 seqiops, u64 randiops,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  869) 			u64 *page, u64 *seqio, u64 *randio)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  870) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  871) 	u64 v;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  872) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  873) 	*page = *seqio = *randio = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  874) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  875) 	if (bps)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  876) 		*page = DIV64_U64_ROUND_UP(VTIME_PER_SEC,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  877) 					   DIV_ROUND_UP_ULL(bps, IOC_PAGE_SIZE));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  878) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  879) 	if (seqiops) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  880) 		v = DIV64_U64_ROUND_UP(VTIME_PER_SEC, seqiops);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  881) 		if (v > *page)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  882) 			*seqio = v - *page;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  883) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  884) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  885) 	if (randiops) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  886) 		v = DIV64_U64_ROUND_UP(VTIME_PER_SEC, randiops);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  887) 		if (v > *page)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  888) 			*randio = v - *page;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  889) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  890) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  891) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  892) static void ioc_refresh_lcoefs(struct ioc *ioc)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  893) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  894) 	u64 *u = ioc->params.i_lcoefs;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  895) 	u64 *c = ioc->params.lcoefs;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  896) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  897) 	calc_lcoefs(u[I_LCOEF_RBPS], u[I_LCOEF_RSEQIOPS], u[I_LCOEF_RRANDIOPS],
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  898) 		    &c[LCOEF_RPAGE], &c[LCOEF_RSEQIO], &c[LCOEF_RRANDIO]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  899) 	calc_lcoefs(u[I_LCOEF_WBPS], u[I_LCOEF_WSEQIOPS], u[I_LCOEF_WRANDIOPS],
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  900) 		    &c[LCOEF_WPAGE], &c[LCOEF_WSEQIO], &c[LCOEF_WRANDIO]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  901) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  902) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  903) static bool ioc_refresh_params(struct ioc *ioc, bool force)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  904) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  905) 	const struct ioc_params *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  906) 	int idx;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  907) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  908) 	lockdep_assert_held(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  909) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  910) 	idx = ioc_autop_idx(ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  911) 	p = &autop[idx];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  912) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  913) 	if (idx == ioc->autop_idx && !force)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  914) 		return false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  915) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  916) 	if (idx != ioc->autop_idx)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  917) 		atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  918) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  919) 	ioc->autop_idx = idx;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  920) 	ioc->autop_too_fast_at = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  921) 	ioc->autop_too_slow_at = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  922) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  923) 	if (!ioc->user_qos_params)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  924) 		memcpy(ioc->params.qos, p->qos, sizeof(p->qos));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  925) 	if (!ioc->user_cost_model)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  926) 		memcpy(ioc->params.i_lcoefs, p->i_lcoefs, sizeof(p->i_lcoefs));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  927) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  928) 	ioc_refresh_period_us(ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  929) 	ioc_refresh_lcoefs(ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  930) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  931) 	ioc->vrate_min = DIV64_U64_ROUND_UP((u64)ioc->params.qos[QOS_MIN] *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  932) 					    VTIME_PER_USEC, MILLION);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  933) 	ioc->vrate_max = div64_u64((u64)ioc->params.qos[QOS_MAX] *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  934) 				   VTIME_PER_USEC, MILLION);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  935) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  936) 	return true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  937) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  938) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  939) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  940)  * When an iocg accumulates too much vtime or gets deactivated, we throw away
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  941)  * some vtime, which lowers the overall device utilization. As the exact amount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  942)  * which is being thrown away is known, we can compensate by accelerating the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  943)  * vrate accordingly so that the extra vtime generated in the current period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  944)  * matches what got lost.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  945)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  946) static void ioc_refresh_vrate(struct ioc *ioc, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  947) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  948) 	s64 pleft = ioc->period_at + ioc->period_us - now->now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  949) 	s64 vperiod = ioc->period_us * ioc->vtime_base_rate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  950) 	s64 vcomp, vcomp_min, vcomp_max;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  951) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  952) 	lockdep_assert_held(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  953) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  954) 	/* we need some time left in this period */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  955) 	if (pleft <= 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  956) 		goto done;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  957) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  958) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  959) 	 * Calculate how much vrate should be adjusted to offset the error.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  960) 	 * Limit the amount of adjustment and deduct the adjusted amount from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  961) 	 * the error.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  962) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  963) 	vcomp = -div64_s64(ioc->vtime_err, pleft);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  964) 	vcomp_min = -(ioc->vtime_base_rate >> 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  965) 	vcomp_max = ioc->vtime_base_rate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  966) 	vcomp = clamp(vcomp, vcomp_min, vcomp_max);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  967) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  968) 	ioc->vtime_err += vcomp * pleft;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  969) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  970) 	atomic64_set(&ioc->vtime_rate, ioc->vtime_base_rate + vcomp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  971) done:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  972) 	/* bound how much error can accumulate */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  973) 	ioc->vtime_err = clamp(ioc->vtime_err, -vperiod, vperiod);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  974) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  975) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  976) /* take a snapshot of the current [v]time and vrate */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  977) static void ioc_now(struct ioc *ioc, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  978) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  979) 	unsigned seq;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  980) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  981) 	now->now_ns = ktime_get();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  982) 	now->now = ktime_to_us(now->now_ns);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  983) 	now->vrate = atomic64_read(&ioc->vtime_rate);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  984) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  985) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  986) 	 * The current vtime is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  987) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  988) 	 *   vtime at period start + (wallclock time since the start) * vrate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  989) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  990) 	 * As a consistent snapshot of `period_at_vtime` and `period_at` is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  991) 	 * needed, they're seqcount protected.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  992) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  993) 	do {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  994) 		seq = read_seqcount_begin(&ioc->period_seqcount);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  995) 		now->vnow = ioc->period_at_vtime +
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  996) 			(now->now - ioc->period_at) * now->vrate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  997) 	} while (read_seqcount_retry(&ioc->period_seqcount, seq));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  998) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  999) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) static void ioc_start_period(struct ioc *ioc, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002) 	WARN_ON_ONCE(ioc->running != IOC_RUNNING);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004) 	write_seqcount_begin(&ioc->period_seqcount);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005) 	ioc->period_at = now->now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) 	ioc->period_at_vtime = now->vnow;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007) 	write_seqcount_end(&ioc->period_seqcount);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009) 	ioc->timer.expires = jiffies + usecs_to_jiffies(ioc->period_us);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010) 	add_timer(&ioc->timer);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014)  * Update @iocg's `active` and `inuse` to @active and @inuse, update level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015)  * weight sums and propagate upwards accordingly. If @save, the current margin
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016)  * is saved to be used as reference for later inuse in-period adjustments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) static void __propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019) 				bool save, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022) 	int lvl;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024) 	lockdep_assert_held(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027) 	 * For an active leaf node, its inuse shouldn't be zero or exceed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028) 	 * @active. An active internal node's inuse is solely determined by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) 	 * inuse to active ratio of its children regardless of @inuse.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031) 	if (list_empty(&iocg->active_list) && iocg->child_active_sum) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032) 		inuse = DIV64_U64_ROUND_UP(active * iocg->child_inuse_sum,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033) 					   iocg->child_active_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035) 		inuse = clamp_t(u32, inuse, 1, active);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038) 	iocg->last_inuse = iocg->inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039) 	if (save)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040) 		iocg->saved_margin = now->vnow - atomic64_read(&iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042) 	if (active == iocg->active && inuse == iocg->inuse)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) 	for (lvl = iocg->level - 1; lvl >= 0; lvl--) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046) 		struct ioc_gq *parent = iocg->ancestors[lvl];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047) 		struct ioc_gq *child = iocg->ancestors[lvl + 1];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048) 		u32 parent_active = 0, parent_inuse = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050) 		/* update the level sums */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051) 		parent->child_active_sum += (s32)(active - child->active);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052) 		parent->child_inuse_sum += (s32)(inuse - child->inuse);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053) 		/* apply the updates */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054) 		child->active = active;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055) 		child->inuse = inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057) 		/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058) 		 * The delta between inuse and active sums indicates that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059) 		 * that much of weight is being given away.  Parent's inuse
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060) 		 * and active should reflect the ratio.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061) 		 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062) 		if (parent->child_active_sum) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063) 			parent_active = parent->weight;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064) 			parent_inuse = DIV64_U64_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065) 				parent_active * parent->child_inuse_sum,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066) 				parent->child_active_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069) 		/* do we need to keep walking up? */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070) 		if (parent_active == parent->active &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071) 		    parent_inuse == parent->inuse)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072) 			break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1074) 		active = parent_active;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1075) 		inuse = parent_inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1076) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1077) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1078) 	ioc->weights_updated = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1079) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1080) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1081) static void commit_weights(struct ioc *ioc)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1082) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1083) 	lockdep_assert_held(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1084) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1085) 	if (ioc->weights_updated) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1086) 		/* paired with rmb in current_hweight(), see there */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1087) 		smp_wmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1088) 		atomic_inc(&ioc->hweight_gen);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1089) 		ioc->weights_updated = false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1090) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1091) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1092) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1093) static void propagate_weights(struct ioc_gq *iocg, u32 active, u32 inuse,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1094) 			      bool save, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1095) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1096) 	__propagate_weights(iocg, active, inuse, save, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1097) 	commit_weights(iocg->ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1098) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1099) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1100) static void current_hweight(struct ioc_gq *iocg, u32 *hw_activep, u32 *hw_inusep)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1101) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1102) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1103) 	int lvl;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1104) 	u32 hwa, hwi;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1105) 	int ioc_gen;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1107) 	/* hot path - if uptodate, use cached */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1108) 	ioc_gen = atomic_read(&ioc->hweight_gen);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1109) 	if (ioc_gen == iocg->hweight_gen)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1110) 		goto out;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1111) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1112) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1113) 	 * Paired with wmb in commit_weights(). If we saw the updated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1114) 	 * hweight_gen, all the weight updates from __propagate_weights() are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1115) 	 * visible too.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1116) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1117) 	 * We can race with weight updates during calculation and get it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1118) 	 * wrong.  However, hweight_gen would have changed and a future
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1119) 	 * reader will recalculate and we're guaranteed to discard the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1120) 	 * wrong result soon.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1121) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1122) 	smp_rmb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1124) 	hwa = hwi = WEIGHT_ONE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1125) 	for (lvl = 0; lvl <= iocg->level - 1; lvl++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1126) 		struct ioc_gq *parent = iocg->ancestors[lvl];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1127) 		struct ioc_gq *child = iocg->ancestors[lvl + 1];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1128) 		u64 active_sum = READ_ONCE(parent->child_active_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1129) 		u64 inuse_sum = READ_ONCE(parent->child_inuse_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1130) 		u32 active = READ_ONCE(child->active);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1131) 		u32 inuse = READ_ONCE(child->inuse);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1132) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1133) 		/* we can race with deactivations and either may read as zero */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1134) 		if (!active_sum || !inuse_sum)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1135) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1137) 		active_sum = max_t(u64, active, active_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1138) 		hwa = div64_u64((u64)hwa * active, active_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1139) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1140) 		inuse_sum = max_t(u64, inuse, inuse_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1141) 		hwi = div64_u64((u64)hwi * inuse, inuse_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1142) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1144) 	iocg->hweight_active = max_t(u32, hwa, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1145) 	iocg->hweight_inuse = max_t(u32, hwi, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1146) 	iocg->hweight_gen = ioc_gen;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1147) out:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1148) 	if (hw_activep)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1149) 		*hw_activep = iocg->hweight_active;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1150) 	if (hw_inusep)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1151) 		*hw_inusep = iocg->hweight_inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1152) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1154) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1155)  * Calculate the hweight_inuse @iocg would get with max @inuse assuming all the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1156)  * other weights stay unchanged.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1157)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1158) static u32 current_hweight_max(struct ioc_gq *iocg)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1159) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1160) 	u32 hwm = WEIGHT_ONE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1161) 	u32 inuse = iocg->active;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1162) 	u64 child_inuse_sum;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1163) 	int lvl;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1165) 	lockdep_assert_held(&iocg->ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1166) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1167) 	for (lvl = iocg->level - 1; lvl >= 0; lvl--) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1168) 		struct ioc_gq *parent = iocg->ancestors[lvl];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1169) 		struct ioc_gq *child = iocg->ancestors[lvl + 1];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1170) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1171) 		child_inuse_sum = parent->child_inuse_sum + inuse - child->inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1172) 		hwm = div64_u64((u64)hwm * inuse, child_inuse_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1173) 		inuse = DIV64_U64_ROUND_UP(parent->active * child_inuse_sum,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1174) 					   parent->child_active_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1175) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1176) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1177) 	return max_t(u32, hwm, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1178) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1179) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1180) static void weight_updated(struct ioc_gq *iocg, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1181) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1182) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1183) 	struct blkcg_gq *blkg = iocg_to_blkg(iocg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1184) 	struct ioc_cgrp *iocc = blkcg_to_iocc(blkg->blkcg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1185) 	u32 weight;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1187) 	lockdep_assert_held(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1188) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1189) 	weight = iocg->cfg_weight ?: iocc->dfl_weight;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1190) 	if (weight != iocg->weight && iocg->active)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1191) 		propagate_weights(iocg, weight, iocg->inuse, true, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1192) 	iocg->weight = weight;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1193) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1195) static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1196) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1197) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1198) 	u64 last_period, cur_period;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1199) 	u64 vtime, vtarget;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1200) 	int i;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1201) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1202) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1203) 	 * If seem to be already active, just update the stamp to tell the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1204) 	 * timer that we're still active.  We don't mind occassional races.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1205) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1206) 	if (!list_empty(&iocg->active_list)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1207) 		ioc_now(ioc, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1208) 		cur_period = atomic64_read(&ioc->cur_period);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1209) 		if (atomic64_read(&iocg->active_period) != cur_period)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1210) 			atomic64_set(&iocg->active_period, cur_period);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1211) 		return true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1212) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1213) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1214) 	/* racy check on internal node IOs, treat as root level IOs */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1215) 	if (iocg->child_active_sum)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1216) 		return false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1217) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1218) 	spin_lock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1219) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1220) 	ioc_now(ioc, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1221) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1222) 	/* update period */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1223) 	cur_period = atomic64_read(&ioc->cur_period);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1224) 	last_period = atomic64_read(&iocg->active_period);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1225) 	atomic64_set(&iocg->active_period, cur_period);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1226) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1227) 	/* already activated or breaking leaf-only constraint? */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1228) 	if (!list_empty(&iocg->active_list))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1229) 		goto succeed_unlock;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1230) 	for (i = iocg->level - 1; i > 0; i--)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1231) 		if (!list_empty(&iocg->ancestors[i]->active_list))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1232) 			goto fail_unlock;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1234) 	if (iocg->child_active_sum)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1235) 		goto fail_unlock;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1236) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1237) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1238) 	 * Always start with the target budget. On deactivation, we throw away
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1239) 	 * anything above it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1240) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1241) 	vtarget = now->vnow - ioc->margins.target;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1242) 	vtime = atomic64_read(&iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1243) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1244) 	atomic64_add(vtarget - vtime, &iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1245) 	atomic64_add(vtarget - vtime, &iocg->done_vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1246) 	vtime = vtarget;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1248) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1249) 	 * Activate, propagate weight and start period timer if not
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1250) 	 * running.  Reset hweight_gen to avoid accidental match from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1251) 	 * wrapping.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1252) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1253) 	iocg->hweight_gen = atomic_read(&ioc->hweight_gen) - 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1254) 	list_add(&iocg->active_list, &ioc->active_iocgs);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1255) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1256) 	propagate_weights(iocg, iocg->weight,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1257) 			  iocg->last_inuse ?: iocg->weight, true, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1258) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1259) 	TRACE_IOCG_PATH(iocg_activate, iocg, now,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1260) 			last_period, cur_period, vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1261) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1262) 	iocg->activated_at = now->now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1263) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1264) 	if (ioc->running == IOC_IDLE) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1265) 		ioc->running = IOC_RUNNING;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1266) 		ioc->dfgv_period_at = now->now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1267) 		ioc->dfgv_period_rem = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1268) 		ioc_start_period(ioc, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1269) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1270) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1271) succeed_unlock:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1272) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1273) 	return true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1274) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1275) fail_unlock:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1276) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1277) 	return false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1278) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1280) static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1281) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1282) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1283) 	struct blkcg_gq *blkg = iocg_to_blkg(iocg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1284) 	u64 tdelta, delay, new_delay;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1285) 	s64 vover, vover_pct;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1286) 	u32 hwa;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1287) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1288) 	lockdep_assert_held(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1289) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1290) 	/* calculate the current delay in effect - 1/2 every second */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1291) 	tdelta = now->now - iocg->delay_at;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1292) 	if (iocg->delay)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1293) 		delay = iocg->delay >> div64_u64(tdelta, USEC_PER_SEC);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1294) 	else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1295) 		delay = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1296) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1297) 	/* calculate the new delay from the debt amount */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1298) 	current_hweight(iocg, &hwa, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1299) 	vover = atomic64_read(&iocg->vtime) +
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1300) 		abs_cost_to_cost(iocg->abs_vdebt, hwa) - now->vnow;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1301) 	vover_pct = div64_s64(100 * vover,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1302) 			      ioc->period_us * ioc->vtime_base_rate);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1303) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1304) 	if (vover_pct <= MIN_DELAY_THR_PCT)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1305) 		new_delay = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1306) 	else if (vover_pct >= MAX_DELAY_THR_PCT)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1307) 		new_delay = MAX_DELAY;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1308) 	else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1309) 		new_delay = MIN_DELAY +
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1310) 			div_u64((MAX_DELAY - MIN_DELAY) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1311) 				(vover_pct - MIN_DELAY_THR_PCT),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1312) 				MAX_DELAY_THR_PCT - MIN_DELAY_THR_PCT);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1313) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1314) 	/* pick the higher one and apply */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1315) 	if (new_delay > delay) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1316) 		iocg->delay = new_delay;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1317) 		iocg->delay_at = now->now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1318) 		delay = new_delay;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1319) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1320) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1321) 	if (delay >= MIN_DELAY) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1322) 		if (!iocg->indelay_since)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1323) 			iocg->indelay_since = now->now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1324) 		blkcg_set_delay(blkg, delay * NSEC_PER_USEC);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1325) 		return true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1326) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1327) 		if (iocg->indelay_since) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1328) 			iocg->local_stat.indelay_us += now->now - iocg->indelay_since;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1329) 			iocg->indelay_since = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1330) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1331) 		iocg->delay = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1332) 		blkcg_clear_delay(blkg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1333) 		return false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1334) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1335) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1336) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1337) static void iocg_incur_debt(struct ioc_gq *iocg, u64 abs_cost,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1338) 			    struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1339) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1340) 	struct iocg_pcpu_stat *gcs;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1341) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1342) 	lockdep_assert_held(&iocg->ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1343) 	lockdep_assert_held(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1344) 	WARN_ON_ONCE(list_empty(&iocg->active_list));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1345) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1346) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1347) 	 * Once in debt, debt handling owns inuse. @iocg stays at the minimum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1348) 	 * inuse donating all of it share to others until its debt is paid off.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1349) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1350) 	if (!iocg->abs_vdebt && abs_cost) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1351) 		iocg->indebt_since = now->now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1352) 		propagate_weights(iocg, iocg->active, 0, false, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1353) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1354) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1355) 	iocg->abs_vdebt += abs_cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1356) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1357) 	gcs = get_cpu_ptr(iocg->pcpu_stat);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1358) 	local64_add(abs_cost, &gcs->abs_vusage);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1359) 	put_cpu_ptr(gcs);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1360) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1361) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1362) static void iocg_pay_debt(struct ioc_gq *iocg, u64 abs_vpay,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1363) 			  struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1364) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1365) 	lockdep_assert_held(&iocg->ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1366) 	lockdep_assert_held(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1367) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1368) 	/* make sure that nobody messed with @iocg */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1369) 	WARN_ON_ONCE(list_empty(&iocg->active_list));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1370) 	WARN_ON_ONCE(iocg->inuse > 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1371) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1372) 	iocg->abs_vdebt -= min(abs_vpay, iocg->abs_vdebt);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1373) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1374) 	/* if debt is paid in full, restore inuse */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1375) 	if (!iocg->abs_vdebt) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1376) 		iocg->local_stat.indebt_us += now->now - iocg->indebt_since;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1377) 		iocg->indebt_since = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1378) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1379) 		propagate_weights(iocg, iocg->active, iocg->last_inuse,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1380) 				  false, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1381) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1382) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1383) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1384) static int iocg_wake_fn(struct wait_queue_entry *wq_entry, unsigned mode,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1385) 			int flags, void *key)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1386) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1387) 	struct iocg_wait *wait = container_of(wq_entry, struct iocg_wait, wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1388) 	struct iocg_wake_ctx *ctx = (struct iocg_wake_ctx *)key;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1389) 	u64 cost = abs_cost_to_cost(wait->abs_cost, ctx->hw_inuse);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1390) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1391) 	ctx->vbudget -= cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1392) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1393) 	if (ctx->vbudget < 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1394) 		return -1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1395) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1396) 	iocg_commit_bio(ctx->iocg, wait->bio, wait->abs_cost, cost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1397) 	wait->committed = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1398) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1399) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1400) 	 * autoremove_wake_function() removes the wait entry only when it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1401) 	 * actually changed the task state. We want the wait always removed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1402) 	 * Remove explicitly and use default_wake_function(). Note that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1403) 	 * order of operations is important as finish_wait() tests whether
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1404) 	 * @wq_entry is removed without grabbing the lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1405) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1406) 	default_wake_function(wq_entry, mode, flags, key);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1407) 	list_del_init_careful(&wq_entry->entry);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1408) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1409) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1410) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1411) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1412)  * Calculate the accumulated budget, pay debt if @pay_debt and wake up waiters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1413)  * accordingly. When @pay_debt is %true, the caller must be holding ioc->lock in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1414)  * addition to iocg->waitq.lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1415)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1416) static void iocg_kick_waitq(struct ioc_gq *iocg, bool pay_debt,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1417) 			    struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1418) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1419) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1420) 	struct iocg_wake_ctx ctx = { .iocg = iocg };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1421) 	u64 vshortage, expires, oexpires;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1422) 	s64 vbudget;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1423) 	u32 hwa;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1424) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1425) 	lockdep_assert_held(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1426) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1427) 	current_hweight(iocg, &hwa, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1428) 	vbudget = now->vnow - atomic64_read(&iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1430) 	/* pay off debt */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1431) 	if (pay_debt && iocg->abs_vdebt && vbudget > 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1432) 		u64 abs_vbudget = cost_to_abs_cost(vbudget, hwa);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1433) 		u64 abs_vpay = min_t(u64, abs_vbudget, iocg->abs_vdebt);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1434) 		u64 vpay = abs_cost_to_cost(abs_vpay, hwa);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1435) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1436) 		lockdep_assert_held(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1437) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1438) 		atomic64_add(vpay, &iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1439) 		atomic64_add(vpay, &iocg->done_vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1440) 		iocg_pay_debt(iocg, abs_vpay, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1441) 		vbudget -= vpay;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1442) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1443) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1444) 	if (iocg->abs_vdebt || iocg->delay)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1445) 		iocg_kick_delay(iocg, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1446) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1447) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1448) 	 * Debt can still be outstanding if we haven't paid all yet or the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1449) 	 * caller raced and called without @pay_debt. Shouldn't wake up waiters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1450) 	 * under debt. Make sure @vbudget reflects the outstanding amount and is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1451) 	 * not positive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1452) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1453) 	if (iocg->abs_vdebt) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1454) 		s64 vdebt = abs_cost_to_cost(iocg->abs_vdebt, hwa);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1455) 		vbudget = min_t(s64, 0, vbudget - vdebt);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1456) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1457) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1458) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1459) 	 * Wake up the ones which are due and see how much vtime we'll need for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1460) 	 * the next one. As paying off debt restores hw_inuse, it must be read
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1461) 	 * after the above debt payment.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1462) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1463) 	ctx.vbudget = vbudget;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1464) 	current_hweight(iocg, NULL, &ctx.hw_inuse);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1465) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1466) 	__wake_up_locked_key(&iocg->waitq, TASK_NORMAL, &ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1467) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1468) 	if (!waitqueue_active(&iocg->waitq)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1469) 		if (iocg->wait_since) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1470) 			iocg->local_stat.wait_us += now->now - iocg->wait_since;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1471) 			iocg->wait_since = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1472) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1473) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1474) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1475) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1476) 	if (!iocg->wait_since)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1477) 		iocg->wait_since = now->now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1478) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1479) 	if (WARN_ON_ONCE(ctx.vbudget >= 0))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1480) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1481) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1482) 	/* determine next wakeup, add a timer margin to guarantee chunking */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1483) 	vshortage = -ctx.vbudget;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1484) 	expires = now->now_ns +
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1485) 		DIV64_U64_ROUND_UP(vshortage, ioc->vtime_base_rate) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1486) 		NSEC_PER_USEC;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1487) 	expires += ioc->timer_slack_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1488) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1489) 	/* if already active and close enough, don't bother */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1490) 	oexpires = ktime_to_ns(hrtimer_get_softexpires(&iocg->waitq_timer));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1491) 	if (hrtimer_is_queued(&iocg->waitq_timer) &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1492) 	    abs(oexpires - expires) <= ioc->timer_slack_ns)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1493) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1494) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1495) 	hrtimer_start_range_ns(&iocg->waitq_timer, ns_to_ktime(expires),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1496) 			       ioc->timer_slack_ns, HRTIMER_MODE_ABS);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1497) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1498) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1499) static enum hrtimer_restart iocg_waitq_timer_fn(struct hrtimer *timer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1500) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1501) 	struct ioc_gq *iocg = container_of(timer, struct ioc_gq, waitq_timer);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1502) 	bool pay_debt = READ_ONCE(iocg->abs_vdebt);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1503) 	struct ioc_now now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1504) 	unsigned long flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1505) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1506) 	ioc_now(iocg->ioc, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1507) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1508) 	iocg_lock(iocg, pay_debt, &flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1509) 	iocg_kick_waitq(iocg, pay_debt, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1510) 	iocg_unlock(iocg, pay_debt, &flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1511) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1512) 	return HRTIMER_NORESTART;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1513) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1514) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1515) static void ioc_lat_stat(struct ioc *ioc, u32 *missed_ppm_ar, u32 *rq_wait_pct_p)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1516) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1517) 	u32 nr_met[2] = { };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1518) 	u32 nr_missed[2] = { };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1519) 	u64 rq_wait_ns = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1520) 	int cpu, rw;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1521) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1522) 	for_each_online_cpu(cpu) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1523) 		struct ioc_pcpu_stat *stat = per_cpu_ptr(ioc->pcpu_stat, cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1524) 		u64 this_rq_wait_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1525) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1526) 		for (rw = READ; rw <= WRITE; rw++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1527) 			u32 this_met = local_read(&stat->missed[rw].nr_met);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1528) 			u32 this_missed = local_read(&stat->missed[rw].nr_missed);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1529) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1530) 			nr_met[rw] += this_met - stat->missed[rw].last_met;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1531) 			nr_missed[rw] += this_missed - stat->missed[rw].last_missed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1532) 			stat->missed[rw].last_met = this_met;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1533) 			stat->missed[rw].last_missed = this_missed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1534) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1535) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1536) 		this_rq_wait_ns = local64_read(&stat->rq_wait_ns);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1537) 		rq_wait_ns += this_rq_wait_ns - stat->last_rq_wait_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1538) 		stat->last_rq_wait_ns = this_rq_wait_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1539) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1540) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1541) 	for (rw = READ; rw <= WRITE; rw++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1542) 		if (nr_met[rw] + nr_missed[rw])
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1543) 			missed_ppm_ar[rw] =
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1544) 				DIV64_U64_ROUND_UP((u64)nr_missed[rw] * MILLION,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1545) 						   nr_met[rw] + nr_missed[rw]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1546) 		else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1547) 			missed_ppm_ar[rw] = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1548) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1549) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1550) 	*rq_wait_pct_p = div64_u64(rq_wait_ns * 100,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1551) 				   ioc->period_us * NSEC_PER_USEC);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1552) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1553) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1554) /* was iocg idle this period? */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1555) static bool iocg_is_idle(struct ioc_gq *iocg)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1556) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1557) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1558) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1559) 	/* did something get issued this period? */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1560) 	if (atomic64_read(&iocg->active_period) ==
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1561) 	    atomic64_read(&ioc->cur_period))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1562) 		return false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1563) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1564) 	/* is something in flight? */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1565) 	if (atomic64_read(&iocg->done_vtime) != atomic64_read(&iocg->vtime))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1566) 		return false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1567) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1568) 	return true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1569) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1570) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1571) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1572)  * Call this function on the target leaf @iocg's to build pre-order traversal
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1573)  * list of all the ancestors in @inner_walk. The inner nodes are linked through
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1574)  * ->walk_list and the caller is responsible for dissolving the list after use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1575)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1576) static void iocg_build_inner_walk(struct ioc_gq *iocg,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1577) 				  struct list_head *inner_walk)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1578) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1579) 	int lvl;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1580) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1581) 	WARN_ON_ONCE(!list_empty(&iocg->walk_list));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1582) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1583) 	/* find the first ancestor which hasn't been visited yet */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1584) 	for (lvl = iocg->level - 1; lvl >= 0; lvl--) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1585) 		if (!list_empty(&iocg->ancestors[lvl]->walk_list))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1586) 			break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1587) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1588) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1589) 	/* walk down and visit the inner nodes to get pre-order traversal */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1590) 	while (++lvl <= iocg->level - 1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1591) 		struct ioc_gq *inner = iocg->ancestors[lvl];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1592) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1593) 		/* record traversal order */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1594) 		list_add_tail(&inner->walk_list, inner_walk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1595) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1596) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1597) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1598) /* collect per-cpu counters and propagate the deltas to the parent */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1599) static void iocg_flush_stat_one(struct ioc_gq *iocg, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1600) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1601) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1602) 	struct iocg_stat new_stat;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1603) 	u64 abs_vusage = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1604) 	u64 vusage_delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1605) 	int cpu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1606) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1607) 	lockdep_assert_held(&iocg->ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1608) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1609) 	/* collect per-cpu counters */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1610) 	for_each_possible_cpu(cpu) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1611) 		abs_vusage += local64_read(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1612) 				per_cpu_ptr(&iocg->pcpu_stat->abs_vusage, cpu));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1613) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1614) 	vusage_delta = abs_vusage - iocg->last_stat_abs_vusage;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1615) 	iocg->last_stat_abs_vusage = abs_vusage;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1616) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1617) 	iocg->usage_delta_us = div64_u64(vusage_delta, ioc->vtime_base_rate);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1618) 	iocg->local_stat.usage_us += iocg->usage_delta_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1619) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1620) 	/* propagate upwards */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1621) 	new_stat.usage_us =
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1622) 		iocg->local_stat.usage_us + iocg->desc_stat.usage_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1623) 	new_stat.wait_us =
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1624) 		iocg->local_stat.wait_us + iocg->desc_stat.wait_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1625) 	new_stat.indebt_us =
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1626) 		iocg->local_stat.indebt_us + iocg->desc_stat.indebt_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1627) 	new_stat.indelay_us =
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1628) 		iocg->local_stat.indelay_us + iocg->desc_stat.indelay_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1629) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1630) 	/* propagate the deltas to the parent */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1631) 	if (iocg->level > 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1632) 		struct iocg_stat *parent_stat =
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1633) 			&iocg->ancestors[iocg->level - 1]->desc_stat;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1634) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1635) 		parent_stat->usage_us +=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1636) 			new_stat.usage_us - iocg->last_stat.usage_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1637) 		parent_stat->wait_us +=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1638) 			new_stat.wait_us - iocg->last_stat.wait_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1639) 		parent_stat->indebt_us +=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1640) 			new_stat.indebt_us - iocg->last_stat.indebt_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1641) 		parent_stat->indelay_us +=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1642) 			new_stat.indelay_us - iocg->last_stat.indelay_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1643) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1644) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1645) 	iocg->last_stat = new_stat;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1646) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1647) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1648) /* get stat counters ready for reading on all active iocgs */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1649) static void iocg_flush_stat(struct list_head *target_iocgs, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1650) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1651) 	LIST_HEAD(inner_walk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1652) 	struct ioc_gq *iocg, *tiocg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1653) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1654) 	/* flush leaves and build inner node walk list */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1655) 	list_for_each_entry(iocg, target_iocgs, active_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1656) 		iocg_flush_stat_one(iocg, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1657) 		iocg_build_inner_walk(iocg, &inner_walk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1658) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1659) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1660) 	/* keep flushing upwards by walking the inner list backwards */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1661) 	list_for_each_entry_safe_reverse(iocg, tiocg, &inner_walk, walk_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1662) 		iocg_flush_stat_one(iocg, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1663) 		list_del_init(&iocg->walk_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1664) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1665) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1666) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1667) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1668)  * Determine what @iocg's hweight_inuse should be after donating unused
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1669)  * capacity. @hwm is the upper bound and used to signal no donation. This
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1670)  * function also throws away @iocg's excess budget.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1671)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1672) static u32 hweight_after_donation(struct ioc_gq *iocg, u32 old_hwi, u32 hwm,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1673) 				  u32 usage, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1674) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1675) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1676) 	u64 vtime = atomic64_read(&iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1677) 	s64 excess, delta, target, new_hwi;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1678) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1679) 	/* debt handling owns inuse for debtors */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1680) 	if (iocg->abs_vdebt)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1681) 		return 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1682) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1683) 	/* see whether minimum margin requirement is met */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1684) 	if (waitqueue_active(&iocg->waitq) ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1685) 	    time_after64(vtime, now->vnow - ioc->margins.min))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1686) 		return hwm;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1687) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1688) 	/* throw away excess above target */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1689) 	excess = now->vnow - vtime - ioc->margins.target;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1690) 	if (excess > 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1691) 		atomic64_add(excess, &iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1692) 		atomic64_add(excess, &iocg->done_vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1693) 		vtime += excess;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1694) 		ioc->vtime_err -= div64_u64(excess * old_hwi, WEIGHT_ONE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1695) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1696) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1697) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1698) 	 * Let's say the distance between iocg's and device's vtimes as a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1699) 	 * fraction of period duration is delta. Assuming that the iocg will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1700) 	 * consume the usage determined above, we want to determine new_hwi so
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1701) 	 * that delta equals MARGIN_TARGET at the end of the next period.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1702) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1703) 	 * We need to execute usage worth of IOs while spending the sum of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1704) 	 * new budget (1 - MARGIN_TARGET) and the leftover from the last period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1705) 	 * (delta):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1706) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1707) 	 *   usage = (1 - MARGIN_TARGET + delta) * new_hwi
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1708) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1709) 	 * Therefore, the new_hwi is:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1710) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1711) 	 *   new_hwi = usage / (1 - MARGIN_TARGET + delta)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1712) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1713) 	delta = div64_s64(WEIGHT_ONE * (now->vnow - vtime),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1714) 			  now->vnow - ioc->period_at_vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1715) 	target = WEIGHT_ONE * MARGIN_TARGET_PCT / 100;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1716) 	new_hwi = div64_s64(WEIGHT_ONE * usage, WEIGHT_ONE - target + delta);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1717) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1718) 	return clamp_t(s64, new_hwi, 1, hwm);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1719) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1720) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1721) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1722)  * For work-conservation, an iocg which isn't using all of its share should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1723)  * donate the leftover to other iocgs. There are two ways to achieve this - 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1724)  * bumping up vrate accordingly 2. lowering the donating iocg's inuse weight.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1725)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1726)  * #1 is mathematically simpler but has the drawback of requiring synchronous
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1727)  * global hweight_inuse updates when idle iocg's get activated or inuse weights
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1728)  * change due to donation snapbacks as it has the possibility of grossly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1729)  * overshooting what's allowed by the model and vrate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1730)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1731)  * #2 is inherently safe with local operations. The donating iocg can easily
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1732)  * snap back to higher weights when needed without worrying about impacts on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1733)  * other nodes as the impacts will be inherently correct. This also makes idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1734)  * iocg activations safe. The only effect activations have is decreasing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1735)  * hweight_inuse of others, the right solution to which is for those iocgs to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1736)  * snap back to higher weights.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1737)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1738)  * So, we go with #2. The challenge is calculating how each donating iocg's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1739)  * inuse should be adjusted to achieve the target donation amounts. This is done
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1740)  * using Andy's method described in the following pdf.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1741)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1742)  *   https://drive.google.com/file/d/1PsJwxPFtjUnwOY1QJ5AeICCcsL7BM3bo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1743)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1744)  * Given the weights and target after-donation hweight_inuse values, Andy's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1745)  * method determines how the proportional distribution should look like at each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1746)  * sibling level to maintain the relative relationship between all non-donating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1747)  * pairs. To roughly summarize, it divides the tree into donating and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1748)  * non-donating parts, calculates global donation rate which is used to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1749)  * determine the target hweight_inuse for each node, and then derives per-level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1750)  * proportions.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1751)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1752)  * The following pdf shows that global distribution calculated this way can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1753)  * achieved by scaling inuse weights of donating leaves and propagating the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1754)  * adjustments upwards proportionally.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1755)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1756)  *   https://drive.google.com/file/d/1vONz1-fzVO7oY5DXXsLjSxEtYYQbOvsE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1757)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1758)  * Combining the above two, we can determine how each leaf iocg's inuse should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1759)  * be adjusted to achieve the target donation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1760)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1761)  *   https://drive.google.com/file/d/1WcrltBOSPN0qXVdBgnKm4mdp9FhuEFQN
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1762)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1763)  * The inline comments use symbols from the last pdf.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1764)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1765)  *   b is the sum of the absolute budgets in the subtree. 1 for the root node.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1766)  *   f is the sum of the absolute budgets of non-donating nodes in the subtree.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1767)  *   t is the sum of the absolute budgets of donating nodes in the subtree.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1768)  *   w is the weight of the node. w = w_f + w_t
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1769)  *   w_f is the non-donating portion of w. w_f = w * f / b
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1770)  *   w_b is the donating portion of w. w_t = w * t / b
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1771)  *   s is the sum of all sibling weights. s = Sum(w) for siblings
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1772)  *   s_f and s_t are the non-donating and donating portions of s.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1773)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1774)  * Subscript p denotes the parent's counterpart and ' the adjusted value - e.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1775)  * w_pt is the donating portion of the parent's weight and w'_pt the same value
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1776)  * after adjustments. Subscript r denotes the root node's values.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1777)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1778) static void transfer_surpluses(struct list_head *surpluses, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1779) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1780) 	LIST_HEAD(over_hwa);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1781) 	LIST_HEAD(inner_walk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1782) 	struct ioc_gq *iocg, *tiocg, *root_iocg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1783) 	u32 after_sum, over_sum, over_target, gamma;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1784) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1785) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1786) 	 * It's pretty unlikely but possible for the total sum of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1787) 	 * hweight_after_donation's to be higher than WEIGHT_ONE, which will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1788) 	 * confuse the following calculations. If such condition is detected,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1789) 	 * scale down everyone over its full share equally to keep the sum below
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1790) 	 * WEIGHT_ONE.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1791) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1792) 	after_sum = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1793) 	over_sum = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1794) 	list_for_each_entry(iocg, surpluses, surplus_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1795) 		u32 hwa;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1796) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1797) 		current_hweight(iocg, &hwa, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1798) 		after_sum += iocg->hweight_after_donation;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1799) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1800) 		if (iocg->hweight_after_donation > hwa) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1801) 			over_sum += iocg->hweight_after_donation;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1802) 			list_add(&iocg->walk_list, &over_hwa);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1803) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1804) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1805) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1806) 	if (after_sum >= WEIGHT_ONE) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1807) 		/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1808) 		 * The delta should be deducted from the over_sum, calculate
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1809) 		 * target over_sum value.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1810) 		 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1811) 		u32 over_delta = after_sum - (WEIGHT_ONE - 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1812) 		WARN_ON_ONCE(over_sum <= over_delta);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1813) 		over_target = over_sum - over_delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1814) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1815) 		over_target = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1816) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1817) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1818) 	list_for_each_entry_safe(iocg, tiocg, &over_hwa, walk_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1819) 		if (over_target)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1820) 			iocg->hweight_after_donation =
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1821) 				div_u64((u64)iocg->hweight_after_donation *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1822) 					over_target, over_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1823) 		list_del_init(&iocg->walk_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1824) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1825) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1826) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1827) 	 * Build pre-order inner node walk list and prepare for donation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1828) 	 * adjustment calculations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1829) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1830) 	list_for_each_entry(iocg, surpluses, surplus_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1831) 		iocg_build_inner_walk(iocg, &inner_walk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1832) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1833) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1834) 	root_iocg = list_first_entry(&inner_walk, struct ioc_gq, walk_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1835) 	WARN_ON_ONCE(root_iocg->level > 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1836) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1837) 	list_for_each_entry(iocg, &inner_walk, walk_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1838) 		iocg->child_adjusted_sum = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1839) 		iocg->hweight_donating = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1840) 		iocg->hweight_after_donation = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1841) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1842) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1843) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1844) 	 * Propagate the donating budget (b_t) and after donation budget (b'_t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1845) 	 * up the hierarchy.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1846) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1847) 	list_for_each_entry(iocg, surpluses, surplus_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1848) 		struct ioc_gq *parent = iocg->ancestors[iocg->level - 1];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1849) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1850) 		parent->hweight_donating += iocg->hweight_donating;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1851) 		parent->hweight_after_donation += iocg->hweight_after_donation;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1852) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1853) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1854) 	list_for_each_entry_reverse(iocg, &inner_walk, walk_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1855) 		if (iocg->level > 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1856) 			struct ioc_gq *parent = iocg->ancestors[iocg->level - 1];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1857) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1858) 			parent->hweight_donating += iocg->hweight_donating;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1859) 			parent->hweight_after_donation += iocg->hweight_after_donation;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1860) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1861) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1862) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1863) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1864) 	 * Calculate inner hwa's (b) and make sure the donation values are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1865) 	 * within the accepted ranges as we're doing low res calculations with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1866) 	 * roundups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1867) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1868) 	list_for_each_entry(iocg, &inner_walk, walk_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1869) 		if (iocg->level) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1870) 			struct ioc_gq *parent = iocg->ancestors[iocg->level - 1];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1871) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1872) 			iocg->hweight_active = DIV64_U64_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1873) 				(u64)parent->hweight_active * iocg->active,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1874) 				parent->child_active_sum);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1875) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1876) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1877) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1878) 		iocg->hweight_donating = min(iocg->hweight_donating,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1879) 					     iocg->hweight_active);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1880) 		iocg->hweight_after_donation = min(iocg->hweight_after_donation,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1881) 						   iocg->hweight_donating - 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1882) 		if (WARN_ON_ONCE(iocg->hweight_active <= 1 ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1883) 				 iocg->hweight_donating <= 1 ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1884) 				 iocg->hweight_after_donation == 0)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1885) 			pr_warn("iocg: invalid donation weights in ");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1886) 			pr_cont_cgroup_path(iocg_to_blkg(iocg)->blkcg->css.cgroup);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1887) 			pr_cont(": active=%u donating=%u after=%u\n",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1888) 				iocg->hweight_active, iocg->hweight_donating,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1889) 				iocg->hweight_after_donation);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1890) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1891) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1892) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1893) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1894) 	 * Calculate the global donation rate (gamma) - the rate to adjust
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1895) 	 * non-donating budgets by.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1896) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1897) 	 * No need to use 64bit multiplication here as the first operand is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1898) 	 * guaranteed to be smaller than WEIGHT_ONE (1<<16).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1899) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1900) 	 * We know that there are beneficiary nodes and the sum of the donating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1901) 	 * hweights can't be whole; however, due to the round-ups during hweight
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1902) 	 * calculations, root_iocg->hweight_donating might still end up equal to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1903) 	 * or greater than whole. Limit the range when calculating the divider.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1904) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1905) 	 * gamma = (1 - t_r') / (1 - t_r)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1906) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1907) 	gamma = DIV_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1908) 		(WEIGHT_ONE - root_iocg->hweight_after_donation) * WEIGHT_ONE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1909) 		WEIGHT_ONE - min_t(u32, root_iocg->hweight_donating, WEIGHT_ONE - 1));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1910) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1911) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1912) 	 * Calculate adjusted hwi, child_adjusted_sum and inuse for the inner
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1913) 	 * nodes.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1914) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1915) 	list_for_each_entry(iocg, &inner_walk, walk_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1916) 		struct ioc_gq *parent;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1917) 		u32 inuse, wpt, wptp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1918) 		u64 st, sf;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1919) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1920) 		if (iocg->level == 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1921) 			/* adjusted weight sum for 1st level: s' = s * b_pf / b'_pf */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1922) 			iocg->child_adjusted_sum = DIV64_U64_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1923) 				iocg->child_active_sum * (WEIGHT_ONE - iocg->hweight_donating),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1924) 				WEIGHT_ONE - iocg->hweight_after_donation);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1925) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1926) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1927) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1928) 		parent = iocg->ancestors[iocg->level - 1];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1929) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1930) 		/* b' = gamma * b_f + b_t' */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1931) 		iocg->hweight_inuse = DIV64_U64_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1932) 			(u64)gamma * (iocg->hweight_active - iocg->hweight_donating),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1933) 			WEIGHT_ONE) + iocg->hweight_after_donation;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1934) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1935) 		/* w' = s' * b' / b'_p */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1936) 		inuse = DIV64_U64_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1937) 			(u64)parent->child_adjusted_sum * iocg->hweight_inuse,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1938) 			parent->hweight_inuse);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1939) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1940) 		/* adjusted weight sum for children: s' = s_f + s_t * w'_pt / w_pt */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1941) 		st = DIV64_U64_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1942) 			iocg->child_active_sum * iocg->hweight_donating,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1943) 			iocg->hweight_active);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1944) 		sf = iocg->child_active_sum - st;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1945) 		wpt = DIV64_U64_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1946) 			(u64)iocg->active * iocg->hweight_donating,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1947) 			iocg->hweight_active);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1948) 		wptp = DIV64_U64_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1949) 			(u64)inuse * iocg->hweight_after_donation,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1950) 			iocg->hweight_inuse);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1951) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1952) 		iocg->child_adjusted_sum = sf + DIV64_U64_ROUND_UP(st * wptp, wpt);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1953) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1954) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1955) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1956) 	 * All inner nodes now have ->hweight_inuse and ->child_adjusted_sum and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1957) 	 * we can finally determine leaf adjustments.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1958) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1959) 	list_for_each_entry(iocg, surpluses, surplus_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1960) 		struct ioc_gq *parent = iocg->ancestors[iocg->level - 1];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1961) 		u32 inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1962) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1963) 		/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1964) 		 * In-debt iocgs participated in the donation calculation with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1965) 		 * the minimum target hweight_inuse. Configuring inuse
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1966) 		 * accordingly would work fine but debt handling expects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1967) 		 * @iocg->inuse stay at the minimum and we don't wanna
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1968) 		 * interfere.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1969) 		 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1970) 		if (iocg->abs_vdebt) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1971) 			WARN_ON_ONCE(iocg->inuse > 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1972) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1973) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1974) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1975) 		/* w' = s' * b' / b'_p, note that b' == b'_t for donating leaves */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1976) 		inuse = DIV64_U64_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1977) 			parent->child_adjusted_sum * iocg->hweight_after_donation,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1978) 			parent->hweight_inuse);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1979) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1980) 		TRACE_IOCG_PATH(inuse_transfer, iocg, now,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1981) 				iocg->inuse, inuse,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1982) 				iocg->hweight_inuse,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1983) 				iocg->hweight_after_donation);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1984) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1985) 		__propagate_weights(iocg, iocg->active, inuse, true, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1986) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1987) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1988) 	/* walk list should be dissolved after use */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1989) 	list_for_each_entry_safe(iocg, tiocg, &inner_walk, walk_list)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1990) 		list_del_init(&iocg->walk_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1991) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1992) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1993) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1994)  * A low weight iocg can amass a large amount of debt, for example, when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1995)  * anonymous memory gets reclaimed aggressively. If the system has a lot of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1996)  * memory paired with a slow IO device, the debt can span multiple seconds or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1997)  * more. If there are no other subsequent IO issuers, the in-debt iocg may end
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1998)  * up blocked paying its debt while the IO device is idle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1999)  *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2000)  * The following protects against such cases. If the device has been
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2001)  * sufficiently idle for a while, the debts are halved and delays are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2002)  * recalculated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2003)  */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2004) static void ioc_forgive_debts(struct ioc *ioc, u64 usage_us_sum, int nr_debtors,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2005) 			      struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2006) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2007) 	struct ioc_gq *iocg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2008) 	u64 dur, usage_pct, nr_cycles;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2009) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2010) 	/* if no debtor, reset the cycle */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2011) 	if (!nr_debtors) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2012) 		ioc->dfgv_period_at = now->now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2013) 		ioc->dfgv_period_rem = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2014) 		ioc->dfgv_usage_us_sum = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2015) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2016) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2017) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2018) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2019) 	 * Debtors can pass through a lot of writes choking the device and we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2020) 	 * don't want to be forgiving debts while the device is struggling from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2021) 	 * write bursts. If we're missing latency targets, consider the device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2022) 	 * fully utilized.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2023) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2024) 	if (ioc->busy_level > 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2025) 		usage_us_sum = max_t(u64, usage_us_sum, ioc->period_us);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2026) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2027) 	ioc->dfgv_usage_us_sum += usage_us_sum;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2028) 	if (time_before64(now->now, ioc->dfgv_period_at + DFGV_PERIOD))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2029) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2030) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2031) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2032) 	 * At least DFGV_PERIOD has passed since the last period. Calculate the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2033) 	 * average usage and reset the period counters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2034) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2035) 	dur = now->now - ioc->dfgv_period_at;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2036) 	usage_pct = div64_u64(100 * ioc->dfgv_usage_us_sum, dur);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2037) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2038) 	ioc->dfgv_period_at = now->now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2039) 	ioc->dfgv_usage_us_sum = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2040) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2041) 	/* if was too busy, reset everything */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2042) 	if (usage_pct > DFGV_USAGE_PCT) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2043) 		ioc->dfgv_period_rem = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2044) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2045) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2046) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2047) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2048) 	 * Usage is lower than threshold. Let's forgive some debts. Debt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2049) 	 * forgiveness runs off of the usual ioc timer but its period usually
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2050) 	 * doesn't match ioc's. Compensate the difference by performing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2051) 	 * reduction as many times as would fit in the duration since the last
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2052) 	 * run and carrying over the left-over duration in @ioc->dfgv_period_rem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2053) 	 * - if ioc period is 75% of DFGV_PERIOD, one out of three consecutive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2054) 	 * reductions is doubled.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2055) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2056) 	nr_cycles = dur + ioc->dfgv_period_rem;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2057) 	ioc->dfgv_period_rem = do_div(nr_cycles, DFGV_PERIOD);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2058) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2059) 	list_for_each_entry(iocg, &ioc->active_iocgs, active_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2060) 		u64 __maybe_unused old_debt, __maybe_unused old_delay;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2061) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2062) 		if (!iocg->abs_vdebt && !iocg->delay)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2063) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2064) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2065) 		spin_lock(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2066) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2067) 		old_debt = iocg->abs_vdebt;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2068) 		old_delay = iocg->delay;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2069) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2070) 		if (iocg->abs_vdebt)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2071) 			iocg->abs_vdebt = iocg->abs_vdebt >> nr_cycles ?: 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2072) 		if (iocg->delay)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2073) 			iocg->delay = iocg->delay >> nr_cycles ?: 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2074) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2075) 		iocg_kick_waitq(iocg, true, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2076) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2077) 		TRACE_IOCG_PATH(iocg_forgive_debt, iocg, now, usage_pct,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2078) 				old_debt, iocg->abs_vdebt,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2079) 				old_delay, iocg->delay);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2080) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2081) 		spin_unlock(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2082) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2083) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2084) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2085) static void ioc_timer_fn(struct timer_list *timer)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2086) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2087) 	struct ioc *ioc = container_of(timer, struct ioc, timer);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2088) 	struct ioc_gq *iocg, *tiocg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2089) 	struct ioc_now now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2090) 	LIST_HEAD(surpluses);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2091) 	int nr_debtors = 0, nr_shortages = 0, nr_lagging = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2092) 	u64 usage_us_sum = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2093) 	u32 ppm_rthr = MILLION - ioc->params.qos[QOS_RPPM];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2094) 	u32 ppm_wthr = MILLION - ioc->params.qos[QOS_WPPM];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2095) 	u32 missed_ppm[2], rq_wait_pct;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2096) 	u64 period_vtime;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2097) 	int prev_busy_level;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2098) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2099) 	/* how were the latencies during the period? */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2100) 	ioc_lat_stat(ioc, missed_ppm, &rq_wait_pct);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2101) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2102) 	/* take care of active iocgs */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2103) 	spin_lock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2104) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2105) 	ioc_now(ioc, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2106) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2107) 	period_vtime = now.vnow - ioc->period_at_vtime;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2108) 	if (WARN_ON_ONCE(!period_vtime)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2109) 		spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2110) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2111) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2113) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2114) 	 * Waiters determine the sleep durations based on the vrate they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2115) 	 * saw at the time of sleep.  If vrate has increased, some waiters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2116) 	 * could be sleeping for too long.  Wake up tardy waiters which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2117) 	 * should have woken up in the last period and expire idle iocgs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2118) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2119) 	list_for_each_entry_safe(iocg, tiocg, &ioc->active_iocgs, active_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2120) 		if (!waitqueue_active(&iocg->waitq) && !iocg->abs_vdebt &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2121) 		    !iocg->delay && !iocg_is_idle(iocg))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2122) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2124) 		spin_lock(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2125) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2126) 		/* flush wait and indebt stat deltas */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2127) 		if (iocg->wait_since) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2128) 			iocg->local_stat.wait_us += now.now - iocg->wait_since;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2129) 			iocg->wait_since = now.now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2130) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2131) 		if (iocg->indebt_since) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2132) 			iocg->local_stat.indebt_us +=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2133) 				now.now - iocg->indebt_since;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2134) 			iocg->indebt_since = now.now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2135) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2136) 		if (iocg->indelay_since) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2137) 			iocg->local_stat.indelay_us +=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2138) 				now.now - iocg->indelay_since;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2139) 			iocg->indelay_since = now.now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2140) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2141) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2142) 		if (waitqueue_active(&iocg->waitq) || iocg->abs_vdebt ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2143) 		    iocg->delay) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2144) 			/* might be oversleeping vtime / hweight changes, kick */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2145) 			iocg_kick_waitq(iocg, true, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2146) 			if (iocg->abs_vdebt || iocg->delay)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2147) 				nr_debtors++;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2148) 		} else if (iocg_is_idle(iocg)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2149) 			/* no waiter and idle, deactivate */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2150) 			u64 vtime = atomic64_read(&iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2151) 			s64 excess;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2153) 			/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2154) 			 * @iocg has been inactive for a full duration and will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2155) 			 * have a high budget. Account anything above target as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2156) 			 * error and throw away. On reactivation, it'll start
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2157) 			 * with the target budget.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2158) 			 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2159) 			excess = now.vnow - vtime - ioc->margins.target;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2160) 			if (excess > 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2161) 				u32 old_hwi;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2162) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2163) 				current_hweight(iocg, NULL, &old_hwi);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2164) 				ioc->vtime_err -= div64_u64(excess * old_hwi,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2165) 							    WEIGHT_ONE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2166) 			}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2167) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2168) 			__propagate_weights(iocg, 0, 0, false, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2169) 			list_del_init(&iocg->active_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2170) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2172) 		spin_unlock(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2173) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2174) 	commit_weights(ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2175) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2176) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2177) 	 * Wait and indebt stat are flushed above and the donation calculation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2178) 	 * below needs updated usage stat. Let's bring stat up-to-date.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2179) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2180) 	iocg_flush_stat(&ioc->active_iocgs, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2182) 	/* calc usage and see whether some weights need to be moved around */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2183) 	list_for_each_entry(iocg, &ioc->active_iocgs, active_list) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2184) 		u64 vdone, vtime, usage_us, usage_dur;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2185) 		u32 usage, hw_active, hw_inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2186) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2187) 		/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2188) 		 * Collect unused and wind vtime closer to vnow to prevent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2189) 		 * iocgs from accumulating a large amount of budget.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2190) 		 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2191) 		vdone = atomic64_read(&iocg->done_vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2192) 		vtime = atomic64_read(&iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2193) 		current_hweight(iocg, &hw_active, &hw_inuse);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2195) 		/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2196) 		 * Latency QoS detection doesn't account for IOs which are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2197) 		 * in-flight for longer than a period.  Detect them by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2198) 		 * comparing vdone against period start.  If lagging behind
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2199) 		 * IOs from past periods, don't increase vrate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2200) 		 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2201) 		if ((ppm_rthr != MILLION || ppm_wthr != MILLION) &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2202) 		    !atomic_read(&iocg_to_blkg(iocg)->use_delay) &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2203) 		    time_after64(vtime, vdone) &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2204) 		    time_after64(vtime, now.vnow -
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2205) 				 MAX_LAGGING_PERIODS * period_vtime) &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2206) 		    time_before64(vdone, now.vnow - period_vtime))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2207) 			nr_lagging++;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2208) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2209) 		/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2210) 		 * Determine absolute usage factoring in in-flight IOs to avoid
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2211) 		 * high-latency completions appearing as idle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2212) 		 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2213) 		usage_us = iocg->usage_delta_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2214) 		usage_us_sum += usage_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2215) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2216) 		if (vdone != vtime) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2217) 			u64 inflight_us = DIV64_U64_ROUND_UP(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2218) 				cost_to_abs_cost(vtime - vdone, hw_inuse),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2219) 				ioc->vtime_base_rate);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2220) 			usage_us = max(usage_us, inflight_us);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2221) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2222) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2223) 		/* convert to hweight based usage ratio */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2224) 		if (time_after64(iocg->activated_at, ioc->period_at))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2225) 			usage_dur = max_t(u64, now.now - iocg->activated_at, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2226) 		else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2227) 			usage_dur = max_t(u64, now.now - ioc->period_at, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2229) 		usage = clamp_t(u32,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2230) 				DIV64_U64_ROUND_UP(usage_us * WEIGHT_ONE,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2231) 						   usage_dur),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2232) 				1, WEIGHT_ONE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2234) 		/* see whether there's surplus vtime */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2235) 		WARN_ON_ONCE(!list_empty(&iocg->surplus_list));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2236) 		if (hw_inuse < hw_active ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2237) 		    (!waitqueue_active(&iocg->waitq) &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2238) 		     time_before64(vtime, now.vnow - ioc->margins.low))) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2239) 			u32 hwa, old_hwi, hwm, new_hwi;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2240) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2241) 			/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2242) 			 * Already donating or accumulated enough to start.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2243) 			 * Determine the donation amount.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2244) 			 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2245) 			current_hweight(iocg, &hwa, &old_hwi);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2246) 			hwm = current_hweight_max(iocg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2247) 			new_hwi = hweight_after_donation(iocg, old_hwi, hwm,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2248) 							 usage, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2249) 			/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2250) 			 * Donation calculation assumes hweight_after_donation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2251) 			 * to be positive, a condition that a donor w/ hwa < 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2252) 			 * can't meet. Don't bother with donation if hwa is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2253) 			 * below 2. It's not gonna make a meaningful difference
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2254) 			 * anyway.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2255) 			 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2256) 			if (new_hwi < hwm && hwa >= 2) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2257) 				iocg->hweight_donating = hwa;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2258) 				iocg->hweight_after_donation = new_hwi;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2259) 				list_add(&iocg->surplus_list, &surpluses);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2260) 			} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2261) 				TRACE_IOCG_PATH(inuse_shortage, iocg, &now,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2262) 						iocg->inuse, iocg->active,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2263) 						iocg->hweight_inuse, new_hwi);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2264) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2265) 				__propagate_weights(iocg, iocg->active,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2266) 						    iocg->active, true, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2267) 				nr_shortages++;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2268) 			}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2269) 		} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2270) 			/* genuinely short on vtime */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2271) 			nr_shortages++;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2272) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2273) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2274) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2275) 	if (!list_empty(&surpluses) && nr_shortages)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2276) 		transfer_surpluses(&surpluses, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2277) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2278) 	commit_weights(ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2280) 	/* surplus list should be dissolved after use */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2281) 	list_for_each_entry_safe(iocg, tiocg, &surpluses, surplus_list)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2282) 		list_del_init(&iocg->surplus_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2283) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2284) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2285) 	 * If q is getting clogged or we're missing too much, we're issuing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2286) 	 * too much IO and should lower vtime rate.  If we're not missing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2287) 	 * and experiencing shortages but not surpluses, we're too stingy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2288) 	 * and should increase vtime rate.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2289) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2290) 	prev_busy_level = ioc->busy_level;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2291) 	if (rq_wait_pct > RQ_WAIT_BUSY_PCT ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2292) 	    missed_ppm[READ] > ppm_rthr ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2293) 	    missed_ppm[WRITE] > ppm_wthr) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2294) 		/* clearly missing QoS targets, slow down vrate */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2295) 		ioc->busy_level = max(ioc->busy_level, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2296) 		ioc->busy_level++;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2297) 	} else if (rq_wait_pct <= RQ_WAIT_BUSY_PCT * UNBUSY_THR_PCT / 100 &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2298) 		   missed_ppm[READ] <= ppm_rthr * UNBUSY_THR_PCT / 100 &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2299) 		   missed_ppm[WRITE] <= ppm_wthr * UNBUSY_THR_PCT / 100) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2300) 		/* QoS targets are being met with >25% margin */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2301) 		if (nr_shortages) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2302) 			/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2303) 			 * We're throttling while the device has spare
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2304) 			 * capacity.  If vrate was being slowed down, stop.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2305) 			 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2306) 			ioc->busy_level = min(ioc->busy_level, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2307) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2308) 			/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2309) 			 * If there are IOs spanning multiple periods, wait
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2310) 			 * them out before pushing the device harder.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2311) 			 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2312) 			if (!nr_lagging)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2313) 				ioc->busy_level--;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2314) 		} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2315) 			/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2316) 			 * Nobody is being throttled and the users aren't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2317) 			 * issuing enough IOs to saturate the device.  We
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2318) 			 * simply don't know how close the device is to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2319) 			 * saturation.  Coast.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2320) 			 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2321) 			ioc->busy_level = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2322) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2323) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2324) 		/* inside the hysterisis margin, we're good */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2325) 		ioc->busy_level = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2326) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2327) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2328) 	ioc->busy_level = clamp(ioc->busy_level, -1000, 1000);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2329) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2330) 	if (ioc->busy_level > 0 || (ioc->busy_level < 0 && !nr_lagging)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2331) 		u64 vrate = ioc->vtime_base_rate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2332) 		u64 vrate_min = ioc->vrate_min, vrate_max = ioc->vrate_max;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2333) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2334) 		/* rq_wait signal is always reliable, ignore user vrate_min */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2335) 		if (rq_wait_pct > RQ_WAIT_BUSY_PCT)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2336) 			vrate_min = VRATE_MIN;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2337) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2338) 		/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2339) 		 * If vrate is out of bounds, apply clamp gradually as the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2340) 		 * bounds can change abruptly.  Otherwise, apply busy_level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2341) 		 * based adjustment.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2342) 		 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2343) 		if (vrate < vrate_min) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2344) 			vrate = div64_u64(vrate * (100 + VRATE_CLAMP_ADJ_PCT),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2345) 					  100);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2346) 			vrate = min(vrate, vrate_min);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2347) 		} else if (vrate > vrate_max) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2348) 			vrate = div64_u64(vrate * (100 - VRATE_CLAMP_ADJ_PCT),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2349) 					  100);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2350) 			vrate = max(vrate, vrate_max);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2351) 		} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2352) 			int idx = min_t(int, abs(ioc->busy_level),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2353) 					ARRAY_SIZE(vrate_adj_pct) - 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2354) 			u32 adj_pct = vrate_adj_pct[idx];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2355) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2356) 			if (ioc->busy_level > 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2357) 				adj_pct = 100 - adj_pct;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2358) 			else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2359) 				adj_pct = 100 + adj_pct;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2360) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2361) 			vrate = clamp(DIV64_U64_ROUND_UP(vrate * adj_pct, 100),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2362) 				      vrate_min, vrate_max);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2363) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2364) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2365) 		trace_iocost_ioc_vrate_adj(ioc, vrate, missed_ppm, rq_wait_pct,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2366) 					   nr_lagging, nr_shortages);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2367) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2368) 		ioc->vtime_base_rate = vrate;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2369) 		ioc_refresh_margins(ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2370) 	} else if (ioc->busy_level != prev_busy_level || nr_lagging) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2371) 		trace_iocost_ioc_vrate_adj(ioc, atomic64_read(&ioc->vtime_rate),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2372) 					   missed_ppm, rq_wait_pct, nr_lagging,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2373) 					   nr_shortages);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2374) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2375) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2376) 	ioc_refresh_params(ioc, false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2377) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2378) 	ioc_forgive_debts(ioc, usage_us_sum, nr_debtors, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2379) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2380) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2381) 	 * This period is done.  Move onto the next one.  If nothing's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2382) 	 * going on with the device, stop the timer.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2383) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2384) 	atomic64_inc(&ioc->cur_period);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2385) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2386) 	if (ioc->running != IOC_STOP) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2387) 		if (!list_empty(&ioc->active_iocgs)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2388) 			ioc_start_period(ioc, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2389) 		} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2390) 			ioc->busy_level = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2391) 			ioc->vtime_err = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2392) 			ioc->running = IOC_IDLE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2393) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2394) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2395) 		ioc_refresh_vrate(ioc, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2396) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2397) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2398) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2399) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2400) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2401) static u64 adjust_inuse_and_calc_cost(struct ioc_gq *iocg, u64 vtime,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2402) 				      u64 abs_cost, struct ioc_now *now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2403) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2404) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2405) 	struct ioc_margins *margins = &ioc->margins;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2406) 	u32 __maybe_unused old_inuse = iocg->inuse, __maybe_unused old_hwi;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2407) 	u32 hwi, adj_step;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2408) 	s64 margin;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2409) 	u64 cost, new_inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2410) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2411) 	current_hweight(iocg, NULL, &hwi);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2412) 	old_hwi = hwi;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2413) 	cost = abs_cost_to_cost(abs_cost, hwi);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2414) 	margin = now->vnow - vtime - cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2415) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2416) 	/* debt handling owns inuse for debtors */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2417) 	if (iocg->abs_vdebt)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2418) 		return cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2419) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2420) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2421) 	 * We only increase inuse during period and do so iff the margin has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2422) 	 * deteriorated since the previous adjustment.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2423) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2424) 	if (margin >= iocg->saved_margin || margin >= margins->low ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2425) 	    iocg->inuse == iocg->active)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2426) 		return cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2427) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2428) 	spin_lock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2430) 	/* we own inuse only when @iocg is in the normal active state */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2431) 	if (iocg->abs_vdebt || list_empty(&iocg->active_list)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2432) 		spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2433) 		return cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2434) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2435) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2436) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2437) 	 * Bump up inuse till @abs_cost fits in the existing budget.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2438) 	 * adj_step must be determined after acquiring ioc->lock - we might
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2439) 	 * have raced and lost to another thread for activation and could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2440) 	 * be reading 0 iocg->active before ioc->lock which will lead to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2441) 	 * infinite loop.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2442) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2443) 	new_inuse = iocg->inuse;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2444) 	adj_step = DIV_ROUND_UP(iocg->active * INUSE_ADJ_STEP_PCT, 100);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2445) 	do {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2446) 		new_inuse = new_inuse + adj_step;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2447) 		propagate_weights(iocg, iocg->active, new_inuse, true, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2448) 		current_hweight(iocg, NULL, &hwi);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2449) 		cost = abs_cost_to_cost(abs_cost, hwi);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2450) 	} while (time_after64(vtime + cost, now->vnow) &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2451) 		 iocg->inuse != iocg->active);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2452) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2453) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2454) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2455) 	TRACE_IOCG_PATH(inuse_adjust, iocg, now,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2456) 			old_inuse, iocg->inuse, old_hwi, hwi);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2457) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2458) 	return cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2459) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2460) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2461) static void calc_vtime_cost_builtin(struct bio *bio, struct ioc_gq *iocg,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2462) 				    bool is_merge, u64 *costp)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2463) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2464) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2465) 	u64 coef_seqio, coef_randio, coef_page;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2466) 	u64 pages = max_t(u64, bio_sectors(bio) >> IOC_SECT_TO_PAGE_SHIFT, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2467) 	u64 seek_pages = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2468) 	u64 cost = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2469) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2470) 	switch (bio_op(bio)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2471) 	case REQ_OP_READ:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2472) 		coef_seqio	= ioc->params.lcoefs[LCOEF_RSEQIO];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2473) 		coef_randio	= ioc->params.lcoefs[LCOEF_RRANDIO];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2474) 		coef_page	= ioc->params.lcoefs[LCOEF_RPAGE];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2475) 		break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2476) 	case REQ_OP_WRITE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2477) 		coef_seqio	= ioc->params.lcoefs[LCOEF_WSEQIO];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2478) 		coef_randio	= ioc->params.lcoefs[LCOEF_WRANDIO];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2479) 		coef_page	= ioc->params.lcoefs[LCOEF_WPAGE];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2480) 		break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2481) 	default:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2482) 		goto out;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2483) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2484) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2485) 	if (iocg->cursor) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2486) 		seek_pages = abs(bio->bi_iter.bi_sector - iocg->cursor);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2487) 		seek_pages >>= IOC_SECT_TO_PAGE_SHIFT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2488) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2489) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2490) 	if (!is_merge) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2491) 		if (seek_pages > LCOEF_RANDIO_PAGES) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2492) 			cost += coef_randio;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2493) 		} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2494) 			cost += coef_seqio;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2495) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2496) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2497) 	cost += pages * coef_page;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2498) out:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2499) 	*costp = cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2500) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2501) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2502) static u64 calc_vtime_cost(struct bio *bio, struct ioc_gq *iocg, bool is_merge)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2503) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2504) 	u64 cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2505) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2506) 	calc_vtime_cost_builtin(bio, iocg, is_merge, &cost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2507) 	return cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2508) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2509) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2510) static void calc_size_vtime_cost_builtin(struct request *rq, struct ioc *ioc,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2511) 					 u64 *costp)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2512) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2513) 	unsigned int pages = blk_rq_stats_sectors(rq) >> IOC_SECT_TO_PAGE_SHIFT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2514) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2515) 	switch (req_op(rq)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2516) 	case REQ_OP_READ:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2517) 		*costp = pages * ioc->params.lcoefs[LCOEF_RPAGE];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2518) 		break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2519) 	case REQ_OP_WRITE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2520) 		*costp = pages * ioc->params.lcoefs[LCOEF_WPAGE];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2521) 		break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2522) 	default:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2523) 		*costp = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2524) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2525) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2526) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2527) static u64 calc_size_vtime_cost(struct request *rq, struct ioc *ioc)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2528) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2529) 	u64 cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2530) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2531) 	calc_size_vtime_cost_builtin(rq, ioc, &cost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2532) 	return cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2533) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2534) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2535) static void ioc_rqos_throttle(struct rq_qos *rqos, struct bio *bio)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2536) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2537) 	struct blkcg_gq *blkg = bio->bi_blkg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2538) 	struct ioc *ioc = rqos_to_ioc(rqos);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2539) 	struct ioc_gq *iocg = blkg_to_iocg(blkg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2540) 	struct ioc_now now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2541) 	struct iocg_wait wait;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2542) 	u64 abs_cost, cost, vtime;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2543) 	bool use_debt, ioc_locked;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2544) 	unsigned long flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2545) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2546) 	/* bypass IOs if disabled, still initializing, or for root cgroup */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2547) 	if (!ioc->enabled || !iocg || !iocg->level)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2548) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2549) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2550) 	/* calculate the absolute vtime cost */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2551) 	abs_cost = calc_vtime_cost(bio, iocg, false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2552) 	if (!abs_cost)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2553) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2554) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2555) 	if (!iocg_activate(iocg, &now))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2556) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2557) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2558) 	iocg->cursor = bio_end_sector(bio);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2559) 	vtime = atomic64_read(&iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2560) 	cost = adjust_inuse_and_calc_cost(iocg, vtime, abs_cost, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2561) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2562) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2563) 	 * If no one's waiting and within budget, issue right away.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2564) 	 * tests are racy but the races aren't systemic - we only miss once
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2565) 	 * in a while which is fine.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2566) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2567) 	if (!waitqueue_active(&iocg->waitq) && !iocg->abs_vdebt &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2568) 	    time_before_eq64(vtime + cost, now.vnow)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2569) 		iocg_commit_bio(iocg, bio, abs_cost, cost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2570) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2571) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2572) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2573) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2574) 	 * We're over budget. This can be handled in two ways. IOs which may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2575) 	 * cause priority inversions are punted to @ioc->aux_iocg and charged as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2576) 	 * debt. Otherwise, the issuer is blocked on @iocg->waitq. Debt handling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2577) 	 * requires @ioc->lock, waitq handling @iocg->waitq.lock. Determine
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2578) 	 * whether debt handling is needed and acquire locks accordingly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2579) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2580) 	use_debt = bio_issue_as_root_blkg(bio) || fatal_signal_pending(current);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2581) 	ioc_locked = use_debt || READ_ONCE(iocg->abs_vdebt);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2582) retry_lock:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2583) 	iocg_lock(iocg, ioc_locked, &flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2584) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2585) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2586) 	 * @iocg must stay activated for debt and waitq handling. Deactivation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2587) 	 * is synchronized against both ioc->lock and waitq.lock and we won't
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2588) 	 * get deactivated as long as we're waiting or has debt, so we're good
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2589) 	 * if we're activated here. In the unlikely cases that we aren't, just
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2590) 	 * issue the IO.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2591) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2592) 	if (unlikely(list_empty(&iocg->active_list))) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2593) 		iocg_unlock(iocg, ioc_locked, &flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2594) 		iocg_commit_bio(iocg, bio, abs_cost, cost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2595) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2596) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2597) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2598) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2599) 	 * We're over budget. If @bio has to be issued regardless, remember
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2600) 	 * the abs_cost instead of advancing vtime. iocg_kick_waitq() will pay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2601) 	 * off the debt before waking more IOs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2602) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2603) 	 * This way, the debt is continuously paid off each period with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2604) 	 * actual budget available to the cgroup. If we just wound vtime, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2605) 	 * would incorrectly use the current hw_inuse for the entire amount
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2606) 	 * which, for example, can lead to the cgroup staying blocked for a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2607) 	 * long time even with substantially raised hw_inuse.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2608) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2609) 	 * An iocg with vdebt should stay online so that the timer can keep
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2610) 	 * deducting its vdebt and [de]activate use_delay mechanism
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2611) 	 * accordingly. We don't want to race against the timer trying to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2612) 	 * clear them and leave @iocg inactive w/ dangling use_delay heavily
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2613) 	 * penalizing the cgroup and its descendants.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2614) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2615) 	if (use_debt) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2616) 		iocg_incur_debt(iocg, abs_cost, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2617) 		if (iocg_kick_delay(iocg, &now))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2618) 			blkcg_schedule_throttle(rqos->q,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2619) 					(bio->bi_opf & REQ_SWAP) == REQ_SWAP);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2620) 		iocg_unlock(iocg, ioc_locked, &flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2621) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2622) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2623) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2624) 	/* guarantee that iocgs w/ waiters have maximum inuse */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2625) 	if (!iocg->abs_vdebt && iocg->inuse != iocg->active) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2626) 		if (!ioc_locked) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2627) 			iocg_unlock(iocg, false, &flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2628) 			ioc_locked = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2629) 			goto retry_lock;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2630) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2631) 		propagate_weights(iocg, iocg->active, iocg->active, true,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2632) 				  &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2633) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2634) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2635) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2636) 	 * Append self to the waitq and schedule the wakeup timer if we're
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2637) 	 * the first waiter.  The timer duration is calculated based on the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2638) 	 * current vrate.  vtime and hweight changes can make it too short
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2639) 	 * or too long.  Each wait entry records the absolute cost it's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2640) 	 * waiting for to allow re-evaluation using a custom wait entry.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2641) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2642) 	 * If too short, the timer simply reschedules itself.  If too long,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2643) 	 * the period timer will notice and trigger wakeups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2644) 	 *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2645) 	 * All waiters are on iocg->waitq and the wait states are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2646) 	 * synchronized using waitq.lock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2647) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2648) 	init_waitqueue_func_entry(&wait.wait, iocg_wake_fn);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2649) 	wait.wait.private = current;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2650) 	wait.bio = bio;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2651) 	wait.abs_cost = abs_cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2652) 	wait.committed = false;	/* will be set true by waker */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2653) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2654) 	__add_wait_queue_entry_tail(&iocg->waitq, &wait.wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2655) 	iocg_kick_waitq(iocg, ioc_locked, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2656) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2657) 	iocg_unlock(iocg, ioc_locked, &flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2658) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2659) 	while (true) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2660) 		set_current_state(TASK_UNINTERRUPTIBLE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2661) 		if (wait.committed)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2662) 			break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2663) 		io_schedule();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2664) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2665) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2666) 	/* waker already committed us, proceed */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2667) 	finish_wait(&iocg->waitq, &wait.wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2668) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2669) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2670) static void ioc_rqos_merge(struct rq_qos *rqos, struct request *rq,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2671) 			   struct bio *bio)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2672) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2673) 	struct ioc_gq *iocg = blkg_to_iocg(bio->bi_blkg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2674) 	struct ioc *ioc = rqos_to_ioc(rqos);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2675) 	sector_t bio_end = bio_end_sector(bio);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2676) 	struct ioc_now now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2677) 	u64 vtime, abs_cost, cost;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2678) 	unsigned long flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2679) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2680) 	/* bypass if disabled, still initializing, or for root cgroup */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2681) 	if (!ioc->enabled || !iocg || !iocg->level)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2682) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2683) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2684) 	abs_cost = calc_vtime_cost(bio, iocg, true);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2685) 	if (!abs_cost)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2686) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2687) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2688) 	ioc_now(ioc, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2689) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2690) 	vtime = atomic64_read(&iocg->vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2691) 	cost = adjust_inuse_and_calc_cost(iocg, vtime, abs_cost, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2692) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2693) 	/* update cursor if backmerging into the request at the cursor */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2694) 	if (blk_rq_pos(rq) < bio_end &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2695) 	    blk_rq_pos(rq) + blk_rq_sectors(rq) == iocg->cursor)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2696) 		iocg->cursor = bio_end;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2697) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2698) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2699) 	 * Charge if there's enough vtime budget and the existing request has
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2700) 	 * cost assigned.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2701) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2702) 	if (rq->bio && rq->bio->bi_iocost_cost &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2703) 	    time_before_eq64(atomic64_read(&iocg->vtime) + cost, now.vnow)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2704) 		iocg_commit_bio(iocg, bio, abs_cost, cost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2705) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2706) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2707) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2708) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2709) 	 * Otherwise, account it as debt if @iocg is online, which it should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2710) 	 * be for the vast majority of cases. See debt handling in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2711) 	 * ioc_rqos_throttle() for details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2712) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2713) 	spin_lock_irqsave(&ioc->lock, flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2714) 	spin_lock(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2715) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2716) 	if (likely(!list_empty(&iocg->active_list))) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2717) 		iocg_incur_debt(iocg, abs_cost, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2718) 		if (iocg_kick_delay(iocg, &now))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2719) 			blkcg_schedule_throttle(rqos->q,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2720) 					(bio->bi_opf & REQ_SWAP) == REQ_SWAP);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2721) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2722) 		iocg_commit_bio(iocg, bio, abs_cost, cost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2723) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2724) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2725) 	spin_unlock(&iocg->waitq.lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2726) 	spin_unlock_irqrestore(&ioc->lock, flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2727) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2728) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2729) static void ioc_rqos_done_bio(struct rq_qos *rqos, struct bio *bio)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2730) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2731) 	struct ioc_gq *iocg = blkg_to_iocg(bio->bi_blkg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2732) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2733) 	if (iocg && bio->bi_iocost_cost)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2734) 		atomic64_add(bio->bi_iocost_cost, &iocg->done_vtime);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2735) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2736) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2737) static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2738) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2739) 	struct ioc *ioc = rqos_to_ioc(rqos);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2740) 	struct ioc_pcpu_stat *ccs;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2741) 	u64 on_q_ns, rq_wait_ns, size_nsec;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2742) 	int pidx, rw;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2743) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2744) 	if (!ioc->enabled || !rq->alloc_time_ns || !rq->start_time_ns)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2745) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2746) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2747) 	switch (req_op(rq) & REQ_OP_MASK) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2748) 	case REQ_OP_READ:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2749) 		pidx = QOS_RLAT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2750) 		rw = READ;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2751) 		break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2752) 	case REQ_OP_WRITE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2753) 		pidx = QOS_WLAT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2754) 		rw = WRITE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2755) 		break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2756) 	default:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2757) 		return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2758) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2759) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2760) 	on_q_ns = ktime_get_ns() - rq->alloc_time_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2761) 	rq_wait_ns = rq->start_time_ns - rq->alloc_time_ns;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2762) 	size_nsec = div64_u64(calc_size_vtime_cost(rq, ioc), VTIME_PER_NSEC);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2763) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2764) 	ccs = get_cpu_ptr(ioc->pcpu_stat);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2765) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2766) 	if (on_q_ns <= size_nsec ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2767) 	    on_q_ns - size_nsec <= ioc->params.qos[pidx] * NSEC_PER_USEC)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2768) 		local_inc(&ccs->missed[rw].nr_met);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2769) 	else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2770) 		local_inc(&ccs->missed[rw].nr_missed);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2771) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2772) 	local64_add(rq_wait_ns, &ccs->rq_wait_ns);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2773) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2774) 	put_cpu_ptr(ccs);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2775) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2776) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2777) static void ioc_rqos_queue_depth_changed(struct rq_qos *rqos)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2778) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2779) 	struct ioc *ioc = rqos_to_ioc(rqos);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2780) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2781) 	spin_lock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2782) 	ioc_refresh_params(ioc, false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2783) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2784) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2785) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2786) static void ioc_rqos_exit(struct rq_qos *rqos)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2787) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2788) 	struct ioc *ioc = rqos_to_ioc(rqos);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2789) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2790) 	blkcg_deactivate_policy(rqos->q, &blkcg_policy_iocost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2791) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2792) 	spin_lock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2793) 	ioc->running = IOC_STOP;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2794) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2795) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2796) 	del_timer_sync(&ioc->timer);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2797) 	free_percpu(ioc->pcpu_stat);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2798) 	kfree(ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2799) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2800) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2801) static struct rq_qos_ops ioc_rqos_ops = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2802) 	.throttle = ioc_rqos_throttle,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2803) 	.merge = ioc_rqos_merge,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2804) 	.done_bio = ioc_rqos_done_bio,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2805) 	.done = ioc_rqos_done,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2806) 	.queue_depth_changed = ioc_rqos_queue_depth_changed,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2807) 	.exit = ioc_rqos_exit,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2808) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2809) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2810) static int blk_iocost_init(struct request_queue *q)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2811) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2812) 	struct ioc *ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2813) 	struct rq_qos *rqos;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2814) 	int i, cpu, ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2815) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2816) 	ioc = kzalloc(sizeof(*ioc), GFP_KERNEL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2817) 	if (!ioc)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2818) 		return -ENOMEM;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2819) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2820) 	ioc->pcpu_stat = alloc_percpu(struct ioc_pcpu_stat);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2821) 	if (!ioc->pcpu_stat) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2822) 		kfree(ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2823) 		return -ENOMEM;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2824) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2825) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2826) 	for_each_possible_cpu(cpu) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2827) 		struct ioc_pcpu_stat *ccs = per_cpu_ptr(ioc->pcpu_stat, cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2828) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2829) 		for (i = 0; i < ARRAY_SIZE(ccs->missed); i++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2830) 			local_set(&ccs->missed[i].nr_met, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2831) 			local_set(&ccs->missed[i].nr_missed, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2832) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2833) 		local64_set(&ccs->rq_wait_ns, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2834) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2835) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2836) 	rqos = &ioc->rqos;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2837) 	rqos->id = RQ_QOS_COST;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2838) 	rqos->ops = &ioc_rqos_ops;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2839) 	rqos->q = q;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2840) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2841) 	spin_lock_init(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2842) 	timer_setup(&ioc->timer, ioc_timer_fn, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2843) 	INIT_LIST_HEAD(&ioc->active_iocgs);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2844) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2845) 	ioc->running = IOC_IDLE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2846) 	ioc->vtime_base_rate = VTIME_PER_USEC;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2847) 	atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2848) 	seqcount_spinlock_init(&ioc->period_seqcount, &ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2849) 	ioc->period_at = ktime_to_us(ktime_get());
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2850) 	atomic64_set(&ioc->cur_period, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2851) 	atomic_set(&ioc->hweight_gen, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2852) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2853) 	spin_lock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2854) 	ioc->autop_idx = AUTOP_INVALID;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2855) 	ioc_refresh_params(ioc, true);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2856) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2857) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2858) 	/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2859) 	 * rqos must be added before activation to allow iocg_pd_init() to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2860) 	 * lookup the ioc from q. This means that the rqos methods may get
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2861) 	 * called before policy activation completion, can't assume that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2862) 	 * target bio has an iocg associated and need to test for NULL iocg.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2863) 	 */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2864) 	rq_qos_add(q, rqos);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2865) 	ret = blkcg_activate_policy(q, &blkcg_policy_iocost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2866) 	if (ret) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2867) 		rq_qos_del(q, rqos);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2868) 		free_percpu(ioc->pcpu_stat);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2869) 		kfree(ioc);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2870) 		return ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2871) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2872) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2873) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2874) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2875) static struct blkcg_policy_data *ioc_cpd_alloc(gfp_t gfp)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2876) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2877) 	struct ioc_cgrp *iocc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2878) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2879) 	iocc = kzalloc(sizeof(struct ioc_cgrp), gfp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2880) 	if (!iocc)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2881) 		return NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2882) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2883) 	iocc->dfl_weight = CGROUP_WEIGHT_DFL * WEIGHT_ONE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2884) 	return &iocc->cpd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2885) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2886) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2887) static void ioc_cpd_free(struct blkcg_policy_data *cpd)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2888) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2889) 	kfree(container_of(cpd, struct ioc_cgrp, cpd));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2890) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2891) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2892) static struct blkg_policy_data *ioc_pd_alloc(gfp_t gfp, struct request_queue *q,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2893) 					     struct blkcg *blkcg)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2894) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2895) 	int levels = blkcg->css.cgroup->level + 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2896) 	struct ioc_gq *iocg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2897) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2898) 	iocg = kzalloc_node(struct_size(iocg, ancestors, levels), gfp, q->node);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2899) 	if (!iocg)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2900) 		return NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2901) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2902) 	iocg->pcpu_stat = alloc_percpu_gfp(struct iocg_pcpu_stat, gfp);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2903) 	if (!iocg->pcpu_stat) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2904) 		kfree(iocg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2905) 		return NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2906) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2907) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2908) 	return &iocg->pd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2909) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2910) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2911) static void ioc_pd_init(struct blkg_policy_data *pd)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2912) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2913) 	struct ioc_gq *iocg = pd_to_iocg(pd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2914) 	struct blkcg_gq *blkg = pd_to_blkg(&iocg->pd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2915) 	struct ioc *ioc = q_to_ioc(blkg->q);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2916) 	struct ioc_now now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2917) 	struct blkcg_gq *tblkg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2918) 	unsigned long flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2919) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2920) 	ioc_now(ioc, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2921) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2922) 	iocg->ioc = ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2923) 	atomic64_set(&iocg->vtime, now.vnow);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2924) 	atomic64_set(&iocg->done_vtime, now.vnow);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2925) 	atomic64_set(&iocg->active_period, atomic64_read(&ioc->cur_period));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2926) 	INIT_LIST_HEAD(&iocg->active_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2927) 	INIT_LIST_HEAD(&iocg->walk_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2928) 	INIT_LIST_HEAD(&iocg->surplus_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2929) 	iocg->hweight_active = WEIGHT_ONE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2930) 	iocg->hweight_inuse = WEIGHT_ONE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2931) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2932) 	init_waitqueue_head(&iocg->waitq);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2933) 	hrtimer_init(&iocg->waitq_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2934) 	iocg->waitq_timer.function = iocg_waitq_timer_fn;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2935) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2936) 	iocg->level = blkg->blkcg->css.cgroup->level;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2937) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2938) 	for (tblkg = blkg; tblkg; tblkg = tblkg->parent) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2939) 		struct ioc_gq *tiocg = blkg_to_iocg(tblkg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2940) 		iocg->ancestors[tiocg->level] = tiocg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2941) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2942) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2943) 	spin_lock_irqsave(&ioc->lock, flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2944) 	weight_updated(iocg, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2945) 	spin_unlock_irqrestore(&ioc->lock, flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2946) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2947) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2948) static void ioc_pd_free(struct blkg_policy_data *pd)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2949) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2950) 	struct ioc_gq *iocg = pd_to_iocg(pd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2951) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2952) 	unsigned long flags;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2953) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2954) 	if (ioc) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2955) 		spin_lock_irqsave(&ioc->lock, flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2956) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2957) 		if (!list_empty(&iocg->active_list)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2958) 			struct ioc_now now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2959) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2960) 			ioc_now(ioc, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2961) 			propagate_weights(iocg, 0, 0, false, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2962) 			list_del_init(&iocg->active_list);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2963) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2964) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2965) 		WARN_ON_ONCE(!list_empty(&iocg->walk_list));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2966) 		WARN_ON_ONCE(!list_empty(&iocg->surplus_list));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2967) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2968) 		spin_unlock_irqrestore(&ioc->lock, flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2969) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2970) 		hrtimer_cancel(&iocg->waitq_timer);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2971) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2972) 	free_percpu(iocg->pcpu_stat);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2973) 	kfree(iocg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2974) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2975) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2976) static size_t ioc_pd_stat(struct blkg_policy_data *pd, char *buf, size_t size)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2977) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2978) 	struct ioc_gq *iocg = pd_to_iocg(pd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2979) 	struct ioc *ioc = iocg->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2980) 	size_t pos = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2981) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2982) 	if (!ioc->enabled)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2983) 		return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2984) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2985) 	if (iocg->level == 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2986) 		unsigned vp10k = DIV64_U64_ROUND_CLOSEST(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2987) 			ioc->vtime_base_rate * 10000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2988) 			VTIME_PER_USEC);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2989) 		pos += scnprintf(buf + pos, size - pos, " cost.vrate=%u.%02u",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2990) 				  vp10k / 100, vp10k % 100);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2991) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2992) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2993) 	pos += scnprintf(buf + pos, size - pos, " cost.usage=%llu",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2994) 			 iocg->last_stat.usage_us);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2995) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2996) 	if (blkcg_debug_stats)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2997) 		pos += scnprintf(buf + pos, size - pos,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2998) 				 " cost.wait=%llu cost.indebt=%llu cost.indelay=%llu",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2999) 				 iocg->last_stat.wait_us,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3000) 				 iocg->last_stat.indebt_us,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3001) 				 iocg->last_stat.indelay_us);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3002) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3003) 	return pos;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3004) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3005) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3006) static u64 ioc_weight_prfill(struct seq_file *sf, struct blkg_policy_data *pd,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3007) 			     int off)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3008) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3009) 	const char *dname = blkg_dev_name(pd->blkg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3010) 	struct ioc_gq *iocg = pd_to_iocg(pd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3011) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3012) 	if (dname && iocg->cfg_weight)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3013) 		seq_printf(sf, "%s %u\n", dname, iocg->cfg_weight / WEIGHT_ONE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3014) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3015) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3016) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3017) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3018) static int ioc_weight_show(struct seq_file *sf, void *v)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3019) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3020) 	struct blkcg *blkcg = css_to_blkcg(seq_css(sf));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3021) 	struct ioc_cgrp *iocc = blkcg_to_iocc(blkcg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3022) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3023) 	seq_printf(sf, "default %u\n", iocc->dfl_weight / WEIGHT_ONE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3024) 	blkcg_print_blkgs(sf, blkcg, ioc_weight_prfill,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3025) 			  &blkcg_policy_iocost, seq_cft(sf)->private, false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3026) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3027) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3028) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3029) static ssize_t ioc_weight_write(struct kernfs_open_file *of, char *buf,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3030) 				size_t nbytes, loff_t off)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3031) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3032) 	struct blkcg *blkcg = css_to_blkcg(of_css(of));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3033) 	struct ioc_cgrp *iocc = blkcg_to_iocc(blkcg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3034) 	struct blkg_conf_ctx ctx;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3035) 	struct ioc_now now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3036) 	struct ioc_gq *iocg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3037) 	u32 v;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3038) 	int ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3039) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3040) 	if (!strchr(buf, ':')) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3041) 		struct blkcg_gq *blkg;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3042) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3043) 		if (!sscanf(buf, "default %u", &v) && !sscanf(buf, "%u", &v))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3044) 			return -EINVAL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3045) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3046) 		if (v < CGROUP_WEIGHT_MIN || v > CGROUP_WEIGHT_MAX)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3047) 			return -EINVAL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3048) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3049) 		spin_lock_irq(&blkcg->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3050) 		iocc->dfl_weight = v * WEIGHT_ONE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3051) 		hlist_for_each_entry(blkg, &blkcg->blkg_list, blkcg_node) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3052) 			struct ioc_gq *iocg = blkg_to_iocg(blkg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3053) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3054) 			if (iocg) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3055) 				spin_lock(&iocg->ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3056) 				ioc_now(iocg->ioc, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3057) 				weight_updated(iocg, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3058) 				spin_unlock(&iocg->ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3059) 			}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3060) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3061) 		spin_unlock_irq(&blkcg->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3062) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3063) 		return nbytes;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3064) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3065) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3066) 	ret = blkg_conf_prep(blkcg, &blkcg_policy_iocost, buf, &ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3067) 	if (ret)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3068) 		return ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3069) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3070) 	iocg = blkg_to_iocg(ctx.blkg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3071) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3072) 	if (!strncmp(ctx.body, "default", 7)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3073) 		v = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3074) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3075) 		if (!sscanf(ctx.body, "%u", &v))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3076) 			goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3077) 		if (v < CGROUP_WEIGHT_MIN || v > CGROUP_WEIGHT_MAX)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3078) 			goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3079) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3080) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3081) 	spin_lock(&iocg->ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3082) 	iocg->cfg_weight = v * WEIGHT_ONE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3083) 	ioc_now(iocg->ioc, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3084) 	weight_updated(iocg, &now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3085) 	spin_unlock(&iocg->ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3086) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3087) 	blkg_conf_finish(&ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3088) 	return nbytes;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3089) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3090) einval:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3091) 	blkg_conf_finish(&ctx);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3092) 	return -EINVAL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3093) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3094) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3095) static u64 ioc_qos_prfill(struct seq_file *sf, struct blkg_policy_data *pd,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3096) 			  int off)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3097) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3098) 	const char *dname = blkg_dev_name(pd->blkg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3099) 	struct ioc *ioc = pd_to_iocg(pd)->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3101) 	if (!dname)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3102) 		return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3103) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3104) 	seq_printf(sf, "%s enable=%d ctrl=%s rpct=%u.%02u rlat=%u wpct=%u.%02u wlat=%u min=%u.%02u max=%u.%02u\n",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3105) 		   dname, ioc->enabled, ioc->user_qos_params ? "user" : "auto",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3106) 		   ioc->params.qos[QOS_RPPM] / 10000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3107) 		   ioc->params.qos[QOS_RPPM] % 10000 / 100,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3108) 		   ioc->params.qos[QOS_RLAT],
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3109) 		   ioc->params.qos[QOS_WPPM] / 10000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3110) 		   ioc->params.qos[QOS_WPPM] % 10000 / 100,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3111) 		   ioc->params.qos[QOS_WLAT],
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3112) 		   ioc->params.qos[QOS_MIN] / 10000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3113) 		   ioc->params.qos[QOS_MIN] % 10000 / 100,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3114) 		   ioc->params.qos[QOS_MAX] / 10000,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3115) 		   ioc->params.qos[QOS_MAX] % 10000 / 100);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3116) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3117) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3118) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3119) static int ioc_qos_show(struct seq_file *sf, void *v)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3120) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3121) 	struct blkcg *blkcg = css_to_blkcg(seq_css(sf));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3122) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3123) 	blkcg_print_blkgs(sf, blkcg, ioc_qos_prfill,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3124) 			  &blkcg_policy_iocost, seq_cft(sf)->private, false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3125) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3126) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3128) static const match_table_t qos_ctrl_tokens = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3129) 	{ QOS_ENABLE,		"enable=%u"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3130) 	{ QOS_CTRL,		"ctrl=%s"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3131) 	{ NR_QOS_CTRL_PARAMS,	NULL		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3132) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3134) static const match_table_t qos_tokens = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3135) 	{ QOS_RPPM,		"rpct=%s"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3136) 	{ QOS_RLAT,		"rlat=%u"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3137) 	{ QOS_WPPM,		"wpct=%s"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3138) 	{ QOS_WLAT,		"wlat=%u"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3139) 	{ QOS_MIN,		"min=%s"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3140) 	{ QOS_MAX,		"max=%s"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3141) 	{ NR_QOS_PARAMS,	NULL		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3142) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3143) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3144) static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3145) 			     size_t nbytes, loff_t off)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3146) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3147) 	struct gendisk *disk;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3148) 	struct ioc *ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3149) 	u32 qos[NR_QOS_PARAMS];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3150) 	bool enable, user;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3151) 	char *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3152) 	int ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3154) 	disk = blkcg_conf_get_disk(&input);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3155) 	if (IS_ERR(disk))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3156) 		return PTR_ERR(disk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3157) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3158) 	ioc = q_to_ioc(disk->queue);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3159) 	if (!ioc) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3160) 		ret = blk_iocost_init(disk->queue);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3161) 		if (ret)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3162) 			goto err;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3163) 		ioc = q_to_ioc(disk->queue);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3164) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3165) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3166) 	spin_lock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3167) 	memcpy(qos, ioc->params.qos, sizeof(qos));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3168) 	enable = ioc->enabled;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3169) 	user = ioc->user_qos_params;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3170) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3171) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3172) 	while ((p = strsep(&input, " \t\n"))) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3173) 		substring_t args[MAX_OPT_ARGS];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3174) 		char buf[32];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3175) 		int tok;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3176) 		s64 v;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3177) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3178) 		if (!*p)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3179) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3180) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3181) 		switch (match_token(p, qos_ctrl_tokens, args)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3182) 		case QOS_ENABLE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3183) 			match_u64(&args[0], &v);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3184) 			enable = v;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3185) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3186) 		case QOS_CTRL:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3187) 			match_strlcpy(buf, &args[0], sizeof(buf));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3188) 			if (!strcmp(buf, "auto"))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3189) 				user = false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3190) 			else if (!strcmp(buf, "user"))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3191) 				user = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3192) 			else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3193) 				goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3194) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3195) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3196) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3197) 		tok = match_token(p, qos_tokens, args);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3198) 		switch (tok) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3199) 		case QOS_RPPM:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3200) 		case QOS_WPPM:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3201) 			if (match_strlcpy(buf, &args[0], sizeof(buf)) >=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3202) 			    sizeof(buf))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3203) 				goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3204) 			if (cgroup_parse_float(buf, 2, &v))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3205) 				goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3206) 			if (v < 0 || v > 10000)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3207) 				goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3208) 			qos[tok] = v * 100;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3209) 			break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3210) 		case QOS_RLAT:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3211) 		case QOS_WLAT:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3212) 			if (match_u64(&args[0], &v))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3213) 				goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3214) 			qos[tok] = v;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3215) 			break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3216) 		case QOS_MIN:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3217) 		case QOS_MAX:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3218) 			if (match_strlcpy(buf, &args[0], sizeof(buf)) >=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3219) 			    sizeof(buf))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3220) 				goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3221) 			if (cgroup_parse_float(buf, 2, &v))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3222) 				goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3223) 			if (v < 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3224) 				goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3225) 			qos[tok] = clamp_t(s64, v * 100,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3226) 					   VRATE_MIN_PPM, VRATE_MAX_PPM);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3227) 			break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3228) 		default:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3229) 			goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3230) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3231) 		user = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3232) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3233) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3234) 	if (qos[QOS_MIN] > qos[QOS_MAX])
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3235) 		goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3236) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3237) 	spin_lock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3238) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3239) 	if (enable) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3240) 		blk_stat_enable_accounting(ioc->rqos.q);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3241) 		blk_queue_flag_set(QUEUE_FLAG_RQ_ALLOC_TIME, ioc->rqos.q);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3242) 		ioc->enabled = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3243) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3244) 		blk_queue_flag_clear(QUEUE_FLAG_RQ_ALLOC_TIME, ioc->rqos.q);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3245) 		ioc->enabled = false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3246) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3247) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3248) 	if (user) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3249) 		memcpy(ioc->params.qos, qos, sizeof(qos));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3250) 		ioc->user_qos_params = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3251) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3252) 		ioc->user_qos_params = false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3253) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3254) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3255) 	ioc_refresh_params(ioc, true);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3256) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3257) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3258) 	put_disk_and_module(disk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3259) 	return nbytes;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3260) einval:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3261) 	ret = -EINVAL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3262) err:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3263) 	put_disk_and_module(disk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3264) 	return ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3265) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3266) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3267) static u64 ioc_cost_model_prfill(struct seq_file *sf,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3268) 				 struct blkg_policy_data *pd, int off)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3269) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3270) 	const char *dname = blkg_dev_name(pd->blkg);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3271) 	struct ioc *ioc = pd_to_iocg(pd)->ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3272) 	u64 *u = ioc->params.i_lcoefs;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3273) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3274) 	if (!dname)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3275) 		return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3276) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3277) 	seq_printf(sf, "%s ctrl=%s model=linear "
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3278) 		   "rbps=%llu rseqiops=%llu rrandiops=%llu "
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3279) 		   "wbps=%llu wseqiops=%llu wrandiops=%llu\n",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3280) 		   dname, ioc->user_cost_model ? "user" : "auto",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3281) 		   u[I_LCOEF_RBPS], u[I_LCOEF_RSEQIOPS], u[I_LCOEF_RRANDIOPS],
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3282) 		   u[I_LCOEF_WBPS], u[I_LCOEF_WSEQIOPS], u[I_LCOEF_WRANDIOPS]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3283) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3284) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3285) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3286) static int ioc_cost_model_show(struct seq_file *sf, void *v)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3287) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3288) 	struct blkcg *blkcg = css_to_blkcg(seq_css(sf));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3289) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3290) 	blkcg_print_blkgs(sf, blkcg, ioc_cost_model_prfill,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3291) 			  &blkcg_policy_iocost, seq_cft(sf)->private, false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3292) 	return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3293) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3294) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3295) static const match_table_t cost_ctrl_tokens = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3296) 	{ COST_CTRL,		"ctrl=%s"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3297) 	{ COST_MODEL,		"model=%s"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3298) 	{ NR_COST_CTRL_PARAMS,	NULL		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3299) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3300) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3301) static const match_table_t i_lcoef_tokens = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3302) 	{ I_LCOEF_RBPS,		"rbps=%u"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3303) 	{ I_LCOEF_RSEQIOPS,	"rseqiops=%u"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3304) 	{ I_LCOEF_RRANDIOPS,	"rrandiops=%u"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3305) 	{ I_LCOEF_WBPS,		"wbps=%u"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3306) 	{ I_LCOEF_WSEQIOPS,	"wseqiops=%u"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3307) 	{ I_LCOEF_WRANDIOPS,	"wrandiops=%u"	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3308) 	{ NR_I_LCOEFS,		NULL		},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3309) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3310) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3311) static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3312) 				    size_t nbytes, loff_t off)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3313) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3314) 	struct gendisk *disk;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3315) 	struct ioc *ioc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3316) 	u64 u[NR_I_LCOEFS];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3317) 	bool user;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3318) 	char *p;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3319) 	int ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3320) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3321) 	disk = blkcg_conf_get_disk(&input);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3322) 	if (IS_ERR(disk))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3323) 		return PTR_ERR(disk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3324) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3325) 	ioc = q_to_ioc(disk->queue);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3326) 	if (!ioc) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3327) 		ret = blk_iocost_init(disk->queue);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3328) 		if (ret)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3329) 			goto err;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3330) 		ioc = q_to_ioc(disk->queue);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3331) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3332) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3333) 	spin_lock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3334) 	memcpy(u, ioc->params.i_lcoefs, sizeof(u));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3335) 	user = ioc->user_cost_model;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3336) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3337) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3338) 	while ((p = strsep(&input, " \t\n"))) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3339) 		substring_t args[MAX_OPT_ARGS];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3340) 		char buf[32];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3341) 		int tok;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3342) 		u64 v;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3343) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3344) 		if (!*p)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3345) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3346) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3347) 		switch (match_token(p, cost_ctrl_tokens, args)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3348) 		case COST_CTRL:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3349) 			match_strlcpy(buf, &args[0], sizeof(buf));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3350) 			if (!strcmp(buf, "auto"))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3351) 				user = false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3352) 			else if (!strcmp(buf, "user"))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3353) 				user = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3354) 			else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3355) 				goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3356) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3357) 		case COST_MODEL:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3358) 			match_strlcpy(buf, &args[0], sizeof(buf));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3359) 			if (strcmp(buf, "linear"))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3360) 				goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3361) 			continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3362) 		}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3363) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3364) 		tok = match_token(p, i_lcoef_tokens, args);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3365) 		if (tok == NR_I_LCOEFS)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3366) 			goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3367) 		if (match_u64(&args[0], &v))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3368) 			goto einval;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3369) 		u[tok] = v;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3370) 		user = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3371) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3372) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3373) 	spin_lock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3374) 	if (user) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3375) 		memcpy(ioc->params.i_lcoefs, u, sizeof(u));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3376) 		ioc->user_cost_model = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3377) 	} else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3378) 		ioc->user_cost_model = false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3379) 	}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3380) 	ioc_refresh_params(ioc, true);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3381) 	spin_unlock_irq(&ioc->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3382) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3383) 	put_disk_and_module(disk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3384) 	return nbytes;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3385) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3386) einval:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3387) 	ret = -EINVAL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3388) err:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3389) 	put_disk_and_module(disk);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3390) 	return ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3391) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3392) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3393) static struct cftype ioc_files[] = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3394) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3395) 		.name = "weight",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3396) 		.flags = CFTYPE_NOT_ON_ROOT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3397) 		.seq_show = ioc_weight_show,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3398) 		.write = ioc_weight_write,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3399) 	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3400) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3401) 		.name = "cost.qos",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3402) 		.flags = CFTYPE_ONLY_ON_ROOT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3403) 		.seq_show = ioc_qos_show,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3404) 		.write = ioc_qos_write,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3405) 	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3406) 	{
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3407) 		.name = "cost.model",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3408) 		.flags = CFTYPE_ONLY_ON_ROOT,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3409) 		.seq_show = ioc_cost_model_show,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3410) 		.write = ioc_cost_model_write,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3411) 	},
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3412) 	{}
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3413) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3414) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3415) static struct blkcg_policy blkcg_policy_iocost = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3416) 	.dfl_cftypes	= ioc_files,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3417) 	.cpd_alloc_fn	= ioc_cpd_alloc,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3418) 	.cpd_free_fn	= ioc_cpd_free,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3419) 	.pd_alloc_fn	= ioc_pd_alloc,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3420) 	.pd_init_fn	= ioc_pd_init,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3421) 	.pd_free_fn	= ioc_pd_free,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3422) 	.pd_stat_fn	= ioc_pd_stat,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3423) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3424) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3425) static int __init ioc_init(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3426) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3427) 	return blkcg_policy_register(&blkcg_policy_iocost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3428) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3430) static void __exit ioc_exit(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3431) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3432) 	blkcg_policy_unregister(&blkcg_policy_iocost);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3433) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3434) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3435) module_init(ioc_init);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3436) module_exit(ioc_exit);