^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) * Pressure stall information for CPU, memory and IO
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) * Copyright (c) 2018 Facebook, Inc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) * Author: Johannes Weiner <hannes@cmpxchg.org>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) * Polling support by Suren Baghdasaryan <surenb@google.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) * Copyright (c) 2018 Google, Inc.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) * When CPU, memory and IO are contended, tasks experience delays that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) * reduce throughput and introduce latencies into the workload. Memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) * and IO contention, in addition, can cause a full loss of forward
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) * progress in which the CPU goes idle.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) * This code aggregates individual task delays into resource pressure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) * metrics that indicate problems with both workload health and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) * resource utilization.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) * Model
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) * The time in which a task can execute on a CPU is our baseline for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) * productivity. Pressure expresses the amount of time in which this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) * potential cannot be realized due to resource contention.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) * This concept of productivity has two components: the workload and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) * the CPU. To measure the impact of pressure on both, we define two
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) * contention states for a resource: SOME and FULL.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) * In the SOME state of a given resource, one or more tasks are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) * delayed on that resource. This affects the workload's ability to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) * perform work, but the CPU may still be executing other tasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) * In the FULL state of a given resource, all non-idle tasks are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) * delayed on that resource such that nobody is advancing and the CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) * goes idle. This leaves both workload and CPU unproductive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) * (Naturally, the FULL state doesn't exist for the CPU resource.)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) * SOME = nr_delayed_tasks != 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) * FULL = nr_delayed_tasks != 0 && nr_running_tasks == 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) * The percentage of wallclock time spent in those compound stall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) * states gives pressure numbers between 0 and 100 for each resource,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) * where the SOME percentage indicates workload slowdowns and the FULL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) * percentage indicates reduced CPU utilization:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) * %SOME = time(SOME) / period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) * %FULL = time(FULL) / period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) * Multiple CPUs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) * The more tasks and available CPUs there are, the more work can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) * performed concurrently. This means that the potential that can go
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) * unrealized due to resource contention *also* scales with non-idle
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) * tasks and CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) * Consider a scenario where 257 number crunching tasks are trying to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) * run concurrently on 256 CPUs. If we simply aggregated the task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) * states, we would have to conclude a CPU SOME pressure number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) * 100%, since *somebody* is waiting on a runqueue at all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) * times. However, that is clearly not the amount of contention the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) * workload is experiencing: only one out of 256 possible exceution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) * threads will be contended at any given time, or about 0.4%.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) * Conversely, consider a scenario of 4 tasks and 4 CPUs where at any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66) * given time *one* of the tasks is delayed due to a lack of memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) * Again, looking purely at the task state would yield a memory FULL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) * pressure number of 0%, since *somebody* is always making forward
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) * progress. But again this wouldn't capture the amount of execution
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) * potential lost, which is 1 out of 4 CPUs, or 25%.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) * To calculate wasted potential (pressure) with multiple processors,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) * we have to base our calculation on the number of non-idle tasks in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) * conjunction with the number of available CPUs, which is the number
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) * of potential execution threads. SOME becomes then the proportion of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) * delayed tasks to possibe threads, and FULL is the share of possible
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) * threads that are unproductive due to delays:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) * threads = min(nr_nonidle_tasks, nr_cpus)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) * SOME = min(nr_delayed_tasks / threads, 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) * FULL = (threads - min(nr_running_tasks, threads)) / threads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) * For the 257 number crunchers on 256 CPUs, this yields:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85) * threads = min(257, 256)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) * SOME = min(1 / 256, 1) = 0.4%
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) * FULL = (256 - min(257, 256)) / 256 = 0%
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) * For the 1 out of 4 memory-delayed tasks, this yields:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) * threads = min(4, 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92) * SOME = min(1 / 4, 1) = 25%
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) * FULL = (4 - min(3, 4)) / 4 = 25%
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95) * [ Substitute nr_cpus with 1, and you can see that it's a natural
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) * extension of the single-CPU model. ]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) * Implementation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) * To assess the precise time spent in each such state, we would have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) * to freeze the system on task changes and start/stop the state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) * clocks accordingly. Obviously that doesn't scale in practice.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) * Because the scheduler aims to distribute the compute load evenly
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) * among the available CPUs, we can track task state locally to each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) * CPU and, at much lower frequency, extrapolate the global state for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) * the cumulative stall times and the running averages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) * For each runqueue, we track:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) * tSOME[cpu] = time(nr_delayed_tasks[cpu] != 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) * tFULL[cpu] = time(nr_delayed_tasks[cpu] && !nr_running_tasks[cpu])
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) * tNONIDLE[cpu] = time(nr_nonidle_tasks[cpu] != 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) * and then periodically aggregate:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) * tNONIDLE = sum(tNONIDLE[i])
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) * tSOME = sum(tSOME[i] * tNONIDLE[i]) / tNONIDLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) * tFULL = sum(tFULL[i] * tNONIDLE[i]) / tNONIDLE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) * %SOME = tSOME / period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) * %FULL = tFULL / period
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125) * This gives us an approximation of pressure that is practical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) * cost-wise, yet way more sensitive and accurate than periodic
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) * sampling of the aggregate task states would be.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) #include "../workqueue_internal.h"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) #include <linux/sched/loadavg.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132) #include <linux/seq_file.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133) #include <linux/proc_fs.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) #include <linux/seqlock.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135) #include <linux/uaccess.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) #include <linux/cgroup.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) #include <linux/module.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138) #include <linux/sched.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) #include <linux/ctype.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) #include <linux/file.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) #include <linux/poll.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) #include <linux/psi.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) #include "sched.h"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) #include <trace/hooks/psi.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) static int psi_bug __read_mostly;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) DEFINE_STATIC_KEY_FALSE(psi_disabled);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) DEFINE_STATIC_KEY_TRUE(psi_cgroups_enabled);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) #ifdef CONFIG_PSI_DEFAULT_DISABLED
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) static bool psi_enable;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154) #else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) static bool psi_enable = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156) #endif
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) static int __init setup_psi(char *str)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) return kstrtobool(str, &psi_enable) == 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 160) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 161) __setup("psi=", setup_psi);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 162)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 163) /* Running averages - we need to be higher-res than loadavg */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 164) #define PSI_FREQ (2*HZ+1) /* 2 sec intervals */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 165) #define EXP_10s 1677 /* 1/exp(2s/10s) as fixed-point */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 166) #define EXP_60s 1981 /* 1/exp(2s/60s) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 167) #define EXP_300s 2034 /* 1/exp(2s/300s) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 168)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 169) /* PSI trigger definitions */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 170) #define WINDOW_MIN_US 500000 /* Min window size is 500ms */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 171) #define WINDOW_MAX_US 10000000 /* Max window size is 10s */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 172) #define UPDATES_PER_WINDOW 10 /* 10 updates per window */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 173)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 174) /* Sampling frequency in nanoseconds */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 175) static u64 psi_period __read_mostly;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 176)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 177) /* System-level pressure and stall tracking */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 178) static DEFINE_PER_CPU(struct psi_group_cpu, system_group_pcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 179) struct psi_group psi_system = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 180) .pcpu = &system_group_pcpu,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 181) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 182)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 183) static void psi_avgs_work(struct work_struct *work);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 184)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 185) static void poll_timer_fn(struct timer_list *t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 187) static void group_init(struct psi_group *group)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 188) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 189) int cpu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 190)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 191) for_each_possible_cpu(cpu)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 192) seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 193) group->avg_last_update = sched_clock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 194) group->avg_next_update = group->avg_last_update + psi_period;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 195) INIT_DELAYED_WORK(&group->avgs_work, psi_avgs_work);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 196) mutex_init(&group->avgs_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 197) /* Init trigger-related members */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 198) atomic_set(&group->poll_scheduled, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 199) mutex_init(&group->trigger_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 200) INIT_LIST_HEAD(&group->triggers);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 201) memset(group->nr_triggers, 0, sizeof(group->nr_triggers));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 202) group->poll_states = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 203) group->poll_min_period = U32_MAX;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 204) memset(group->polling_total, 0, sizeof(group->polling_total));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 205) group->polling_next_update = ULLONG_MAX;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 206) group->polling_until = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 207) init_waitqueue_head(&group->poll_wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 208) timer_setup(&group->poll_timer, poll_timer_fn, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 209) rcu_assign_pointer(group->poll_task, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 210) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 211)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 212) void __init psi_init(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 213) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 214) if (!psi_enable) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 215) static_branch_enable(&psi_disabled);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 216) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 217) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 218)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 219) if (!cgroup_psi_enabled())
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 220) static_branch_disable(&psi_cgroups_enabled);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 221)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 222) psi_period = jiffies_to_nsecs(PSI_FREQ);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 223) group_init(&psi_system);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 224) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 225)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 226) static bool test_state(unsigned int *tasks, enum psi_states state)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 227) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 228) switch (state) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 229) case PSI_IO_SOME:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 230) return tasks[NR_IOWAIT];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 231) case PSI_IO_FULL:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 232) return tasks[NR_IOWAIT] && !tasks[NR_RUNNING];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 233) case PSI_MEM_SOME:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 234) return tasks[NR_MEMSTALL];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 235) case PSI_MEM_FULL:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 236) return tasks[NR_MEMSTALL] && !tasks[NR_RUNNING];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 237) case PSI_CPU_SOME:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 238) return tasks[NR_RUNNING] > tasks[NR_ONCPU];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 239) case PSI_NONIDLE:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 240) return tasks[NR_IOWAIT] || tasks[NR_MEMSTALL] ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 241) tasks[NR_RUNNING];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 242) default:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 243) return false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 244) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 245) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 246)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 247) static void get_recent_times(struct psi_group *group, int cpu,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 248) enum psi_aggregators aggregator, u32 *times,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 249) u32 *pchanged_states)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 250) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 251) struct psi_group_cpu *groupc = per_cpu_ptr(group->pcpu, cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 252) u64 now, state_start;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 253) enum psi_states s;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 254) unsigned int seq;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 255) u32 state_mask;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 256)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 257) *pchanged_states = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 258)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 259) /* Snapshot a coherent view of the CPU state */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 260) do {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 261) seq = read_seqcount_begin(&groupc->seq);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 262) now = cpu_clock(cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 263) memcpy(times, groupc->times, sizeof(groupc->times));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 264) state_mask = groupc->state_mask;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 265) state_start = groupc->state_start;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 266) } while (read_seqcount_retry(&groupc->seq, seq));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 267)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 268) /* Calculate state time deltas against the previous snapshot */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 269) for (s = 0; s < NR_PSI_STATES; s++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 270) u32 delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 271) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 272) * In addition to already concluded states, we also
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 273) * incorporate currently active states on the CPU,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 274) * since states may last for many sampling periods.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 275) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 276) * This way we keep our delta sampling buckets small
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 277) * (u32) and our reported pressure close to what's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 278) * actually happening.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 279) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 280) if (state_mask & (1 << s))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 281) times[s] += now - state_start;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 282)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 283) delta = times[s] - groupc->times_prev[aggregator][s];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 284) groupc->times_prev[aggregator][s] = times[s];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 285)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 286) times[s] = delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 287) if (delta)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 288) *pchanged_states |= (1 << s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 289) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 290) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 291)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 292) static void calc_avgs(unsigned long avg[3], int missed_periods,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 293) u64 time, u64 period)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 294) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 295) unsigned long pct;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 296)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 297) /* Fill in zeroes for periods of no activity */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 298) if (missed_periods) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 299) avg[0] = calc_load_n(avg[0], EXP_10s, 0, missed_periods);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 300) avg[1] = calc_load_n(avg[1], EXP_60s, 0, missed_periods);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 301) avg[2] = calc_load_n(avg[2], EXP_300s, 0, missed_periods);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 302) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 303)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 304) /* Sample the most recent active period */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 305) pct = div_u64(time * 100, period);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 306) pct *= FIXED_1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 307) avg[0] = calc_load(avg[0], EXP_10s, pct);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 308) avg[1] = calc_load(avg[1], EXP_60s, pct);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 309) avg[2] = calc_load(avg[2], EXP_300s, pct);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 310) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 311)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 312) static void collect_percpu_times(struct psi_group *group,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 313) enum psi_aggregators aggregator,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 314) u32 *pchanged_states)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 315) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 316) u64 deltas[NR_PSI_STATES - 1] = { 0, };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 317) unsigned long nonidle_total = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 318) u32 changed_states = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 319) int cpu;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 320) int s;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 321)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 322) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 323) * Collect the per-cpu time buckets and average them into a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 324) * single time sample that is normalized to wallclock time.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 325) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 326) * For averaging, each CPU is weighted by its non-idle time in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 327) * the sampling period. This eliminates artifacts from uneven
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 328) * loading, or even entirely idle CPUs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 329) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 330) for_each_possible_cpu(cpu) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 331) u32 times[NR_PSI_STATES];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 332) u32 nonidle;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 333) u32 cpu_changed_states;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 334)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 335) get_recent_times(group, cpu, aggregator, times,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 336) &cpu_changed_states);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 337) changed_states |= cpu_changed_states;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 338)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 339) nonidle = nsecs_to_jiffies(times[PSI_NONIDLE]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 340) nonidle_total += nonidle;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 341)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 342) for (s = 0; s < PSI_NONIDLE; s++)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 343) deltas[s] += (u64)times[s] * nonidle;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 344) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 345)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 346) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 347) * Integrate the sample into the running statistics that are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 348) * reported to userspace: the cumulative stall times and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 349) * decaying averages.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 350) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 351) * Pressure percentages are sampled at PSI_FREQ. We might be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 352) * called more often when the user polls more frequently than
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 353) * that; we might be called less often when there is no task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 354) * activity, thus no data, and clock ticks are sporadic. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 355) * below handles both.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 356) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 357)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 358) /* total= */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 359) for (s = 0; s < NR_PSI_STATES - 1; s++)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 360) group->total[aggregator][s] +=
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 361) div_u64(deltas[s], max(nonidle_total, 1UL));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 362)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 363) if (pchanged_states)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 364) *pchanged_states = changed_states;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 365) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 366)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 367) static u64 update_averages(struct psi_group *group, u64 now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 368) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 369) unsigned long missed_periods = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 370) u64 expires, period;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 371) u64 avg_next_update;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 372) int s;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 373)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 374) /* avgX= */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 375) expires = group->avg_next_update;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 376) if (now - expires >= psi_period)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 377) missed_periods = div_u64(now - expires, psi_period);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 378)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 379) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 380) * The periodic clock tick can get delayed for various
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 381) * reasons, especially on loaded systems. To avoid clock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 382) * drift, we schedule the clock in fixed psi_period intervals.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 383) * But the deltas we sample out of the per-cpu buckets above
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 384) * are based on the actual time elapsing between clock ticks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 385) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 386) avg_next_update = expires + ((1 + missed_periods) * psi_period);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 387) period = now - (group->avg_last_update + (missed_periods * psi_period));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 388) group->avg_last_update = now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 389)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 390) for (s = 0; s < NR_PSI_STATES - 1; s++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 391) u32 sample;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 392)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 393) sample = group->total[PSI_AVGS][s] - group->avg_total[s];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 394) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 395) * Due to the lockless sampling of the time buckets,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 396) * recorded time deltas can slip into the next period,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 397) * which under full pressure can result in samples in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 398) * excess of the period length.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 399) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 400) * We don't want to report non-sensical pressures in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 401) * excess of 100%, nor do we want to drop such events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 402) * on the floor. Instead we punt any overage into the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 403) * future until pressure subsides. By doing this we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 404) * don't underreport the occurring pressure curve, we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 405) * just report it delayed by one period length.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 406) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 407) * The error isn't cumulative. As soon as another
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 408) * delta slips from a period P to P+1, by definition
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 409) * it frees up its time T in P.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 410) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 411) if (sample > period)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 412) sample = period;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 413) group->avg_total[s] += sample;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 414) calc_avgs(group->avg[s], missed_periods, sample, period);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 415) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 416)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 417) return avg_next_update;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 418) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 419)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 420) static void psi_avgs_work(struct work_struct *work)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 421) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 422) struct delayed_work *dwork;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 423) struct psi_group *group;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 424) u32 changed_states;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 425) bool nonidle;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 426) u64 now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 427)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 428) dwork = to_delayed_work(work);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 429) group = container_of(dwork, struct psi_group, avgs_work);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 430)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 431) mutex_lock(&group->avgs_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 432)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 433) now = sched_clock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 434)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 435) collect_percpu_times(group, PSI_AVGS, &changed_states);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 436) nonidle = changed_states & (1 << PSI_NONIDLE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 437) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 438) * If there is task activity, periodically fold the per-cpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 439) * times and feed samples into the running averages. If things
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 440) * are idle and there is no data to process, stop the clock.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 441) * Once restarted, we'll catch up the running averages in one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 442) * go - see calc_avgs() and missed_periods.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 443) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 444) if (now >= group->avg_next_update)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 445) group->avg_next_update = update_averages(group, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 446)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 447) if (nonidle) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 448) schedule_delayed_work(dwork, nsecs_to_jiffies(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 449) group->avg_next_update - now) + 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 450) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 451)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 452) mutex_unlock(&group->avgs_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 453) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 454)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 455) /* Trigger tracking window manupulations */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 456) static void window_reset(struct psi_window *win, u64 now, u64 value,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 457) u64 prev_growth)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 458) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 459) win->start_time = now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 460) win->start_value = value;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 461) win->prev_growth = prev_growth;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 462) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 463)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 464) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 465) * PSI growth tracking window update and growth calculation routine.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 466) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 467) * This approximates a sliding tracking window by interpolating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 468) * partially elapsed windows using historical growth data from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 469) * previous intervals. This minimizes memory requirements (by not storing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 470) * all the intermediate values in the previous window) and simplifies
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 471) * the calculations. It works well because PSI signal changes only in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 472) * positive direction and over relatively small window sizes the growth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 473) * is close to linear.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 474) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 475) static u64 window_update(struct psi_window *win, u64 now, u64 value)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 476) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 477) u64 elapsed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 478) u64 growth;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 479)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 480) elapsed = now - win->start_time;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 481) growth = value - win->start_value;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 482) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 483) * After each tracking window passes win->start_value and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 484) * win->start_time get reset and win->prev_growth stores
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 485) * the average per-window growth of the previous window.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 486) * win->prev_growth is then used to interpolate additional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 487) * growth from the previous window assuming it was linear.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 488) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 489) if (elapsed > win->size)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 490) window_reset(win, now, value, growth);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 491) else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 492) u32 remaining;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 493)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 494) remaining = win->size - elapsed;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 495) growth += div64_u64(win->prev_growth * remaining, win->size);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 496) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 497)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 498) return growth;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 499) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 500)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 501) static void init_triggers(struct psi_group *group, u64 now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 502) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 503) struct psi_trigger *t;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 504)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 505) list_for_each_entry(t, &group->triggers, node)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 506) window_reset(&t->win, now,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 507) group->total[PSI_POLL][t->state], 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 508) memcpy(group->polling_total, group->total[PSI_POLL],
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 509) sizeof(group->polling_total));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 510) group->polling_next_update = now + group->poll_min_period;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 511) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 512)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 513) static u64 update_triggers(struct psi_group *group, u64 now)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 514) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 515) struct psi_trigger *t;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 516) bool new_stall = false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 517) u64 *total = group->total[PSI_POLL];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 518)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 519) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 520) * On subsequent updates, calculate growth deltas and let
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 521) * watchers know when their specified thresholds are exceeded.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 522) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 523) list_for_each_entry(t, &group->triggers, node) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 524) u64 growth;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 525)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 526) /* Check for stall activity */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 527) if (group->polling_total[t->state] == total[t->state])
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 528) continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 529)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 530) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 531) * Multiple triggers might be looking at the same state,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 532) * remember to update group->polling_total[] once we've
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 533) * been through all of them. Also remember to extend the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 534) * polling time if we see new stall activity.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 535) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 536) new_stall = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 537)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 538) /* Calculate growth since last update */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 539) growth = window_update(&t->win, now, total[t->state]);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 540) if (growth < t->threshold)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 541) continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 542)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 543) /* Limit event signaling to once per window */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 544) if (now < t->last_event_time + t->win.size)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 545) continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 546)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 547) trace_android_vh_psi_event(t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 548)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 549) /* Generate an event */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 550) if (cmpxchg(&t->event, 0, 1) == 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 551) wake_up_interruptible(&t->event_wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 552) t->last_event_time = now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 553) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 554)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 555) trace_android_vh_psi_group(group);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 556)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 557) if (new_stall)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 558) memcpy(group->polling_total, total,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 559) sizeof(group->polling_total));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 560)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 561) return now + group->poll_min_period;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 562) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 563)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 564) /* Schedule polling if it's not already scheduled or forced. */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 565) static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 566) bool force)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 567) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 568) struct task_struct *task;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 569)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 570) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 571) * atomic_xchg should be called even when !force to provide a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 572) * full memory barrier (see the comment inside psi_poll_work).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 573) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 574) if (atomic_xchg(&group->poll_scheduled, 1) && !force)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 575) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 576)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 577) rcu_read_lock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 578)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 579) task = rcu_dereference(group->poll_task);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 580) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 581) * kworker might be NULL in case psi_trigger_destroy races with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 582) * psi_task_change (hotpath) which can't use locks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 583) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 584) if (likely(task))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 585) mod_timer(&group->poll_timer, jiffies + delay);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 586) else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 587) atomic_set(&group->poll_scheduled, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 588)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 589) rcu_read_unlock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 590) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 591)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 592) static void psi_poll_work(struct psi_group *group)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 593) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 594) bool force_reschedule = false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 595) u32 changed_states;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 596) u64 now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 597)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 598) mutex_lock(&group->trigger_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 599)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 600) now = sched_clock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 601)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 602) if (now > group->polling_until) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 603) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 604) * We are either about to start or might stop polling if no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 605) * state change was recorded. Resetting poll_scheduled leaves
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 606) * a small window for psi_group_change to sneak in and schedule
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 607) * an immegiate poll_work before we get to rescheduling. One
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 608) * potential extra wakeup at the end of the polling window
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 609) * should be negligible and polling_next_update still keeps
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 610) * updates correctly on schedule.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 611) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 612) atomic_set(&group->poll_scheduled, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 613) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 614) * A task change can race with the poll worker that is supposed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 615) * report on it. To avoid missing events, ensure ordering between
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 616) * poll_scheduled and the task state accesses, such that if the poll
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 617) * worker misses the state update, the task change is guaranteed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 618) * reschedule the poll worker:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 619) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 620) * poll worker:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 621) * atomic_set(poll_scheduled, 0)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 622) * smp_mb()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 623) * LOAD states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 624) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 625) * task change:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 626) * STORE states
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 627) * if atomic_xchg(poll_scheduled, 1) == 0:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 628) * schedule poll worker
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 629) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 630) * The atomic_xchg() implies a full barrier.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 631) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 632) smp_mb();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 633) } else {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 634) /* Polling window is not over, keep rescheduling */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 635) force_reschedule = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 636) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 637)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 638)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 639) collect_percpu_times(group, PSI_POLL, &changed_states);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 640)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 641) if (changed_states & group->poll_states) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 642) /* Initialize trigger windows when entering polling mode */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 643) if (now > group->polling_until)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 644) init_triggers(group, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 645)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 646) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 647) * Keep the monitor active for at least the duration of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 648) * minimum tracking window as long as monitor states are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 649) * changing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 650) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 651) group->polling_until = now +
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 652) group->poll_min_period * UPDATES_PER_WINDOW;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 653) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 654)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 655) if (now > group->polling_until) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 656) group->polling_next_update = ULLONG_MAX;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 657) goto out;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 658) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 659)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 660) if (now >= group->polling_next_update)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 661) group->polling_next_update = update_triggers(group, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 662)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 663) psi_schedule_poll_work(group,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 664) nsecs_to_jiffies(group->polling_next_update - now) + 1,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 665) force_reschedule);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 666)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 667) out:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 668) mutex_unlock(&group->trigger_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 669) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 670)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 671) static int psi_poll_worker(void *data)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 672) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 673) struct psi_group *group = (struct psi_group *)data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 674)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 675) sched_set_fifo_low(current);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 676)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 677) while (true) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 678) wait_event_interruptible(group->poll_wait,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 679) atomic_cmpxchg(&group->poll_wakeup, 1, 0) ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 680) kthread_should_stop());
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 681) if (kthread_should_stop())
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 682) break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 683)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 684) psi_poll_work(group);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 685) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 686) return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 687) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 688)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 689) static void poll_timer_fn(struct timer_list *t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 690) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 691) struct psi_group *group = from_timer(group, t, poll_timer);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 692)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 693) atomic_set(&group->poll_wakeup, 1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 694) wake_up_interruptible(&group->poll_wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 695) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 696)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 697) static void record_times(struct psi_group_cpu *groupc, int cpu,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 698) bool memstall_tick)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 699) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 700) u32 delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 701) u64 now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 702)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 703) now = cpu_clock(cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 704) delta = now - groupc->state_start;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 705) groupc->state_start = now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 706)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 707) if (groupc->state_mask & (1 << PSI_IO_SOME)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 708) groupc->times[PSI_IO_SOME] += delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 709) if (groupc->state_mask & (1 << PSI_IO_FULL))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 710) groupc->times[PSI_IO_FULL] += delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 711) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 712)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 713) if (groupc->state_mask & (1 << PSI_MEM_SOME)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 714) groupc->times[PSI_MEM_SOME] += delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 715) if (groupc->state_mask & (1 << PSI_MEM_FULL))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 716) groupc->times[PSI_MEM_FULL] += delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 717) else if (memstall_tick) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 718) u32 sample;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 719) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 720) * Since we care about lost potential, a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 721) * memstall is FULL when there are no other
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 722) * working tasks, but also when the CPU is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 723) * actively reclaiming and nothing productive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 724) * could run even if it were runnable.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 725) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 726) * When the timer tick sees a reclaiming CPU,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 727) * regardless of runnable tasks, sample a FULL
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 728) * tick (or less if it hasn't been a full tick
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 729) * since the last state change).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 730) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 731) sample = min(delta, (u32)jiffies_to_nsecs(1));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 732) groupc->times[PSI_MEM_FULL] += sample;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 733) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 734) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 735)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 736) if (groupc->state_mask & (1 << PSI_CPU_SOME))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 737) groupc->times[PSI_CPU_SOME] += delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 738)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 739) if (groupc->state_mask & (1 << PSI_NONIDLE))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 740) groupc->times[PSI_NONIDLE] += delta;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 741) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 742)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 743) static void psi_group_change(struct psi_group *group, int cpu,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 744) unsigned int clear, unsigned int set,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 745) bool wake_clock)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 746) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 747) struct psi_group_cpu *groupc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 748) u32 state_mask = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 749) unsigned int t, m;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 750) enum psi_states s;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 751)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 752) groupc = per_cpu_ptr(group->pcpu, cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 753)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 754) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 755) * First we assess the aggregate resource states this CPU's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 756) * tasks have been in since the last change, and account any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 757) * SOME and FULL time these may have resulted in.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 758) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 759) * Then we update the task counts according to the state
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 760) * change requested through the @clear and @set bits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 761) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 762) write_seqcount_begin(&groupc->seq);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 763)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 764) record_times(groupc, cpu, false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 765)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 766) for (t = 0, m = clear; m; m &= ~(1 << t), t++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 767) if (!(m & (1 << t)))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 768) continue;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 769) if (groupc->tasks[t]) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 770) groupc->tasks[t]--;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 771) } else if (!psi_bug) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 772) printk_deferred(KERN_ERR "psi: task underflow! cpu=%d t=%d tasks=[%u %u %u %u] clear=%x set=%x\n",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 773) cpu, t, groupc->tasks[0],
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 774) groupc->tasks[1], groupc->tasks[2],
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 775) groupc->tasks[3], clear, set);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 776) psi_bug = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 777) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 778) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 779)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 780) for (t = 0; set; set &= ~(1 << t), t++)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 781) if (set & (1 << t))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 782) groupc->tasks[t]++;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 783)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 784) /* Calculate state mask representing active states */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 785) for (s = 0; s < NR_PSI_STATES; s++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 786) if (test_state(groupc->tasks, s))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 787) state_mask |= (1 << s);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 788) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 789) groupc->state_mask = state_mask;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 790)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 791) write_seqcount_end(&groupc->seq);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 792)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 793) if (state_mask & group->poll_states)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 794) psi_schedule_poll_work(group, 1, false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 795)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 796) if (wake_clock && !delayed_work_pending(&group->avgs_work))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 797) schedule_delayed_work(&group->avgs_work, PSI_FREQ);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 798) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 799)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 800) static struct psi_group *iterate_groups(struct task_struct *task, void **iter)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 801) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 802) if (*iter == &psi_system)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 803) return NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 804)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 805) #ifdef CONFIG_CGROUPS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 806) if (static_branch_likely(&psi_cgroups_enabled)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 807) struct cgroup *cgroup = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 808)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 809) if (!*iter)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 810) cgroup = task->cgroups->dfl_cgrp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 811) else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 812) cgroup = cgroup_parent(*iter);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 813)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 814) if (cgroup && cgroup_parent(cgroup)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 815) *iter = cgroup;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 816) return cgroup_psi(cgroup);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 817) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 818) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 819) #endif
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 820) *iter = &psi_system;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 821) return &psi_system;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 822) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 823)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 824) static void psi_flags_change(struct task_struct *task, int clear, int set)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 825) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 826) if (((task->psi_flags & set) ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 827) (task->psi_flags & clear) != clear) &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 828) !psi_bug) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 829) printk_deferred(KERN_ERR "psi: inconsistent task state! task=%d:%s cpu=%d psi_flags=%x clear=%x set=%x\n",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 830) task->pid, task->comm, task_cpu(task),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 831) task->psi_flags, clear, set);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 832) psi_bug = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 833) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 834)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 835) task->psi_flags &= ~clear;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 836) task->psi_flags |= set;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 837) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 838)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 839) void psi_task_change(struct task_struct *task, int clear, int set)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 840) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 841) int cpu = task_cpu(task);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 842) struct psi_group *group;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 843) bool wake_clock = true;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 844) void *iter = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 845)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 846) if (!task->pid)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 847) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 848)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 849) psi_flags_change(task, clear, set);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 850)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 851) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 852) * Periodic aggregation shuts off if there is a period of no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 853) * task changes, so we wake it back up if necessary. However,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 854) * don't do this if the task change is the aggregation worker
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 855) * itself going to sleep, or we'll ping-pong forever.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 856) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 857) if (unlikely((clear & TSK_RUNNING) &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 858) (task->flags & PF_WQ_WORKER) &&
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 859) wq_worker_last_func(task) == psi_avgs_work))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 860) wake_clock = false;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 861)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 862) while ((group = iterate_groups(task, &iter)))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 863) psi_group_change(group, cpu, clear, set, wake_clock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 864) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 865)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 866) void psi_task_switch(struct task_struct *prev, struct task_struct *next,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 867) bool sleep)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 868) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 869) struct psi_group *group, *common = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 870) int cpu = task_cpu(prev);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 871) void *iter;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 872)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 873) if (next->pid) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 874) psi_flags_change(next, 0, TSK_ONCPU);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 875) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 876) * When moving state between tasks, the group that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 877) * contains them both does not change: we can stop
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 878) * updating the tree once we reach the first common
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 879) * ancestor. Iterate @next's ancestors until we
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 880) * encounter @prev's state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 881) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 882) iter = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 883) while ((group = iterate_groups(next, &iter))) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 884) if (per_cpu_ptr(group->pcpu, cpu)->tasks[NR_ONCPU]) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 885) common = group;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 886) break;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 887) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 888)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 889) psi_group_change(group, cpu, 0, TSK_ONCPU, true);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 890) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 891) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 892)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 893) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 894) * If this is a voluntary sleep, dequeue will have taken care
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 895) * of the outgoing TSK_ONCPU alongside TSK_RUNNING already. We
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 896) * only need to deal with it during preemption.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 897) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 898) if (sleep)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 899) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 900)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 901) if (prev->pid) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 902) psi_flags_change(prev, TSK_ONCPU, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 903)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 904) iter = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 905) while ((group = iterate_groups(prev, &iter)) && group != common)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 906) psi_group_change(group, cpu, TSK_ONCPU, 0, true);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 907) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 908) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 909)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 910) void psi_memstall_tick(struct task_struct *task, int cpu)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 911) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 912) struct psi_group *group;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 913) void *iter = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 914)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 915) while ((group = iterate_groups(task, &iter))) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 916) struct psi_group_cpu *groupc;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 917)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 918) groupc = per_cpu_ptr(group->pcpu, cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 919) write_seqcount_begin(&groupc->seq);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 920) record_times(groupc, cpu, true);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 921) write_seqcount_end(&groupc->seq);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 922) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 923) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 924)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 925) /**
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 926) * psi_memstall_enter - mark the beginning of a memory stall section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 927) * @flags: flags to handle nested sections
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 928) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 929) * Marks the calling task as being stalled due to a lack of memory,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 930) * such as waiting for a refault or performing reclaim.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 931) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 932) void psi_memstall_enter(unsigned long *flags)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 933) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 934) struct rq_flags rf;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 935) struct rq *rq;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 936)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 937) if (static_branch_likely(&psi_disabled))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 938) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 939)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 940) *flags = current->in_memstall;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 941) if (*flags)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 942) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 943) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 944) * in_memstall setting & accounting needs to be atomic wrt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 945) * changes to the task's scheduling state, otherwise we can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 946) * race with CPU migration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 947) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 948) rq = this_rq_lock_irq(&rf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 949)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 950) current->in_memstall = 1;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 951) psi_task_change(current, 0, TSK_MEMSTALL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 952)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 953) rq_unlock_irq(rq, &rf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 954) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 955)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 956) /**
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 957) * psi_memstall_leave - mark the end of an memory stall section
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 958) * @flags: flags to handle nested memdelay sections
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 959) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 960) * Marks the calling task as no longer stalled due to lack of memory.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 961) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 962) void psi_memstall_leave(unsigned long *flags)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 963) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 964) struct rq_flags rf;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 965) struct rq *rq;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 966)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 967) if (static_branch_likely(&psi_disabled))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 968) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 969)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 970) if (*flags)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 971) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 972) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 973) * in_memstall clearing & accounting needs to be atomic wrt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 974) * changes to the task's scheduling state, otherwise we could
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 975) * race with CPU migration.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 976) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 977) rq = this_rq_lock_irq(&rf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 978)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 979) current->in_memstall = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 980) psi_task_change(current, TSK_MEMSTALL, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 981)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 982) rq_unlock_irq(rq, &rf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 983) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 984)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 985) #ifdef CONFIG_CGROUPS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 986) int psi_cgroup_alloc(struct cgroup *cgroup)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 987) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 988) if (static_branch_likely(&psi_disabled))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 989) return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 990)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 991) cgroup->psi.pcpu = alloc_percpu(struct psi_group_cpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 992) if (!cgroup->psi.pcpu)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 993) return -ENOMEM;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 994) group_init(&cgroup->psi);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 995) return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 996) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 997)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 998) void psi_cgroup_free(struct cgroup *cgroup)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 999) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) if (static_branch_likely(&psi_disabled))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003) cancel_delayed_work_sync(&cgroup->psi.avgs_work);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004) free_percpu(cgroup->psi.pcpu);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005) /* All triggers must be removed by now */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) WARN_ONCE(cgroup->psi.poll_states, "psi: trigger leak\n");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009) /**
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010) * cgroup_move_task - move task to a different cgroup
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011) * @task: the task
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012) * @to: the target css_set
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014) * Move task to a new cgroup and safely migrate its associated stall
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015) * state between the different groups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016) *
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017) * This function acquires the task's rq lock to lock out concurrent
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) * changes to the task's scheduling state and - in case the task is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019) * running - concurrent changes to its stall state.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) void cgroup_move_task(struct task_struct *task, struct css_set *to)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023) unsigned int task_flags = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024) struct rq_flags rf;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025) struct rq *rq;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027) if (static_branch_likely(&psi_disabled)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) * Lame to do this here, but the scheduler cannot be locked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030) * from the outside, so we move cgroups from inside sched/.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032) rcu_assign_pointer(task->cgroups, to);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036) rq = task_rq_lock(task, &rf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038) if (task_on_rq_queued(task)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039) task_flags = TSK_RUNNING;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040) if (task_current(rq, task))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) task_flags |= TSK_ONCPU;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042) } else if (task->in_iowait)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043) task_flags = TSK_IOWAIT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) if (task->in_memstall)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046) task_flags |= TSK_MEMSTALL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048) if (task_flags)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049) psi_task_change(task, task_flags, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051) /* See comment above */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052) rcu_assign_pointer(task->cgroups, to);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054) if (task_flags)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055) psi_task_change(task, 0, task_flags);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057) task_rq_unlock(rq, task, &rf);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059) #endif /* CONFIG_CGROUPS */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061) int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063) int full;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064) u64 now;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066) if (static_branch_likely(&psi_disabled))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067) return -EOPNOTSUPP;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069) /* Update averages before reporting them */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070) mutex_lock(&group->avgs_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071) now = sched_clock();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072) collect_percpu_times(group, PSI_AVGS, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073) if (now >= group->avg_next_update)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1074) group->avg_next_update = update_averages(group, now);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1075) mutex_unlock(&group->avgs_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1076)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1077) for (full = 0; full < 2 - (res == PSI_CPU); full++) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1078) unsigned long avg[3];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1079) u64 total;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1080) int w;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1081)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1082) for (w = 0; w < 3; w++)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1083) avg[w] = group->avg[res * 2 + full][w];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1084) total = div_u64(group->total[PSI_AVGS][res * 2 + full],
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1085) NSEC_PER_USEC);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1086)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1087) seq_printf(m, "%s avg10=%lu.%02lu avg60=%lu.%02lu avg300=%lu.%02lu total=%llu\n",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1088) full ? "full" : "some",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1089) LOAD_INT(avg[0]), LOAD_FRAC(avg[0]),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1090) LOAD_INT(avg[1]), LOAD_FRAC(avg[1]),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1091) LOAD_INT(avg[2]), LOAD_FRAC(avg[2]),
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1092) total);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1093) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1094)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1095) return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1096) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1097)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1098) static int psi_io_show(struct seq_file *m, void *v)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1099) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1100) return psi_show(m, &psi_system, PSI_IO);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1101) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1103) static int psi_memory_show(struct seq_file *m, void *v)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1104) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1105) return psi_show(m, &psi_system, PSI_MEM);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1106) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1108) static int psi_cpu_show(struct seq_file *m, void *v)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1109) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1110) return psi_show(m, &psi_system, PSI_CPU);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1111) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1113) static int psi_io_open(struct inode *inode, struct file *file)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1114) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1115) return single_open(file, psi_io_show, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1116) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1117)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1118) static int psi_memory_open(struct inode *inode, struct file *file)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1119) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1120) return single_open(file, psi_memory_show, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1121) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1122)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1123) static int psi_cpu_open(struct inode *inode, struct file *file)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1124) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1125) return single_open(file, psi_cpu_show, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1126) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1127)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1128) struct psi_trigger *psi_trigger_create(struct psi_group *group,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1129) char *buf, size_t nbytes, enum psi_res res)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1130) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1131) struct psi_trigger *t;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1132) enum psi_states state;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1133) u32 threshold_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1134) u32 window_us;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1136) if (static_branch_likely(&psi_disabled))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1137) return ERR_PTR(-EOPNOTSUPP);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1139) if (sscanf(buf, "some %u %u", &threshold_us, &window_us) == 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1140) state = PSI_IO_SOME + res * 2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1141) else if (sscanf(buf, "full %u %u", &threshold_us, &window_us) == 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1142) state = PSI_IO_FULL + res * 2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1143) else
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1144) return ERR_PTR(-EINVAL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1145)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1146) if (state >= PSI_NONIDLE)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1147) return ERR_PTR(-EINVAL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1148)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1149) if (window_us < WINDOW_MIN_US ||
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1150) window_us > WINDOW_MAX_US)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1151) return ERR_PTR(-EINVAL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1152)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1153) /* Check threshold */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1154) if (threshold_us == 0 || threshold_us > window_us)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1155) return ERR_PTR(-EINVAL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1157) t = kmalloc(sizeof(*t), GFP_KERNEL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1158) if (!t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1159) return ERR_PTR(-ENOMEM);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1160)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1161) t->group = group;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1162) t->state = state;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1163) t->threshold = threshold_us * NSEC_PER_USEC;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1164) t->win.size = window_us * NSEC_PER_USEC;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1165) window_reset(&t->win, 0, 0, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1166)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1167) t->event = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1168) t->last_event_time = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1169) init_waitqueue_head(&t->event_wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1170)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1171) mutex_lock(&group->trigger_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1172)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1173) if (!rcu_access_pointer(group->poll_task)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1174) struct task_struct *task;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1175)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1176) task = kthread_create(psi_poll_worker, group, "psimon");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1177) if (IS_ERR(task)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1178) kfree(t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1179) mutex_unlock(&group->trigger_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1180) return ERR_CAST(task);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1181) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1182) atomic_set(&group->poll_wakeup, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1183) wake_up_process(task);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1184) rcu_assign_pointer(group->poll_task, task);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1185) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1186)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1187) list_add(&t->node, &group->triggers);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1188) group->poll_min_period = min(group->poll_min_period,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1189) div_u64(t->win.size, UPDATES_PER_WINDOW));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1190) group->nr_triggers[t->state]++;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1191) group->poll_states |= (1 << t->state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1192)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1193) mutex_unlock(&group->trigger_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1194)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1195) return t;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1196) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1197)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1198) void psi_trigger_destroy(struct psi_trigger *t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1199) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1200) struct psi_group *group;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1201) struct task_struct *task_to_destroy = NULL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1202)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1203) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1204) * We do not check psi_disabled since it might have been disabled after
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1205) * the trigger got created.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1206) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1207) if (!t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1208) return;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1209)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1210) group = t->group;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1211) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1212) * Wakeup waiters to stop polling. Can happen if cgroup is deleted
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1213) * from under a polling process.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1214) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1215) wake_up_interruptible(&t->event_wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1216)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1217) mutex_lock(&group->trigger_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1218)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1219) if (!list_empty(&t->node)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1220) struct psi_trigger *tmp;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1221) u64 period = ULLONG_MAX;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1222)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1223) list_del(&t->node);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1224) group->nr_triggers[t->state]--;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1225) if (!group->nr_triggers[t->state])
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1226) group->poll_states &= ~(1 << t->state);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1227) /* reset min update period for the remaining triggers */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1228) list_for_each_entry(tmp, &group->triggers, node)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1229) period = min(period, div_u64(tmp->win.size,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1230) UPDATES_PER_WINDOW));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1231) group->poll_min_period = period;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1232) /* Destroy poll_task when the last trigger is destroyed */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1233) if (group->poll_states == 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1234) group->polling_until = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1235) task_to_destroy = rcu_dereference_protected(
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1236) group->poll_task,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1237) lockdep_is_held(&group->trigger_lock));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1238) rcu_assign_pointer(group->poll_task, NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1239) del_timer(&group->poll_timer);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1240) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1241) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1242)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1243) mutex_unlock(&group->trigger_lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1244)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1245) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1246) * Wait for psi_schedule_poll_work RCU to complete its read-side
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1247) * critical section before destroying the trigger and optionally the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1248) * poll_task.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1249) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1250) synchronize_rcu();
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1251) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1252) * Stop kthread 'psimon' after releasing trigger_lock to prevent a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1253) * deadlock while waiting for psi_poll_work to acquire trigger_lock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1254) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1255) if (task_to_destroy) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1256) /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1257) * After the RCU grace period has expired, the worker
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1258) * can no longer be found through group->poll_task.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1259) */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1260) kthread_stop(task_to_destroy);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1261) atomic_set(&group->poll_scheduled, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1262) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1263) kfree(t);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1264) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1265)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1266) __poll_t psi_trigger_poll(void **trigger_ptr,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1267) struct file *file, poll_table *wait)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1268) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1269) __poll_t ret = DEFAULT_POLLMASK;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1270) struct psi_trigger *t;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1271)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1272) if (static_branch_likely(&psi_disabled))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1273) return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1274)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1275) t = smp_load_acquire(trigger_ptr);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1276) if (!t)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1277) return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1278)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1279) poll_wait(file, &t->event_wait, wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1280)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1281) if (cmpxchg(&t->event, 1, 0) == 1)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1282) ret |= EPOLLPRI;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1283)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1284) return ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1285) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1286)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1287) static ssize_t psi_write(struct file *file, const char __user *user_buf,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1288) size_t nbytes, enum psi_res res)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1289) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1290) char buf[32];
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1291) size_t buf_size;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1292) struct seq_file *seq;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1293) struct psi_trigger *new;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1294)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1295) if (static_branch_likely(&psi_disabled))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1296) return -EOPNOTSUPP;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1297)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1298) if (!nbytes)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1299) return -EINVAL;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1300)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1301) buf_size = min(nbytes, sizeof(buf));
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1302) if (copy_from_user(buf, user_buf, buf_size))
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1303) return -EFAULT;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1304)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1305) buf[buf_size - 1] = '\0';
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1306)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1307) seq = file->private_data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1308)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1309) /* Take seq->lock to protect seq->private from concurrent writes */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1310) mutex_lock(&seq->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1311)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1312) /* Allow only one trigger per file descriptor */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1313) if (seq->private) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1314) mutex_unlock(&seq->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1315) return -EBUSY;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1316) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1317)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1318) new = psi_trigger_create(&psi_system, buf, nbytes, res);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1319) if (IS_ERR(new)) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1320) mutex_unlock(&seq->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1321) return PTR_ERR(new);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1322) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1323)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1324) smp_store_release(&seq->private, new);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1325) mutex_unlock(&seq->lock);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1326)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1327) return nbytes;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1328) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1329)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1330) static ssize_t psi_io_write(struct file *file, const char __user *user_buf,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1331) size_t nbytes, loff_t *ppos)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1332) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1333) return psi_write(file, user_buf, nbytes, PSI_IO);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1334) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1335)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1336) static ssize_t psi_memory_write(struct file *file, const char __user *user_buf,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1337) size_t nbytes, loff_t *ppos)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1338) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1339) return psi_write(file, user_buf, nbytes, PSI_MEM);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1340) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1341)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1342) static ssize_t psi_cpu_write(struct file *file, const char __user *user_buf,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1343) size_t nbytes, loff_t *ppos)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1344) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1345) return psi_write(file, user_buf, nbytes, PSI_CPU);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1346) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1347)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1348) static __poll_t psi_fop_poll(struct file *file, poll_table *wait)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1349) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1350) struct seq_file *seq = file->private_data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1351)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1352) return psi_trigger_poll(&seq->private, file, wait);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1353) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1354)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1355) static int psi_fop_release(struct inode *inode, struct file *file)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1356) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1357) struct seq_file *seq = file->private_data;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1358)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1359) psi_trigger_destroy(seq->private);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1360) return single_release(inode, file);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1361) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1362)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1363) static const struct proc_ops psi_io_proc_ops = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1364) .proc_open = psi_io_open,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1365) .proc_read = seq_read,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1366) .proc_lseek = seq_lseek,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1367) .proc_write = psi_io_write,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1368) .proc_poll = psi_fop_poll,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1369) .proc_release = psi_fop_release,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1370) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1371)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1372) static const struct proc_ops psi_memory_proc_ops = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1373) .proc_open = psi_memory_open,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1374) .proc_read = seq_read,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1375) .proc_lseek = seq_lseek,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1376) .proc_write = psi_memory_write,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1377) .proc_poll = psi_fop_poll,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1378) .proc_release = psi_fop_release,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1379) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1380)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1381) static const struct proc_ops psi_cpu_proc_ops = {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1382) .proc_open = psi_cpu_open,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1383) .proc_read = seq_read,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1384) .proc_lseek = seq_lseek,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1385) .proc_write = psi_cpu_write,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1386) .proc_poll = psi_fop_poll,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1387) .proc_release = psi_fop_release,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1388) };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1389)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1390) static int __init psi_proc_init(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1391) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1392) if (psi_enable) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1393) proc_mkdir("pressure", NULL);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1394) proc_create("pressure/io", 0, NULL, &psi_io_proc_ops);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1395) proc_create("pressure/memory", 0, NULL, &psi_memory_proc_ops);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1396) proc_create("pressure/cpu", 0, NULL, &psi_cpu_proc_ops);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1397) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1398) return 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1399) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1400) module_init(psi_proc_init);