Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    2) .. include:: <isonum.txt>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    3) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    4) ===========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    5) User Interface for Resource Control feature
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    6) ===========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    7) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    8) :Copyright: |copy| 2016 Intel Corporation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300    9) :Authors: - Fenghua Yu <fenghua.yu@intel.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   10)           - Tony Luck <tony.luck@intel.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   11)           - Vikas Shivappa <vikas.shivappa@intel.com>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   13) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   14) Intel refers to this feature as Intel Resource Director Technology(Intel(R) RDT).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   15) AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   16) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   17) This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   18) flag bits:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   19) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   20) =============================================	================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   21) RDT (Resource Director Technology) Allocation	"rdt_a"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   22) CAT (Cache Allocation Technology)		"cat_l3", "cat_l2"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   23) CDP (Code and Data Prioritization)		"cdp_l3", "cdp_l2"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   24) CQM (Cache QoS Monitoring)			"cqm_llc", "cqm_occup_llc"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   25) MBM (Memory Bandwidth Monitoring)		"cqm_mbm_total", "cqm_mbm_local"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   26) MBA (Memory Bandwidth Allocation)		"mba"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   27) =============================================	================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   28) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   29) To use the feature mount the file system::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   30) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   31)  # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   33) mount options are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   34) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   35) "cdp":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   36) 	Enable code/data prioritization in L3 cache allocations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   37) "cdpl2":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   38) 	Enable code/data prioritization in L2 cache allocations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   39) "mba_MBps":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   40) 	Enable the MBA Software Controller(mba_sc) to specify MBA
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   41) 	bandwidth in MBps
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   42) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   43) L2 and L3 CDP are controlled separately.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   44) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   45) RDT features are orthogonal. A particular system may support only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   46) monitoring, only control, or both monitoring and control.  Cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   47) pseudo-locking is a unique way of using cache control to "pin" or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   48) "lock" data in the cache. Details can be found in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   49) "Cache Pseudo-Locking".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   51) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   52) The mount succeeds if either of allocation or monitoring is present, but
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   53) only those files and directories supported by the system will be created.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   54) For more details on the behavior of the interface during monitoring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   55) and allocation, see the "Resource alloc and monitor groups" section.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   56) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   57) Info directory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   58) ==============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   60) The 'info' directory contains information about the enabled
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   61) resources. Each resource has its own subdirectory. The subdirectory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   62) names reflect the resource names.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   63) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   64) Each subdirectory contains the following files with respect to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   65) allocation:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   67) Cache resource(L3/L2)  subdirectory contains the following files
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   68) related to allocation:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   69) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   70) "num_closids":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   71) 		The number of CLOSIDs which are valid for this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   72) 		resource. The kernel uses the smallest number of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   73) 		CLOSIDs of all enabled resources as limit.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   74) "cbm_mask":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   75) 		The bitmask which is valid for this resource.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   76) 		This mask is equivalent to 100%.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   77) "min_cbm_bits":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   78) 		The minimum number of consecutive bits which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   79) 		must be set when writing a mask.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   81) "shareable_bits":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   82) 		Bitmask of shareable resource with other executing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   83) 		entities (e.g. I/O). User can use this when
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   84) 		setting up exclusive cache partitions. Note that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   85) 		some platforms support devices that have their
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   86) 		own settings for cache use which can over-ride
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   87) 		these bits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   88) "bit_usage":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   89) 		Annotated capacity bitmasks showing how all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   90) 		instances of the resource are used. The legend is:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   91) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   92) 			"0":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   93) 			      Corresponding region is unused. When the system's
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   94) 			      resources have been allocated and a "0" is found
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   95) 			      in "bit_usage" it is a sign that resources are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   96) 			      wasted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   97) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   98) 			"H":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   99) 			      Corresponding region is used by hardware only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  100) 			      but available for software use. If a resource
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  101) 			      has bits set in "shareable_bits" but not all
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  102) 			      of these bits appear in the resource groups'
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  103) 			      schematas then the bits appearing in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  104) 			      "shareable_bits" but no resource group will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  105) 			      be marked as "H".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  106) 			"X":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  107) 			      Corresponding region is available for sharing and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  108) 			      used by hardware and software. These are the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  109) 			      bits that appear in "shareable_bits" as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  110) 			      well as a resource group's allocation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  111) 			"S":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  112) 			      Corresponding region is used by software
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  113) 			      and available for sharing.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  114) 			"E":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  115) 			      Corresponding region is used exclusively by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  116) 			      one resource group. No sharing allowed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  117) 			"P":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  118) 			      Corresponding region is pseudo-locked. No
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  119) 			      sharing allowed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  121) Memory bandwidth(MB) subdirectory contains the following files
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  122) with respect to allocation:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  124) "min_bandwidth":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  125) 		The minimum memory bandwidth percentage which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  126) 		user can request.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  127) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  128) "bandwidth_gran":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  129) 		The granularity in which the memory bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  130) 		percentage is allocated. The allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  131) 		b/w percentage is rounded off to the next
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  132) 		control step available on the hardware. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  133) 		available bandwidth control steps are:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  134) 		min_bandwidth + N * bandwidth_gran.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  135) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  136) "delay_linear":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  137) 		Indicates if the delay scale is linear or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  138) 		non-linear. This field is purely informational
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  139) 		only.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  141) "thread_throttle_mode":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  142) 		Indicator on Intel systems of how tasks running on threads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  143) 		of a physical core are throttled in cases where they
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  144) 		request different memory bandwidth percentages:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  145) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  146) 		"max":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  147) 			the smallest percentage is applied
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  148) 			to all threads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  149) 		"per-thread":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  150) 			bandwidth percentages are directly applied to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  151) 			the threads running on the core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  152) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  153) If RDT monitoring is available there will be an "L3_MON" directory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  154) with the following files:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  155) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  156) "num_rmids":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  157) 		The number of RMIDs available. This is the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  158) 		upper bound for how many "CTRL_MON" + "MON"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  159) 		groups can be created.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  160) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  161) "mon_features":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  162) 		Lists the monitoring events if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  163) 		monitoring is enabled for the resource.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  165) "max_threshold_occupancy":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  166) 		Read/write file provides the largest value (in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  167) 		bytes) at which a previously used LLC_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  168) 		counter can be considered for re-use.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  169) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  170) Finally, in the top level of the "info" directory there is a file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  171) named "last_cmd_status". This is reset with every "command" issued
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  172) via the file system (making new directories or writing to any of the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  173) control files). If the command was successful, it will read as "ok".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  174) If the command failed, it will provide more information that can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  175) conveyed in the error returns from file operations. E.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  176) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  177) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  178) 	# echo L3:0=f7 > schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  179) 	bash: echo: write error: Invalid argument
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  180) 	# cat info/last_cmd_status
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  181) 	mask f7 has non-consecutive 1-bits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  182) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  183) Resource alloc and monitor groups
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  184) =================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  185) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  186) Resource groups are represented as directories in the resctrl file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  187) system.  The default group is the root directory which, immediately
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  188) after mounting, owns all the tasks and cpus in the system and can make
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  189) full use of all resources.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  191) On a system with RDT control features additional directories can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  192) created in the root directory that specify different amounts of each
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  193) resource (see "schemata" below). The root and these additional top level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  194) directories are referred to as "CTRL_MON" groups below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  195) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  196) On a system with RDT monitoring the root directory and other top level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  197) directories contain a directory named "mon_groups" in which additional
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  198) directories can be created to monitor subsets of tasks in the CTRL_MON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  199) group that is their ancestor. These are called "MON" groups in the rest
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  200) of this document.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  201) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  202) Removing a directory will move all tasks and cpus owned by the group it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  203) represents to the parent. Removing one of the created CTRL_MON groups
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  204) will automatically remove all MON groups below it.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  205) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  206) All groups contain the following files:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  207) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  208) "tasks":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  209) 	Reading this file shows the list of all tasks that belong to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  210) 	this group. Writing a task id to the file will add a task to the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  211) 	group. If the group is a CTRL_MON group the task is removed from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  212) 	whichever previous CTRL_MON group owned the task and also from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  213) 	any MON group that owned the task. If the group is a MON group,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  214) 	then the task must already belong to the CTRL_MON parent of this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  215) 	group. The task is removed from any previous MON group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  216) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  217) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  218) "cpus":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  219) 	Reading this file shows a bitmask of the logical CPUs owned by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  220) 	this group. Writing a mask to this file will add and remove
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  221) 	CPUs to/from this group. As with the tasks file a hierarchy is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  222) 	maintained where MON groups may only include CPUs owned by the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  223) 	parent CTRL_MON group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  224) 	When the resource group is in pseudo-locked mode this file will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  225) 	only be readable, reflecting the CPUs associated with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  226) 	pseudo-locked region.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  227) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  228) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  229) "cpus_list":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  230) 	Just like "cpus", only using ranges of CPUs instead of bitmasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  231) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  232) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  233) When control is enabled all CTRL_MON groups will also contain:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  234) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  235) "schemata":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  236) 	A list of all the resources available to this group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  237) 	Each resource has its own line and format - see below for details.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  238) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  239) "size":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  240) 	Mirrors the display of the "schemata" file to display the size in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  241) 	bytes of each allocation instead of the bits representing the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  242) 	allocation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  243) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  244) "mode":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  245) 	The "mode" of the resource group dictates the sharing of its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  246) 	allocations. A "shareable" resource group allows sharing of its
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  247) 	allocations while an "exclusive" resource group does not. A
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  248) 	cache pseudo-locked region is created by first writing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  249) 	"pseudo-locksetup" to the "mode" file before writing the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  250) 	pseudo-locked region's schemata to the resource group's "schemata"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  251) 	file. On successful pseudo-locked region creation the mode will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  252) 	automatically change to "pseudo-locked".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  253) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  254) When monitoring is enabled all MON groups will also contain:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  255) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  256) "mon_data":
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  257) 	This contains a set of files organized by L3 domain and by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  258) 	RDT event. E.g. on a system with two L3 domains there will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  259) 	be subdirectories "mon_L3_00" and "mon_L3_01".	Each of these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  260) 	directories have one file per event (e.g. "llc_occupancy",
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  261) 	"mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  262) 	files provide a read out of the current value of the event for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  263) 	all tasks in the group. In CTRL_MON groups these files provide
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  264) 	the sum for all tasks in the CTRL_MON group and all tasks in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  265) 	MON groups. Please see example section for more details on usage.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  266) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  267) Resource allocation rules
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  268) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  269) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  270) When a task is running the following rules define which resources are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  271) available to it:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  272) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  273) 1) If the task is a member of a non-default group, then the schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  274)    for that group is used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  275) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  276) 2) Else if the task belongs to the default group, but is running on a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  277)    CPU that is assigned to some specific group, then the schemata for the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  278)    CPU's group is used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  279) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  280) 3) Otherwise the schemata for the default group is used.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  281) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  282) Resource monitoring rules
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  283) -------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  284) 1) If a task is a member of a MON group, or non-default CTRL_MON group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  285)    then RDT events for the task will be reported in that group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  286) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  287) 2) If a task is a member of the default CTRL_MON group, but is running
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  288)    on a CPU that is assigned to some specific group, then the RDT events
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  289)    for the task will be reported in that group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  290) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  291) 3) Otherwise RDT events for the task will be reported in the root level
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  292)    "mon_data" group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  293) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  294) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  295) Notes on cache occupancy monitoring and control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  296) ===============================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  297) When moving a task from one group to another you should remember that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  298) this only affects *new* cache allocations by the task. E.g. you may have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  299) a task in a monitor group showing 3 MB of cache occupancy. If you move
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  300) to a new group and immediately check the occupancy of the old and new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  301) groups you will likely see that the old group is still showing 3 MB and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  302) the new group zero. When the task accesses locations still in cache from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  303) before the move, the h/w does not update any counters. On a busy system
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  304) you will likely see the occupancy in the old group go down as cache lines
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  305) are evicted and re-used while the occupancy in the new group rises as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  306) the task accesses memory and loads into the cache are counted based on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  307) membership in the new group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  308) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  309) The same applies to cache allocation control. Moving a task to a group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  310) with a smaller cache partition will not evict any cache lines. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  311) process may continue to use them from the old partition.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  312) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  313) Hardware uses CLOSid(Class of service ID) and an RMID(Resource monitoring ID)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  314) to identify a control group and a monitoring group respectively. Each of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  315) the resource groups are mapped to these IDs based on the kind of group. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  316) number of CLOSid and RMID are limited by the hardware and hence the creation of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  317) a "CTRL_MON" directory may fail if we run out of either CLOSID or RMID
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  318) and creation of "MON" group may fail if we run out of RMIDs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  319) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  320) max_threshold_occupancy - generic concepts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  321) ------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  322) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  323) Note that an RMID once freed may not be immediately available for use as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  324) the RMID is still tagged the cache lines of the previous user of RMID.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  325) Hence such RMIDs are placed on limbo list and checked back if the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  326) occupancy has gone down. If there is a time when system has a lot of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  327) limbo RMIDs but which are not ready to be used, user may see an -EBUSY
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  328) during mkdir.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  329) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  330) max_threshold_occupancy is a user configurable value to determine the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  331) occupancy at which an RMID can be freed.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  332) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  333) Schemata files - general concepts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  334) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  335) Each line in the file describes one resource. The line starts with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  336) the name of the resource, followed by specific values to be applied
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  337) in each of the instances of that resource on the system.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  338) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  339) Cache IDs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  340) ---------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  341) On current generation systems there is one L3 cache per socket and L2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  342) caches are generally just shared by the hyperthreads on a core, but this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  343) isn't an architectural requirement. We could have multiple separate L3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  344) caches on a socket, multiple cores could share an L2 cache. So instead
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  345) of using "socket" or "core" to define the set of logical cpus sharing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  346) a resource we use a "Cache ID". At a given cache level this will be a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  347) unique number across the whole system (but it isn't guaranteed to be a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  348) contiguous sequence, there may be gaps).  To find the ID for each logical
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  349) CPU look in /sys/devices/system/cpu/cpu*/cache/index*/id
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  350) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  351) Cache Bit Masks (CBM)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  352) ---------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  353) For cache resources we describe the portion of the cache that is available
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  354) for allocation using a bitmask. The maximum value of the mask is defined
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  355) by each cpu model (and may be different for different cache levels). It
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  356) is found using CPUID, but is also provided in the "info" directory of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  357) the resctrl file system in "info/{resource}/cbm_mask". Intel hardware
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  358) requires that these masks have all the '1' bits in a contiguous block. So
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  359) 0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  360) and 0xA are not.  On a system with a 20-bit mask each bit represents 5%
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  361) of the capacity of the cache. You could partition the cache into four
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  362) equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  363) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  364) Memory bandwidth Allocation and monitoring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  365) ==========================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  366) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  367) For Memory bandwidth resource, by default the user controls the resource
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  368) by indicating the percentage of total memory bandwidth.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  369) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  370) The minimum bandwidth percentage value for each cpu model is predefined
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  371) and can be looked up through "info/MB/min_bandwidth". The bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  372) granularity that is allocated is also dependent on the cpu model and can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  373) be looked up at "info/MB/bandwidth_gran". The available bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  374) control steps are: min_bw + N * bw_gran. Intermediate values are rounded
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  375) to the next control step available on the hardware.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  376) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  377) The bandwidth throttling is a core specific mechanism on some of Intel
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  378) SKUs. Using a high bandwidth and a low bandwidth setting on two threads
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  379) sharing a core may result in both threads being throttled to use the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  380) low bandwidth (see "thread_throttle_mode").
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  381) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  382) The fact that Memory bandwidth allocation(MBA) may be a core
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  383) specific mechanism where as memory bandwidth monitoring(MBM) is done at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  384) the package level may lead to confusion when users try to apply control
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  385) via the MBA and then monitor the bandwidth to see if the controls are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  386) effective. Below are such scenarios:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  387) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  388) 1. User may *not* see increase in actual bandwidth when percentage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  389)    values are increased:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  390) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  391) This can occur when aggregate L2 external bandwidth is more than L3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  392) external bandwidth. Consider an SKL SKU with 24 cores on a package and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  393) where L2 external  is 10GBps (hence aggregate L2 external bandwidth is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  394) 240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  395) threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  396) bandwidth of 100GBps although the percentage value specified is only 50%
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  397) << 100%. Hence increasing the bandwidth percentage will not yield any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  398) more bandwidth. This is because although the L2 external bandwidth still
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  399) has capacity, the L3 external bandwidth is fully used. Also note that
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  400) this would be dependent on number of cores the benchmark is run on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  401) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  402) 2. Same bandwidth percentage may mean different actual bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  403)    depending on # of threads:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  404) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  405) For the same SKU in #1, a 'single thread, with 10% bandwidth' and '4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  406) thread, with 10% bandwidth' can consume upto 10GBps and 40GBps although
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  407) they have same percentage bandwidth of 10%. This is simply because as
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  408) threads start using more cores in an rdtgroup, the actual bandwidth may
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  409) increase or vary although user specified bandwidth percentage is same.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  410) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  411) In order to mitigate this and make the interface more user friendly,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  412) resctrl added support for specifying the bandwidth in MBps as well.  The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  413) kernel underneath would use a software feedback mechanism or a "Software
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  414) Controller(mba_sc)" which reads the actual bandwidth using MBM counters
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  415) and adjust the memory bandwidth percentages to ensure::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  416) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  417) 	"actual bandwidth < user specified bandwidth".
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  418) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  419) By default, the schemata would take the bandwidth percentage values
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  420) where as user can switch to the "MBA software controller" mode using
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  421) a mount option 'mba_MBps'. The schemata format is specified in the below
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  422) sections.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  423) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  424) L3 schemata file details (code and data prioritization disabled)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  425) ----------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  426) With CDP disabled the L3 schemata format is::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  427) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  428) 	L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  429) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  430) L3 schemata file details (CDP enabled via mount option to resctrl)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  431) ------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  432) When CDP is enabled L3 control is split into two separate resources
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  433) so you can specify independent masks for code and data like this::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  434) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  435) 	L3DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  436) 	L3CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  437) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  438) L2 schemata file details
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  439) ------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  440) CDP is supported at L2 using the 'cdpl2' mount option. The schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  441) format is either::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  442) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  443) 	L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  444) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  445) or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  446) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  447) 	L2DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  448) 	L2CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  449) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  450) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  451) Memory bandwidth Allocation (default mode)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  452) ------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  453) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  454) Memory b/w domain is L3 cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  455) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  456) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  457) 	MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  458) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  459) Memory bandwidth Allocation specified in MBps
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  460) ---------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  461) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  462) Memory bandwidth domain is L3 cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  463) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  464) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  465) 	MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  466) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  467) Reading/writing the schemata file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  468) ---------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  469) Reading the schemata file will show the state of all resources
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  470) on all domains. When writing you only need to specify those values
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  471) which you wish to change.  E.g.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  472) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  473) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  474)   # cat schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  475)   L3DATA:0=fffff;1=fffff;2=fffff;3=fffff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  476)   L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  477)   # echo "L3DATA:2=3c0;" > schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  478)   # cat schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  479)   L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  480)   L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  481) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  482) Cache Pseudo-Locking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  483) ====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  484) CAT enables a user to specify the amount of cache space that an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  485) application can fill. Cache pseudo-locking builds on the fact that a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  486) CPU can still read and write data pre-allocated outside its current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  487) allocated area on a cache hit. With cache pseudo-locking, data can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  488) preloaded into a reserved portion of cache that no application can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  489) fill, and from that point on will only serve cache hits. The cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  490) pseudo-locked memory is made accessible to user space where an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  491) application can map it into its virtual address space and thus have
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  492) a region of memory with reduced average read latency.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  493) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  494) The creation of a cache pseudo-locked region is triggered by a request
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  495) from the user to do so that is accompanied by a schemata of the region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  496) to be pseudo-locked. The cache pseudo-locked region is created as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  497) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  498) - Create a CAT allocation CLOSNEW with a CBM matching the schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  499)   from the user of the cache region that will contain the pseudo-locked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  500)   memory. This region must not overlap with any current CAT allocation/CLOS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  501)   on the system and no future overlap with this cache region is allowed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  502)   while the pseudo-locked region exists.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  503) - Create a contiguous region of memory of the same size as the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  504)   region.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  505) - Flush the cache, disable hardware prefetchers, disable preemption.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  506) - Make CLOSNEW the active CLOS and touch the allocated memory to load
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  507)   it into the cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  508) - Set the previous CLOS as active.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  509) - At this point the closid CLOSNEW can be released - the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  510)   pseudo-locked region is protected as long as its CBM does not appear in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  511)   any CAT allocation. Even though the cache pseudo-locked region will from
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  512)   this point on not appear in any CBM of any CLOS an application running with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  513)   any CLOS will be able to access the memory in the pseudo-locked region since
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  514)   the region continues to serve cache hits.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  515) - The contiguous region of memory loaded into the cache is exposed to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  516)   user-space as a character device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  517) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  518) Cache pseudo-locking increases the probability that data will remain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  519) in the cache via carefully configuring the CAT feature and controlling
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  520) application behavior. There is no guarantee that data is placed in
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  521) cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  522) “locked” data from cache. Power management C-states may shrink or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  523) power off cache. Deeper C-states will automatically be restricted on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  524) pseudo-locked region creation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  525) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  526) It is required that an application using a pseudo-locked region runs
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  527) with affinity to the cores (or a subset of the cores) associated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  528) with the cache on which the pseudo-locked region resides. A sanity check
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  529) within the code will not allow an application to map pseudo-locked memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  530) unless it runs with affinity to cores associated with the cache on which the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  531) pseudo-locked region resides. The sanity check is only done during the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  532) initial mmap() handling, there is no enforcement afterwards and the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  533) application self needs to ensure it remains affine to the correct cores.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  534) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  535) Pseudo-locking is accomplished in two stages:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  536) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  537) 1) During the first stage the system administrator allocates a portion
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  538)    of cache that should be dedicated to pseudo-locking. At this time an
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  539)    equivalent portion of memory is allocated, loaded into allocated
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  540)    cache portion, and exposed as a character device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  541) 2) During the second stage a user-space application maps (mmap()) the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  542)    pseudo-locked memory into its address space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  543) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  544) Cache Pseudo-Locking Interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  545) ------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  546) A pseudo-locked region is created using the resctrl interface as follows:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  547) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  548) 1) Create a new resource group by creating a new directory in /sys/fs/resctrl.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  549) 2) Change the new resource group's mode to "pseudo-locksetup" by writing
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  550)    "pseudo-locksetup" to the "mode" file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  551) 3) Write the schemata of the pseudo-locked region to the "schemata" file. All
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  552)    bits within the schemata should be "unused" according to the "bit_usage"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  553)    file.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  554) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  555) On successful pseudo-locked region creation the "mode" file will contain
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  556) "pseudo-locked" and a new character device with the same name as the resource
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  557) group will exist in /dev/pseudo_lock. This character device can be mmap()'ed
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  558) by user space in order to obtain access to the pseudo-locked memory region.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  559) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  560) An example of cache pseudo-locked region creation and usage can be found below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  561) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  562) Cache Pseudo-Locking Debugging Interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  563) ----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  564) The pseudo-locking debugging interface is enabled by default (if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  565) CONFIG_DEBUG_FS is enabled) and can be found in /sys/kernel/debug/resctrl.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  566) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  567) There is no explicit way for the kernel to test if a provided memory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  568) location is present in the cache. The pseudo-locking debugging interface uses
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  569) the tracing infrastructure to provide two ways to measure cache residency of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  570) the pseudo-locked region:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  571) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  572) 1) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  573)    from these measurements are best visualized using a hist trigger (see
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  574)    example below). In this test the pseudo-locked region is traversed at
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  575)    a stride of 32 bytes while hardware prefetchers and preemption
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  576)    are disabled. This also provides a substitute visualization of cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  577)    hits and misses.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  578) 2) Cache hit and miss measurements using model specific precision counters if
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  579)    available. Depending on the levels of cache on the system the pseudo_lock_l2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  580)    and pseudo_lock_l3 tracepoints are available.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  581) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  582) When a pseudo-locked region is created a new debugfs directory is created for
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  583) it in debugfs as /sys/kernel/debug/resctrl/<newdir>. A single
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  584) write-only file, pseudo_lock_measure, is present in this directory. The
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  585) measurement of the pseudo-locked region depends on the number written to this
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  586) debugfs file:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  587) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  588) 1:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  589)      writing "1" to the pseudo_lock_measure file will trigger the latency
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  590)      measurement captured in the pseudo_lock_mem_latency tracepoint. See
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  591)      example below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  592) 2:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  593)      writing "2" to the pseudo_lock_measure file will trigger the L2 cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  594)      residency (cache hits and misses) measurement captured in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  595)      pseudo_lock_l2 tracepoint. See example below.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  596) 3:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  597)      writing "3" to the pseudo_lock_measure file will trigger the L3 cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  598)      residency (cache hits and misses) measurement captured in the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  599)      pseudo_lock_l3 tracepoint.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  600) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  601) All measurements are recorded with the tracing infrastructure. This requires
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  602) the relevant tracepoints to be enabled before the measurement is triggered.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  603) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  604) Example of latency debugging interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  605) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  606) In this example a pseudo-locked region named "newlock" was created. Here is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  607) how we can measure the latency in cycles of reading from this region and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  608) visualize this data with a histogram that is available if CONFIG_HIST_TRIGGERS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  609) is set::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  610) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  611)   # :> /sys/kernel/debug/tracing/trace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  612)   # echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/trigger
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  613)   # echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  614)   # echo 1 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  615)   # echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  616)   # cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  617) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  618)   # event histogram
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  619)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  620)   # trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active]
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  621)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  622) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  623)   { latency:        456 } hitcount:          1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  624)   { latency:         50 } hitcount:         83
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  625)   { latency:         36 } hitcount:         96
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  626)   { latency:         44 } hitcount:        174
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  627)   { latency:         48 } hitcount:        195
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  628)   { latency:         46 } hitcount:        262
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  629)   { latency:         42 } hitcount:        693
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  630)   { latency:         40 } hitcount:       3204
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  631)   { latency:         38 } hitcount:       3484
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  632) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  633)   Totals:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  634)       Hits: 8192
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  635)       Entries: 9
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  636)     Dropped: 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  637) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  638) Example of cache hits/misses debugging
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  639) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  640) In this example a pseudo-locked region named "newlock" was created on the L2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  641) cache of a platform. Here is how we can obtain details of the cache hits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  642) and misses using the platform's precision counters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  643) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  644) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  645)   # :> /sys/kernel/debug/tracing/trace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  646)   # echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  647)   # echo 2 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  648)   # echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  649)   # cat /sys/kernel/debug/tracing/trace
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  650) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  651)   # tracer: nop
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  652)   #
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  653)   #                              _-----=> irqs-off
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  654)   #                             / _----=> need-resched
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  655)   #                            | / _---=> hardirq/softirq
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  656)   #                            || / _--=> preempt-depth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  657)   #                            ||| /     delay
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  658)   #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  659)   #              | |       |   ||||       |         |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  660)   pseudo_lock_mea-1672  [002] ....  3132.860500: pseudo_lock_l2: hits=4097 miss=0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  661) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  662) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  663) Examples for RDT allocation usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  664) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  665) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  666) 1) Example 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  667) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  668) On a two socket machine (one L3 cache per socket) with just four bits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  669) for cache bit masks, minimum b/w of 10% with a memory bandwidth
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  670) granularity of 10%.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  671) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  672) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  673)   # mount -t resctrl resctrl /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  674)   # cd /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  675)   # mkdir p0 p1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  676)   # echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  677)   # echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  678) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  679) The default resource group is unmodified, so we have access to all parts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  680) of all caches (its schemata file reads "L3:0=f;1=f").
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  681) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  682) Tasks that are under the control of group "p0" may only allocate from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  683) "lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  684) Tasks in group "p1" use the "lower" 50% of cache on both sockets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  685) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  686) Similarly, tasks that are under the control of group "p0" may use a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  687) maximum memory b/w of 50% on socket0 and 50% on socket 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  688) Tasks in group "p1" may also use 50% memory b/w on both sockets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  689) Note that unlike cache masks, memory b/w cannot specify whether these
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  690) allocations can overlap or not. The allocations specifies the maximum
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  691) b/w that the group may be able to use and the system admin can configure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  692) the b/w accordingly.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  693) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  694) If resctrl is using the software controller (mba_sc) then user can enter the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  695) max b/w in MB rather than the percentage values.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  696) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  697) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  698)   # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  699)   # echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  700) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  701) In the above example the tasks in "p1" and "p0" on socket 0 would use a max b/w
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  702) of 1024MB where as on socket 1 they would use 500MB.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  703) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  704) 2) Example 2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  705) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  706) Again two sockets, but this time with a more realistic 20-bit mask.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  707) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  708) Two real time tasks pid=1234 running on processor 0 and pid=5678 running on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  709) processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  710) neighbors, each of the two real-time tasks exclusively occupies one quarter
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  711) of L3 cache on socket 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  712) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  713) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  714)   # mount -t resctrl resctrl /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  715)   # cd /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  716) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  717) First we reset the schemata for the default group so that the "upper"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  718) 50% of the L3 cache on socket 0 and 50% of memory b/w cannot be used by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  719) ordinary tasks::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  720) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  721)   # echo "L3:0=3ff;1=fffff\nMB:0=50;1=100" > schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  722) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  723) Next we make a resource group for our first real time task and give
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  724) it access to the "top" 25% of the cache on socket 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  725) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  726) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  727)   # mkdir p0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  728)   # echo "L3:0=f8000;1=fffff" > p0/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  729) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  730) Finally we move our first real time task into this resource group. We
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  731) also use taskset(1) to ensure the task always runs on a dedicated CPU
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  732) on socket 0. Most uses of resource groups will also constrain which
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  733) processors tasks run on.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  734) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  735) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  736)   # echo 1234 > p0/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  737)   # taskset -cp 1 1234
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  738) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  739) Ditto for the second real time task (with the remaining 25% of cache)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  740) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  741)   # mkdir p1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  742)   # echo "L3:0=7c00;1=fffff" > p1/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  743)   # echo 5678 > p1/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  744)   # taskset -cp 2 5678
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  745) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  746) For the same 2 socket system with memory b/w resource and CAT L3 the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  747) schemata would look like(Assume min_bandwidth 10 and bandwidth_gran is
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  748) 10):
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  749) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  750) For our first real time task this would request 20% memory b/w on socket 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  751) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  752) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  753)   # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  754) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  755) For our second real time task this would request an other 20% memory b/w
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  756) on socket 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  757) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  758) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  759)   # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  760) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  761) 3) Example 3
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  762) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  763) A single socket system which has real-time tasks running on core 4-7 and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  764) non real-time workload assigned to core 0-3. The real-time tasks share text
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  765) and data, so a per task association is not required and due to interaction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  766) with the kernel it's desired that the kernel on these cores shares L3 with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  767) the tasks.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  768) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  769) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  770)   # mount -t resctrl resctrl /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  771)   # cd /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  772) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  773) First we reset the schemata for the default group so that the "upper"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  774) 50% of the L3 cache on socket 0, and 50% of memory bandwidth on socket 0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  775) cannot be used by ordinary tasks::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  776) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  777)   # echo "L3:0=3ff\nMB:0=50" > schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  778) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  779) Next we make a resource group for our real time cores and give it access
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  780) to the "top" 50% of the cache on socket 0 and 50% of memory bandwidth on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  781) socket 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  782) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  783) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  784)   # mkdir p0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  785)   # echo "L3:0=ffc00\nMB:0=50" > p0/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  786) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  787) Finally we move core 4-7 over to the new group and make sure that the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  788) kernel and the tasks running there get 50% of the cache. They should
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  789) also get 50% of memory bandwidth assuming that the cores 4-7 are SMT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  790) siblings and only the real time threads are scheduled on the cores 4-7.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  791) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  792) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  793)   # echo F0 > p0/cpus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  794) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  795) 4) Example 4
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  796) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  797) The resource groups in previous examples were all in the default "shareable"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  798) mode allowing sharing of their cache allocations. If one resource group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  799) configures a cache allocation then nothing prevents another resource group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  800) to overlap with that allocation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  801) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  802) In this example a new exclusive resource group will be created on a L2 CAT
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  803) system with two L2 cache instances that can be configured with an 8-bit
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  804) capacity bitmask. The new exclusive resource group will be configured to use
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  805) 25% of each cache instance.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  806) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  807) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  808)   # mount -t resctrl resctrl /sys/fs/resctrl/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  809)   # cd /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  810) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  811) First, we observe that the default group is configured to allocate to all L2
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  812) cache::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  813) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  814)   # cat schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  815)   L2:0=ff;1=ff
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  816) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  817) We could attempt to create the new resource group at this point, but it will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  818) fail because of the overlap with the schemata of the default group::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  819) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  820)   # mkdir p0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  821)   # echo 'L2:0=0x3;1=0x3' > p0/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  822)   # cat p0/mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  823)   shareable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  824)   # echo exclusive > p0/mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  825)   -sh: echo: write error: Invalid argument
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  826)   # cat info/last_cmd_status
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  827)   schemata overlaps
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  828) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  829) To ensure that there is no overlap with another resource group the default
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  830) resource group's schemata has to change, making it possible for the new
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  831) resource group to become exclusive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  832) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  833) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  834)   # echo 'L2:0=0xfc;1=0xfc' > schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  835)   # echo exclusive > p0/mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  836)   # grep . p0/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  837)   p0/cpus:0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  838)   p0/mode:exclusive
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  839)   p0/schemata:L2:0=03;1=03
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  840)   p0/size:L2:0=262144;1=262144
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  841) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  842) A new resource group will on creation not overlap with an exclusive resource
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  843) group::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  844) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  845)   # mkdir p1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  846)   # grep . p1/*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  847)   p1/cpus:0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  848)   p1/mode:shareable
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  849)   p1/schemata:L2:0=fc;1=fc
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  850)   p1/size:L2:0=786432;1=786432
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  851) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  852) The bit_usage will reflect how the cache is used::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  853) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  854)   # cat info/L2/bit_usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  855)   0=SSSSSSEE;1=SSSSSSEE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  856) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  857) A resource group cannot be forced to overlap with an exclusive resource group::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  858) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  859)   # echo 'L2:0=0x1;1=0x1' > p1/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  860)   -sh: echo: write error: Invalid argument
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  861)   # cat info/last_cmd_status
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  862)   overlaps with exclusive group
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  863) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  864) Example of Cache Pseudo-Locking
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  865) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  866) Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  867) region is exposed at /dev/pseudo_lock/newlock that can be provided to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  868) application for argument to mmap().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  869) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  870) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  871)   # mount -t resctrl resctrl /sys/fs/resctrl/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  872)   # cd /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  873) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  874) Ensure that there are bits available that can be pseudo-locked, since only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  875) unused bits can be pseudo-locked the bits to be pseudo-locked needs to be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  876) removed from the default resource group's schemata::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  877) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  878)   # cat info/L2/bit_usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  879)   0=SSSSSSSS;1=SSSSSSSS
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  880)   # echo 'L2:1=0xfc' > schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  881)   # cat info/L2/bit_usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  882)   0=SSSSSSSS;1=SSSSSS00
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  883) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  884) Create a new resource group that will be associated with the pseudo-locked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  885) region, indicate that it will be used for a pseudo-locked region, and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  886) configure the requested pseudo-locked region capacity bitmask::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  887) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  888)   # mkdir newlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  889)   # echo pseudo-locksetup > newlock/mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  890)   # echo 'L2:1=0x3' > newlock/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  891) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  892) On success the resource group's mode will change to pseudo-locked, the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  893) bit_usage will reflect the pseudo-locked region, and the character device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  894) exposing the pseudo-locked region will exist::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  895) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  896)   # cat newlock/mode
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  897)   pseudo-locked
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  898)   # cat info/L2/bit_usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  899)   0=SSSSSSSS;1=SSSSSSPP
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  900)   # ls -l /dev/pseudo_lock/newlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  901)   crw------- 1 root root 243, 0 Apr  3 05:01 /dev/pseudo_lock/newlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  902) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  903) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  904) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  905)   /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  906)   * Example code to access one page of pseudo-locked cache region
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  907)   * from user space.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  908)   */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  909)   #define _GNU_SOURCE
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  910)   #include <fcntl.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  911)   #include <sched.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  912)   #include <stdio.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  913)   #include <stdlib.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  914)   #include <unistd.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  915)   #include <sys/mman.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  916) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  917)   /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  918)   * It is required that the application runs with affinity to only
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  919)   * cores associated with the pseudo-locked region. Here the cpu
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  920)   * is hardcoded for convenience of example.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  921)   */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  922)   static int cpuid = 2;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  923) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  924)   int main(int argc, char *argv[])
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  925)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  926)     cpu_set_t cpuset;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  927)     long page_size;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  928)     void *mapping;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  929)     int dev_fd;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  930)     int ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  931) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  932)     page_size = sysconf(_SC_PAGESIZE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  933) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  934)     CPU_ZERO(&cpuset);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  935)     CPU_SET(cpuid, &cpuset);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  936)     ret = sched_setaffinity(0, sizeof(cpuset), &cpuset);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  937)     if (ret < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  938)       perror("sched_setaffinity");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  939)       exit(EXIT_FAILURE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  940)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  941) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  942)     dev_fd = open("/dev/pseudo_lock/newlock", O_RDWR);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  943)     if (dev_fd < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  944)       perror("open");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  945)       exit(EXIT_FAILURE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  946)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  947) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  948)     mapping = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  949)             dev_fd, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  950)     if (mapping == MAP_FAILED) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  951)       perror("mmap");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  952)       close(dev_fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  953)       exit(EXIT_FAILURE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  954)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  955) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  956)     /* Application interacts with pseudo-locked memory @mapping */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  957) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  958)     ret = munmap(mapping, page_size);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  959)     if (ret < 0) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  960)       perror("munmap");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  961)       close(dev_fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  962)       exit(EXIT_FAILURE);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  963)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  964) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  965)     close(dev_fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  966)     exit(EXIT_SUCCESS);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  967)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  968) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  969) Locking between applications
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  970) ----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  971) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  972) Certain operations on the resctrl filesystem, composed of read/writes
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  973) to/from multiple files, must be atomic.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  974) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  975) As an example, the allocation of an exclusive reservation of L3 cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  976) involves:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  977) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  978)   1. Read the cbmmasks from each directory or the per-resource "bit_usage"
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  979)   2. Find a contiguous set of bits in the global CBM bitmask that is clear
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  980)      in any of the directory cbmmasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  981)   3. Create a new directory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  982)   4. Set the bits found in step 2 to the new directory "schemata" file
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  983) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  984) If two applications attempt to allocate space concurrently then they can
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  985) end up allocating the same bits so the reservations are shared instead of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  986) exclusive.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  987) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  988) To coordinate atomic operations on the resctrlfs and to avoid the problem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  989) above, the following locking procedure is recommended:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  990) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  991) Locking is based on flock, which is available in libc and also as a shell
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  992) script command
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  993) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  994) Write lock:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  995) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  996)  A) Take flock(LOCK_EX) on /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  997)  B) Read/write the directory structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  998)  C) funlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  999) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1000) Read lock:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1001) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1002)  A) Take flock(LOCK_SH) on /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1003)  B) If success read the directory structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1004)  C) funlock
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1005) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1006) Example with bash::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1007) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1008)   # Atomically read directory structure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1009)   $ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1010) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1011)   # Read directory contents and create new subdirectory
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1012) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1013)   $ cat create-dir.sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1014)   find /sys/fs/resctrl/ > output.txt
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1015)   mask = function-of(output.txt)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1016)   mkdir /sys/fs/resctrl/newres/
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1017)   echo mask > /sys/fs/resctrl/newres/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1018) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1019)   $ flock /sys/fs/resctrl/ ./create-dir.sh
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1020) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1021) Example with C::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1022) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1023)   /*
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1024)   * Example code do take advisory locks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1025)   * before accessing resctrl filesystem
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1026)   */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1027)   #include <sys/file.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1028)   #include <stdlib.h>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1029) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1030)   void resctrl_take_shared_lock(int fd)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1031)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1032)     int ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1033) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1034)     /* take shared lock on resctrl filesystem */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1035)     ret = flock(fd, LOCK_SH);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1036)     if (ret) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1037)       perror("flock");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1038)       exit(-1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1039)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1040)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1041) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1042)   void resctrl_take_exclusive_lock(int fd)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1043)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1044)     int ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1045) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1046)     /* release lock on resctrl filesystem */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1047)     ret = flock(fd, LOCK_EX);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1048)     if (ret) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1049)       perror("flock");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1050)       exit(-1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1051)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1052)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1053) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1054)   void resctrl_release_lock(int fd)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1055)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1056)     int ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1057) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1058)     /* take shared lock on resctrl filesystem */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1059)     ret = flock(fd, LOCK_UN);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1060)     if (ret) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1061)       perror("flock");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1062)       exit(-1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1063)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1064)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1065) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1066)   void main(void)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1067)   {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1068)     int fd, ret;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1069) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1070)     fd = open("/sys/fs/resctrl", O_DIRECTORY);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1071)     if (fd == -1) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1072)       perror("open");
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1073)       exit(-1);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1074)     }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1075)     resctrl_take_shared_lock(fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1076)     /* code to read directory contents */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1077)     resctrl_release_lock(fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1078) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1079)     resctrl_take_exclusive_lock(fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1080)     /* code to read and write directory contents */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1081)     resctrl_release_lock(fd);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1082)   }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1083) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1084) Examples for RDT Monitoring along with allocation usage
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1085) =======================================================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1086) Reading monitored data
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1087) ----------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1088) Reading an event file (for ex: mon_data/mon_L3_00/llc_occupancy) would
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1089) show the current snapshot of LLC occupancy of the corresponding MON
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1090) group or CTRL_MON group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1091) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1092) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1093) Example 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1094) ------------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1095) On a two socket machine (one L3 cache per socket) with just four bits
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1096) for cache bit masks::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1097) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1098)   # mount -t resctrl resctrl /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1099)   # cd /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1100)   # mkdir p0 p1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1101)   # echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1102)   # echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1103)   # echo 5678 > p1/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1104)   # echo 5679 > p1/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1105) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1106) The default resource group is unmodified, so we have access to all parts
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1107) of all caches (its schemata file reads "L3:0=f;1=f").
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1108) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1109) Tasks that are under the control of group "p0" may only allocate from the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1110) "lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1111) Tasks in group "p1" use the "lower" 50% of cache on both sockets.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1112) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1113) Create monitor groups and assign a subset of tasks to each monitor group.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1114) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1115) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1116)   # cd /sys/fs/resctrl/p1/mon_groups
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1117)   # mkdir m11 m12
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1118)   # echo 5678 > m11/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1119)   # echo 5679 > m12/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1120) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1121) fetch data (data shown in bytes)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1122) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1123) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1124)   # cat m11/mon_data/mon_L3_00/llc_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1125)   16234000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1126)   # cat m11/mon_data/mon_L3_01/llc_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1127)   14789000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1128)   # cat m12/mon_data/mon_L3_00/llc_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1129)   16789000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1130) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1131) The parent ctrl_mon group shows the aggregated data.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1132) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1133) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1134)   # cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1135)   31234000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1136) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1137) Example 2 (Monitor a task from its creation)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1138) --------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1139) On a two socket machine (one L3 cache per socket)::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1140) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1141)   # mount -t resctrl resctrl /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1142)   # cd /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1143)   # mkdir p0 p1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1144) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1145) An RMID is allocated to the group once its created and hence the <cmd>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1146) below is monitored from its creation.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1147) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1148) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1149)   # echo $$ > /sys/fs/resctrl/p1/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1150)   # <cmd>
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1151) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1152) Fetch the data::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1153) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1154)   # cat /sys/fs/resctrl/p1/mon_data/mon_l3_00/llc_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1155)   31789000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1156) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1157) Example 3 (Monitor without CAT support or before creating CAT groups)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1158) ---------------------------------------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1159) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1160) Assume a system like HSW has only CQM and no CAT support. In this case
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1161) the resctrl will still mount but cannot create CTRL_MON directories.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1162) But user can create different MON groups within the root group thereby
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1163) able to monitor all tasks including kernel threads.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1164) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1165) This can also be used to profile jobs cache size footprint before being
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1166) able to allocate them to different allocation groups.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1167) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1168) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1169)   # mount -t resctrl resctrl /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1170)   # cd /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1171)   # mkdir mon_groups/m01
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1172)   # mkdir mon_groups/m02
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1173) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1174)   # echo 3478 > /sys/fs/resctrl/mon_groups/m01/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1175)   # echo 2467 > /sys/fs/resctrl/mon_groups/m02/tasks
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1176) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1177) Monitor the groups separately and also get per domain data. From the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1178) below its apparent that the tasks are mostly doing work on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1179) domain(socket) 0.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1180) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1181) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1182)   # cat /sys/fs/resctrl/mon_groups/m01/mon_L3_00/llc_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1183)   31234000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1184)   # cat /sys/fs/resctrl/mon_groups/m01/mon_L3_01/llc_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1185)   34555
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1186)   # cat /sys/fs/resctrl/mon_groups/m02/mon_L3_00/llc_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1187)   31234000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1188)   # cat /sys/fs/resctrl/mon_groups/m02/mon_L3_01/llc_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1189)   32789
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1190) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1191) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1192) Example 4 (Monitor real time tasks)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1193) -----------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1194) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1195) A single socket system which has real time tasks running on cores 4-7
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1196) and non real time tasks on other cpus. We want to monitor the cache
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1197) occupancy of the real time threads on these cores.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1198) ::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1199) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1200)   # mount -t resctrl resctrl /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1201)   # cd /sys/fs/resctrl
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1202)   # mkdir p1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1203) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1204) Move the cpus 4-7 over to p1::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1205) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1206)   # echo f0 > p1/cpus
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1207) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1208) View the llc occupancy snapshot::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1209) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1210)   # cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1211)   11234000