Orange Pi5 kernel

Deprecated Linux kernel 5.10.110 for OrangePi 5/5B/5+ boards

3 Commits   0 Branches   0 Tags
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   1) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   2) RDMA Controller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   3) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   4) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   5) .. Contents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   6) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   7)    1. Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   8)      1-1. What is RDMA controller?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300   9)      1-2. Why RDMA controller needed?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  10)      1-3. How is RDMA controller implemented?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  11)    2. Usage Examples
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  12) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  13) 1. Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  14) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  15) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  16) 1-1. What is RDMA controller?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  17) -----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  18) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  19) RDMA controller allows user to limit RDMA/IB specific resources that a given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  20) set of processes can use. These processes are grouped using RDMA controller.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  21) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  22) RDMA controller defines two resources which can be limited for processes of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  23) cgroup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  24) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  25) 1-2. Why RDMA controller needed?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  26) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  27) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  28) Currently user space applications can easily take away all the rdma verb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  29) specific resources such as AH, CQ, QP, MR etc. Due to which other applications
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  30) in other cgroup or kernel space ULPs may not even get chance to allocate any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  31) rdma resources. This can lead to service unavailability.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  32) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  33) Therefore RDMA controller is needed through which resource consumption
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  34) of processes can be limited. Through this controller different rdma
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  35) resources can be accounted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  36) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  37) 1-3. How is RDMA controller implemented?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  38) ----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  39) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  40) RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  41) resource accounting per cgroup, per device using resource pool structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  42) Each such resource pool is limited up to 64 resources in given resource pool
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  43) by rdma cgroup, which can be extended later if required.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  44) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  45) This resource pool object is linked to the cgroup css. Typically there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  46) are 0 to 4 resource pool instances per cgroup, per device in most use cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  47) But nothing limits to have it more. At present hundreds of RDMA devices per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  48) single cgroup may not be handled optimally, however there is no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  49) known use case or requirement for such configuration either.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  50) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  51) Since RDMA resources can be allocated from any process and can be freed by any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  52) of the child processes which shares the address space, rdma resources are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  53) always owned by the creator cgroup css. This allows process migration from one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  54) to other cgroup without major complexity of transferring resource ownership;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  55) because such ownership is not really present due to shared nature of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  56) rdma resources. Linking resources around css also ensures that cgroups can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  57) deleted after processes migrated. This allow progress migration as well with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  58) active resources, even though that is not a primary use case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  59) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  60) Whenever RDMA resource charging occurs, owner rdma cgroup is returned to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  61) the caller. Same rdma cgroup should be passed while uncharging the resource.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  62) This also allows process migrated with active RDMA resource to charge
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  63) to new owner cgroup for new resource. It also allows to uncharge resource of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  64) a process from previously charged cgroup which is migrated to new cgroup,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  65) even though that is not a primary use case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  66) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  67) Resource pool object is created in following situations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  68) (a) User sets the limit and no previous resource pool exist for the device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  69) of interest for the cgroup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  70) (b) No resource limits were configured, but IB/RDMA stack tries to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  71) charge the resource. So that it correctly uncharge them when applications are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  72) running without limits and later on when limits are enforced during uncharging,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  73) otherwise usage count will drop to negative.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  74) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  75) Resource pool is destroyed if all the resource limits are set to max and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  76) it is the last resource getting deallocated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  77) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  78) User should set all the limit to max value if it intents to remove/unconfigure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  79) the resource pool for a particular device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  80) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  81) IB stack honors limits enforced by the rdma controller. When application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  82) query about maximum resource limits of IB device, it returns minimum of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  83) what is configured by user for a given cgroup and what is supported by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  84) IB device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  85) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  86) Following resources can be accounted by rdma controller.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  87) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  88)   ==========    =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  89)   hca_handle	Maximum number of HCA Handles
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  90)   hca_object 	Maximum number of HCA Objects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  91)   ==========    =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  92) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  93) 2. Usage Examples
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  94) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  95) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  96) (a) Configure resource limit::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  97) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  98) 	echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300  99) 	echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) (b) Query resource limit::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) 	cat /sys/fs/cgroup/rdma/2/rdma.max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) 	#Output:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) 	mlx4_0 hca_handle=2 hca_object=2000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) 	ocrdma1 hca_handle=3 hca_object=max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) (c) Query current usage::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) 	cat /sys/fs/cgroup/rdma/2/rdma.current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) 	#Output:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) 	mlx4_0 hca_handle=1 hca_object=20
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) 	ocrdma1 hca_handle=1 hca_object=23
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) (d) Delete resource limit::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) 
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) 	echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max