^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2) RDMA Controller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) .. Contents
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) 1. Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) 1-1. What is RDMA controller?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9) 1-2. Why RDMA controller needed?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) 1-3. How is RDMA controller implemented?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) 2. Usage Examples
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13) 1. Overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) ===========
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) 1-1. What is RDMA controller?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) -----------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19) RDMA controller allows user to limit RDMA/IB specific resources that a given
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) set of processes can use. These processes are grouped using RDMA controller.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) RDMA controller defines two resources which can be limited for processes of a
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23) cgroup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) 1-2. Why RDMA controller needed?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26) --------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28) Currently user space applications can easily take away all the rdma verb
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) specific resources such as AH, CQ, QP, MR etc. Due to which other applications
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) in other cgroup or kernel space ULPs may not even get chance to allocate any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) rdma resources. This can lead to service unavailability.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) Therefore RDMA controller is needed through which resource consumption
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) of processes can be limited. Through this controller different rdma
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) resources can be accounted.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) 1-3. How is RDMA controller implemented?
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) ----------------------------------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) resource accounting per cgroup, per device using resource pool structure.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) Each such resource pool is limited up to 64 resources in given resource pool
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) by rdma cgroup, which can be extended later if required.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) This resource pool object is linked to the cgroup css. Typically there
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) are 0 to 4 resource pool instances per cgroup, per device in most use cases.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) But nothing limits to have it more. At present hundreds of RDMA devices per
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) single cgroup may not be handled optimally, however there is no
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) known use case or requirement for such configuration either.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) Since RDMA resources can be allocated from any process and can be freed by any
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) of the child processes which shares the address space, rdma resources are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) always owned by the creator cgroup css. This allows process migration from one
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) to other cgroup without major complexity of transferring resource ownership;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) because such ownership is not really present due to shared nature of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) rdma resources. Linking resources around css also ensures that cgroups can be
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57) deleted after processes migrated. This allow progress migration as well with
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) active resources, even though that is not a primary use case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) Whenever RDMA resource charging occurs, owner rdma cgroup is returned to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) the caller. Same rdma cgroup should be passed while uncharging the resource.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) This also allows process migrated with active RDMA resource to charge
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) to new owner cgroup for new resource. It also allows to uncharge resource of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) a process from previously charged cgroup which is migrated to new cgroup,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) even though that is not a primary use case.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) Resource pool object is created in following situations.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) (a) User sets the limit and no previous resource pool exist for the device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) of interest for the cgroup.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) (b) No resource limits were configured, but IB/RDMA stack tries to
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) charge the resource. So that it correctly uncharge them when applications are
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) running without limits and later on when limits are enforced during uncharging,
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) otherwise usage count will drop to negative.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) Resource pool is destroyed if all the resource limits are set to max and
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76) it is the last resource getting deallocated.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) User should set all the limit to max value if it intents to remove/unconfigure
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) the resource pool for a particular device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) IB stack honors limits enforced by the rdma controller. When application
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82) query about maximum resource limits of IB device, it returns minimum of
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) what is configured by user for a given cgroup and what is supported by
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) IB device.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) Following resources can be accounted by rdma controller.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) ========== =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89) hca_handle Maximum number of HCA Handles
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) hca_object Maximum number of HCA Objects
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) ========== =============================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) 2. Usage Examples
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) =================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) (a) Configure resource limit::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99) echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) (b) Query resource limit::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) cat /sys/fs/cgroup/rdma/2/rdma.max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) #Output:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105) mlx4_0 hca_handle=2 hca_object=2000
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) ocrdma1 hca_handle=3 hca_object=max
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) (c) Query current usage::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) cat /sys/fs/cgroup/rdma/2/rdma.current
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) #Output:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112) mlx4_0 hca_handle=1 hca_object=20
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) ocrdma1 hca_handle=1 hca_object=23
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) (d) Delete resource limit::
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max