^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 1) .. SPDX-License-Identifier: GPL-2.0
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 2)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 3) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 4) Page Pool API
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 5) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 6)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 7) The page_pool allocator is optimized for the XDP mode that uses one frame
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 8) per-page, but it can fallback on the regular page allocator APIs.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 9)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 10) Basic use involves replacing alloc_pages() calls with the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 11) page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages()
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 12) replacing dev_alloc_pages().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 13)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 14) API keeps track of inflight pages, in order to let API user know
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 15) when it is safe to free a page_pool object. Thus, API users
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 16) must run page_pool_release_page() when a page is leaving the page_pool or
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 17) call page_pool_put_page() where appropriate in order to maintain correct
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 18) accounting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 19)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 20) API user must call page_pool_put_page() once on a page, as it
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 21) will either recycle the page, or in case of refcnt > 1, it will
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 22) release the DMA mapping and inflight state accounting.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 23)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 24) Architecture overview
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 25) =====================
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 26)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 27) .. code-block:: none
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 28)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 29) +------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 30) | Driver |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 31) +------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 32) ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 33) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 34) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 35) |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 36) v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 37) +--------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 38) | request memory |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 39) +--------------------------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 40) ^ ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 41) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 42) | Pool empty | Pool has entries
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 43) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 44) v v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 45) +-----------------------+ +------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 46) | alloc (and map) pages | | get page from cache |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 47) +-----------------------+ +------------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 48) ^ ^
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 49) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 50) | cache available | No entries, refill
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 51) | | from ptr-ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 52) | |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 53) v v
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 54) +-----------------+ +------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 55) | Fast cache | | ptr-ring cache |
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 56) +-----------------+ +------------------+
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 57)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 58) API interface
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 59) =============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 60) The number of pools created **must** match the number of hardware queues
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 61) unless hardware restrictions make that impossible. This would otherwise beat the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 62) purpose of page pool, which is allocate pages fast from cache without locking.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 63) This lockless guarantee naturally comes from running under a NAPI softirq.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 64) The protection doesn't strictly have to be NAPI, any guarantee that allocating
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 65) a page will cause no race conditions is enough.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 66)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 67) * page_pool_create(): Create a pool.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 68) * flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 69) * order: 2^order pages on allocation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 70) * pool_size: size of the ptr_ring
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 71) * nid: preferred NUMA node for allocation
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 72) * dev: struct device. Used on DMA operations
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 73) * dma_dir: DMA direction
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 74) * max_len: max DMA sync memory size
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 75) * offset: DMA address offset
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 76)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 77) * page_pool_put_page(): The outcome of this depends on the page refcnt. If the
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 78) driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 79) the allocator owns the page and will try to recycle it in one of the pool
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 80) caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 81) using dma_sync_single_range_for_device().
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 82)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 83) * page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 84) for the entire memory area configured in area pool->max_len.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 85)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 86) * page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 87) must guarantee safe context (e.g NAPI), since it will recycle the page
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 88) directly into the pool fast cache.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 89)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 90) * page_pool_release_page(): Unmap the page (if mapped) and account for it on
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 91) inflight counters.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 92)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 93) * page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 94) caches.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 95)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 96) * page_pool_get_dma_addr(): Retrieve the stored DMA address.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 97)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 98) * page_pool_get_dma_dir(): Retrieve the stored DMA direction.
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 99)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 100) Coding examples
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 101) ===============
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 102)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 103) Registration
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 104) ------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 105)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 106) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 107)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 108) /* Page pool registration */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 109) struct page_pool_params pp_params = { 0 };
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 110) struct xdp_rxq_info xdp_rxq;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 111) int err;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 112)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 113) pp_params.order = 0;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 114) /* internal DMA mapping in page_pool */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 115) pp_params.flags = PP_FLAG_DMA_MAP;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 116) pp_params.pool_size = DESC_NUM;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 117) pp_params.nid = NUMA_NO_NODE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 118) pp_params.dev = priv->dev;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 119) pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 120) page_pool = page_pool_create(&pp_params);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 121)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 122) err = xdp_rxq_info_reg(&xdp_rxq, ndev, 0);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 123) if (err)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 124) goto err_out;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 125)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 126) err = xdp_rxq_info_reg_mem_model(&xdp_rxq, MEM_TYPE_PAGE_POOL, page_pool);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 127) if (err)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 128) goto err_out;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 129)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 130) NAPI poller
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 131) -----------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 132)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 133)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 134) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 135)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 136) /* NAPI Rx poller */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 137) enum dma_data_direction dma_dir;
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 138)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 139) dma_dir = page_pool_get_dma_dir(dring->page_pool);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 140) while (done < budget) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 141) if (some error)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 142) page_pool_recycle_direct(page_pool, page);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 143) if (packet_is_xdp) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 144) if XDP_DROP:
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 145) page_pool_recycle_direct(page_pool, page);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 146) } else (packet_is_skb) {
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 147) page_pool_release_page(page_pool, page);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 148) new_page = page_pool_dev_alloc_pages(page_pool);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 149) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 150) }
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 151)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 152) Driver unload
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 153) -------------
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 154)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 155) .. code-block:: c
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 156)
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 157) /* Driver unload */
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 158) page_pool_put_full_page(page_pool, page, false);
^8f3ce5b39 (kx 2023-10-28 12:00:06 +0300 159) xdp_rxq_info_unreg(&xdp_rxq);