[PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation
       [not found] <20240103095650.25769-1-linyunsheng@huawei.com>
@ 2024-01-03  9:56 ` Yunsheng Lin
  2024-01-05 15:35   ` Alexander H Duyck
  2024-01-03  9:56 ` [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill() Yunsheng Lin
  2024-01-03  9:56 ` [PATCH net-next 5/6] net: introduce page_frag_cache_drain() Yunsheng Lin
  2 siblings, 1 reply; 10+ messages in thread
From: Yunsheng Lin @ 2024-01-03  9:56 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Alexander Duyck,
	Michael S. Tsirkin, Jason Wang, Andrew Morton, Eric Dumazet, kvm,
	virtualization, linux-mm

Currently there seems to be three page frag implementions
which all try to allocate order 3 page, if that fails, it
then fail back to allocate order 0 page, and each of them
all allow order 3 page allocation to fail under certain
condition by using specific gfp bits.

The gfp bits for order 3 page allocation are different
between different implementation, __GFP_NOMEMALLOC is
or'd to forbid access to emergency reserves memory for
__page_frag_cache_refill(), but it is not or'd in other
implementions, __GFP_DIRECT_RECLAIM is masked off to avoid
direct reclaim in skb_page_frag_refill(), but it is not
masked off in __page_frag_cache_refill().

This patch unifies the gfp bits used between different
implementions by or'ing __GFP_NOMEMALLOC and masking off
__GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
possible pressure for mm.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Alexander Duyck <alexander.duyck@gmail.com>
---
 drivers/vhost/net.c | 2 +-
 mm/page_alloc.c     | 4 ++--
 net/core/sock.c     | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f2ed7167c848..e574e21cc0ca 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -670,7 +670,7 @@ static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
 		/* Avoid direct reclaim but allow kswapd to wake */
 		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
 					  __GFP_COMP | __GFP_NOWARN |
-					  __GFP_NORETRY,
+					  __GFP_NORETRY | __GFP_NOMEMALLOC,
 					  SKB_FRAG_PAGE_ORDER);
 		if (likely(pfrag->page)) {
 			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9a16305cf985..1f0b36dd81b5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4693,8 +4693,8 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
 	gfp_t gfp = gfp_mask;
 
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-	gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
-		    __GFP_NOMEMALLOC;
+	gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
+		   __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
 	page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
 				PAGE_FRAG_CACHE_MAX_ORDER);
 	nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
diff --git a/net/core/sock.c b/net/core/sock.c
index 446e945f736b..d643332c3ee5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2900,7 +2900,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
 		/* Avoid direct reclaim but allow kswapd to wake */
 		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
 					  __GFP_COMP | __GFP_NOWARN |
-					  __GFP_NORETRY,
+					  __GFP_NORETRY | __GFP_NOMEMALLOC,
 					  SKB_FRAG_PAGE_ORDER);
 		if (likely(pfrag->page)) {
 			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill()
       [not found] <20240103095650.25769-1-linyunsheng@huawei.com>
  2024-01-03  9:56 ` [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation Yunsheng Lin
@ 2024-01-03  9:56 ` Yunsheng Lin
  2024-01-05 16:06   ` Alexander H Duyck
  2024-01-03  9:56 ` [PATCH net-next 5/6] net: introduce page_frag_cache_drain() Yunsheng Lin
  2 siblings, 1 reply; 10+ messages in thread
From: Yunsheng Lin @ 2024-01-03  9:56 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Jason Wang,
	Michael S. Tsirkin, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, kvm, virtualization, bpf

The page frag in vhost_net_page_frag_refill() uses the
'struct page_frag' from skb_page_frag_refill(), but it's
implementation is similar to page_frag_alloc_align() now.

This patch removes vhost_net_page_frag_refill() by using
'struct page_frag_cache' instead of 'struct page_frag',
and allocating frag using page_frag_alloc_align().

The added benefit is that not only unifying the page frag
implementation a little, but also having about 0.5% performance
boost testing by using the vhost_net_test introduced in the
last patch.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 93 ++++++++++++++-------------------------------
 1 file changed, 29 insertions(+), 64 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e574e21cc0ca..805e11d598e4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -141,10 +141,8 @@ struct vhost_net {
 	unsigned tx_zcopy_err;
 	/* Flush in progress. Protected by tx vq lock. */
 	bool tx_flush;
-	/* Private page frag */
-	struct page_frag page_frag;
-	/* Refcount bias of page frag */
-	int refcnt_bias;
+	/* Private page frag cache */
+	struct page_frag_cache pf_cache;
 };
 
 static unsigned vhost_net_zcopy_mask __read_mostly;
@@ -655,41 +653,6 @@ static bool tx_can_batch(struct vhost_virtqueue *vq, size_t total_len)
 	       !vhost_vq_avail_empty(vq->dev, vq);
 }
 
-static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
-				       struct page_frag *pfrag, gfp_t gfp)
-{
-	if (pfrag->page) {
-		if (pfrag->offset + sz <= pfrag->size)
-			return true;
-		__page_frag_cache_drain(pfrag->page, net->refcnt_bias);
-	}
-
-	pfrag->offset = 0;
-	net->refcnt_bias = 0;
-	if (SKB_FRAG_PAGE_ORDER) {
-		/* Avoid direct reclaim but allow kswapd to wake */
-		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
-					  __GFP_COMP | __GFP_NOWARN |
-					  __GFP_NORETRY | __GFP_NOMEMALLOC,
-					  SKB_FRAG_PAGE_ORDER);
-		if (likely(pfrag->page)) {
-			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
-			goto done;
-		}
-	}
-	pfrag->page = alloc_page(gfp);
-	if (likely(pfrag->page)) {
-		pfrag->size = PAGE_SIZE;
-		goto done;
-	}
-	return false;
-
-done:
-	net->refcnt_bias = USHRT_MAX;
-	page_ref_add(pfrag->page, USHRT_MAX - 1);
-	return true;
-}
-
 #define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
 
 static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
@@ -699,7 +662,6 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
 	struct vhost_net *net = container_of(vq->dev, struct vhost_net,
 					     dev);
 	struct socket *sock = vhost_vq_get_backend(vq);
-	struct page_frag *alloc_frag = &net->page_frag;
 	struct virtio_net_hdr *gso;
 	struct xdp_buff *xdp = &nvq->xdp[nvq->batched_xdp];
 	struct tun_xdp_hdr *hdr;
@@ -710,6 +672,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
 	int sock_hlen = nvq->sock_hlen;
 	void *buf;
 	int copied;
+	int ret;
 
 	if (unlikely(len < nvq->sock_hlen))
 		return -EFAULT;
@@ -719,18 +682,17 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
 		return -ENOSPC;
 
 	buflen += SKB_DATA_ALIGN(len + pad);
-	alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
-	if (unlikely(!vhost_net_page_frag_refill(net, buflen,
-						 alloc_frag, GFP_KERNEL)))
+	buf = page_frag_alloc_align(&net->pf_cache, buflen, GFP_KERNEL,
+				    SMP_CACHE_BYTES);
+	if (unlikely(!buf))
 		return -ENOMEM;
 
-	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
-	copied = copy_page_from_iter(alloc_frag->page,
-				     alloc_frag->offset +
-				     offsetof(struct tun_xdp_hdr, gso),
-				     sock_hlen, from);
-	if (copied != sock_hlen)
-		return -EFAULT;
+	copied = copy_from_iter(buf + offsetof(struct tun_xdp_hdr, gso),
+				sock_hlen, from);
+	if (copied != sock_hlen) {
+		ret = -EFAULT;
+		goto err;
+	}
 
 	hdr = buf;
 	gso = &hdr->gso;
@@ -743,27 +705,30 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
 			       vhost16_to_cpu(vq, gso->csum_start) +
 			       vhost16_to_cpu(vq, gso->csum_offset) + 2);
 
-		if (vhost16_to_cpu(vq, gso->hdr_len) > len)
-			return -EINVAL;
+		if (vhost16_to_cpu(vq, gso->hdr_len) > len) {
+			ret = -EINVAL;
+			goto err;
+		}
 	}
 
 	len -= sock_hlen;
-	copied = copy_page_from_iter(alloc_frag->page,
-				     alloc_frag->offset + pad,
-				     len, from);
-	if (copied != len)
-		return -EFAULT;
+	copied = copy_from_iter(buf + pad, len, from);
+	if (copied != len) {
+		ret = -EFAULT;
+		goto err;
+	}
 
 	xdp_init_buff(xdp, buflen, NULL);
 	xdp_prepare_buff(xdp, buf, pad, len, true);
 	hdr->buflen = buflen;
 
-	--net->refcnt_bias;
-	alloc_frag->offset += buflen;
-
 	++nvq->batched_xdp;
 
 	return 0;
+
+err:
+	page_frag_free(buf);
+	return ret;
 }
 
 static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
@@ -1353,8 +1318,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 			vqs[VHOST_NET_VQ_RX]);
 
 	f->private_data = n;
-	n->page_frag.page = NULL;
-	n->refcnt_bias = 0;
+	n->pf_cache.va = NULL;
 
 	return 0;
 }
@@ -1422,8 +1386,9 @@ static int vhost_net_release(struct inode *inode, struct file *f)
 	kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
 	kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
 	kfree(n->dev.vqs);
-	if (n->page_frag.page)
-		__page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
+	if (n->pf_cache.va)
+		__page_frag_cache_drain(virt_to_head_page(n->pf_cache.va),
+					n->pf_cache.pagecnt_bias);
 	kvfree(n);
 	return 0;
 }
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 5/6] net: introduce page_frag_cache_drain()
       [not found] <20240103095650.25769-1-linyunsheng@huawei.com>
  2024-01-03  9:56 ` [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation Yunsheng Lin
  2024-01-03  9:56 ` [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill() Yunsheng Lin
@ 2024-01-03  9:56 ` Yunsheng Lin
  2024-01-05 15:48   ` Alexander H Duyck
  2 siblings, 1 reply; 10+ messages in thread
From: Yunsheng Lin @ 2024-01-03  9:56 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Jason Wang, Jeroen de Borst,
	Praveen Kaligineedi, Shailend Chand, Eric Dumazet, Felix Fietkau,
	John Crispin, Sean Wang, Mark Lee, Lorenzo Bianconi,
	Matthias Brugger, AngeloGioacchino Del Regno, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
	Michael S. Tsirkin, Andrew Morton, linux-arm-kernel,
	linux-mediatek, linux-nvme, kvm, virtualization, linux-mm

When draining a page_frag_cache, most user are doing
the similar steps, so introduce an API to avoid code
duplication.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/ethernet/google/gve/gve_main.c | 11 ++---------
 drivers/net/ethernet/mediatek/mtk_wed_wo.c | 17 ++---------------
 drivers/nvme/host/tcp.c                    |  7 +------
 drivers/nvme/target/tcp.c                  |  4 +---
 drivers/vhost/net.c                        |  4 +---
 include/linux/gfp.h                        |  2 ++
 mm/page_alloc.c                            | 10 ++++++++++
 7 files changed, 19 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 619bf63ec935..d976190b0f4d 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1278,17 +1278,10 @@ static void gve_unreg_xdp_info(struct gve_priv *priv)
 
 static void gve_drain_page_cache(struct gve_priv *priv)
 {
-	struct page_frag_cache *nc;
 	int i;
 
-	for (i = 0; i < priv->rx_cfg.num_queues; i++) {
-		nc = &priv->rx[i].page_cache;
-		if (nc->va) {
-			__page_frag_cache_drain(virt_to_page(nc->va),
-						nc->pagecnt_bias);
-			nc->va = NULL;
-		}
-	}
+	for (i = 0; i < priv->rx_cfg.num_queues; i++)
+		page_frag_cache_drain(&priv->rx[i].page_cache);
 }
 
 static int gve_open(struct net_device *dev)
diff --git a/drivers/net/ethernet/mediatek/mtk_wed_wo.c b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
index d58b07e7e123..7063c78bd35f 100644
--- a/drivers/net/ethernet/mediatek/mtk_wed_wo.c
+++ b/drivers/net/ethernet/mediatek/mtk_wed_wo.c
@@ -286,7 +286,6 @@ mtk_wed_wo_queue_free(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
 static void
 mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
 {
-	struct page *page;
 	int i;
 
 	for (i = 0; i < q->n_desc; i++) {
@@ -301,19 +300,12 @@ mtk_wed_wo_queue_tx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
 		entry->buf = NULL;
 	}
 
-	if (!q->cache.va)
-		return;
-
-	page = virt_to_page(q->cache.va);
-	__page_frag_cache_drain(page, q->cache.pagecnt_bias);
-	memset(&q->cache, 0, sizeof(q->cache));
+	page_frag_cache_drain(&q->cache);
 }
 
 static void
 mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
 {
-	struct page *page;
-
 	for (;;) {
 		void *buf = mtk_wed_wo_dequeue(wo, q, NULL, true);
 
@@ -323,12 +315,7 @@ mtk_wed_wo_queue_rx_clean(struct mtk_wed_wo *wo, struct mtk_wed_wo_queue *q)
 		skb_free_frag(buf);
 	}
 
-	if (!q->cache.va)
-		return;
-
-	page = virt_to_page(q->cache.va);
-	__page_frag_cache_drain(page, q->cache.pagecnt_bias);
-	memset(&q->cache, 0, sizeof(q->cache));
+	page_frag_cache_drain(&q->cache);
 }
 
 static void
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 08805f027810..c80037a78066 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1344,7 +1344,6 @@ static int nvme_tcp_alloc_async_req(struct nvme_tcp_ctrl *ctrl)
 
 static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
 {
-	struct page *page;
 	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
 	struct nvme_tcp_queue *queue = &ctrl->queues[qid];
 	unsigned int noreclaim_flag;
@@ -1355,11 +1354,7 @@ static void nvme_tcp_free_queue(struct nvme_ctrl *nctrl, int qid)
 	if (queue->hdr_digest || queue->data_digest)
 		nvme_tcp_free_crypto(queue);
 
-	if (queue->pf_cache.va) {
-		page = virt_to_head_page(queue->pf_cache.va);
-		__page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias);
-		queue->pf_cache.va = NULL;
-	}
+	page_frag_cache_drain(&queue->pf_cache);
 
 	noreclaim_flag = memalloc_noreclaim_save();
 	/* ->sock will be released by fput() */
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 4cc27856aa8f..11237557cfc5 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1576,7 +1576,6 @@ static void nvmet_tcp_free_cmd_data_in_buffers(struct nvmet_tcp_queue *queue)
 
 static void nvmet_tcp_release_queue_work(struct work_struct *w)
 {
-	struct page *page;
 	struct nvmet_tcp_queue *queue =
 		container_of(w, struct nvmet_tcp_queue, release_work);
 
@@ -1600,8 +1599,7 @@ static void nvmet_tcp_release_queue_work(struct work_struct *w)
 	if (queue->hdr_digest || queue->data_digest)
 		nvmet_tcp_free_crypto(queue);
 	ida_free(&nvmet_tcp_queue_ida, queue->idx);
-	page = virt_to_head_page(queue->pf_cache.va);
-	__page_frag_cache_drain(page, queue->pf_cache.pagecnt_bias);
+	page_frag_cache_drain(&queue->pf_cache);
 	kfree(queue);
 }
 
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 805e11d598e4..4b2fcb228a0a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1386,9 +1386,7 @@ static int vhost_net_release(struct inode *inode, struct file *f)
 	kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
 	kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
 	kfree(n->dev.vqs);
-	if (n->pf_cache.va)
-		__page_frag_cache_drain(virt_to_head_page(n->pf_cache.va),
-					n->pf_cache.pagecnt_bias);
+	page_frag_cache_drain(&n->pf_cache);
 	kvfree(n);
 	return 0;
 }
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index bbd75976541e..03ba079655d3 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -316,6 +316,8 @@ extern void *page_frag_alloc_align(struct page_frag_cache *nc,
 				   unsigned int fragsz, gfp_t gfp_mask,
 				   unsigned int align);
 
+void page_frag_cache_drain(struct page_frag_cache *nc);
+
 static inline void *page_frag_alloc(struct page_frag_cache *nc,
 			     unsigned int fragsz, gfp_t gfp_mask)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 083e0c38fb62..5a0e68edcb05 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4716,6 +4716,16 @@ void __page_frag_cache_drain(struct page *page, unsigned int count)
 }
 EXPORT_SYMBOL(__page_frag_cache_drain);
 
+void page_frag_cache_drain(struct page_frag_cache *nc)
+{
+	if (!nc->va)
+		return;
+
+	__page_frag_cache_drain(virt_to_head_page(nc->va), nc->pagecnt_bias);
+	nc->va = NULL;
+}
+EXPORT_SYMBOL(page_frag_cache_drain);
+
 void *page_frag_alloc_align(struct page_frag_cache *nc,
 		      unsigned int fragsz, gfp_t gfp_mask,
 		      unsigned int align)
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation
  2024-01-03  9:56 ` [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation Yunsheng Lin
@ 2024-01-05 15:35   ` Alexander H Duyck
  2024-01-08  8:25     ` Yunsheng Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander H Duyck @ 2024-01-05 15:35 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba, pabeni
  Cc: netdev, linux-kernel, Michael S. Tsirkin, Jason Wang,
	Andrew Morton, Eric Dumazet, kvm, virtualization, linux-mm

On Wed, 2024-01-03 at 17:56 +0800, Yunsheng Lin wrote:
> Currently there seems to be three page frag implementions
> which all try to allocate order 3 page, if that fails, it
> then fail back to allocate order 0 page, and each of them
> all allow order 3 page allocation to fail under certain
> condition by using specific gfp bits.
> 
> The gfp bits for order 3 page allocation are different
> between different implementation, __GFP_NOMEMALLOC is
> or'd to forbid access to emergency reserves memory for
> __page_frag_cache_refill(), but it is not or'd in other
> implementions, __GFP_DIRECT_RECLAIM is masked off to avoid
> direct reclaim in skb_page_frag_refill(), but it is not
> masked off in __page_frag_cache_refill().
> 
> This patch unifies the gfp bits used between different
> implementions by or'ing __GFP_NOMEMALLOC and masking off
> __GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
> possible pressure for mm.
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> ---
>  drivers/vhost/net.c | 2 +-
>  mm/page_alloc.c     | 4 ++--
>  net/core/sock.c     | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index f2ed7167c848..e574e21cc0ca 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -670,7 +670,7 @@ static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
>  		/* Avoid direct reclaim but allow kswapd to wake */
>  		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>  					  __GFP_COMP | __GFP_NOWARN |
> -					  __GFP_NORETRY,
> +					  __GFP_NORETRY | __GFP_NOMEMALLOC,
>  					  SKB_FRAG_PAGE_ORDER);
>  		if (likely(pfrag->page)) {
>  			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9a16305cf985..1f0b36dd81b5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4693,8 +4693,8 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>  	gfp_t gfp = gfp_mask;
>  
>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> -	gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
> -		    __GFP_NOMEMALLOC;
> +	gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
> +		   __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
>  	page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
>  				PAGE_FRAG_CACHE_MAX_ORDER);
>  	nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 446e945f736b..d643332c3ee5 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2900,7 +2900,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
>  		/* Avoid direct reclaim but allow kswapd to wake */
>  		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>  					  __GFP_COMP | __GFP_NOWARN |
> -					  __GFP_NORETRY,
> +					  __GFP_NORETRY | __GFP_NOMEMALLOC,
>  					  SKB_FRAG_PAGE_ORDER);
>  		if (likely(pfrag->page)) {
>  			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;

Looks fine to me.

One thing you may want to consider would be to place this all in an
inline function that could just consolidate all the code.

Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 5/6] net: introduce page_frag_cache_drain()
  2024-01-03  9:56 ` [PATCH net-next 5/6] net: introduce page_frag_cache_drain() Yunsheng Lin
@ 2024-01-05 15:48   ` Alexander H Duyck
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander H Duyck @ 2024-01-05 15:48 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba, pabeni
  Cc: netdev, linux-kernel, Jason Wang, Jeroen de Borst,
	Praveen Kaligineedi, Shailend Chand, Eric Dumazet, Felix Fietkau,
	John Crispin, Sean Wang, Mark Lee, Lorenzo Bianconi,
	Matthias Brugger, AngeloGioacchino Del Regno, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni,
	Michael S. Tsirkin, Andrew Morton, linux-arm-kernel,
	linux-mediatek, linux-nvme, kvm, virtualization, linux-mm

On Wed, 2024-01-03 at 17:56 +0800, Yunsheng Lin wrote:
> When draining a page_frag_cache, most user are doing
> the similar steps, so introduce an API to avoid code
> duplication.
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> Acked-by: Jason Wang <jasowang@redhat.com>

Looks good to me.

Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill()
  2024-01-03  9:56 ` [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill() Yunsheng Lin
@ 2024-01-05 16:06   ` Alexander H Duyck
  2024-01-08  9:06     ` Yunsheng Lin
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander H Duyck @ 2024-01-05 16:06 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba, pabeni
  Cc: netdev, linux-kernel, Jason Wang, Michael S. Tsirkin,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, kvm, virtualization, bpf

On Wed, 2024-01-03 at 17:56 +0800, Yunsheng Lin wrote:
> The page frag in vhost_net_page_frag_refill() uses the
> 'struct page_frag' from skb_page_frag_refill(), but it's
> implementation is similar to page_frag_alloc_align() now.
> 
> This patch removes vhost_net_page_frag_refill() by using
> 'struct page_frag_cache' instead of 'struct page_frag',
> and allocating frag using page_frag_alloc_align().
> 
> The added benefit is that not only unifying the page frag
> implementation a little, but also having about 0.5% performance
> boost testing by using the vhost_net_test introduced in the
> last patch.
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c | 93 ++++++++++++++-------------------------------
>  1 file changed, 29 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index e574e21cc0ca..805e11d598e4 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -141,10 +141,8 @@ struct vhost_net {
>  	unsigned tx_zcopy_err;
>  	/* Flush in progress. Protected by tx vq lock. */
>  	bool tx_flush;
> -	/* Private page frag */
> -	struct page_frag page_frag;
> -	/* Refcount bias of page frag */
> -	int refcnt_bias;
> +	/* Private page frag cache */
> +	struct page_frag_cache pf_cache;
>  };
>  
>  static unsigned vhost_net_zcopy_mask __read_mostly;
> @@ -655,41 +653,6 @@ static bool tx_can_batch(struct vhost_virtqueue *vq, size_t total_len)
>  	       !vhost_vq_avail_empty(vq->dev, vq);
>  }
>  
> -static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
> -				       struct page_frag *pfrag, gfp_t gfp)
> -{
> -	if (pfrag->page) {
> -		if (pfrag->offset + sz <= pfrag->size)
> -			return true;
> -		__page_frag_cache_drain(pfrag->page, net->refcnt_bias);
> -	}
> -
> -	pfrag->offset = 0;
> -	net->refcnt_bias = 0;
> -	if (SKB_FRAG_PAGE_ORDER) {
> -		/* Avoid direct reclaim but allow kswapd to wake */
> -		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> -					  __GFP_COMP | __GFP_NOWARN |
> -					  __GFP_NORETRY | __GFP_NOMEMALLOC,
> -					  SKB_FRAG_PAGE_ORDER);
> -		if (likely(pfrag->page)) {
> -			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> -			goto done;
> -		}
> -	}
> -	pfrag->page = alloc_page(gfp);
> -	if (likely(pfrag->page)) {
> -		pfrag->size = PAGE_SIZE;
> -		goto done;
> -	}
> -	return false;
> -
> -done:
> -	net->refcnt_bias = USHRT_MAX;
> -	page_ref_add(pfrag->page, USHRT_MAX - 1);
> -	return true;
> -}
> -
>  #define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
>  
>  static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
> @@ -699,7 +662,6 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
>  	struct vhost_net *net = container_of(vq->dev, struct vhost_net,
>  					     dev);
>  	struct socket *sock = vhost_vq_get_backend(vq);
> -	struct page_frag *alloc_frag = &net->page_frag;
>  	struct virtio_net_hdr *gso;
>  	struct xdp_buff *xdp = &nvq->xdp[nvq->batched_xdp];
>  	struct tun_xdp_hdr *hdr;
> @@ -710,6 +672,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
>  	int sock_hlen = nvq->sock_hlen;
>  	void *buf;
>  	int copied;
> +	int ret;
>  
>  	if (unlikely(len < nvq->sock_hlen))
>  		return -EFAULT;
> @@ -719,18 +682,17 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
>  		return -ENOSPC;
>  
>  	buflen += SKB_DATA_ALIGN(len + pad);
> -	alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
> -	if (unlikely(!vhost_net_page_frag_refill(net, buflen,
> -						 alloc_frag, GFP_KERNEL)))
> +	buf = page_frag_alloc_align(&net->pf_cache, buflen, GFP_KERNEL,
> +				    SMP_CACHE_BYTES);

If your changes from patch 1 are just to make it fit into this layout
might I suggest just splitting up page_frag_alloc_align into an inline
that accepts the arguments you have here, and adding
__page_frag_alloc_align which is passed the mask the original function
expected. By doing that you should be able to maintain the same level
of performance and still get most of the code cleanup.

> +	if (unlikely(!buf))
>  		return -ENOMEM;
>  
> -	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> -	copied = copy_page_from_iter(alloc_frag->page,
> -				     alloc_frag->offset +
> -				     offsetof(struct tun_xdp_hdr, gso),
> -				     sock_hlen, from);
> -	if (copied != sock_hlen)
> -		return -EFAULT;
> +	copied = copy_from_iter(buf + offsetof(struct tun_xdp_hdr, gso),
> +				sock_hlen, from);
> +	if (copied != sock_hlen) {
> +		ret = -EFAULT;
> +		goto err;
> +	}
>  
>  	hdr = buf;
>  	gso = &hdr->gso;
> @@ -743,27 +705,30 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
>  			       vhost16_to_cpu(vq, gso->csum_start) +
>  			       vhost16_to_cpu(vq, gso->csum_offset) + 2);
>  
> -		if (vhost16_to_cpu(vq, gso->hdr_len) > len)
> -			return -EINVAL;
> +		if (vhost16_to_cpu(vq, gso->hdr_len) > len) {
> +			ret = -EINVAL;
> +			goto err;
> +		}
>  	}
>  
>  	len -= sock_hlen;
> -	copied = copy_page_from_iter(alloc_frag->page,
> -				     alloc_frag->offset + pad,
> -				     len, from);
> -	if (copied != len)
> -		return -EFAULT;
> +	copied = copy_from_iter(buf + pad, len, from);
> +	if (copied != len) {
> +		ret = -EFAULT;
> +		goto err;
> +	}
>  
>  	xdp_init_buff(xdp, buflen, NULL);
>  	xdp_prepare_buff(xdp, buf, pad, len, true);
>  	hdr->buflen = buflen;
>  
> -	--net->refcnt_bias;
> -	alloc_frag->offset += buflen;
> -
>  	++nvq->batched_xdp;
>  
>  	return 0;
> +
> +err:
> +	page_frag_free(buf);
> +	return ret;
>  }
>  
>  static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> @@ -1353,8 +1318,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  			vqs[VHOST_NET_VQ_RX]);
>  
>  	f->private_data = n;
> -	n->page_frag.page = NULL;
> -	n->refcnt_bias = 0;
> +	n->pf_cache.va = NULL;
>  
>  	return 0;
>  }
> @@ -1422,8 +1386,9 @@ static int vhost_net_release(struct inode *inode, struct file *f)
>  	kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
>  	kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
>  	kfree(n->dev.vqs);
> -	if (n->page_frag.page)
> -		__page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
> +	if (n->pf_cache.va)
> +		__page_frag_cache_drain(virt_to_head_page(n->pf_cache.va),
> +					n->pf_cache.pagecnt_bias);
>  	kvfree(n);
>  	return 0;
>  }

I would recommend reordering this patch with patch 5. Then you could
remove the block that is setting "n->pf_cache.va = NULL" above and just
make use of page_frag_cache_drain in the lower block which would also
return the va to NULL.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation
  2024-01-05 15:35   ` Alexander H Duyck
@ 2024-01-08  8:25     ` Yunsheng Lin
  2024-01-08 16:13       ` Alexander Duyck
  0 siblings, 1 reply; 10+ messages in thread
From: Yunsheng Lin @ 2024-01-08  8:25 UTC (permalink / raw)
  To: Alexander H Duyck, davem, kuba, pabeni
  Cc: netdev, linux-kernel, Michael S. Tsirkin, Jason Wang,
	Andrew Morton, Eric Dumazet, kvm, virtualization, linux-mm

On 2024/1/5 23:35, Alexander H Duyck wrote:
> On Wed, 2024-01-03 at 17:56 +0800, Yunsheng Lin wrote:
>> Currently there seems to be three page frag implementions
>> which all try to allocate order 3 page, if that fails, it
>> then fail back to allocate order 0 page, and each of them
>> all allow order 3 page allocation to fail under certain
>> condition by using specific gfp bits.
>>
>> The gfp bits for order 3 page allocation are different
>> between different implementation, __GFP_NOMEMALLOC is
>> or'd to forbid access to emergency reserves memory for
>> __page_frag_cache_refill(), but it is not or'd in other
>> implementions, __GFP_DIRECT_RECLAIM is masked off to avoid
>> direct reclaim in skb_page_frag_refill(), but it is not
>> masked off in __page_frag_cache_refill().
>>
>> This patch unifies the gfp bits used between different
>> implementions by or'ing __GFP_NOMEMALLOC and masking off
>> __GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
>> possible pressure for mm.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> CC: Alexander Duyck <alexander.duyck@gmail.com>
>> ---
>>  drivers/vhost/net.c | 2 +-
>>  mm/page_alloc.c     | 4 ++--
>>  net/core/sock.c     | 2 +-
>>  3 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index f2ed7167c848..e574e21cc0ca 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -670,7 +670,7 @@ static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
>>  		/* Avoid direct reclaim but allow kswapd to wake */
>>  		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>>  					  __GFP_COMP | __GFP_NOWARN |
>> -					  __GFP_NORETRY,
>> +					  __GFP_NORETRY | __GFP_NOMEMALLOC,
>>  					  SKB_FRAG_PAGE_ORDER);
>>  		if (likely(pfrag->page)) {
>>  			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 9a16305cf985..1f0b36dd81b5 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -4693,8 +4693,8 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
>>  	gfp_t gfp = gfp_mask;
>>  
>>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> -	gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
>> -		    __GFP_NOMEMALLOC;
>> +	gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
>> +		   __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
>>  	page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
>>  				PAGE_FRAG_CACHE_MAX_ORDER);
>>  	nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 446e945f736b..d643332c3ee5 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2900,7 +2900,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
>>  		/* Avoid direct reclaim but allow kswapd to wake */
>>  		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>>  					  __GFP_COMP | __GFP_NOWARN |
>> -					  __GFP_NORETRY,
>> +					  __GFP_NORETRY | __GFP_NOMEMALLOC,
>>  					  SKB_FRAG_PAGE_ORDER);
>>  		if (likely(pfrag->page)) {
>>  			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> 
> Looks fine to me.
> 
> One thing you may want to consider would be to place this all in an
> inline function that could just consolidate all the code.

Do you think it is possible to further unify the implementations of the
'struct page_frag_cache' and 'struct page_frag', so adding a inline
function for above is unnecessary?

> 
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> 
> .
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill()
  2024-01-05 16:06   ` Alexander H Duyck
@ 2024-01-08  9:06     ` Yunsheng Lin
  2024-01-08 15:45       ` Alexander Duyck
  0 siblings, 1 reply; 10+ messages in thread
From: Yunsheng Lin @ 2024-01-08  9:06 UTC (permalink / raw)
  To: Alexander H Duyck, davem, kuba, pabeni
  Cc: netdev, linux-kernel, Jason Wang, Michael S. Tsirkin,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, kvm, virtualization, bpf

On 2024/1/6 0:06, Alexander H Duyck wrote:
>>  
>>  static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>> @@ -1353,8 +1318,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>>  			vqs[VHOST_NET_VQ_RX]);
>>  
>>  	f->private_data = n;
>> -	n->page_frag.page = NULL;
>> -	n->refcnt_bias = 0;
>> +	n->pf_cache.va = NULL;
>>  
>>  	return 0;
>>  }
>> @@ -1422,8 +1386,9 @@ static int vhost_net_release(struct inode *inode, struct file *f)
>>  	kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
>>  	kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
>>  	kfree(n->dev.vqs);
>> -	if (n->page_frag.page)
>> -		__page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
>> +	if (n->pf_cache.va)
>> +		__page_frag_cache_drain(virt_to_head_page(n->pf_cache.va),
>> +					n->pf_cache.pagecnt_bias);
>>  	kvfree(n);
>>  	return 0;
>>  }
> 
> I would recommend reordering this patch with patch 5. Then you could
> remove the block that is setting "n->pf_cache.va = NULL" above and just
> make use of page_frag_cache_drain in the lower block which would also
> return the va to NULL.

I am not sure if we can as there is no zeroing for 'struct vhost_net' in
vhost_net_open().

If we don't have "n->pf_cache.va = NULL", don't we use the uninitialized data
when calling page_frag_alloc_align() for the first time?

> .
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill()
  2024-01-08  9:06     ` Yunsheng Lin
@ 2024-01-08 15:45       ` Alexander Duyck
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Duyck @ 2024-01-08 15:45 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Jason Wang,
	Michael S. Tsirkin, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, kvm, virtualization, bpf

On Mon, Jan 8, 2024 at 1:06 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2024/1/6 0:06, Alexander H Duyck wrote:
> >>
> >>  static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> >> @@ -1353,8 +1318,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
> >>                      vqs[VHOST_NET_VQ_RX]);
> >>
> >>      f->private_data = n;
> >> -    n->page_frag.page = NULL;
> >> -    n->refcnt_bias = 0;
> >> +    n->pf_cache.va = NULL;
> >>
> >>      return 0;
> >>  }
> >> @@ -1422,8 +1386,9 @@ static int vhost_net_release(struct inode *inode, struct file *f)
> >>      kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
> >>      kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
> >>      kfree(n->dev.vqs);
> >> -    if (n->page_frag.page)
> >> -            __page_frag_cache_drain(n->page_frag.page, n->refcnt_bias);
> >> +    if (n->pf_cache.va)
> >> +            __page_frag_cache_drain(virt_to_head_page(n->pf_cache.va),
> >> +                                    n->pf_cache.pagecnt_bias);
> >>      kvfree(n);
> >>      return 0;
> >>  }
> >
> > I would recommend reordering this patch with patch 5. Then you could
> > remove the block that is setting "n->pf_cache.va = NULL" above and just
> > make use of page_frag_cache_drain in the lower block which would also
> > return the va to NULL.
>
> I am not sure if we can as there is no zeroing for 'struct vhost_net' in
> vhost_net_open().
>
> If we don't have "n->pf_cache.va = NULL", don't we use the uninitialized data
> when calling page_frag_alloc_align() for the first time?

I see. So kvmalloc is used instead of kvzalloc when allocating the
structure. That might be an opportunity to clean things up a bit by
making that change to reduce the risk of some piece of memory
initialization being missed.

That said, I still think reordering the two patches might be useful as
it would help to make it so that the change you make to vhost_net is
encapsulated in one patch to fully enable the use of the new page pool
API.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation
  2024-01-08  8:25     ` Yunsheng Lin
@ 2024-01-08 16:13       ` Alexander Duyck
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Duyck @ 2024-01-08 16:13 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Michael S. Tsirkin,
	Jason Wang, Andrew Morton, Eric Dumazet, kvm, virtualization,
	linux-mm

On Mon, Jan 8, 2024 at 12:25 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2024/1/5 23:35, Alexander H Duyck wrote:
> > On Wed, 2024-01-03 at 17:56 +0800, Yunsheng Lin wrote:
> >> Currently there seems to be three page frag implementions
> >> which all try to allocate order 3 page, if that fails, it
> >> then fail back to allocate order 0 page, and each of them
> >> all allow order 3 page allocation to fail under certain
> >> condition by using specific gfp bits.
> >>
> >> The gfp bits for order 3 page allocation are different
> >> between different implementation, __GFP_NOMEMALLOC is
> >> or'd to forbid access to emergency reserves memory for
> >> __page_frag_cache_refill(), but it is not or'd in other
> >> implementions, __GFP_DIRECT_RECLAIM is masked off to avoid
> >> direct reclaim in skb_page_frag_refill(), but it is not
> >> masked off in __page_frag_cache_refill().
> >>
> >> This patch unifies the gfp bits used between different
> >> implementions by or'ing __GFP_NOMEMALLOC and masking off
> >> __GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
> >> possible pressure for mm.
> >>
> >> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> >> CC: Alexander Duyck <alexander.duyck@gmail.com>
> >> ---
> >>  drivers/vhost/net.c | 2 +-
> >>  mm/page_alloc.c     | 4 ++--
> >>  net/core/sock.c     | 2 +-
> >>  3 files changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index f2ed7167c848..e574e21cc0ca 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -670,7 +670,7 @@ static bool vhost_net_page_frag_refill(struct vhost_net *net, unsigned int sz,
> >>              /* Avoid direct reclaim but allow kswapd to wake */
> >>              pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> >>                                        __GFP_COMP | __GFP_NOWARN |
> >> -                                      __GFP_NORETRY,
> >> +                                      __GFP_NORETRY | __GFP_NOMEMALLOC,
> >>                                        SKB_FRAG_PAGE_ORDER);
> >>              if (likely(pfrag->page)) {
> >>                      pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 9a16305cf985..1f0b36dd81b5 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -4693,8 +4693,8 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
> >>      gfp_t gfp = gfp_mask;
> >>
> >>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> >> -    gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
> >> -                __GFP_NOMEMALLOC;
> >> +    gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
> >> +               __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
> >>      page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
> >>                              PAGE_FRAG_CACHE_MAX_ORDER);
> >>      nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
> >> diff --git a/net/core/sock.c b/net/core/sock.c
> >> index 446e945f736b..d643332c3ee5 100644
> >> --- a/net/core/sock.c
> >> +++ b/net/core/sock.c
> >> @@ -2900,7 +2900,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
> >>              /* Avoid direct reclaim but allow kswapd to wake */
> >>              pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> >>                                        __GFP_COMP | __GFP_NOWARN |
> >> -                                      __GFP_NORETRY,
> >> +                                      __GFP_NORETRY | __GFP_NOMEMALLOC,
> >>                                        SKB_FRAG_PAGE_ORDER);
> >>              if (likely(pfrag->page)) {
> >>                      pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> >
> > Looks fine to me.
> >
> > One thing you may want to consider would be to place this all in an
> > inline function that could just consolidate all the code.
>
> Do you think it is possible to further unify the implementations of the
> 'struct page_frag_cache' and 'struct page_frag', so adding a inline
> function for above is unnecessary?

Actually the skb_page_frag_refill seems to function more similarly to
how the Intel drivers do in terms of handling fragments. It is
basically slicing off pieces until either it runs out of them and
allocates a new one, or if the page reference count is one without
pre-allocating the references.

However, with that said many of the core bits are the same so it might
be possible to look at unifiying at least pieces of this. For example
the page_frag has the same first 3 members as the page_frag_cache so
it might be possible to look at refactoring things further to unify
more of the frag_refill logic.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-01-08 16:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240103095650.25769-1-linyunsheng@huawei.com>
2024-01-03  9:56 ` [PATCH net-next 2/6] page_frag: unify gfp bits for order 3 page allocation Yunsheng Lin
2024-01-05 15:35   ` Alexander H Duyck
2024-01-08  8:25     ` Yunsheng Lin
2024-01-08 16:13       ` Alexander Duyck
2024-01-03  9:56 ` [PATCH net-next 4/6] vhost/net: remove vhost_net_page_frag_refill() Yunsheng Lin
2024-01-05 16:06   ` Alexander H Duyck
2024-01-08  9:06     ` Yunsheng Lin
2024-01-08 15:45       ` Alexander Duyck
2024-01-03  9:56 ` [PATCH net-next 5/6] net: introduce page_frag_cache_drain() Yunsheng Lin
2024-01-05 15:48   ` Alexander H Duyck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox