The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool
@ 2026-06-26 10:26 Breno Leitao
  2026-06-26 10:26 ` [PATCH RFC 1/4] fs/pipe: make the prealloc pool per-pipe infrastructure Breno Leitao
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26 10:26 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, oleg, mjguzik, josh, Jan Kara,
	jlayton
  Cc: axboe, shakeel.butt, linux-fsdevel, linux-kernel, Breno Leitao,
	kernel-team

TL;DR: This simplifies the pipe code, unify the page pools, reduce the
code by 11 lines, and improves the microbenchmark by up to 23% — so it's
probably wrong (!?).

Summary:
=======

I've spent some time converging tmp_page[] and the on-stack
anon_pipe_prealloc pool of pages into a single per-pipe pool, as
discussed previously in a few places, most recently at:

https://lore.kernel.org/all/ajLA_zxsYyKISkwp@redhat.com/

Problem:
========

1) We have two types of page caches in the pipe mechanism today
   * tmp_page[]
   * anon_pipe_prealloc

2) they operate in different ways:
   * tmp_page[] is protected by the pipe lock
    *  per-pipe, persistent, 2 pages
   * anon_pipe_prealloc is an on-stack pool, not lock protected
    *  burst, up to 8 pages

Proposal/Design:
================

1) Keep the same page budget as today
  a) up to two per-pipe persistent pages
  b) burst of up to 8 pages

2) no pages are allocated unless necessary
   * Pages are _ONLY_ allocated based on the length of the write,
     minus the pages already available in the pool.
   * No page is allocated but left unused

3) keep allocation and freeing outside of the lock
   * only the assignment of pages stays lock-protected
   * Currently, tmp_page[] pages are allocated in the lock, so
     this patch will improve it (thus the performance numbers)

How:
====

1) replace tmp_page[] with anon_pipe_prealloc in pipe_inode_info
2) at write (anon_pipe_write), allocate the pages outside the lock in a helper
   called anon_pipe_prefill()
   a) the assignment into the pool must be lock protected
      * anon_pipe_prefill() does it
   b) anon_pipe_prefill() can populate up to PIPE_PREALLOC_MAX pages in the
      pool
3) once anon_pipe_write is done, the pool is trimmed back to at most
   PIPE_PREALLOC_KEEP (2) pages by anon_pipe_trim_pool()

Testing:
========

Tested on a bare-metal Intel(R) Xeon(R) Platinum 8321HC (52 CPUs) using the
pipe_bench selftest (tools/testing/selftests/pipe/pipe_bench).

Two kernels were built from the same configuration (no debug options),
differing only by this series:

  - baseline: on-stack anon_pipe_prealloc pool + tmp_page[]
    Commit 4e5dfb7c84012 ("Add linux-next specific files for 20260623")
  - patched:  this series (unified per-pipe pool)

Each kernel was booted on the same host and benchmarked with 5 writers /
5 readers, 64 KiB messages, 5s per run, with and without memory pressure
(stress-ng --vm 4 --vm-bytes 80%). Comparing writes/s and average write
latency:

  - no memory pressure:    ~+11% throughput, ~-10% avg write latency
  - under memory pressure: ~+23% throughput, ~-18% avg write latency

The improvement comes from the larger persistent cache (up to 8 reusable
pages vs the old 2-page tmp_page cache), which reduces alloc_page()/
free_page() traffic; the effect is largest when reclaim is active.

Future:
=======

If this approach is accepted, we could keep all allocated pages in the pool
and rely on a shrinker to trim it under memory pressure.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
Breno Leitao (4):
      fs/pipe: make the prealloc pool per-pipe infrastructure
      fs/pipe: add per-pipe pool push, prefill and trim helpers
      fs/pipe: switch the write path to the per-pipe pool
      fs/pipe: remove the old on-stack prealloc helpers and tmp_page[2]

 fs/pipe.c                 | 162 +++++++++++++++++++---------------------------
 include/linux/pipe_fs_i.h |  21 +++++-
 2 files changed, 86 insertions(+), 97 deletions(-)
---
base-commit: 4e5dfb7c84012007c3c7061126491bbc92d71bf1
change-id: 20260625-b4-pipe-unification-aba7b8525de7

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 1/4] fs/pipe: make the prealloc pool per-pipe infrastructure
  2026-06-26 10:26 [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool Breno Leitao
@ 2026-06-26 10:26 ` Breno Leitao
  2026-06-26 10:26 ` [PATCH RFC 2/4] fs/pipe: add per-pipe pool push, prefill and trim helpers Breno Leitao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26 10:26 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, oleg, mjguzik, josh, Jan Kara,
	jlayton
  Cc: axboe, shakeel.butt, linux-fsdevel, linux-kernel, Breno Leitao,
	kernel-team

Move struct anon_pipe_prealloc and PIPE_PREALLOC_MAX to pipe_fs_i.h and
embed the pool in each pipe via a new prealloc field, next to the existing
tmp_page[2] cache (which will be removed by the end of this patchset).
Add PIPE_PREALLOC_KEEP for the post-write trim target.

The on-stack prealloc pool used by anon_pipe_write() is unchanged; this
only adds the per-pipe storage that later patches switch the read/write
paths over to.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 fs/pipe.c                 |  7 -------
 include/linux/pipe_fs_i.h | 19 +++++++++++++++++++
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 429b0714ec575..325fd9757dbdd 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -111,13 +111,6 @@ void pipe_double_lock(struct pipe_inode_info *pipe1,
 	pipe_lock(pipe2);
 }
 
-#define PIPE_PREALLOC_MAX 8
-
-struct anon_pipe_prealloc {
-	struct page *pages[PIPE_PREALLOC_MAX];
-	unsigned int count;
-};
-
 /*
  * Pre-allocate pages outside pipe->mutex for multi-page writes.
  * alloc_page() with GFP_HIGHUSER can sleep in reclaim and runs memcg
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index a1eeed8006694..796860cbddf30 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -13,6 +13,9 @@
 #define PIPE_BUF_FLAG_LOSS	0x40	/* Message loss happened after this buffer */
 #endif
 
+#define PIPE_PREALLOC_MAX	8	/* max pages in prealloc pool */
+#define PIPE_PREALLOC_KEEP	2	/* keep at least this many after trim */
+
 /**
  *	struct pipe_buffer - a linux kernel pipe buffer
  *	@page: the page containing the data for the pipe buffer
@@ -57,6 +60,20 @@ union pipe_index {
 	};
 };
 
+/**
+ *	struct anon_pipe_prealloc - per-pipe page preallocation pool
+ *	@pages: array of cached pages (pool)
+ *	@count: number of pages currently in the pool
+ *
+ * Each pipe keeps a small bounded pool of preallocated pages to reduce
+ * allocation overhead during writes. The pool is bounded at PIPE_PREALLOC_MAX
+ * and trimmed down to PIPE_PREALLOC_KEEP after a write completes.
+ */
+struct anon_pipe_prealloc {
+	struct page *pages[PIPE_PREALLOC_MAX];
+	unsigned int count;
+};
+
 /**
  *	struct pipe_inode_info - a linux kernel pipe
  *	@mutex: mutex protecting the whole thing
@@ -68,6 +85,7 @@ union pipe_index {
  *	@ring_size: total number of buffers (should be a power of 2)
  *	@nr_accounted: The amount this pipe accounts for in user->pipe_bufs
  *	@tmp_page: cached released page
+ *	@prealloc: per-pipe page preallocation pool
  *	@readers: number of current readers of this pipe
  *	@writers: number of current writers of this pipe
  *	@files: number of struct file referring this pipe (protected by ->i_lock)
@@ -99,6 +117,7 @@ struct pipe_inode_info {
 	bool note_loss;
 #endif
 	struct page *tmp_page[2];
+	struct anon_pipe_prealloc prealloc;
 	struct fasync_struct *fasync_readers;
 	struct fasync_struct *fasync_writers;
 	struct pipe_buffer *bufs;

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 2/4] fs/pipe: add per-pipe pool push, prefill and trim helpers
  2026-06-26 10:26 [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool Breno Leitao
  2026-06-26 10:26 ` [PATCH RFC 1/4] fs/pipe: make the prealloc pool per-pipe infrastructure Breno Leitao
@ 2026-06-26 10:26 ` Breno Leitao
  2026-06-26 10:26 ` [PATCH RFC 3/4] fs/pipe: switch the write path to the per-pipe pool Breno Leitao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26 10:26 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, oleg, mjguzik, josh, Jan Kara,
	jlayton
  Cc: axboe, shakeel.butt, linux-fsdevel, linux-kernel, Breno Leitao,
	kernel-team

Add the helpers the per-pipe pool needs: anon_pipe_prealloc_push() to
return a page to the pool, anon_pipe_prefill() to top the pipe's pool up
before the lock, and anon_pipe_trim_pool() to drop it back to
PIPE_PREALLOC_KEEP after a write. anon_pipe_prealloc_pop() already exists
and is reused.

prefill and trim_pool have no callers yet and are marked __maybe_unused;
the next patch wires them into anon_pipe_write() and removes the
annotation along with the old on-stack pool helpers.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 fs/pipe.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/fs/pipe.c b/fs/pipe.c
index 325fd9757dbdd..93bdc7a846bd6 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -155,6 +155,72 @@ static struct page *anon_pipe_prealloc_pop(struct anon_pipe_prealloc *prealloc)
 	return prealloc->pages[prealloc->count];
 }
 
+/* Push a page to the prealloc pool. Returns true if added, false if full. */
+static bool anon_pipe_prealloc_push(struct anon_pipe_prealloc *prealloc,
+				    struct page *page)
+{
+	if (prealloc->count >= PIPE_PREALLOC_MAX)
+		return false;
+	prealloc->pages[prealloc->count++] = page;
+	return true;
+}
+
+/*
+ * Top up the pipe's own pool before taking pipe->mutex, allocating only the
+ * shortfall outside the lock, then briefly take the lock to push the pages in.
+ * anon_pipe_get_page() then drains the pool instead of allocating under the lock.
+ */
+static void __maybe_unused anon_pipe_prefill(struct pipe_inode_info *pipe,
+					     size_t total_len)
+{
+	struct page *pages[PIPE_PREALLOC_MAX];
+	unsigned int want, have, need, n = 0;
+
+	want = min_t(unsigned int, DIV_ROUND_UP(total_len, PAGE_SIZE),
+		     PIPE_PREALLOC_MAX);
+	have = min_t(unsigned int, pipe->prealloc.count, want);
+	need = want - have;
+
+	if (!need)
+		return;
+
+	while (n < need) {
+		struct page *page = alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT);
+
+		if (!page)
+			break;
+		pages[n++] = page;
+	}
+	if (!n)
+		return;
+
+	mutex_lock(&pipe->mutex);
+	while (n && anon_pipe_prealloc_push(&pipe->prealloc, pages[n - 1]))
+		n--;
+	mutex_unlock(&pipe->mutex);
+
+	while (n)
+		put_page(pages[--n]);
+}
+
+/* Trim the pool down to PIPE_PREALLOC_KEEP, freeing the excess unlocked. */
+static void __maybe_unused anon_pipe_trim_pool(struct pipe_inode_info *pipe)
+{
+	struct page *excess[PIPE_PREALLOC_MAX];
+	unsigned int nexcess = 0;
+
+	if (pipe->prealloc.count <= PIPE_PREALLOC_KEEP)
+		return;
+
+	mutex_lock(&pipe->mutex);
+	while (pipe->prealloc.count > PIPE_PREALLOC_KEEP)
+		excess[nexcess++] = anon_pipe_prealloc_pop(&pipe->prealloc);
+	mutex_unlock(&pipe->mutex);
+
+	while (nexcess)
+		put_page(excess[--nexcess]);
+}
+
 static struct page *anon_pipe_get_page(struct pipe_inode_info *pipe,
 				       struct anon_pipe_prealloc *prealloc)
 {

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 3/4] fs/pipe: switch the write path to the per-pipe pool
  2026-06-26 10:26 [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool Breno Leitao
  2026-06-26 10:26 ` [PATCH RFC 1/4] fs/pipe: make the prealloc pool per-pipe infrastructure Breno Leitao
  2026-06-26 10:26 ` [PATCH RFC 2/4] fs/pipe: add per-pipe pool push, prefill and trim helpers Breno Leitao
@ 2026-06-26 10:26 ` Breno Leitao
  2026-06-26 10:26 ` [PATCH RFC 4/4] fs/pipe: remove the old on-stack prealloc helpers and tmp_page[2] Breno Leitao
  2026-07-03 10:19 ` [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool Christian Brauner
  4 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26 10:26 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, oleg, mjguzik, josh, Jan Kara,
	jlayton
  Cc: axboe, shakeel.butt, linux-fsdevel, linux-kernel, Breno Leitao,
	kernel-team

Replace the per-write on-stack prealloc pool with the pipe's persistent
pool: anon_pipe_write() now tops up pipe->prealloc before the lock via
anon_pipe_prefill() and trims it after the write via anon_pipe_trim_pool(),
and anon_pipe_get_page()/anon_pipe_put_page() drain and refill that pool
directly. Free the pool, instead of tmp_page[2], on teardown.

This leaves the old on-stack helpers (anon_pipe_get_page_prealloc,
anon_pipe_refill_tmp_pages, anon_pipe_free_pages) and tmp_page[2] without
callers; they are marked __maybe_unused here and removed in the next patch.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 fs/pipe.c | 57 ++++++++++++++++++---------------------------------------
 1 file changed, 18 insertions(+), 39 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 93bdc7a846bd6..070fba8c865c1 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -124,8 +124,8 @@ void pipe_double_lock(struct pipe_inode_info *pipe1,
  * pipe->mutex hold-time being shrunk. Any shortfall is covered by the
  * in-lock alloc_page() fallback in anon_pipe_get_page().
  */
-static void anon_pipe_get_page_prealloc(struct anon_pipe_prealloc *prealloc,
-					size_t total_len)
+static void __maybe_unused anon_pipe_get_page_prealloc(struct anon_pipe_prealloc *prealloc,
+						       size_t total_len)
 {
 	unsigned int want, i;
 	struct page *page;
@@ -170,8 +170,7 @@ static bool anon_pipe_prealloc_push(struct anon_pipe_prealloc *prealloc,
  * shortfall outside the lock, then briefly take the lock to push the pages in.
  * anon_pipe_get_page() then drains the pool instead of allocating under the lock.
  */
-static void __maybe_unused anon_pipe_prefill(struct pipe_inode_info *pipe,
-					     size_t total_len)
+static void anon_pipe_prefill(struct pipe_inode_info *pipe, size_t total_len)
 {
 	struct page *pages[PIPE_PREALLOC_MAX];
 	unsigned int want, have, need, n = 0;
@@ -204,7 +203,7 @@ static void __maybe_unused anon_pipe_prefill(struct pipe_inode_info *pipe,
 }
 
 /* Trim the pool down to PIPE_PREALLOC_KEEP, freeing the excess unlocked. */
-static void __maybe_unused anon_pipe_trim_pool(struct pipe_inode_info *pipe)
+static void anon_pipe_trim_pool(struct pipe_inode_info *pipe)
 {
 	struct page *excess[PIPE_PREALLOC_MAX];
 	unsigned int nexcess = 0;
@@ -221,39 +220,24 @@ static void __maybe_unused anon_pipe_trim_pool(struct pipe_inode_info *pipe)
 		put_page(excess[--nexcess]);
 }
 
-static struct page *anon_pipe_get_page(struct pipe_inode_info *pipe,
-				       struct anon_pipe_prealloc *prealloc)
+static struct page *anon_pipe_get_page(struct pipe_inode_info *pipe)
 {
 	struct page *page;
 
-	/* Drain prealloc first to keep tmp_page[] hot for later small writes. */
-	page = anon_pipe_prealloc_pop(prealloc);
+	/* Drain the prealloc pool before allocating. Called with mutex held. */
+	page = anon_pipe_prealloc_pop(&pipe->prealloc);
 	if (page)
 		return page;
 
-	for (int i = 0; i < ARRAY_SIZE(pipe->tmp_page); i++) {
-		if (pipe->tmp_page[i]) {
-			page = pipe->tmp_page[i];
-			pipe->tmp_page[i] = NULL;
-			return page;
-		}
-	}
-
-	/* FWIW: This is called with pipe->mutex held */
 	return alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT);
 }
 
 static void anon_pipe_put_page(struct pipe_inode_info *pipe,
 			       struct page *page)
 {
-	if (page_count(page) == 1) {
-		for (int i = 0; i < ARRAY_SIZE(pipe->tmp_page); i++) {
-			if (!pipe->tmp_page[i]) {
-				pipe->tmp_page[i] = page;
-				return;
-			}
-		}
-	}
+	if (page_count(page) == 1 &&
+	    anon_pipe_prealloc_push(&pipe->prealloc, page))
+		return;
 
 	put_page(page);
 }
@@ -262,8 +246,8 @@ static void anon_pipe_put_page(struct pipe_inode_info *pipe,
  * Stash leftover prealloc pages in tmp_page[] so the next write to this
  * pipe gets a hot page without entering the allocator.
  */
-static void anon_pipe_refill_tmp_pages(struct pipe_inode_info *pipe,
-				       struct anon_pipe_prealloc *prealloc)
+static void __maybe_unused anon_pipe_refill_tmp_pages(struct pipe_inode_info *pipe,
+						      struct anon_pipe_prealloc *prealloc)
 {
 	int i, idx;
 
@@ -282,7 +266,7 @@ static void anon_pipe_refill_tmp_pages(struct pipe_inode_info *pipe,
 }
 
 /* Runs after mutex_unlock() to keep put_page() out of the critical section. */
-static void anon_pipe_free_pages(struct anon_pipe_prealloc *prealloc)
+static void __maybe_unused anon_pipe_free_pages(struct anon_pipe_prealloc *prealloc)
 {
 	while (prealloc->count) {
 		prealloc->count--;
@@ -583,7 +567,6 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *filp = iocb->ki_filp;
 	struct pipe_inode_info *pipe = filp->private_data;
-	struct anon_pipe_prealloc prealloc;
 	unsigned int head;
 	ssize_t ret = 0;
 	size_t total_len = iov_iter_count(from);
@@ -607,8 +590,7 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *from)
 	if (unlikely(total_len == 0))
 		return 0;
 
-	anon_pipe_get_page_prealloc(&prealloc, total_len);
-
+	anon_pipe_prefill(pipe, total_len);
 	mutex_lock(&pipe->mutex);
 
 	if (!pipe->readers) {
@@ -666,7 +648,7 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *from)
 			struct page *page;
 			int copied;
 
-			page = anon_pipe_get_page(pipe, &prealloc);
+			page = anon_pipe_get_page(pipe);
 			if (unlikely(!page)) {
 				if (!ret)
 					ret = -ENOMEM;
@@ -730,11 +712,10 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *from)
 		wake_next_writer = true;
 	}
 out:
-	anon_pipe_refill_tmp_pages(pipe, &prealloc);
 	if (pipe_is_full(pipe))
 		wake_next_writer = false;
 	mutex_unlock(&pipe->mutex);
-	anon_pipe_free_pages(&prealloc);
+	anon_pipe_trim_pool(pipe);
 
 	/*
 	 * If we do do a wakeup event, we do a 'sync' wakeup, because we
@@ -1015,10 +996,8 @@ void free_pipe_info(struct pipe_inode_info *pipe)
 	if (pipe->watch_queue)
 		put_watch_queue(pipe->watch_queue);
 #endif
-	for (i = 0; i < ARRAY_SIZE(pipe->tmp_page); i++) {
-		if (pipe->tmp_page[i])
-			__free_page(pipe->tmp_page[i]);
-	}
+	for (i = 0; i < pipe->prealloc.count; i++)
+		__free_page(pipe->prealloc.pages[i]);
 	kfree(pipe->bufs);
 	kfree(pipe);
 }

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH RFC 4/4] fs/pipe: remove the old on-stack prealloc helpers and tmp_page[2]
  2026-06-26 10:26 [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool Breno Leitao
                   ` (2 preceding siblings ...)
  2026-06-26 10:26 ` [PATCH RFC 3/4] fs/pipe: switch the write path to the per-pipe pool Breno Leitao
@ 2026-06-26 10:26 ` Breno Leitao
  2026-07-03 10:19 ` [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool Christian Brauner
  4 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-06-26 10:26 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, oleg, mjguzik, josh, Jan Kara,
	jlayton
  Cc: axboe, shakeel.butt, linux-fsdevel, linux-kernel, Breno Leitao,
	kernel-team

With the write path converted to the per-pipe pool, the old on-stack
prealloc helpers (anon_pipe_get_page_prealloc, anon_pipe_refill_tmp_pages,
anon_pipe_free_pages) and the tmp_page[2] cache have no remaining users.
Remove them.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 fs/pipe.c                 | 66 -----------------------------------------------
 include/linux/pipe_fs_i.h |  2 --
 2 files changed, 68 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 070fba8c865c1..108e498aee47e 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -111,40 +111,6 @@ void pipe_double_lock(struct pipe_inode_info *pipe1,
 	pipe_lock(pipe2);
 }
 
-/*
- * Pre-allocate pages outside pipe->mutex for multi-page writes.
- * alloc_page() with GFP_HIGHUSER can sleep in reclaim and runs memcg
- * charging; doing it under the mutex stalls a concurrent reader.
- *
- * Loop alloc_page() instead of alloc_pages_bulk_*(): the bulk path refuses
- * __GFP_ACCOUNT under memcg (see commit 8dcb3060d81d "memcg: page_alloc:
- * skip bulk allocator for __GFP_ACCOUNT") and silently degrades to a single
- * page. A per-page loop keeps memcg accounting and the task NUMA mempolicy
- * honoured for every page; the per-call overhead is small compared to the
- * pipe->mutex hold-time being shrunk. Any shortfall is covered by the
- * in-lock alloc_page() fallback in anon_pipe_get_page().
- */
-static void __maybe_unused anon_pipe_get_page_prealloc(struct anon_pipe_prealloc *prealloc,
-						       size_t total_len)
-{
-	unsigned int want, i;
-	struct page *page;
-
-	prealloc->count = 0;
-	if (total_len <= PAGE_SIZE)
-		return;
-
-	want = min_t(unsigned int, DIV_ROUND_UP(total_len, PAGE_SIZE),
-		     PIPE_PREALLOC_MAX);
-
-	for (i = 0; i < want; i++) {
-		page = alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT);
-		if (!page)
-			break;
-		prealloc->pages[prealloc->count++] = page;
-	}
-}
-
 static struct page *anon_pipe_prealloc_pop(struct anon_pipe_prealloc *prealloc)
 {
 	if (!prealloc->count)
@@ -242,38 +208,6 @@ static void anon_pipe_put_page(struct pipe_inode_info *pipe,
 	put_page(page);
 }
 
-/*
- * Stash leftover prealloc pages in tmp_page[] so the next write to this
- * pipe gets a hot page without entering the allocator.
- */
-static void __maybe_unused anon_pipe_refill_tmp_pages(struct pipe_inode_info *pipe,
-						      struct anon_pipe_prealloc *prealloc)
-{
-	int i, idx;
-
-	if (!prealloc->count)
-		return;
-
-	for (i = 0; i < ARRAY_SIZE(pipe->tmp_page); i++) {
-		if (pipe->tmp_page[i])
-			continue;
-		if (!prealloc->count)
-			return;
-		idx = --prealloc->count;
-		pipe->tmp_page[i] = prealloc->pages[idx];
-		prealloc->pages[idx] = NULL;
-	}
-}
-
-/* Runs after mutex_unlock() to keep put_page() out of the critical section. */
-static void __maybe_unused anon_pipe_free_pages(struct anon_pipe_prealloc *prealloc)
-{
-	while (prealloc->count) {
-		prealloc->count--;
-		put_page(prealloc->pages[prealloc->count]);
-	}
-}
-
 static void anon_pipe_buf_release(struct pipe_inode_info *pipe,
 				  struct pipe_buffer *buf)
 {
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 796860cbddf30..6bd0d956691cb 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -84,7 +84,6 @@ struct anon_pipe_prealloc {
  *	@max_usage: The maximum number of slots that may be used in the ring
  *	@ring_size: total number of buffers (should be a power of 2)
  *	@nr_accounted: The amount this pipe accounts for in user->pipe_bufs
- *	@tmp_page: cached released page
  *	@prealloc: per-pipe page preallocation pool
  *	@readers: number of current readers of this pipe
  *	@writers: number of current writers of this pipe
@@ -116,7 +115,6 @@ struct pipe_inode_info {
 #ifdef CONFIG_WATCH_QUEUE
 	bool note_loss;
 #endif
-	struct page *tmp_page[2];
 	struct anon_pipe_prealloc prealloc;
 	struct fasync_struct *fasync_readers;
 	struct fasync_struct *fasync_writers;

-- 
2.53.0-Meta


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool
  2026-06-26 10:26 [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool Breno Leitao
                   ` (3 preceding siblings ...)
  2026-06-26 10:26 ` [PATCH RFC 4/4] fs/pipe: remove the old on-stack prealloc helpers and tmp_page[2] Breno Leitao
@ 2026-07-03 10:19 ` Christian Brauner
  2026-07-03 15:27   ` Breno Leitao
  4 siblings, 1 reply; 7+ messages in thread
From: Christian Brauner @ 2026-07-03 10:19 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Alexander Viro, Christian Brauner, oleg, mjguzik, josh, Jan Kara,
	jlayton, axboe, shakeel.butt, linux-fsdevel, linux-kernel,
	kernel-team

On 2026-06-26 03:26 -0700, Breno Leitao wrote:
> TL;DR: This simplifies the pipe code, unify the page pools, reduce the
> code by 11 lines, and improves the microbenchmark by up to 23% — so it's
> probably wrong (!?).
> 
> Summary:
> =======
> 
> I've spent some time converging tmp_page[] and the on-stack
> anon_pipe_prealloc pool of pages into a single per-pipe pool, as
> discussed previously in a few places, most recently at:
> 
> https://lore.kernel.org/all/ajLA_zxsYyKISkwp@redhat.com/

I think this makes sense. Sashiko has some comments on missing trim for
readers you might want to consider.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool
  2026-07-03 10:19 ` [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool Christian Brauner
@ 2026-07-03 15:27   ` Breno Leitao
  0 siblings, 0 replies; 7+ messages in thread
From: Breno Leitao @ 2026-07-03 15:27 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alexander Viro, oleg, mjguzik, josh, Jan Kara, jlayton, axboe,
	shakeel.butt, linux-fsdevel, linux-kernel, kernel-team

Hello Christian,

On Fri, Jul 03, 2026 at 12:19:50PM +0200, Christian Brauner wrote:
> On 2026-06-26 03:26 -0700, Breno Leitao wrote:
> > TL;DR: This simplifies the pipe code, unify the page pools, reduce the
> > code by 11 lines, and improves the microbenchmark by up to 23% — so it's
> > probably wrong (!?).
> > 
> > Summary:
> > =======
> > 
> > I've spent some time converging tmp_page[] and the on-stack
> > anon_pipe_prealloc pool of pages into a single per-pipe pool, as
> > discussed previously in a few places, most recently at:
> > 
> > https://lore.kernel.org/all/ajLA_zxsYyKISkwp@redhat.com/
> 
> I think this makes sense. Sashiko has some comments on missing trim for
> readers you might want to consider.

Thanks for looking at it.

Agreed, we want to trim on the reader side as well, otherwise we might
end up with more than 2 (current default) pages in the pipe structure.

I will update this RFC and resend as a v2.

--breno

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-07-03 15:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-26 10:26 [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool Breno Leitao
2026-06-26 10:26 ` [PATCH RFC 1/4] fs/pipe: make the prealloc pool per-pipe infrastructure Breno Leitao
2026-06-26 10:26 ` [PATCH RFC 2/4] fs/pipe: add per-pipe pool push, prefill and trim helpers Breno Leitao
2026-06-26 10:26 ` [PATCH RFC 3/4] fs/pipe: switch the write path to the per-pipe pool Breno Leitao
2026-06-26 10:26 ` [PATCH RFC 4/4] fs/pipe: remove the old on-stack prealloc helpers and tmp_page[2] Breno Leitao
2026-07-03 10:19 ` [PATCH RFC 0/4] fs/pipe: unify the page pools into a single per-pipe pool Christian Brauner
2026-07-03 15:27   ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox