public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Kuan-Wei Chiu <visitorckw@gmail.com>
To: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: richard@nod.at, akpm@linux-foundation.org,
	jserv@ccns.ncku.edu.tw, eleanor15x@gmail.com,
	marscheng@google.com, linux-mtd@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] lib/list_sort: introduce list_sort_nonatomic() and remove dummy cmp() calls
Date: Tue, 17 Mar 2026 20:32:18 +0800	[thread overview]
Message-ID: <ablJ0pe55s1-AkSh@google.com> (raw)
In-Reply-To: <7bb23ce1-a0f3-b576-ad79-c9fa746f11ed@huawei.com>

Hi Zhihao,

On Tue, Mar 17, 2026 at 12:05:54PM +0800, Zhihao Cheng wrote:
> 在 2026/3/16 3:39, Kuan-Wei Chiu 写道:
> > Historically, list_sort() implemented a hack in merge_final():
> >      if (unlikely(!++count))
> >          cmp(priv, b, b);
> > 
> > This was designed specifically so that callers could periodically
> > invoke cond_resched() within their comparison functions when merging
> > highly unbalanced lists.
> > 
> > However, an audit of the kernel tree reveals that only fs/ubifs/ relies
> > on this mechanism. For the vast majority of list_sort() users (such as
> > block layer IO schedulers and file systems), this results in completely
> > wasted function calls. In the worst-case scenario (merging an already
> > sorted list where 'a' is exhausted quickly), this results in
> > approximately (N/2)/256 unnecessary cmp() calls.
> > 
> > To clean up this API while ensuring behavior compatibility:
> > 1. Introduce list_sort_nonatomic(), which explicitly calls
> >     cond_resched() internally when count overflows.
> > 2. Remove the dummy cmp(priv, b, b) fallback for standard list_sort(),
> >     saving unnecessary function calls and improving determinism.
> > 3. Convert the sole user (fs/ubifs/) to the new API.
> > 
> > Note that ubifs still maintains cond_resched() inside its own
> > comparison functions. This patch does not alter the frequency or timing
> > of those scheduling points, guaranteeing no regressions for ubifs,
> > while benefiting all other kernel users.
> > 
> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > ---
> >   fs/ubifs/gc.c             |   4 +-
> >   fs/ubifs/replay.c         |   2 +-
> >   include/linux/list_sort.h |   3 +
> >   lib/list_sort.c           | 166 +++++++++++++++++++++-----------------
> >   4 files changed, 100 insertions(+), 75 deletions(-)
> 
> lgtm for UBIFS.
> 
> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>

Thanks for your review!

> 
> one small nit below.
> 
> > 
> [...]
> > diff --git a/include/linux/list_sort.h b/include/linux/list_sort.h
> > index 453105f74e05..f7af29073d48 100644
> > --- a/include/linux/list_sort.h
> > +++ b/include/linux/list_sort.h
> > @@ -11,4 +11,7 @@ typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,
> >   __attribute__((nonnull(2,3)))
> >   void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp);
> > +
> > +__attribute__((nonnull(2, 3)))
> > +void list_sort_nonatomic(void *priv, struct list_head *head, list_cmp_func_t cmp);
> >   #endif
> > diff --git a/lib/list_sort.c b/lib/list_sort.c
> > index a310ecb7ccc0..788bfc26cf7b 100644
> > --- a/lib/list_sort.c
> > +++ b/lib/list_sort.c
> > @@ -3,6 +3,7 @@
> >   #include <linux/export.h>
> >   #include <linux/list_sort.h>
> >   #include <linux/list.h>
> > +#include <linux/sched.h>
> >   /*
> >    * Returns a list organized in an intermediate format suited
> > @@ -47,7 +48,7 @@ static struct list_head *merge(void *priv, list_cmp_func_t cmp,
> >    */
> >   __attribute__((nonnull(2,3,4,5)))
> >   static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head,
> > -			struct list_head *a, struct list_head *b)
> > +			struct list_head *a, struct list_head *b, bool may_schedule)
> >   {
> >   	struct list_head *tail = head;
> >   	u8 count = 0;
> > @@ -79,12 +80,11 @@ static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head,
> >   		/*
> >   		 * If the merge is highly unbalanced (e.g. the input is
> >   		 * already sorted), this loop may run many iterations.
> > -		 * Continue callbacks to the client even though no
> > -		 * element comparison is needed, so the client's cmp()
> > -		 * routine can invoke cond_resched() periodically.
> > +		 * If may_schedule is true, periodically invoke cond_resched()
> > +		 * to avoid soft lockups.
> >   		 */
> > -		if (unlikely(!++count))
> > -			cmp(priv, b, b);
> > +		if (may_schedule && unlikely(!++count))
> > +			cond_resched();
> The cond_resched() already has a judgment on whether to schedule out, so the
> 'count' could be removed?

However, I think keeping the u8 count rate-limiter makes more sense
here due to the overhead difference.

Evaluating unlikely(!++count) is essentially a single ALU instruction
(register increment) and a zero-flag check, which has virtually zero
cost. On the other hand, cond_resched() is a macro that does much more
than a simple flag check. Depending on the kernel config, it often
invokes __might_resched() (which reads current to check task_struct
states, irq flags, etc.) and makes a call to __cond_resched().
Evaluating all of this heavy machinery on every single iteration of
such a tight loop would probably introduce noticeable overhead.

Actually, your comment brings up another thought I wanted to discuss.

Since we are introducing list_sort_nonatomic(), I wonder if we should
eventually move the cond_resched() out of UBIFS's cmp() functions
entirely and handle it inside list_sort_nonatomic().

Right now, because the cmp() callback is inherently invoked at every
step of the merge process, UBIFS ends up evaluating the cond_resched()
macro every 3 or 4 pointer assignments during the main sort. While
UBIFS needs to prevent soft lockups on huge lists, checking for resched
at such a micro-granularity still feels excessive and likely leaves
performance on the table.

I didn't make this change in the current patch because I don't have the
proper UBIFS hardware/setup to benchmark the performance difference,
and I wanted to keep the current scheduling frequency exactly the same
to guarantee no regressions. But I'd love to hear your thoughts on
whether reducing the frequency and moving it out of UBIFS's cmp() is
something worth doing in the future.

Regards,
Kuan-Wei

  reply	other threads:[~2026-03-17 12:32 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-15 19:39 [PATCH] lib/list_sort: introduce list_sort_nonatomic() and remove dummy cmp() calls Kuan-Wei Chiu
2026-03-16  7:25 ` Richard Weinberger
2026-03-16 18:04   ` Kuan-Wei Chiu
2026-03-16 21:49     ` Richard Weinberger
2026-03-17 14:22     ` Christoph Hellwig
2026-03-17 14:38       ` Richard Weinberger
2026-03-17 14:40         ` Christoph Hellwig
2026-03-17 16:08           ` Kuan-Wei Chiu
2026-03-17  4:05 ` Zhihao Cheng
2026-03-17 12:32   ` Kuan-Wei Chiu [this message]
2026-03-17 13:22     ` Zhihao Cheng
2026-03-17 14:15       ` Kuan-Wei Chiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ablJ0pe55s1-AkSh@google.com \
    --to=visitorckw@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengzhihao1@huawei.com \
    --cc=eleanor15x@gmail.com \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=marscheng@google.com \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox