From: Kuan-Wei Chiu <visitorckw@gmail.com>
To: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: richard@nod.at, akpm@linux-foundation.org,
jserv@ccns.ncku.edu.tw, eleanor15x@gmail.com,
marscheng@google.com, linux-mtd@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] lib/list_sort: introduce list_sort_nonatomic() and remove dummy cmp() calls
Date: Tue, 17 Mar 2026 22:15:29 +0800 [thread overview]
Message-ID: <abliATuggWn3aCbC@google.com> (raw)
In-Reply-To: <3fec3dbc-2835-e056-4394-d2dcaae3b80a@huawei.com>
Hi Zhihao,
On Tue, Mar 17, 2026 at 09:22:26PM +0800, Zhihao Cheng wrote:
> 在 2026/3/17 20:32, Kuan-Wei Chiu 写道:
> > Hi Zhihao,
> >
> > On Tue, Mar 17, 2026 at 12:05:54PM +0800, Zhihao Cheng wrote:
> > > 在 2026/3/16 3:39, Kuan-Wei Chiu 写道:
> > > > Historically, list_sort() implemented a hack in merge_final():
> > > > if (unlikely(!++count))
> > > > cmp(priv, b, b);
> > > >
> > > > This was designed specifically so that callers could periodically
> > > > invoke cond_resched() within their comparison functions when merging
> > > > highly unbalanced lists.
> > > >
> > > > However, an audit of the kernel tree reveals that only fs/ubifs/ relies
> > > > on this mechanism. For the vast majority of list_sort() users (such as
> > > > block layer IO schedulers and file systems), this results in completely
> > > > wasted function calls. In the worst-case scenario (merging an already
> > > > sorted list where 'a' is exhausted quickly), this results in
> > > > approximately (N/2)/256 unnecessary cmp() calls.
> > > >
> > > > To clean up this API while ensuring behavior compatibility:
> > > > 1. Introduce list_sort_nonatomic(), which explicitly calls
> > > > cond_resched() internally when count overflows.
> > > > 2. Remove the dummy cmp(priv, b, b) fallback for standard list_sort(),
> > > > saving unnecessary function calls and improving determinism.
> > > > 3. Convert the sole user (fs/ubifs/) to the new API.
> > > >
> > > > Note that ubifs still maintains cond_resched() inside its own
> > > > comparison functions. This patch does not alter the frequency or timing
> > > > of those scheduling points, guaranteeing no regressions for ubifs,
> > > > while benefiting all other kernel users.
> > > >
> > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > > ---
> > > > fs/ubifs/gc.c | 4 +-
> > > > fs/ubifs/replay.c | 2 +-
> > > > include/linux/list_sort.h | 3 +
> > > > lib/list_sort.c | 166 +++++++++++++++++++++-----------------
> > > > 4 files changed, 100 insertions(+), 75 deletions(-)
> > >
> > > lgtm for UBIFS.
> > >
> > > Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
> >
> > Thanks for your review!
> >
> > >
> > > one small nit below.
> > >
> > > >
> > > [...]
> > > > diff --git a/include/linux/list_sort.h b/include/linux/list_sort.h
> > > > index 453105f74e05..f7af29073d48 100644
> > > > --- a/include/linux/list_sort.h
> > > > +++ b/include/linux/list_sort.h
> > > > @@ -11,4 +11,7 @@ typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,
> > > > __attribute__((nonnull(2,3)))
> > > > void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp);
> > > > +
> > > > +__attribute__((nonnull(2, 3)))
> > > > +void list_sort_nonatomic(void *priv, struct list_head *head, list_cmp_func_t cmp);
> > > > #endif
> > > > diff --git a/lib/list_sort.c b/lib/list_sort.c
> > > > index a310ecb7ccc0..788bfc26cf7b 100644
> > > > --- a/lib/list_sort.c
> > > > +++ b/lib/list_sort.c
> > > > @@ -3,6 +3,7 @@
> > > > #include <linux/export.h>
> > > > #include <linux/list_sort.h>
> > > > #include <linux/list.h>
> > > > +#include <linux/sched.h>
> > > > /*
> > > > * Returns a list organized in an intermediate format suited
> > > > @@ -47,7 +48,7 @@ static struct list_head *merge(void *priv, list_cmp_func_t cmp,
> > > > */
> > > > __attribute__((nonnull(2,3,4,5)))
> > > > static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head,
> > > > - struct list_head *a, struct list_head *b)
> > > > + struct list_head *a, struct list_head *b, bool may_schedule)
> > > > {
> > > > struct list_head *tail = head;
> > > > u8 count = 0;
> > > > @@ -79,12 +80,11 @@ static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head,
> > > > /*
> > > > * If the merge is highly unbalanced (e.g. the input is
> > > > * already sorted), this loop may run many iterations.
> > > > - * Continue callbacks to the client even though no
> > > > - * element comparison is needed, so the client's cmp()
> > > > - * routine can invoke cond_resched() periodically.
> > > > + * If may_schedule is true, periodically invoke cond_resched()
> > > > + * to avoid soft lockups.
> > > > */
> > > > - if (unlikely(!++count))
> > > > - cmp(priv, b, b);
> > > > + if (may_schedule && unlikely(!++count))
> > > > + cond_resched();
> > > The cond_resched() already has a judgment on whether to schedule out, so the
> > > 'count' could be removed?
> >
> > However, I think keeping the u8 count rate-limiter makes more sense
> > here due to the overhead difference.
> >
> > Evaluating unlikely(!++count) is essentially a single ALU instruction
> > (register increment) and a zero-flag check, which has virtually zero
> > cost. On the other hand, cond_resched() is a macro that does much more
> > than a simple flag check. Depending on the kernel config, it often
> > invokes __might_resched() (which reads current to check task_struct
> > states, irq flags, etc.) and makes a call to __cond_resched().
> > Evaluating all of this heavy machinery on every single iteration of
> > such a tight loop would probably introduce noticeable overhead.
> >
> > Actually, your comment brings up another thought I wanted to discuss.
> >
> > Since we are introducing list_sort_nonatomic(), I wonder if we should
> > eventually move the cond_resched() out of UBIFS's cmp() functions
> > entirely and handle it inside list_sort_nonatomic().
> >
> > Right now, because the cmp() callback is inherently invoked at every
> > step of the merge process, UBIFS ends up evaluating the cond_resched()
> > macro every 3 or 4 pointer assignments during the main sort. While
> > UBIFS needs to prevent soft lockups on huge lists, checking for resched
> > at such a micro-granularity still feels excessive and likely leaves
> > performance on the table.
>
> In my humble opinion, I don't think frequent 'cond_resched' calling will
> bring observable performance impact, and there are many examples in kernel
> hotspot paths(eg.
> blk_mq_prealloc_tag_set_tags/blk_rq_poll_completion/__blk_mq_alloc_rq_maps
> ...). For list_sort(), I prefer the aim of code cleanup is to make the code
> more readable. I am neutral on code cleanup for the current implementation
> of list_sort.
OK. Since this only affects UBIFS, if performance isn't a concern for
UBIFS, I'll drop the !++count check in v2.
Additionally, I plan to remove cond_resched() from UBIFS's cmp()
functions in v2 and move it directly into merge() and merge_final()
inside list_sort. I think this will make the code much more readable
compared to hiding the scheduling points inside the comparison
callbacks.
Regards,
Kuan-Wei
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/
WARNING: multiple messages have this Message-ID (diff)
From: Kuan-Wei Chiu <visitorckw@gmail.com>
To: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: richard@nod.at, akpm@linux-foundation.org,
jserv@ccns.ncku.edu.tw, eleanor15x@gmail.com,
marscheng@google.com, linux-mtd@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] lib/list_sort: introduce list_sort_nonatomic() and remove dummy cmp() calls
Date: Tue, 17 Mar 2026 22:15:29 +0800 [thread overview]
Message-ID: <abliATuggWn3aCbC@google.com> (raw)
In-Reply-To: <3fec3dbc-2835-e056-4394-d2dcaae3b80a@huawei.com>
Hi Zhihao,
On Tue, Mar 17, 2026 at 09:22:26PM +0800, Zhihao Cheng wrote:
> 在 2026/3/17 20:32, Kuan-Wei Chiu 写道:
> > Hi Zhihao,
> >
> > On Tue, Mar 17, 2026 at 12:05:54PM +0800, Zhihao Cheng wrote:
> > > 在 2026/3/16 3:39, Kuan-Wei Chiu 写道:
> > > > Historically, list_sort() implemented a hack in merge_final():
> > > > if (unlikely(!++count))
> > > > cmp(priv, b, b);
> > > >
> > > > This was designed specifically so that callers could periodically
> > > > invoke cond_resched() within their comparison functions when merging
> > > > highly unbalanced lists.
> > > >
> > > > However, an audit of the kernel tree reveals that only fs/ubifs/ relies
> > > > on this mechanism. For the vast majority of list_sort() users (such as
> > > > block layer IO schedulers and file systems), this results in completely
> > > > wasted function calls. In the worst-case scenario (merging an already
> > > > sorted list where 'a' is exhausted quickly), this results in
> > > > approximately (N/2)/256 unnecessary cmp() calls.
> > > >
> > > > To clean up this API while ensuring behavior compatibility:
> > > > 1. Introduce list_sort_nonatomic(), which explicitly calls
> > > > cond_resched() internally when count overflows.
> > > > 2. Remove the dummy cmp(priv, b, b) fallback for standard list_sort(),
> > > > saving unnecessary function calls and improving determinism.
> > > > 3. Convert the sole user (fs/ubifs/) to the new API.
> > > >
> > > > Note that ubifs still maintains cond_resched() inside its own
> > > > comparison functions. This patch does not alter the frequency or timing
> > > > of those scheduling points, guaranteeing no regressions for ubifs,
> > > > while benefiting all other kernel users.
> > > >
> > > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > > ---
> > > > fs/ubifs/gc.c | 4 +-
> > > > fs/ubifs/replay.c | 2 +-
> > > > include/linux/list_sort.h | 3 +
> > > > lib/list_sort.c | 166 +++++++++++++++++++++-----------------
> > > > 4 files changed, 100 insertions(+), 75 deletions(-)
> > >
> > > lgtm for UBIFS.
> > >
> > > Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
> >
> > Thanks for your review!
> >
> > >
> > > one small nit below.
> > >
> > > >
> > > [...]
> > > > diff --git a/include/linux/list_sort.h b/include/linux/list_sort.h
> > > > index 453105f74e05..f7af29073d48 100644
> > > > --- a/include/linux/list_sort.h
> > > > +++ b/include/linux/list_sort.h
> > > > @@ -11,4 +11,7 @@ typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,
> > > > __attribute__((nonnull(2,3)))
> > > > void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp);
> > > > +
> > > > +__attribute__((nonnull(2, 3)))
> > > > +void list_sort_nonatomic(void *priv, struct list_head *head, list_cmp_func_t cmp);
> > > > #endif
> > > > diff --git a/lib/list_sort.c b/lib/list_sort.c
> > > > index a310ecb7ccc0..788bfc26cf7b 100644
> > > > --- a/lib/list_sort.c
> > > > +++ b/lib/list_sort.c
> > > > @@ -3,6 +3,7 @@
> > > > #include <linux/export.h>
> > > > #include <linux/list_sort.h>
> > > > #include <linux/list.h>
> > > > +#include <linux/sched.h>
> > > > /*
> > > > * Returns a list organized in an intermediate format suited
> > > > @@ -47,7 +48,7 @@ static struct list_head *merge(void *priv, list_cmp_func_t cmp,
> > > > */
> > > > __attribute__((nonnull(2,3,4,5)))
> > > > static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head,
> > > > - struct list_head *a, struct list_head *b)
> > > > + struct list_head *a, struct list_head *b, bool may_schedule)
> > > > {
> > > > struct list_head *tail = head;
> > > > u8 count = 0;
> > > > @@ -79,12 +80,11 @@ static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head,
> > > > /*
> > > > * If the merge is highly unbalanced (e.g. the input is
> > > > * already sorted), this loop may run many iterations.
> > > > - * Continue callbacks to the client even though no
> > > > - * element comparison is needed, so the client's cmp()
> > > > - * routine can invoke cond_resched() periodically.
> > > > + * If may_schedule is true, periodically invoke cond_resched()
> > > > + * to avoid soft lockups.
> > > > */
> > > > - if (unlikely(!++count))
> > > > - cmp(priv, b, b);
> > > > + if (may_schedule && unlikely(!++count))
> > > > + cond_resched();
> > > The cond_resched() already has a judgment on whether to schedule out, so the
> > > 'count' could be removed?
> >
> > However, I think keeping the u8 count rate-limiter makes more sense
> > here due to the overhead difference.
> >
> > Evaluating unlikely(!++count) is essentially a single ALU instruction
> > (register increment) and a zero-flag check, which has virtually zero
> > cost. On the other hand, cond_resched() is a macro that does much more
> > than a simple flag check. Depending on the kernel config, it often
> > invokes __might_resched() (which reads current to check task_struct
> > states, irq flags, etc.) and makes a call to __cond_resched().
> > Evaluating all of this heavy machinery on every single iteration of
> > such a tight loop would probably introduce noticeable overhead.
> >
> > Actually, your comment brings up another thought I wanted to discuss.
> >
> > Since we are introducing list_sort_nonatomic(), I wonder if we should
> > eventually move the cond_resched() out of UBIFS's cmp() functions
> > entirely and handle it inside list_sort_nonatomic().
> >
> > Right now, because the cmp() callback is inherently invoked at every
> > step of the merge process, UBIFS ends up evaluating the cond_resched()
> > macro every 3 or 4 pointer assignments during the main sort. While
> > UBIFS needs to prevent soft lockups on huge lists, checking for resched
> > at such a micro-granularity still feels excessive and likely leaves
> > performance on the table.
>
> In my humble opinion, I don't think frequent 'cond_resched' calling will
> bring observable performance impact, and there are many examples in kernel
> hotspot paths(eg.
> blk_mq_prealloc_tag_set_tags/blk_rq_poll_completion/__blk_mq_alloc_rq_maps
> ...). For list_sort(), I prefer the aim of code cleanup is to make the code
> more readable. I am neutral on code cleanup for the current implementation
> of list_sort.
OK. Since this only affects UBIFS, if performance isn't a concern for
UBIFS, I'll drop the !++count check in v2.
Additionally, I plan to remove cond_resched() from UBIFS's cmp()
functions in v2 and move it directly into merge() and merge_final()
inside list_sort. I think this will make the code much more readable
compared to hiding the scheduling points inside the comparison
callbacks.
Regards,
Kuan-Wei
next prev parent reply other threads:[~2026-03-17 14:15 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-15 19:39 [PATCH] lib/list_sort: introduce list_sort_nonatomic() and remove dummy cmp() calls Kuan-Wei Chiu
2026-03-15 19:39 ` Kuan-Wei Chiu
2026-03-16 7:25 ` Richard Weinberger
2026-03-16 7:25 ` Richard Weinberger
2026-03-16 18:04 ` Kuan-Wei Chiu
2026-03-16 18:04 ` Kuan-Wei Chiu
2026-03-16 21:49 ` Richard Weinberger
2026-03-16 21:49 ` Richard Weinberger
2026-03-17 14:22 ` Christoph Hellwig
2026-03-17 14:22 ` Christoph Hellwig
2026-03-17 14:38 ` Richard Weinberger
2026-03-17 14:38 ` Richard Weinberger
2026-03-17 14:40 ` Christoph Hellwig
2026-03-17 14:40 ` Christoph Hellwig
2026-03-17 16:08 ` Kuan-Wei Chiu
2026-03-17 16:08 ` Kuan-Wei Chiu
2026-03-17 4:05 ` Zhihao Cheng
2026-03-17 4:05 ` Zhihao Cheng
2026-03-17 12:32 ` Kuan-Wei Chiu
2026-03-17 12:32 ` Kuan-Wei Chiu
2026-03-17 13:22 ` Zhihao Cheng
2026-03-17 13:22 ` Zhihao Cheng
2026-03-17 14:15 ` Kuan-Wei Chiu [this message]
2026-03-17 14:15 ` Kuan-Wei Chiu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abliATuggWn3aCbC@google.com \
--to=visitorckw@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chengzhihao1@huawei.com \
--cc=eleanor15x@gmail.com \
--cc=jserv@ccns.ncku.edu.tw \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mtd@lists.infradead.org \
--cc=marscheng@google.com \
--cc=richard@nod.at \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.