* Re: [PATCH v2] scsi: bsg: read io_uring command fields once
From: Yang Xiuwei @ 2026-06-25 3:25 UTC (permalink / raw)
To: James E . J . Bottomley, Martin K . Petersen
Cc: Yang Xiuwei, Rahul Chandelkar, Jens Axboe, FUJITA Tomonori,
linux-scsi, linux-block, io-uring, Bart Van Assche,
Caleb Sander Mateos
In-Reply-To: <20260527191817.142769-1-rc@rexion.ai>
Hi James, Martin,
Friendly ping on v2 — anything else needed before pick-up?
Thanks,
Yang Xiuwei
^ permalink raw reply
* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: Kaitao Cheng @ 2026-06-25 3:01 UTC (permalink / raw)
To: David Laight, Christian König, Jani Nikula,
David Hildenbrand (Arm), Alexei Starovoitov
Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
Alexander Viro, Christian Brauner, Daniel Borkmann,
Andrii Nakryiko, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Juri Lelli, Vincent Guittot, Paul Moore, Andy Shevchenko,
Paul E. McKenney, Shakeel Butt, David Howells, Simona Vetter,
Randy Dunlap, Luca Ceresoli, Philipp Stanner, linux-block,
linux-kernel, cgroups, linux-ntfs-dev, linux-fsdevel, io-uring,
audit, bpf, netdev, dri-devel, linux-perf-users,
linux-trace-kernel, kexec, live-patching, linux-modules,
linux-crypto, linux-pm, rcu, sched-ext, linux-mm, virtualization,
damon, llvm, Kaitao Cheng, Muchun Song
In-Reply-To: <20260624152324.3def88ce@pumpkin>
在 2026/6/24 22:23, David Laight 写道:
> On Wed, 24 Jun 2026 15:23:47 +0200
> Christian König <christian.koenig@amd.com> wrote:
>> On 6/24/26 15:14, Kaitao Cheng wrote:
>>> 在 2026/6/22 16:42, David Laight 写道:
>>>> On Mon, 22 Jun 2026 12:05:31 +0800
>>>> Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>>>
>>>>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>>
>>>>> The list_for_each*_safe() helpers are used when the loop body may
>>>>> remove the current entry. Their API exposes the temporary cursor at
>>>>> every call site, even though most users only need it for the iterator
>>>>> implementation and never reference it in the loop body.
>>>>>
>>>>> Add *_mutable() variants for list and hlist iteration. The new helpers
>>>>> support both forms: callers may keep passing an explicit temporary cursor
>>>>> when they need to inspect or reset it, or omit it and let the helper use
>>>>> a unique internal cursor.
>>>>
>>>> I'm not really sure 'mutable' means anything either.
>>>> It is possible to make it valid for the loop body (or even other threads)
>>>> to delete arbitrary list items - but that needs significant extra overheads.
>>>>
>>>> It might be worth doing something that doesn't need the extra variable,
>>>> but there is little point doing all the churn just to rename things.
>>>>
>>>>>
>>>>> This makes call sites that only mutate the list through the current entry
>>>>> less noisy, while keeping the existing *_safe() helpers available for
>>>>> compatibility.
>>>>>
>>>>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>>> ---
>>>>> include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
>>>>> 1 file changed, 231 insertions(+), 38 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/list.h b/include/linux/list.h
>>>>> index 09d979976b3b..1081def7cea9 100644
>>>>> --- a/include/linux/list.h
>>>>> +++ b/include/linux/list.h
>>>>> @@ -7,6 +7,7 @@
>>>>> #include <linux/stddef.h>
>>>>> #include <linux/poison.h>
>>>>> #include <linux/const.h>
>>>>> +#include <linux/args.h>
>>>>>
>>>>> #include <asm/barrier.h>
>>>>>
>>>>> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
>>>>> #define list_for_each_prev(pos, head) \
>>>>> for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>>>>>
>>>>> -/**
>>>>> - * list_for_each_safe - iterate over a list safe against removal of list entry
>>>>> - * @pos: the &struct list_head to use as a loop cursor.
>>>>> - * @n: another &struct list_head to use as temporary storage
>>>>> - * @head: the head for your list.
>>>>> +/*
>>>>> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
>>>>> */
>>>>> #define list_for_each_safe(pos, n, head) \
>>>>> for (pos = (head)->next, n = pos->next; \
>>>>> !list_is_head(pos, (head)); \
>>>>> pos = n, n = pos->next)
>>>>>
>>>>> +#define __list_for_each_mutable_internal(pos, tmp, head) \
>>>>> + for (typeof(pos) tmp = (pos = (head)->next)->next; \
>>>>
>>>> Use auto
>>>>
>>>>> + !list_is_head(pos, (head)); \
>>>>> + pos = tmp, tmp = pos->next)
>>>>> +
>>>>> +#define __list_for_each_mutable1(pos, head) \
>>>>> + __list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
>>>>> +
>>>>> +#define __list_for_each_mutable2(pos, next, head) \
>>>>> + list_for_each_safe(pos, next, head)
>>>>> +
>>>>> /**
>>>>> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
>>>>> + * list_for_each_mutable - iterate over a list safe against entry removal
>>>>> * @pos: the &struct list_head to use as a loop cursor.
>>>>> - * @n: another &struct list_head to use as temporary storage
>>>>> - * @head: the head for your list.
>>>>> + * @...: either (head) or (next, head)
>>>>> + *
>>>>> + * next: another &struct list_head to use as optional temporary storage.
>>>>> + * The temporary cursor is internal unless explicitly supplied by
>>>>> + * the caller.
>>>>> + * head: the head for your list.
>>>>> + */
>>>>> +#define list_for_each_mutable(pos, ...) \
>>>>> + CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
>>>>> + (pos, __VA_ARGS__)
>>>>
>>>> The variable argument count logic really just slows down compilation.
>>>> Maybe there aren't enough copies of this code to make that significant.
>>>> But just because you can do it doesn't mean it is a gooD idea.
>>>> I'm also not sure it really adds anything to the readability.
>>>>
>>>> And, it you are going to make the middle argument optional there is
>>>> no need to change the macro name.
>>>
>>> Christian König and Jani Nikula also disagree with the variadic-argument
>>> implementation approach. If we abandon that method, it means we will
>>> inevitably need to add some new macros. If mutable is not a good name,
>>> suggestions for better alternatives would be welcome; coming up with a
>>> suitable name is indeed rather tricky.
>>
>> I don't think you need to add a new macro for the specific use case that people want to modify the next element of the iteration.
>>
>> If I remember your numbers correctly that is a really corner case and keeping using the existing *_safe() macros for that sounds perfectly fine to me.
>
> IIRC currently you have a choice of either:
> define Item that can't be deleted
> list_for_each() The current item.
> list_for_each_safe() The next item.
> There is also likely to be code that updates the variables to allow
> for other scenarios.
>
> Note that if increase a reference count and release a lock then list_for_each()
> is likely safer than list_for_each_safe() :-)
>
> list.h has 9 variants of the 'safe' loop.
> The bloat of another 9 is getting excessive.
>
> It has to be said that this is one of my least favourite type of list...
Hi Christian König, David Laight, Jani Nikula, David Hildenbrand,
Andy Shevchenko, Alexei Starovoitov
For ease of discussion, I need to summarize the currently possible
approaches and briefly describe their respective pros and cons,
using the list_for_each_entry* interfaces as examples.
1. Add list_for_each_entry_mutable, while keeping list_for_each_entry
and list_for_each_entry_safe unchanged. list_for_each_entry_mutable
would be used specifically for safe deletion scenarios that do not
need to expose the temporary cursor externally. The code can refer to
the v1 version.
Pros: Does not depend on immediate per-subsystem adaptation and can be
merged directly.
Cons: Requires adding a whole set of mutable interfaces, which makes the
code somewhat redundant.
2. Directly optimize away the temporary cursor in list_for_each_entry_safe
and define it inside the loop instead, changing the interface from four
arguments to three.
Pros: Does not add redundant interfaces.
Cons: (1) Users need to manually update special cases that use the
traversal variable of list_for_each_entry_safe, the new
list_for_each_entry_safe would no longer apply there and would
need to be open-coded.
(2) Because the macro arguments changes, all list_for_each_entry_safe
callers would need to be modified and merged together, making it
difficult to merge such a large amount of code at once.
3. Use a variadic macro approach to optimize list_for_each_entry_safe,
so that it supports both three and four arguments.
Pros: (1) Does not add redundant interfaces.
(2) Does not depend on immediate per-subsystem adaptation and can
be merged directly.
Cons: (1) Increases compile time.
(2) Makes the interface harder for users to use.
4. Optimize list_for_each_entry by defining the temporary cursor internally,
making it compatible with the functionality of list_for_each_entry_safe.
The code can refer to the v2 version.
Pros: (1) Does not add redundant interfaces.
(2) The number of externally visible arguments of list_for_each_entry
remains unchanged, still three.
Cons: (1) list_for_each_entry and list_for_each_entry_safe would be merged
into one, and list_for_each_entry_safe would gradually be deprecated.
(2) Users need to manually update special cases that use the traversal
variable of list_for_each_entry, the new list_for_each_entry would no
longer apply there and would need to be open-coded. There are 15 such
cases in total.
5. Use a variadic macro approach to optimize list_for_each_entry, so that
it supports both three and four arguments.
Pros: (1) Does not add redundant interfaces.
(2) Does not depend on immediate per-subsystem adaptation and can be
merged directly.
Cons: (1) Increases compile time.
(2) list_for_each_entry and list_for_each_entry_safe would be merged
into one, and list_for_each_entry_safe would gradually be deprecated.
6. Make no changes, keep the current logic unchanged, and close the current
email discussion.
Which of the six solutions above do people prefer?
--
Thanks
Kaitao Cheng
^ permalink raw reply
* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: David Laight @ 2026-06-24 14:23 UTC (permalink / raw)
To: Christian König
Cc: Kaitao Cheng, Andrew Morton, David Hildenbrand, Jens Axboe,
Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Andy Shevchenko, Paul E. McKenney, Shakeel Butt, David Howells,
Simona Vetter, Randy Dunlap, Luca Ceresoli, Philipp Stanner,
linux-block, linux-kernel, cgroups, linux-ntfs-dev, linux-fsdevel,
io-uring, audit, bpf, netdev, dri-devel, linux-perf-users,
linux-trace-kernel, kexec, live-patching, linux-modules,
linux-crypto, linux-pm, rcu, sched-ext, linux-mm, virtualization,
damon, llvm, Kaitao Cheng
In-Reply-To: <cf8467c7-b98f-44a5-9cf9-60b43b5da711@amd.com>
On Wed, 24 Jun 2026 15:23:47 +0200
Christian König <christian.koenig@amd.com> wrote:
> On 6/24/26 15:14, Kaitao Cheng wrote:
> >
> >
> > 在 2026/6/22 16:42, David Laight 写道:
> >> On Mon, 22 Jun 2026 12:05:31 +0800
> >> Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
> >>
> >>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
> >>>
> >>> The list_for_each*_safe() helpers are used when the loop body may
> >>> remove the current entry. Their API exposes the temporary cursor at
> >>> every call site, even though most users only need it for the iterator
> >>> implementation and never reference it in the loop body.
> >>>
> >>> Add *_mutable() variants for list and hlist iteration. The new helpers
> >>> support both forms: callers may keep passing an explicit temporary cursor
> >>> when they need to inspect or reset it, or omit it and let the helper use
> >>> a unique internal cursor.
> >>
> >> I'm not really sure 'mutable' means anything either.
> >> It is possible to make it valid for the loop body (or even other threads)
> >> to delete arbitrary list items - but that needs significant extra overheads.
> >>
> >> It might be worth doing something that doesn't need the extra variable,
> >> but there is little point doing all the churn just to rename things.
> >>
> >>>
> >>> This makes call sites that only mutate the list through the current entry
> >>> less noisy, while keeping the existing *_safe() helpers available for
> >>> compatibility.
> >>>
> >>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
> >>> ---
> >>> include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
> >>> 1 file changed, 231 insertions(+), 38 deletions(-)
> >>>
> >>> diff --git a/include/linux/list.h b/include/linux/list.h
> >>> index 09d979976b3b..1081def7cea9 100644
> >>> --- a/include/linux/list.h
> >>> +++ b/include/linux/list.h
> >>> @@ -7,6 +7,7 @@
> >>> #include <linux/stddef.h>
> >>> #include <linux/poison.h>
> >>> #include <linux/const.h>
> >>> +#include <linux/args.h>
> >>>
> >>> #include <asm/barrier.h>
> >>>
> >>> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
> >>> #define list_for_each_prev(pos, head) \
> >>> for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
> >>>
> >>> -/**
> >>> - * list_for_each_safe - iterate over a list safe against removal of list entry
> >>> - * @pos: the &struct list_head to use as a loop cursor.
> >>> - * @n: another &struct list_head to use as temporary storage
> >>> - * @head: the head for your list.
> >>> +/*
> >>> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
> >>> */
> >>> #define list_for_each_safe(pos, n, head) \
> >>> for (pos = (head)->next, n = pos->next; \
> >>> !list_is_head(pos, (head)); \
> >>> pos = n, n = pos->next)
> >>>
> >>> +#define __list_for_each_mutable_internal(pos, tmp, head) \
> >>> + for (typeof(pos) tmp = (pos = (head)->next)->next; \
> >>
> >> Use auto
> >>
> >>> + !list_is_head(pos, (head)); \
> >>> + pos = tmp, tmp = pos->next)
> >>> +
> >>> +#define __list_for_each_mutable1(pos, head) \
> >>> + __list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
> >>> +
> >>> +#define __list_for_each_mutable2(pos, next, head) \
> >>> + list_for_each_safe(pos, next, head)
> >>> +
> >>> /**
> >>> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
> >>> + * list_for_each_mutable - iterate over a list safe against entry removal
> >>> * @pos: the &struct list_head to use as a loop cursor.
> >>> - * @n: another &struct list_head to use as temporary storage
> >>> - * @head: the head for your list.
> >>> + * @...: either (head) or (next, head)
> >>> + *
> >>> + * next: another &struct list_head to use as optional temporary storage.
> >>> + * The temporary cursor is internal unless explicitly supplied by
> >>> + * the caller.
> >>> + * head: the head for your list.
> >>> + */
> >>> +#define list_for_each_mutable(pos, ...) \
> >>> + CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
> >>> + (pos, __VA_ARGS__)
> >>
> >> The variable argument count logic really just slows down compilation.
> >> Maybe there aren't enough copies of this code to make that significant.
> >> But just because you can do it doesn't mean it is a gooD idea.
> >> I'm also not sure it really adds anything to the readability.
> >>
> >> And, it you are going to make the middle argument optional there is
> >> no need to change the macro name.
> >
> > Christian König and Jani Nikula also disagree with the variadic-argument
> > implementation approach. If we abandon that method, it means we will
> > inevitably need to add some new macros. If mutable is not a good name,
> > suggestions for better alternatives would be welcome; coming up with a
> > suitable name is indeed rather tricky.
>
> I don't think you need to add a new macro for the specific use case that people want to modify the next element of the iteration.
>
> If I remember your numbers correctly that is a really corner case and keeping using the existing *_safe() macros for that sounds perfectly fine to me.
IIRC currently you have a choice of either:
define Item that can't be deleted
list_for_each() The current item.
list_for_each_safe() The next item.
There is also likely to be code that updates the variables to allow
for other scenarios.
Note that if increase a reference count and release a lock then list_for_each()
is likely safer than list_for_each_safe() :-)
list.h has 9 variants of the 'safe' loop.
The bloat of another 9 is getting excessive.
It has to be said that this is one of my least favourite type of list...
David
>
> Regards,
> Christian.
^ permalink raw reply
* Re: [PATCH v3] io_uring: annotate remote tasks for kcoverage
From: Jens Axboe @ 2026-06-24 14:16 UTC (permalink / raw)
To: Jann Horn, robert; +Cc: io-uring, Dmitry Vyukov, Andrey Konovalov, kasan-dev
In-Reply-To: <CAG48ez02Sio8ZENVK3gUWM+8j6NgG9LxtnDV=v+FSqsqs_KfnA@mail.gmail.com>
On 6/23/26 10:37 AM, Jann Horn wrote:
> On Tue, May 26, 2026 at 6:49 PM Robert Femmer <robert@fmmr.tech> wrote:
>> Fuzzers use coverage information to guide generation of test cases
>> towards new or interesting code paths. Syzkaller, specifically, makes
>> use kcoverage (CONFIG_KCOV). Coverage information is not collected for
>> kernel tasks unless annotated by kcov_remote_start and kcov_remote_stop.
>> This patch annotates io-uring's work queue and sqpoll tasks.
>
> I think this is a useful change overall.
Agree, mostly waiting on Andrey and Robert to hash out the details and
we can get this landed for 7.3. On vacation the next weeks, not much
going on on my end, work wise.
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: Christian König @ 2026-06-24 13:23 UTC (permalink / raw)
To: Kaitao Cheng, David Laight
Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Andy Shevchenko, Paul E. McKenney, Shakeel Butt, David Howells,
Simona Vetter, Randy Dunlap, Luca Ceresoli, Philipp Stanner,
linux-block, linux-kernel, cgroups, linux-ntfs-dev, linux-fsdevel,
io-uring, audit, bpf, netdev, dri-devel, linux-perf-users,
linux-trace-kernel, kexec, live-patching, linux-modules,
linux-crypto, linux-pm, rcu, sched-ext, linux-mm, virtualization,
damon, llvm, Kaitao Cheng
In-Reply-To: <351a6b67-b394-4c58-aee2-88b6c8089ad5@linux.dev>
On 6/24/26 15:14, Kaitao Cheng wrote:
>
>
> 在 2026/6/22 16:42, David Laight 写道:
>> On Mon, 22 Jun 2026 12:05:31 +0800
>> Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>
>>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>>>
>>> The list_for_each*_safe() helpers are used when the loop body may
>>> remove the current entry. Their API exposes the temporary cursor at
>>> every call site, even though most users only need it for the iterator
>>> implementation and never reference it in the loop body.
>>>
>>> Add *_mutable() variants for list and hlist iteration. The new helpers
>>> support both forms: callers may keep passing an explicit temporary cursor
>>> when they need to inspect or reset it, or omit it and let the helper use
>>> a unique internal cursor.
>>
>> I'm not really sure 'mutable' means anything either.
>> It is possible to make it valid for the loop body (or even other threads)
>> to delete arbitrary list items - but that needs significant extra overheads.
>>
>> It might be worth doing something that doesn't need the extra variable,
>> but there is little point doing all the churn just to rename things.
>>
>>>
>>> This makes call sites that only mutate the list through the current entry
>>> less noisy, while keeping the existing *_safe() helpers available for
>>> compatibility.
>>>
>>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
>>> ---
>>> include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
>>> 1 file changed, 231 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/include/linux/list.h b/include/linux/list.h
>>> index 09d979976b3b..1081def7cea9 100644
>>> --- a/include/linux/list.h
>>> +++ b/include/linux/list.h
>>> @@ -7,6 +7,7 @@
>>> #include <linux/stddef.h>
>>> #include <linux/poison.h>
>>> #include <linux/const.h>
>>> +#include <linux/args.h>
>>>
>>> #include <asm/barrier.h>
>>>
>>> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
>>> #define list_for_each_prev(pos, head) \
>>> for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>>>
>>> -/**
>>> - * list_for_each_safe - iterate over a list safe against removal of list entry
>>> - * @pos: the &struct list_head to use as a loop cursor.
>>> - * @n: another &struct list_head to use as temporary storage
>>> - * @head: the head for your list.
>>> +/*
>>> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
>>> */
>>> #define list_for_each_safe(pos, n, head) \
>>> for (pos = (head)->next, n = pos->next; \
>>> !list_is_head(pos, (head)); \
>>> pos = n, n = pos->next)
>>>
>>> +#define __list_for_each_mutable_internal(pos, tmp, head) \
>>> + for (typeof(pos) tmp = (pos = (head)->next)->next; \
>>
>> Use auto
>>
>>> + !list_is_head(pos, (head)); \
>>> + pos = tmp, tmp = pos->next)
>>> +
>>> +#define __list_for_each_mutable1(pos, head) \
>>> + __list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
>>> +
>>> +#define __list_for_each_mutable2(pos, next, head) \
>>> + list_for_each_safe(pos, next, head)
>>> +
>>> /**
>>> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
>>> + * list_for_each_mutable - iterate over a list safe against entry removal
>>> * @pos: the &struct list_head to use as a loop cursor.
>>> - * @n: another &struct list_head to use as temporary storage
>>> - * @head: the head for your list.
>>> + * @...: either (head) or (next, head)
>>> + *
>>> + * next: another &struct list_head to use as optional temporary storage.
>>> + * The temporary cursor is internal unless explicitly supplied by
>>> + * the caller.
>>> + * head: the head for your list.
>>> + */
>>> +#define list_for_each_mutable(pos, ...) \
>>> + CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
>>> + (pos, __VA_ARGS__)
>>
>> The variable argument count logic really just slows down compilation.
>> Maybe there aren't enough copies of this code to make that significant.
>> But just because you can do it doesn't mean it is a gooD idea.
>> I'm also not sure it really adds anything to the readability.
>>
>> And, it you are going to make the middle argument optional there is
>> no need to change the macro name.
>
> Christian König and Jani Nikula also disagree with the variadic-argument
> implementation approach. If we abandon that method, it means we will
> inevitably need to add some new macros. If mutable is not a good name,
> suggestions for better alternatives would be welcome; coming up with a
> suitable name is indeed rather tricky.
I don't think you need to add a new macro for the specific use case that people want to modify the next element of the iteration.
If I remember your numbers correctly that is a really corner case and keeping using the existing *_safe() macros for that sounds perfectly fine to me.
Regards,
Christian.
^ permalink raw reply
* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: Kaitao Cheng @ 2026-06-24 13:14 UTC (permalink / raw)
To: David Laight
Cc: Andrew Morton, David Hildenbrand, Jens Axboe, Tejun Heo,
Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
Christian König, David Howells, Simona Vetter, Randy Dunlap,
Luca Ceresoli, Philipp Stanner, linux-block, linux-kernel,
cgroups, linux-ntfs-dev, linux-fsdevel, io-uring, audit, bpf,
netdev, dri-devel, linux-perf-users, linux-trace-kernel, kexec,
live-patching, linux-modules, linux-crypto, linux-pm, rcu,
sched-ext, linux-mm, virtualization, damon, llvm, Kaitao Cheng
In-Reply-To: <20260622094242.64531b9a@pumpkin>
在 2026/6/22 16:42, David Laight 写道:
> On Mon, 22 Jun 2026 12:05:31 +0800
> Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>
>> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>>
>> The list_for_each*_safe() helpers are used when the loop body may
>> remove the current entry. Their API exposes the temporary cursor at
>> every call site, even though most users only need it for the iterator
>> implementation and never reference it in the loop body.
>>
>> Add *_mutable() variants for list and hlist iteration. The new helpers
>> support both forms: callers may keep passing an explicit temporary cursor
>> when they need to inspect or reset it, or omit it and let the helper use
>> a unique internal cursor.
>
> I'm not really sure 'mutable' means anything either.
> It is possible to make it valid for the loop body (or even other threads)
> to delete arbitrary list items - but that needs significant extra overheads.
>
> It might be worth doing something that doesn't need the extra variable,
> but there is little point doing all the churn just to rename things.
>
>>
>> This makes call sites that only mutate the list through the current entry
>> less noisy, while keeping the existing *_safe() helpers available for
>> compatibility.
>>
>> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
>> ---
>> include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
>> 1 file changed, 231 insertions(+), 38 deletions(-)
>>
>> diff --git a/include/linux/list.h b/include/linux/list.h
>> index 09d979976b3b..1081def7cea9 100644
>> --- a/include/linux/list.h
>> +++ b/include/linux/list.h
>> @@ -7,6 +7,7 @@
>> #include <linux/stddef.h>
>> #include <linux/poison.h>
>> #include <linux/const.h>
>> +#include <linux/args.h>
>>
>> #include <asm/barrier.h>
>>
>> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
>> #define list_for_each_prev(pos, head) \
>> for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>>
>> -/**
>> - * list_for_each_safe - iterate over a list safe against removal of list entry
>> - * @pos: the &struct list_head to use as a loop cursor.
>> - * @n: another &struct list_head to use as temporary storage
>> - * @head: the head for your list.
>> +/*
>> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
>> */
>> #define list_for_each_safe(pos, n, head) \
>> for (pos = (head)->next, n = pos->next; \
>> !list_is_head(pos, (head)); \
>> pos = n, n = pos->next)
>>
>> +#define __list_for_each_mutable_internal(pos, tmp, head) \
>> + for (typeof(pos) tmp = (pos = (head)->next)->next; \
>
> Use auto
>
>> + !list_is_head(pos, (head)); \
>> + pos = tmp, tmp = pos->next)
>> +
>> +#define __list_for_each_mutable1(pos, head) \
>> + __list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
>> +
>> +#define __list_for_each_mutable2(pos, next, head) \
>> + list_for_each_safe(pos, next, head)
>> +
>> /**
>> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
>> + * list_for_each_mutable - iterate over a list safe against entry removal
>> * @pos: the &struct list_head to use as a loop cursor.
>> - * @n: another &struct list_head to use as temporary storage
>> - * @head: the head for your list.
>> + * @...: either (head) or (next, head)
>> + *
>> + * next: another &struct list_head to use as optional temporary storage.
>> + * The temporary cursor is internal unless explicitly supplied by
>> + * the caller.
>> + * head: the head for your list.
>> + */
>> +#define list_for_each_mutable(pos, ...) \
>> + CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
>> + (pos, __VA_ARGS__)
>
> The variable argument count logic really just slows down compilation.
> Maybe there aren't enough copies of this code to make that significant.
> But just because you can do it doesn't mean it is a gooD idea.
> I'm also not sure it really adds anything to the readability.
>
> And, it you are going to make the middle argument optional there is
> no need to change the macro name.
Christian König and Jani Nikula also disagree with the variadic-argument
implementation approach. If we abandon that method, it means we will
inevitably need to add some new macros. If mutable is not a good name,
suggestions for better alternatives would be welcome; coming up with a
suitable name is indeed rather tricky.
--
Thanks
Kaitao Cheng
^ permalink raw reply
* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Kaitao Cheng @ 2026-06-24 13:05 UTC (permalink / raw)
To: Jani Nikula, Andrew Morton, David Hildenbrand, Jens Axboe,
Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Andy Shevchenko, Paul E. McKenney, Shakeel Butt,
Christian König
Cc: David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
Philipp Stanner, linux-block, linux-kernel, cgroups,
linux-ntfs-dev, linux-fsdevel, io-uring, audit, bpf, netdev,
dri-devel, linux-perf-users, linux-trace-kernel, kexec,
live-patching, linux-modules, linux-crypto, linux-pm, rcu,
sched-ext, linux-mm, virtualization, damon, llvm, chengkaitao
In-Reply-To: <88f34c7fa5a3d1700cc8005818751d6aa31f09df@intel.com>
在 2026/6/22 16:37, Jani Nikula 写道:
> On Mon, 22 Jun 2026, Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>> Add *_mutable() iterator variants for list, hlist and llist. The new
>> helpers are variadic and support both forms. In the common case, the
>> caller omits the temporary cursor and the macro creates a unique internal
>> cursor with typeof(pos) and __UNIQUE_ID(). If a loop really needs an
>> explicit temporary cursor, the caller can still pass it and the helper
>> keeps the existing *_safe() behaviour.
>>
>> For example, a call site may use the shorter form:
>>
>> list_for_each_entry_mutable(pos, head, member)
>>
>> or keep the explicit temporary cursor form:
>>
>> list_for_each_entry_mutable(pos, tmp, head, member)
>
> I'm unconvinced it's a good idea to allow two forms with macro trickery,
> *especially* when it's not the last argument you can omit. I think it's
> a footgun.
>
> IMO stick with the first form only, and there'll always be the _safe
> variant that can be used when the temp pointer is needed.
Could we go back to the v1 version? What do you think of that
implementation approach?
https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/
--
Thanks
Kaitao Cheng
^ permalink raw reply
* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Kaitao Cheng @ 2026-06-24 12:58 UTC (permalink / raw)
To: David Hildenbrand (Arm), Alexei Starovoitov
Cc: Andrew Morton, Jens Axboe, Tejun Heo, Alexander Viro,
Christian Brauner, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Juri Lelli, Vincent Guittot, Paul Moore, Andy Shevchenko,
Paul E. McKenney, Shakeel Butt, Christian König,
David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
Philipp Stanner, linux-block, LKML,
open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
io-uring, audit, bpf, Network Development, dri-devel,
linux-perf-use., linux-trace-kernel, kexec, live-patching,
linux-modules, Linux Crypto Mailing List, Linux Power Management,
rcu, sched-ext, linux-mm, virtualization, damon,
clang-built-linux, chengkaitao
In-Reply-To: <8f98a3a6-f97b-4673-964f-fb09c8879e2e@kernel.org>
在 2026/6/22 19:27, David Hildenbrand (Arm) 写道:
> On 6/22/26 07:28, Alexei Starovoitov wrote:
>> On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>>
>>> From: chengkaitao <chengkaitao@kylinos.cn>
>>>
>>> The list_for_each*_safe() helpers are used when the loop body may remove
>>> the current entry. Their current interface, however, forces every caller
>>> to define a temporary cursor outside the macro and pass it in, even when
>>> the caller never uses that cursor directly. For most call sites this
>>> extra cursor is just boilerplate required by the macro implementation.
>>>
>>> This is awkward because the saved next pointer is an internal detail of
>>> the iteration. Callers that only remove or move the current entry do not
>>> need to spell it out.
>>>
>>> The _safe() suffix has also caused confusion. Christian Koenig pointed
>>> out that the name is easy to read as a thread-safe variant, especially
>>> for beginners, even though it only means that the iterator keeps enough
>>> state to tolerate removal of the current entry. He suggested _mutable()
>>> as a clearer description of what the loop permits.
>>>
>>> Add *_mutable() iterator variants for list, hlist and llist. The new
>>> helpers are variadic and support both forms. In the common case, the
>>> caller omits the temporary cursor and the macro creates a unique internal
>>> cursor with typeof(pos) and __UNIQUE_ID(). If a loop really needs an
>>> explicit temporary cursor, the caller can still pass it and the helper
>>> keeps the existing *_safe() behaviour.
>>>
>>> For example, a call site may use the shorter form:
>>>
>>> list_for_each_entry_mutable(pos, head, member)
>>>
>>> or keep the explicit temporary cursor form:
>>>
>>> list_for_each_entry_mutable(pos, tmp, head, member)
>>>
>>> The existing *_safe() helpers remain available for compatibility. This
>>> series only converts users in mm, block, kernel, init and io_uring. If
>>> this approach looks acceptable, the remaining users can be converted in
>>> follow-up series.
>>>
>>> Changes in v3 (Christian König, Andy Shevchenko):
>>> - Convert safe list walks to mutable iterators
>>>
>>> Changes in v2 (Muchun Song, Andy Shevchenko):
>>> - Drop the list_for_each_entry_mutable*() helpers from v1 and make the
>>> cursor change directly in the existing list_for_each_entry*() helpers.
>>> - Open-code special list walks that rely on updating the loop cursor in
>>> the body, preserving their existing traversal semantics.
>>>
>>> Link to v2:
>>> https://lore.kernel.org/all/20260609061347.93688-1-kaitao.cheng@linux.dev/
>>>
>>> Link to v1:
>>> https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/
>>>
>>> Kaitao Cheng (7):
>>> list: Add mutable iterator variants
>>> llist: Add mutable iterator variants
>>> mm: Use mutable list iterators
>>> block: Use mutable list iterators
>>> kernel: Use mutable list iterators
>>> initramfs: Use mutable list iterator
>>> io_uring: Use mutable list iterators
>>>
>>> block/bfq-iosched.c | 17 +-
>>> block/blk-cgroup.c | 12 +-
>>> block/blk-flush.c | 4 +-
>>> block/blk-iocost.c | 18 +-
>>> block/blk-mq.c | 8 +-
>>> block/blk-throttle.c | 4 +-
>>> block/kyber-iosched.c | 4 +-
>>> block/partitions/ldm.c | 8 +-
>>> block/sed-opal.c | 4 +-
>>> include/linux/list.h | 269 ++++++++++++++++++++++++----
>>> include/linux/llist.h | 81 +++++++--
>>> init/initramfs.c | 5 +-
>>> io_uring/cancel.c | 6 +-
>>> io_uring/poll.c | 3 +-
>>> io_uring/rw.c | 4 +-
>>> io_uring/timeout.c | 8 +-
>>> io_uring/uring_cmd.c | 3 +-
>>> kernel/audit_tree.c | 4 +-
>>> kernel/audit_watch.c | 16 +-
>>> kernel/auditfilter.c | 4 +-
>>> kernel/auditsc.c | 4 +-
>>> kernel/bpf/arena.c | 10 +-
>>> kernel/bpf/arraymap.c | 8 +-
>>> kernel/bpf/bpf_local_storage.c | 3 +-
>>> kernel/bpf/bpf_lru_list.c | 25 ++-
>>> kernel/bpf/btf.c | 18 +-
>>> kernel/bpf/cgroup.c | 7 +-
>>> kernel/bpf/cpumap.c | 4 +-
>>> kernel/bpf/devmap.c | 10 +-
>>> kernel/bpf/helpers.c | 8 +-
>>> kernel/bpf/local_storage.c | 4 +-
>>> kernel/bpf/memalloc.c | 16 +-
>>> kernel/bpf/offload.c | 8 +-
>>> kernel/bpf/states.c | 4 +-
>>> kernel/bpf/stream.c | 4 +-
>>> kernel/bpf/verifier.c | 6 +-
>>> kernel/cgroup/cgroup-v1.c | 4 +-
>>> kernel/cgroup/cgroup.c | 54 +++---
>>> kernel/cgroup/dmem.c | 12 +-
>>> kernel/cgroup/rdma.c | 8 +-
>>> kernel/events/core.c | 44 +++--
>>> kernel/events/uprobes.c | 12 +-
>>> kernel/exit.c | 8 +-
>>> kernel/fail_function.c | 4 +-
>>> kernel/gcov/clang.c | 4 +-
>>> kernel/irq_work.c | 4 +-
>>> kernel/kexec_core.c | 4 +-
>>> kernel/kprobes.c | 16 +-
>>> kernel/livepatch/core.c | 4 +-
>>> kernel/livepatch/core.h | 4 +-
>>> kernel/liveupdate/kho_block.c | 4 +-
>>> kernel/liveupdate/luo_flb.c | 4 +-
>>> kernel/locking/rwsem.c | 2 +-
>>> kernel/locking/test-ww_mutex.c | 2 +-
>>> kernel/module/main.c | 11 +-
>>> kernel/padata.c | 4 +-
>>> kernel/power/snapshot.c | 8 +-
>>> kernel/power/wakelock.c | 4 +-
>>> kernel/printk/printk.c | 11 +-
>>> kernel/ptrace.c | 4 +-
>>> kernel/rcu/rcutorture.c | 3 +-
>>> kernel/rcu/tasks.h | 9 +-
>>> kernel/rcu/tree.c | 6 +-
>>> kernel/resource.c | 4 +-
>>> kernel/sched/core.c | 4 +-
>>> kernel/sched/ext.c | 22 +--
>>> kernel/sched/fair.c | 28 +--
>>> kernel/sched/topology.c | 4 +-
>>> kernel/sched/wait.c | 4 +-
>>> kernel/seccomp.c | 4 +-
>>> kernel/signal.c | 11 +-
>>> kernel/smp.c | 4 +-
>>> kernel/taskstats.c | 8 +-
>>> kernel/time/clockevents.c | 6 +-
>>> kernel/time/clocksource.c | 4 +-
>>> kernel/time/posix-cpu-timers.c | 4 +-
>>> kernel/time/posix-timers.c | 3 +-
>>> kernel/torture.c | 3 +-
>>> kernel/trace/bpf_trace.c | 4 +-
>>> kernel/trace/ftrace.c | 49 +++--
>>> kernel/trace/ring_buffer.c | 25 ++-
>>> kernel/trace/trace.c | 12 +-
>>> kernel/trace/trace_dynevent.c | 6 +-
>>> kernel/trace/trace_dynevent.h | 5 +-
>>> kernel/trace/trace_events.c | 35 ++--
>>> kernel/trace/trace_events_filter.c | 4 +-
>>> kernel/trace/trace_events_hist.c | 8 +-
>>> kernel/trace/trace_events_trigger.c | 17 +-
>>> kernel/trace/trace_events_user.c | 16 +-
>>> kernel/trace/trace_stat.c | 4 +-
>>> kernel/user-return-notifier.c | 3 +-
>>> kernel/workqueue.c | 16 +-
>>> mm/backing-dev.c | 8 +-
>>> mm/balloon.c | 8 +-
>>> mm/cma.c | 4 +-
>>> mm/compaction.c | 4 +-
>>> mm/damon/core.c | 4 +-
>>> mm/damon/sysfs-schemes.c | 4 +-
>>> mm/dmapool.c | 4 +-
>>> mm/huge_memory.c | 8 +-
>>> mm/hugetlb.c | 56 +++---
>>> mm/hugetlb_vmemmap.c | 16 +-
>>> mm/khugepaged.c | 14 +-
>>> mm/kmemleak.c | 7 +-
>>> mm/ksm.c | 25 +--
>>> mm/list_lru.c | 4 +-
>>> mm/memcontrol-v1.c | 8 +-
>>> mm/memory-failure.c | 12 +-
>>> mm/memory-tiers.c | 4 +-
>>> mm/migrate.c | 23 ++-
>>> mm/mmu_notifier.c | 9 +-
>>> mm/page_alloc.c | 8 +-
>>> mm/page_reporting.c | 2 +-
>>> mm/percpu.c | 11 +-
>>> mm/pgtable-generic.c | 4 +-
>>> mm/rmap.c | 10 +-
>>> mm/shmem.c | 9 +-
>>> mm/slab_common.c | 14 +-
>>> mm/slub.c | 33 ++--
>>> mm/swapfile.c | 4 +-
>>> mm/userfaultfd.c | 12 +-
>>> mm/vmalloc.c | 24 +--
>>> mm/vmscan.c | 7 +-
>>> mm/zsmalloc.c | 4 +-
>>> 124 files changed, 875 insertions(+), 681 deletions(-)
>>
>> Not sure what you were thinking, but this diff stat
>> is not landable.
>
> Agreed. If we decide we want this, I guess we should target per-subsystem
> conversions.
>
> If this goes through the MM tree, I would even appreciate doing this on a per-MM
> component granularity.
>
> (unless we have some magic "Linus converts all of them" script, which I doubt we
> will have)
I strongly agree with the point above.
> Is there a way forward to replace list_for_each_*_safe entirely, possibly just
> reusing the old name but simply the parameter?
David Laight, Christian König, and Jani Nikula do not agree with using
clever macro syntax to support both calling forms at the same time,
so for now it is not possible to keep the original macro name and only
simplify the parameter. I may revert to the v1 version and ask everyone
for their opinions again.
--
Thanks
Kaitao Cheng
^ permalink raw reply
* [PATCH v6] io_uring/register: add IORING_REGISTER_CLONE_FILES opcode
From: Harshal Chavan @ 2026-06-24 12:40 UTC (permalink / raw)
To: harshal24.chavan
Cc: axboe, gregkh, gustavoars, io-uring, kees, krisman,
linux-hardening, linux-kernel
In-Reply-To: <20260624073921.11037-1-harshal24.chavan@gmail.com>
Currently, if an application wants to duplicate registered file
descriptors from one io_uring instance to another, it must manually
unregister and re-register them, incurring unnecessary overhead.
Add IORING_REGISTER_CLONE_FILES to allow direct cloning of the file
table from a source ring to a destination ring. This implementation
strictly mirrors the io_clone_buffers UAPI, supporting partial offsets
and the IORING_REGISTER_DST_REPLACE flag.
To ensure lock synchronization safety, destination nodes are strictly
allocated as new, private io_rsrc_nodes rather than sharing references
across rings.
Signed-off-by: Harshal Chavan <harshal24.chavan@gmail.com>
---
Sorry for the noise on the previous email! I accidentally sent the patch
before running checkpatch and missed a whitespace error. This v6 corrects it.
v6:
- Fixed trailing whitespace checkpatch error.
v5:
- Added missing spacing in comment (Gabriel).
- Removed ctx->user and mm_account checks (Gabriel).
- Used !! for boolean conversion (Gabriel).
- Moved mutex_unlock unconditionally above the out label (Gabriel).
- liburing implementation and tests: https://github.com/axboe/liburing/pull/1606
v4:
- Updated Signed-off-by to use real name and moved above the scissors line (Greg KH).
v3:
- Rewrote the cloning loop to allocate private destination nodes via io_rsrc_node_alloc to fix non-atomic ref lock synchronization (Jens).
- Maintained partial offset/copy support to mirror io_clone_buffers UAPI (Jens).
- Gated the replacement free check on ctx->file_table.data.nr (Gabriel).
- Prevented self-cloning by checking ctx == src_ctx (Gabriel).
- Removed submitter_task check to allow cross-thread pooling setups (Gabriel).
v2:
- Dropped unrelated whitespace formatting changes from v1
---
include/uapi/linux/io_uring.h | 12 +++
io_uring/register.c | 6 ++
io_uring/rsrc.c | 145 ++++++++++++++++++++++++++++++++++
io_uring/rsrc.h | 1 +
4 files changed, 164 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 909fb7aea638..67fcc40f8dfc 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -723,6 +723,9 @@ enum io_uring_register_op {
/* register bpf filtering programs */
IORING_REGISTER_BPF_FILTER = 37,
+ /* clone file descriptors from another ring */
+ IORING_REGISTER_CLONE_FILES = 38,
+
/* this goes last */
IORING_REGISTER_LAST,
@@ -854,6 +857,15 @@ struct io_uring_clone_buffers {
__u32 pad[3];
};
+struct io_uring_clone_files {
+ __u32 src_fd;
+ __u32 flags;
+ __u32 src_off;
+ __u32 dst_off;
+ __u32 nr;
+ __u32 pad[3];
+};
+
struct io_uring_buf {
__u64 addr;
__u32 len;
diff --git a/io_uring/register.c b/io_uring/register.c
index dce5e2f9cf77..bbc8c506ea2d 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -924,6 +924,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
break;
ret = io_register_clone_buffers(ctx, arg);
break;
+ case IORING_REGISTER_CLONE_FILES:
+ ret = -EINVAL;
+ if (!arg || nr_args != 1)
+ break;
+ ret = io_register_clone_files(ctx, arg);
+ break;
case IORING_REGISTER_ZCRX_IFQ:
ret = -EINVAL;
if (!arg || nr_args != 1)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 650303626be6..1d58c256b3a5 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1303,6 +1303,151 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
return ret;
}
+static int io_clone_file_node(struct io_ring_ctx *ctx,
+ struct io_rsrc_node *src_node,
+ int dst_index,
+ struct io_file_table *new_table)
+{
+ struct io_rsrc_node *dst_node;
+ struct file *file;
+
+ dst_node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE);
+ if (!dst_node)
+ return -ENOMEM;
+
+ file = io_slot_file(src_node);
+ get_file(file);
+ io_fixed_file_set(dst_node, file);
+
+ new_table->data.nodes[dst_index] = dst_node;
+ io_file_bitmap_set(new_table, dst_index);
+
+ return 0;
+}
+
+static int io_clone_files(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx,
+ struct io_uring_clone_files *arg)
+{
+ struct io_file_table new_file_table;
+ unsigned int dst_nr = ctx->file_table.data.nr;
+ unsigned int src_nr = src_ctx->file_table.data.nr;
+ unsigned int new_nr, i;
+
+ lockdep_assert_held(&ctx->uring_lock);
+ lockdep_assert_held(&src_ctx->uring_lock);
+
+ if (dst_nr && !(arg->flags & IORING_REGISTER_DST_REPLACE))
+ return -EBUSY;
+
+ if (!src_nr)
+ return -ENXIO;
+
+ if (!arg->nr)
+ arg->nr = src_nr;
+ else if (arg->nr > src_nr)
+ return -EINVAL;
+
+ if (check_add_overflow(arg->src_off, arg->nr, &i) || i > src_nr)
+ return -EINVAL;
+ if (check_add_overflow(arg->dst_off, arg->nr, &i))
+ return -EINVAL;
+
+ new_nr = max(dst_nr, arg->dst_off + arg->nr);
+ if (new_nr > IORING_MAX_FIXED_FILES)
+ return -EINVAL;
+
+ memset(&new_file_table, 0, sizeof(new_file_table));
+ if (!io_alloc_file_tables(ctx, &new_file_table, new_nr))
+ return -ENOMEM;
+
+ /* Copy original nodes from before the cloned range */
+ for (i = 0; i < min(arg->dst_off, dst_nr); i++) {
+ struct io_rsrc_node *src_node = io_rsrc_node_lookup(&ctx->file_table.data, i);
+
+ if (!src_node)
+ continue;
+ if (io_clone_file_node(ctx, src_node, i, &new_file_table))
+ goto out;
+ }
+
+ /* Copy the actual cloned range from the source ring */
+ for (i = 0; i < arg->nr; i++) {
+ struct io_rsrc_node *src_node = io_rsrc_node_lookup(&src_ctx->file_table.data,
+ arg->src_off + i);
+
+ if (!src_node)
+ continue;
+ if (io_clone_file_node(ctx, src_node, arg->dst_off + i, &new_file_table))
+ goto out;
+ }
+
+ /* Copy original nodes from after the cloned range */
+ for (i = arg->dst_off + arg->nr; i < dst_nr; i++) {
+ struct io_rsrc_node *src_node = io_rsrc_node_lookup(&ctx->file_table.data, i);
+
+ if (!src_node)
+ continue;
+ if (io_clone_file_node(ctx, src_node, i, &new_file_table))
+ goto out;
+ }
+
+ /* free the old file table if there is any data present */
+ if (dst_nr)
+ io_free_file_tables(ctx, &ctx->file_table);
+
+ WARN_ON_ONCE(ctx->file_table.data.nr);
+ ctx->file_table = new_file_table;
+ io_file_table_set_alloc_range(ctx, 0, ctx->file_table.data.nr);
+ return 0;
+
+out:
+ /* Error Path: Safely destroy whatever we partially built */
+ io_free_file_tables(ctx, &new_file_table);
+ return -ENOMEM;
+}
+
+int io_register_clone_files(struct io_ring_ctx *ctx, void __user *arg)
+{
+ struct io_uring_clone_files clone_arg;
+ struct io_ring_ctx *src_ctx;
+ bool registered_src;
+ struct file *file;
+ int ret;
+
+ if (copy_from_user(&clone_arg, arg, sizeof(clone_arg)))
+ return -EFAULT;
+ if (clone_arg.flags &
+ ~(IORING_REGISTER_SRC_REGISTERED | IORING_REGISTER_DST_REPLACE))
+ return -EINVAL;
+
+ if (memchr_inv(clone_arg.pad, 0, sizeof(clone_arg.pad)))
+ return -EINVAL;
+
+ registered_src = !!(clone_arg.flags & IORING_REGISTER_SRC_REGISTERED);
+ file = io_uring_ctx_get_file(clone_arg.src_fd, registered_src);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ src_ctx = file->private_data;
+ /* Same ring clone is not allowed */
+ if (src_ctx == ctx) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ mutex_unlock(&ctx->uring_lock);
+ lock_two_rings(ctx, src_ctx);
+
+ ret = io_clone_files(ctx, src_ctx, &clone_arg);
+
+ mutex_unlock(&src_ctx->uring_lock);
+
+out:
+ if (!registered_src)
+ fput(file);
+ return ret;
+}
+
void io_vec_free(struct iou_vec *iv)
{
if (!iv->iovec)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 44e3386f7c1c..32f5c47c46af 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -75,6 +75,7 @@ int io_prep_reg_iovec(struct io_kiocb *req, struct iou_vec *iv,
const struct iovec __user *uvec, size_t uvec_segs);
int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg);
+int io_register_clone_files(struct io_ring_ctx *ctx, void __user *arg);
int io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
unsigned int nr_args, u64 __user *tags);
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Kaitao Cheng @ 2026-06-24 12:29 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Alexei Starovoitov, Andrew Morton, David Hildenbrand, Jens Axboe,
Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Paul E. McKenney, Shakeel Butt, Christian König,
David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
Philipp Stanner, linux-block, LKML,
open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
io-uring, audit, bpf, Network Development, dri-devel,
linux-perf-use., linux-trace-kernel, kexec, live-patching,
linux-modules, Linux Crypto Mailing List, Linux Power Management,
rcu, sched-ext, linux-mm, virtualization, damon,
clang-built-linux, chengkaitao, Muchun Song
In-Reply-To: <ajkSftEbdGoiJXYs@ashevche-desk.local>
在 2026/6/22 18:46, Andy Shevchenko 写道:
> On Mon, Jun 22, 2026 at 02:15:01PM +0800, Kaitao Cheng wrote:
>> 在 2026/6/22 13:28, Alexei Starovoitov 写道:
>>> On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>
> ...
>
>>>> block/bfq-iosched.c | 17 +-
>>>> block/blk-cgroup.c | 12 +-
>>>> block/blk-flush.c | 4 +-
>>>> block/blk-iocost.c | 18 +-
>>>> block/blk-mq.c | 8 +-
>>>> block/blk-throttle.c | 4 +-
>>>> block/kyber-iosched.c | 4 +-
>>>> block/partitions/ldm.c | 8 +-
>>>> block/sed-opal.c | 4 +-
>>>> include/linux/list.h | 269 ++++++++++++++++++++++++----
>>>> include/linux/llist.h | 81 +++++++--
>>>> init/initramfs.c | 5 +-
>>>> io_uring/cancel.c | 6 +-
>>>> io_uring/poll.c | 3 +-
>>>> io_uring/rw.c | 4 +-
>>>> io_uring/timeout.c | 8 +-
>>>> io_uring/uring_cmd.c | 3 +-
>>>> kernel/audit_tree.c | 4 +-
>>>> kernel/audit_watch.c | 16 +-
>>>> kernel/auditfilter.c | 4 +-
>>>> kernel/auditsc.c | 4 +-
>>>> kernel/bpf/arena.c | 10 +-
>>>> kernel/bpf/arraymap.c | 8 +-
>>>> kernel/bpf/bpf_local_storage.c | 3 +-
>>>> kernel/bpf/bpf_lru_list.c | 25 ++-
>>>> kernel/bpf/btf.c | 18 +-
>>>> kernel/bpf/cgroup.c | 7 +-
>>>> kernel/bpf/cpumap.c | 4 +-
>>>> kernel/bpf/devmap.c | 10 +-
>>>> kernel/bpf/helpers.c | 8 +-
>>>> kernel/bpf/local_storage.c | 4 +-
>>>> kernel/bpf/memalloc.c | 16 +-
>>>> kernel/bpf/offload.c | 8 +-
>>>> kernel/bpf/states.c | 4 +-
>>>> kernel/bpf/stream.c | 4 +-
>>>> kernel/bpf/verifier.c | 6 +-
>>>> kernel/cgroup/cgroup-v1.c | 4 +-
>>>> kernel/cgroup/cgroup.c | 54 +++---
>>>> kernel/cgroup/dmem.c | 12 +-
>>>> kernel/cgroup/rdma.c | 8 +-
>>>> kernel/events/core.c | 44 +++--
>>>> kernel/events/uprobes.c | 12 +-
>>>> kernel/exit.c | 8 +-
>>>> kernel/fail_function.c | 4 +-
>>>> kernel/gcov/clang.c | 4 +-
>>>> kernel/irq_work.c | 4 +-
>>>> kernel/kexec_core.c | 4 +-
>>>> kernel/kprobes.c | 16 +-
>>>> kernel/livepatch/core.c | 4 +-
>>>> kernel/livepatch/core.h | 4 +-
>>>> kernel/liveupdate/kho_block.c | 4 +-
>>>> kernel/liveupdate/luo_flb.c | 4 +-
>>>> kernel/locking/rwsem.c | 2 +-
>>>> kernel/locking/test-ww_mutex.c | 2 +-
>>>> kernel/module/main.c | 11 +-
>>>> kernel/padata.c | 4 +-
>>>> kernel/power/snapshot.c | 8 +-
>>>> kernel/power/wakelock.c | 4 +-
>>>> kernel/printk/printk.c | 11 +-
>>>> kernel/ptrace.c | 4 +-
>>>> kernel/rcu/rcutorture.c | 3 +-
>>>> kernel/rcu/tasks.h | 9 +-
>>>> kernel/rcu/tree.c | 6 +-
>>>> kernel/resource.c | 4 +-
>>>> kernel/sched/core.c | 4 +-
>>>> kernel/sched/ext.c | 22 +--
>>>> kernel/sched/fair.c | 28 +--
>>>> kernel/sched/topology.c | 4 +-
>>>> kernel/sched/wait.c | 4 +-
>>>> kernel/seccomp.c | 4 +-
>>>> kernel/signal.c | 11 +-
>>>> kernel/smp.c | 4 +-
>>>> kernel/taskstats.c | 8 +-
>>>> kernel/time/clockevents.c | 6 +-
>>>> kernel/time/clocksource.c | 4 +-
>>>> kernel/time/posix-cpu-timers.c | 4 +-
>>>> kernel/time/posix-timers.c | 3 +-
>>>> kernel/torture.c | 3 +-
>>>> kernel/trace/bpf_trace.c | 4 +-
>>>> kernel/trace/ftrace.c | 49 +++--
>>>> kernel/trace/ring_buffer.c | 25 ++-
>>>> kernel/trace/trace.c | 12 +-
>>>> kernel/trace/trace_dynevent.c | 6 +-
>>>> kernel/trace/trace_dynevent.h | 5 +-
>>>> kernel/trace/trace_events.c | 35 ++--
>>>> kernel/trace/trace_events_filter.c | 4 +-
>>>> kernel/trace/trace_events_hist.c | 8 +-
>>>> kernel/trace/trace_events_trigger.c | 17 +-
>>>> kernel/trace/trace_events_user.c | 16 +-
>>>> kernel/trace/trace_stat.c | 4 +-
>>>> kernel/user-return-notifier.c | 3 +-
>>>> kernel/workqueue.c | 16 +-
>>>> mm/backing-dev.c | 8 +-
>>>> mm/balloon.c | 8 +-
>>>> mm/cma.c | 4 +-
>>>> mm/compaction.c | 4 +-
>>>> mm/damon/core.c | 4 +-
>>>> mm/damon/sysfs-schemes.c | 4 +-
>>>> mm/dmapool.c | 4 +-
>>>> mm/huge_memory.c | 8 +-
>>>> mm/hugetlb.c | 56 +++---
>>>> mm/hugetlb_vmemmap.c | 16 +-
>>>> mm/khugepaged.c | 14 +-
>>>> mm/kmemleak.c | 7 +-
>>>> mm/ksm.c | 25 +--
>>>> mm/list_lru.c | 4 +-
>>>> mm/memcontrol-v1.c | 8 +-
>>>> mm/memory-failure.c | 12 +-
>>>> mm/memory-tiers.c | 4 +-
>>>> mm/migrate.c | 23 ++-
>>>> mm/mmu_notifier.c | 9 +-
>>>> mm/page_alloc.c | 8 +-
>>>> mm/page_reporting.c | 2 +-
>>>> mm/percpu.c | 11 +-
>>>> mm/pgtable-generic.c | 4 +-
>>>> mm/rmap.c | 10 +-
>>>> mm/shmem.c | 9 +-
>>>> mm/slab_common.c | 14 +-
>>>> mm/slub.c | 33 ++--
>>>> mm/swapfile.c | 4 +-
>>>> mm/userfaultfd.c | 12 +-
>>>> mm/vmalloc.c | 24 +--
>>>> mm/vmscan.c | 7 +-
>>>> mm/zsmalloc.c | 4 +-
>>>> 124 files changed, 875 insertions(+), 681 deletions(-)
>>>
>>> Not sure what you were thinking, but this diff stat
>>> is not landable.
>>
>> [PATCH v3 1/7] and [PATCH v3 2/7] contain the main logic and can
>> be merged directly. They are also compatible with the old API.
>> [PATCH v3 3/7] through [PATCH v3 7/7] are just simple interface
>> replacements and do not change any functional logic. They can be
>> left unmerged for now; individual modules can pick them up later
>> if needed.
>>
>> In v2, Andy Shevchenko mentioned: "If it's done by Linus himself
>> during the day when he prepares -rc1, it's fine."
>
> Yes, but you need to get his blessing first to go with this.
> Have you communicated with him on this?
Not yet, because the overall approach is still not mature. People
have different opinions on the implementation details and on how
to move this forward, so I think we should iterate through a few
versions first before making a final decision.
>> Even so, the
>> changes in this patch series are indeed quite large and touch
>> almost every subsystem. I have only converted part of them for
>> now, so I wanted to send this out first and see what people think.
>
> That's why it's better to provide a script to convert (e.g., coccinelle)
> instead of tons of patches.
I tried writing conversion scripts with Coccinelle, but there were
always cases that got missed. In contrast, I found that using AI
for focused replacements was actually more efficient.
As David Hildenbrand mentioned, "If we decide we want this, I guess
we should target per-subsystem conversions." I would like to provide
the new interface first; adapting each subsystem on demand later may
be easier to achieve.
--
Thanks
Kaitao Cheng
^ permalink raw reply
* [PATCH v4] io_uring: annotate remote tasks for kcoverage
From: Robert Femmer @ 2026-06-24 9:01 UTC (permalink / raw)
To: io-uring
Cc: Jens Axboe, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Jann Horn,
Robert Femmer
In-Reply-To: <CAG48ez02Sio8ZENVK3gUWM+8j6NgG9LxtnDV=v+FSqsqs_KfnA@mail.gmail.com>
Fuzzers use coverage information to guide generation of test cases
towards new or interesting code paths. Syzkaller, specifically, makes
use kcoverage (CONFIG_KCOV). Coverage information is not collected for
kernel tasks unless annotated by kcov_remote_start and kcov_remote_stop.
This patch annotates io-uring's work queue and sqpoll tasks.
Depends-On: 20260430-kcov-refactor-common-handle-v1-1-23a0c7a0ba38@google.com
Signed-off-by: Robert Femmer <robert@fmmr.tech>
---
include/linux/io_uring_types.h | 2 ++
io_uring/io-wq.c | 5 +++++
io_uring/io_uring.c | 2 ++
io_uring/sqpoll.c | 3 +++
4 files changed, 12 insertions(+)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 244392026c6d..b6590b2b350c 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -504,6 +504,8 @@ struct io_ring_ctx {
struct io_mapped_region ring_region;
/* used for optimised request parameter and wait argument passing */
struct io_mapped_region param_region;
+
+ struct kcov_common_handle_id kcov_handle;
};
/*
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 8cc7b47d3089..173299dfc9c2 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -19,6 +19,7 @@
#include <linux/mmu_context.h>
#include <linux/sched/sysctl.h>
#include <uapi/linux/io_uring.h>
+#include <linux/kcov.h>
#include "io-wq.h"
#include "slist.h"
@@ -639,6 +640,7 @@ static void io_worker_handle_work(struct io_wq_acct *acct,
/* handle a whole dependent link */
do {
struct io_wq_work *next_hashed, *linked;
+ struct io_kiocb *req;
unsigned int work_flags = atomic_read(&work->flags);
unsigned int hash = __io_wq_is_hashed(work_flags)
? __io_get_work_hash(work_flags)
@@ -649,7 +651,10 @@ static void io_worker_handle_work(struct io_wq_acct *acct,
if (do_kill &&
(work_flags & IO_WQ_WORK_UNBOUND))
atomic_or(IO_WQ_WORK_CANCEL, &work->flags);
+ req = container_of(work, struct io_kiocb, work);
+ kcov_remote_start_common(req->ctx->kcov_handle);
io_wq_submit_work(work);
+ kcov_remote_stop();
io_assign_current_work(worker, NULL);
linked = io_wq_free_work(work);
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 103b6c88f252..ab7c3e45e238 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -59,6 +59,7 @@
#include <linux/audit.h>
#include <linux/security.h>
#include <linux/jump_label.h>
+#include <linux/kcov.h>
#define CREATE_TRACE_POINTS
#include <trace/events/io_uring.h>
@@ -293,6 +294,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_HLIST_HEAD(&ctx->cancelable_uring_cmd);
io_napi_init(ctx);
mutex_init(&ctx->mmap_lock);
+ ctx->kcov_handle = kcov_common_handle();
return ctx;
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index 46c12afec73e..aafb640d3b2f 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -13,6 +13,7 @@
#include <linux/cpuset.h>
#include <linux/sched/cputime.h>
#include <linux/io_uring.h>
+#include <linux/kcov.h>
#include <uapi/linux/io_uring.h>
@@ -342,10 +343,12 @@ static int io_sq_thread(void *data)
cap_entries = !list_is_singular(&sqd->ctx_list);
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
+ kcov_remote_start_common(ctx->kcov_handle);
int ret = __io_sq_thread(ctx, sqd, cap_entries, &ist);
if (!sqt_spin && (ret > 0 || !list_empty(&ctx->iopoll_list)))
sqt_spin = true;
+ kcov_remote_stop();
}
if (io_sq_tw(&retry_list, IORING_TW_CAP_ENTRIES_VALUE))
sqt_spin = true;
--
2.54.0
^ permalink raw reply related
* [PATCH v5] io_uring/register: add IORING_REGISTER_CLONE_FILES opcode
From: Harshal Chavan @ 2026-06-24 7:39 UTC (permalink / raw)
To: io-uring
Cc: axboe, krisman, gregkh, gustavoars, harshal24.chavan, kees,
linux-hardening, linux-kernel
Currently, if an application wants to duplicate registered file
descriptors from one io_uring instance to another, it must manually
unregister and re-register them, incurring unnecessary overhead.
Add IORING_REGISTER_CLONE_FILES to allow direct cloning of the file
table from a source ring to a destination ring. This implementation
strictly mirrors the io_clone_buffers UAPI, supporting partial offsets
and the IORING_REGISTER_DST_REPLACE flag.
To ensure lock synchronization safety, destination nodes are strictly
allocated as new, private io_rsrc_nodes rather than sharing references
across rings.
Signed-off-by: Harshal Chavan <harshal24.chavan@gmail.com>
---
v5:
- Added missing spacing in comment (Gabriel).
- Removed ctx->user and mm_account checks (Gabriel).
- Used !! for boolean conversion (Gabriel).
- Moved mutex_unlock unconditionally above the out label (Gabriel).
- liburing implementation and tests: https://github.com/axboe/liburing/pull/1606
v4:
- Updated Signed-off-by to use real name and moved above the scissors line (Greg KH).
v3:
- Rewrote the cloning loop to allocate private destination nodes via io_rsrc_node_alloc to fix non-atomic ref lock synchronization (Jens).
- Maintained partial offset/copy support to mirror io_clone_buffers UAPI (Jens).
- Gated the replacement free check on ctx->file_table.data.nr (Gabriel).
- Prevented self-cloning by checking ctx == src_ctx (Gabriel).
- Removed submitter_task check to allow cross-thread pooling setups (Gabriel).
v2:
- Dropped unrelated whitespace formatting changes from v1
---
include/uapi/linux/io_uring.h | 12 +++
io_uring/register.c | 6 ++
io_uring/rsrc.c | 145 ++++++++++++++++++++++++++++++++++
io_uring/rsrc.h | 1 +
4 files changed, 164 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 909fb7aea638..67fcc40f8dfc 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -723,6 +723,9 @@ enum io_uring_register_op {
/* register bpf filtering programs */
IORING_REGISTER_BPF_FILTER = 37,
+ /* clone file descriptors from another ring */
+ IORING_REGISTER_CLONE_FILES = 38,
+
/* this goes last */
IORING_REGISTER_LAST,
@@ -854,6 +857,15 @@ struct io_uring_clone_buffers {
__u32 pad[3];
};
+struct io_uring_clone_files {
+ __u32 src_fd;
+ __u32 flags;
+ __u32 src_off;
+ __u32 dst_off;
+ __u32 nr;
+ __u32 pad[3];
+};
+
struct io_uring_buf {
__u64 addr;
__u32 len;
diff --git a/io_uring/register.c b/io_uring/register.c
index dce5e2f9cf77..bbc8c506ea2d 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -924,6 +924,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
break;
ret = io_register_clone_buffers(ctx, arg);
break;
+ case IORING_REGISTER_CLONE_FILES:
+ ret = -EINVAL;
+ if (!arg || nr_args != 1)
+ break;
+ ret = io_register_clone_files(ctx, arg);
+ break;
case IORING_REGISTER_ZCRX_IFQ:
ret = -EINVAL;
if (!arg || nr_args != 1)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 650303626be6..5ddd715e2a63 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1303,6 +1303,151 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
return ret;
}
+static int io_clone_file_node(struct io_ring_ctx *ctx,
+ struct io_rsrc_node *src_node,
+ int dst_index,
+ struct io_file_table *new_table)
+{
+ struct io_rsrc_node *dst_node;
+ struct file *file;
+
+ dst_node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE);
+ if (!dst_node)
+ return -ENOMEM;
+
+ file = io_slot_file(src_node);
+ get_file(file);
+ io_fixed_file_set(dst_node, file);
+
+ new_table->data.nodes[dst_index] = dst_node;
+ io_file_bitmap_set(new_table, dst_index);
+
+ return 0;
+}
+
+static int io_clone_files(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx,
+ struct io_uring_clone_files *arg)
+{
+ struct io_file_table new_file_table;
+ unsigned int dst_nr = ctx->file_table.data.nr;
+ unsigned int src_nr = src_ctx->file_table.data.nr;
+ unsigned int new_nr, i;
+
+ lockdep_assert_held(&ctx->uring_lock);
+ lockdep_assert_held(&src_ctx->uring_lock);
+
+ if (dst_nr && !(arg->flags & IORING_REGISTER_DST_REPLACE))
+ return -EBUSY;
+
+ if (!src_nr)
+ return -ENXIO;
+
+ if (!arg->nr)
+ arg->nr = src_nr;
+ else if (arg->nr > src_nr)
+ return -EINVAL;
+
+ if (check_add_overflow(arg->src_off, arg->nr, &i) || i > src_nr)
+ return -EINVAL;
+ if (check_add_overflow(arg->dst_off, arg->nr, &i))
+ return -EINVAL;
+
+ new_nr = max(dst_nr, arg->dst_off + arg->nr);
+ if (new_nr > IORING_MAX_FIXED_FILES)
+ return -EINVAL;
+
+ memset(&new_file_table, 0, sizeof(new_file_table));
+ if (!io_alloc_file_tables(ctx, &new_file_table, new_nr))
+ return -ENOMEM;
+
+ /* Copy original nodes from before the cloned range */
+ for (i = 0; i < min(arg->dst_off, dst_nr); i++) {
+ struct io_rsrc_node *src_node = io_rsrc_node_lookup(&ctx->file_table.data, i);
+
+ if (!src_node)
+ continue;
+ if (io_clone_file_node(ctx, src_node, i, &new_file_table))
+ goto out;
+ }
+
+ /* Copy the actual cloned range from the source ring */
+ for (i = 0; i < arg->nr; i++) {
+ struct io_rsrc_node *src_node = io_rsrc_node_lookup(&src_ctx->file_table.data,
+ arg->src_off + i);
+
+ if (!src_node)
+ continue;
+ if (io_clone_file_node(ctx, src_node, arg->dst_off + i, &new_file_table))
+ goto out;
+ }
+
+ /* Copy original nodes from after the cloned range */
+ for (i = arg->dst_off + arg->nr; i < dst_nr; i++) {
+ struct io_rsrc_node *src_node = io_rsrc_node_lookup(&ctx->file_table.data, i);
+
+ if (!src_node)
+ continue;
+ if (io_clone_file_node(ctx, src_node, i, &new_file_table))
+ goto out;
+ }
+
+ /* free the old file table if there is any data present */
+ if (dst_nr)
+ io_free_file_tables(ctx, &ctx->file_table);
+
+ WARN_ON_ONCE(ctx->file_table.data.nr);
+ ctx->file_table = new_file_table;
+ io_file_table_set_alloc_range(ctx, 0, ctx->file_table.data.nr);
+ return 0;
+
+out:
+ /* Error Path: Safely destroy whatever we partially built */
+ io_free_file_tables(ctx, &new_file_table);
+ return -ENOMEM;
+}
+
+int io_register_clone_files(struct io_ring_ctx *ctx, void __user *arg)
+{
+ struct io_uring_clone_files clone_arg;
+ struct io_ring_ctx *src_ctx;
+ bool registered_src;
+ struct file *file;
+ int ret;
+
+ if (copy_from_user(&clone_arg, arg, sizeof(clone_arg)))
+ return -EFAULT;
+ if (clone_arg.flags &
+ ~(IORING_REGISTER_SRC_REGISTERED | IORING_REGISTER_DST_REPLACE))
+ return -EINVAL;
+
+ if (memchr_inv(clone_arg.pad, 0, sizeof(clone_arg.pad)))
+ return -EINVAL;
+
+ registered_src = !!(clone_arg.flags & IORING_REGISTER_SRC_REGISTERED);
+ file = io_uring_ctx_get_file(clone_arg.src_fd, registered_src);
+ if (IS_ERR(file))
+ return PTR_ERR(file);
+
+ src_ctx = file->private_data;
+ /* Same ring clone is not allowed */
+ if (src_ctx == ctx) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ mutex_unlock(&ctx->uring_lock);
+ lock_two_rings(ctx, src_ctx);
+
+ ret = io_clone_files(ctx, src_ctx, &clone_arg);
+
+ mutex_unlock(&src_ctx->uring_lock);
+
+out:
+ if (!registered_src)
+ fput(file);
+ return ret;
+}
+
void io_vec_free(struct iou_vec *iv)
{
if (!iv->iovec)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 44e3386f7c1c..32f5c47c46af 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -75,6 +75,7 @@ int io_prep_reg_iovec(struct io_kiocb *req, struct iou_vec *iv,
const struct iovec __user *uvec, size_t uvec_segs);
int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg);
+int io_register_clone_files(struct io_ring_ctx *ctx, void __user *arg);
int io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
unsigned int nr_args, u64 __user *tags);
--
2.54.0
^ permalink raw reply related
* Re: [PATCH v4] io_uring/register: add IORING_REGISTER_CLONE_FILES opcode
From: Harshal Chavan @ 2026-06-23 16:48 UTC (permalink / raw)
To: krisman
Cc: axboe, gregkh, gustavoars, harshal24.chavan, io-uring, kees,
linux-hardening, linux-kernel
In-Reply-To: <871pdyxrxw.fsf@mailhost.krisman.be>
Gabriel Krisman Bertazi @ 2026-06-22 20:04 UTC writes:
>Hello,
>
>Do you have the liburing side and test cases?
>
>A few comments inline.
Hello,
Yes I will update the liburing side with helper function
and add appropriate test cases.
>> + /* clone file descriptors from another ring*/
> ^ spacing
Fixed in v5
>> + if (ctx->user != src_ctx->user || ctx->mm_account != src_ctx->mm_account)
>> + return -EINVAL;
>
>I don't think it makes sense to check ->user here. But is mm_account
>necessary either? How could you get the src_ctx from another process?
Yes, Keeping this check would unnecessarily break valid use case like
Root user passing FDs to guest user
Removed it completely in v5, thanks for catching this!.
>> + registered_src = (clone_arg.flags & IORING_REGISTER_SRC_REGISTERED) != 0;
>
>This is better written as
>
>registered_src = !!(clone_arg.flags & IORING_REGISTER_SRC_REGISTERED);
Understood, updated this in v5
>> +out:
>> + if (src_ctx != ctx)
>> + mutex_unlock(&src_ctx->uring_lock);
>
>Make the mutex_unlock unconditionally above the out label. It is never
>locked in the error context.
Yes, moved the unlock statement before out without any conditions.
Thank you for the review.
Regards,
Harshal Chavan
^ permalink raw reply
* Re: [PATCH v3] io_uring: annotate remote tasks for kcoverage
From: Jann Horn @ 2026-06-23 16:46 UTC (permalink / raw)
To: robert; +Cc: io-uring, Jens Axboe, Dmitry Vyukov, Andrey Konovalov, kasan-dev
In-Reply-To: <CAG48ez02Sio8ZENVK3gUWM+8j6NgG9LxtnDV=v+FSqsqs_KfnA@mail.gmail.com>
On Tue, Jun 23, 2026 at 6:37 PM Jann Horn <jannh@google.com> wrote:
> On Tue, May 26, 2026 at 6:49 PM Robert Femmer <robert@fmmr.tech> wrote:
> > Fuzzers use coverage information to guide generation of test cases
> > towards new or interesting code paths. Syzkaller, specifically, makes
> > use kcoverage (CONFIG_KCOV). Coverage information is not collected for
> > kernel tasks unless annotated by kcov_remote_start and kcov_remote_stop.
> > This patch annotates io-uring's work queue and sqpoll tasks.
>
> I think this is a useful change overall.
>
> @maintainers: For context, this should have no impact on normal builds
> - "struct kcov_common_handle_id" is zero-sized in normal builds, and
> all the helpers used here are empty inline functions.
(That was supposed to be "are empty inline functions in normal
builds". I should've re-read this before hitting send...)
> > Depends-on: 20260430-kcov-refactor-common-handle-v1-1-23a0c7a0ba38@google.com
(This landed in mainline in the current merge window.)
^ permalink raw reply
* Re: [PATCH v3] io_uring: annotate remote tasks for kcoverage
From: Jann Horn @ 2026-06-23 16:37 UTC (permalink / raw)
To: robert; +Cc: io-uring, Jens Axboe, Dmitry Vyukov, Andrey Konovalov, kasan-dev
In-Reply-To: <20260526164948.831543-2-robert@fmmr.tech>
On Tue, May 26, 2026 at 6:49 PM Robert Femmer <robert@fmmr.tech> wrote:
> Fuzzers use coverage information to guide generation of test cases
> towards new or interesting code paths. Syzkaller, specifically, makes
> use kcoverage (CONFIG_KCOV). Coverage information is not collected for
> kernel tasks unless annotated by kcov_remote_start and kcov_remote_stop.
> This patch annotates io-uring's work queue and sqpoll tasks.
I think this is a useful change overall.
@maintainers: For context, this should have no impact on normal builds
- "struct kcov_common_handle_id" is zero-sized in normal builds, and
all the helpers used here are empty inline functions.
> Depends-on: 20260430-kcov-refactor-common-handle-v1-1-23a0c7a0ba38@google.com
> Signed-off-by: Robert Femmer <robert@fmmr.tech>
> ---
> include/linux/io_uring_types.h | 2 ++
> io_uring/io-wq.c | 4 ++++
> io_uring/io_uring.c | 1 +
> io_uring/io_uring.h | 2 ++
> io_uring/sqpoll.c | 4 ++++
> 5 files changed, 13 insertions(+)
>
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 244392026c6d..b6590b2b350c 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -504,6 +504,8 @@ struct io_ring_ctx {
> struct io_mapped_region ring_region;
> /* used for optimised request parameter and wait argument passing */
> struct io_mapped_region param_region;
> +
> + struct kcov_common_handle_id kcov_handle;
> };
>
> /*
> diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
> index 8cc7b47d3089..9ade4c4f4983 100644
> --- a/io_uring/io-wq.c
> +++ b/io_uring/io-wq.c
> @@ -639,6 +639,7 @@ static void io_worker_handle_work(struct io_wq_acct *acct,
> /* handle a whole dependent link */
> do {
> struct io_wq_work *next_hashed, *linked;
> + struct io_kiocb *req;
> unsigned int work_flags = atomic_read(&work->flags);
> unsigned int hash = __io_wq_is_hashed(work_flags)
> ? __io_get_work_hash(work_flags)
> @@ -649,7 +650,10 @@ static void io_worker_handle_work(struct io_wq_acct *acct,
> if (do_kill &&
> (work_flags & IO_WQ_WORK_UNBOUND))
> atomic_or(IO_WQ_WORK_CANCEL, &work->flags);
> + req = container_of(work, struct io_kiocb, work);
> + kcov_remote_start_common(req->ctx->kcov_handle);
> io_wq_submit_work(work);
> + kcov_remote_stop();
> io_assign_current_work(worker, NULL);
>
> linked = io_wq_free_work(work);
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 103b6c88f252..89cb649944d9 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -293,6 +293,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
> INIT_HLIST_HEAD(&ctx->cancelable_uring_cmd);
> io_napi_init(ctx);
> mutex_init(&ctx->mmap_lock);
> + ctx->kcov_handle = kcov_common_handle();
>
> return ctx;
>
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index e612a66ee80e..7226fbbbf9f0 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -7,6 +7,7 @@
> #include <linux/resume_user_mode.h>
> #include <linux/poll.h>
> #include <linux/io_uring_types.h>
> +#include <linux/kcov.h>
I think instead of this, normal kernel coding style is to use includes
directly in the files where they are needed.
https://docs.kernel.org/process/submit-checklist.html says:
"If you use a facility then #include the file that defines/declares
that facility. Don’t depend on other header files pulling in ones that
you use."
> #include <uapi/linux/eventpoll.h>
> #include "alloc_cache.h"
> #include "io-wq.h"
> @@ -581,4 +582,5 @@ static inline bool io_has_work(struct io_ring_ctx *ctx)
> return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) ||
> io_local_work_pending(ctx);
> }
> +
> #endif
This looks like an accidental whitespace change.
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index 46c12afec73e..c7b78ea98587 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -342,19 +342,23 @@ static int io_sq_thread(void *data)
>
> cap_entries = !list_is_singular(&sqd->ctx_list);
> list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
> + kcov_remote_start_common(ctx->kcov_handle);
> int ret = __io_sq_thread(ctx, sqd, cap_entries, &ist);
>
> if (!sqt_spin && (ret > 0 || !list_empty(&ctx->iopoll_list)))
> sqt_spin = true;
> + kcov_remote_stop();
> }
> if (io_sq_tw(&retry_list, IORING_TW_CAP_ENTRIES_VALUE))
> sqt_spin = true;
>
> list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
> + kcov_remote_start_common(ctx->kcov_handle);
> if (io_napi(ctx)) {
> io_sq_start_worktime(&ist);
> io_napi_sqpoll_busy_poll(ctx);
> }
> + kcov_remote_stop();
Someone who knows more about networking than me might know this area
better, but I think we probably don't want to have KCOV coverage
around the call to io_napi_sqpoll_busy_poll() for two reasons:
1. This is NAPI busypolling code, designed to busy-loop until network
packets arrive - meaning the limited KCOV coverage buffer will quickly
fill up even if no data is actually being processed.
2. As far as I know, io_napi_sqpoll_busy_poll() doesn't really process
data related to the uring instance - it (more or less) merely
busy-polls network interfaces specified by the user. Received packets
are not necessarily actually related to this uring instance.
> }
>
> io_sq_update_worktime(sqd, &ist);
> --
> 2.54.0
>
^ permalink raw reply
* [PATCH v2] setup: dynamically detect default huge page size
From: Prateek @ 2026-06-23 15:43 UTC (permalink / raw)
To: gabriel; +Cc: io-uring, kprateek283
Replaces the hardcoded 2MB huge page size with dynamic detection by
parsing /proc/meminfo. This fixes no-mmap allocation failures on
architectures with different default huge page sizes (like ARM64
which often uses 512MB) or x86 systems configured for 1GB pages.
- Safely parses /proc/meminfo without allocating memory.
- Adds a __uring_memcmp shim for CONFIG_NOLIBC builds, allowing
setup.c to use standard memcmp for the Hugepagesize: match.
- Drops the MAP_HUGE_2MB mmap flag to allow the kernel to correctly
apply the system's default huge page size.
- Falls back safely to 2MB if /proc/meminfo is unreadable.
Signed-off-by: Prateek <kprateek283@gmail.com>
---
Changes in v2:
- Initialized hps explicitly to 0.
- Replaced the char-by-char Hugepagesize comparison with a new __uring_memcmp helper.
- Removed the redundant ret variable and simplified the fallback assignment using a ternary operator.
src/lib.h | 2 ++
src/nolibc.c | 19 +++++++++++++
src/setup.c | 75 +++++++++++++++++++++++++++++++++++++++++-----------
3 files changed, 80 insertions(+), 16 deletions(-)
diff --git a/src/lib.h b/src/lib.h
index 4d32d3e1..463dd4b5 100644
--- a/src/lib.h
+++ b/src/lib.h
@@ -41,10 +41,12 @@
void *__uring_memset(void *s, int c, size_t n);
void *__uring_malloc(size_t len);
void __uring_free(void *p);
+int __uring_memcmp(const void *s1, const void *s2, size_t n);
#define malloc(LEN) __uring_malloc(LEN)
#define free(PTR) __uring_free(PTR)
#define memset(PTR, C, LEN) __uring_memset(PTR, C, LEN)
+#define memcmp(S1, S2, LEN) __uring_memcmp(S1, S2, LEN)
#endif
#endif /* #ifndef LIBURING_LIB_H */
diff --git a/src/nolibc.c b/src/nolibc.c
index 88b1494a..14ede500 100644
--- a/src/nolibc.c
+++ b/src/nolibc.c
@@ -25,6 +25,25 @@ void *__uring_memset(void *s, int c, size_t n)
return s;
}
+int __uring_memcmp(const void *s1, const void *s2, size_t n)
+{
+ size_t i;
+ const unsigned char *p1 = s1, *p2 = s2;
+
+ for (i = 0; i < n; i++) {
+ if (p1[i] != p2[i])
+ return p1[i] - p2[i];
+
+ /*
+ * An empty inline ASM to avoid auto-vectorization
+ * because it's too bloated for liburing.
+ */
+ __asm__ volatile ("");
+ }
+
+ return 0;
+}
+
struct uring_heap {
size_t len;
char user_p[] __attribute__((__aligned__));
diff --git a/src/setup.c b/src/setup.c
index ea6f11fd..88f86784 100644
--- a/src/setup.c
+++ b/src/setup.c
@@ -220,15 +220,58 @@ __cold int io_uring_ring_dontfork(struct io_uring *ring)
return 0;
}
-#ifndef MAP_HUGE_SHIFT
-#define MAP_HUGE_SHIFT 26
-#endif
-#ifndef MAP_HUGE_2MB
-#define MAP_HUGE_2MB (21U << MAP_HUGE_SHIFT)
-#endif
-/* FIXME */
-static size_t huge_page_size = 2 * 1024 * 1024;
+static size_t get_huge_page_size(void)
+{
+ static size_t hps = 0;
+ char buf[4096];
+ char *p, *end;
+ unsigned long val = 0;
+ ssize_t n;
+ int fd;
+
+ if (hps)
+ return hps;
+
+ fd = __sys_open("/proc/meminfo", O_RDONLY, 0);
+ if (fd < 0)
+ goto out;
+
+ n = __sys_read(fd, buf, sizeof(buf) - 1);
+ __sys_close(fd);
+ if (n <= 0)
+ goto out;
+ buf[n] = '\0';
+
+ /*
+ * Scan line-by-line for "Hugepagesize:".
+ */
+ p = buf;
+ end = buf + n;
+ while (p < end) {
+ /* Check if this line starts with "Hugepagesize:" (13 chars) */
+ if (p + 13 <= end && !memcmp(p, "Hugepagesize:", 13)) {
+ p += 13;
+ while (p < end && (*p == ' ' || *p == '\t'))
+ p++;
+ val = 0;
+ while (p < end && *p >= '0' && *p <= '9') {
+ val = val * 10 + (*p - '0');
+ p++;
+ }
+ break;
+ }
+ /* Advance to next line */
+ while (p < end && *p != '\n')
+ p++;
+ if (p < end)
+ p++;
+ }
+out:
+ hps = val ? val * 1024 : 2 * 1024 * 1024;
+ return hps;
+}
+
#define KRING_SIZE 64
@@ -261,13 +304,13 @@ static int io_uring_alloc_huge(unsigned entries, struct io_uring_params *p,
mem_used = (mem_used + page_size - 1) & ~(page_size - 1);
/*
- * A maxed-out number of CQ entries with IORING_SETUP_CQE32 fills a 2MB
- * huge page by itself, so the SQ entries won't fit in the same huge
- * page. For SQEs, that shouldn't be possible given KERN_MAX_ENTRIES,
+ * A maxed-out number of CQ entries with IORING_SETUP_CQE32 can fill a
+ * single huge page by itself, so the SQ entries won't fit in the same
+ * huge page. For SQEs, that shouldn't be possible given KERN_MAX_ENTRIES,
* but check that too to future-proof (e.g. against different huge page
* sizes). Bail out early so we don't overrun.
*/
- if (!buf && (sqes_mem > huge_page_size || ring_mem > huge_page_size))
+ if (!buf && (sqes_mem > get_huge_page_size() || ring_mem > get_huge_page_size()))
return -ENOMEM;
if (buf) {
@@ -279,8 +322,8 @@ static int io_uring_alloc_huge(unsigned entries, struct io_uring_params *p,
if (sqes_mem <= page_size)
buf_size = page_size;
else {
- buf_size = huge_page_size;
- map_hugetlb = MAP_HUGETLB | MAP_HUGE_2MB;
+ buf_size = get_huge_page_size();
+ map_hugetlb = MAP_HUGETLB;
}
sqes_size = buf_size;
ptr = __sys_mmap(NULL, sqes_size, PROT_READ|PROT_WRITE,
@@ -302,8 +345,8 @@ static int io_uring_alloc_huge(unsigned entries, struct io_uring_params *p,
if (ring_mem <= page_size)
buf_size = page_size;
else {
- buf_size = huge_page_size;
- map_hugetlb = MAP_HUGETLB | MAP_HUGE_2MB;
+ buf_size = get_huge_page_size();
+ map_hugetlb = MAP_HUGETLB;
}
ptr = __sys_mmap(NULL, buf_size, PROT_READ|PROT_WRITE,
MAP_SHARED|MAP_ANONYMOUS|map_hugetlb,
--
2.43.0
^ permalink raw reply related
* Re: [PATCH] setup: dynamically detect default huge page size
From: Gabriel Krisman Bertazi @ 2026-06-23 15:11 UTC (permalink / raw)
To: Prateek; +Cc: io-uring, kprateek283
In-Reply-To: <20260623110930.910263-1-kprateek283@gmail.com>
Prateek <kprateek283@gmail.com> writes:
> Hi Gabriel,
>
> Thanks for the review.
>
> On Mon, Jun 22, 2026 at 16:49 Gabriel Krisman Bertazi wrote:
>> > +static size_t get_huge_page_size(void)
>> > +{
>> > + static size_t hps;
>>
>> Please, initialize your static variables to makes it readable. I.e,
>> should be initialized it to 2MB.
>
> hps is left at 0 on purpose as a "not computed yet" flag -- same thing
> get_page_size() does in arch/aarch64/lib.h with cache_val. If I set
> hps = 2MB upfront, the first call just returns 2MB without ever
> reading /proc/meminfo, which defeats the point.
Ah, of course. Back to the original point, please initialize hps
explicitly (to 0). Yeah, I know the compiler should do that for you in
C99. Still, make it explicit.
>
>> > + size_t ret = 2 * 1024 * 1024; /* fallback: 2MB */
>>
>> ret redundant with hps, could go away.
>
> The local ret is there so I only write to hps once at the end. If two
> threads race into this function, neither one sees a half-baked
> fallback value in hps. The race itself is harmless since both threads
> would compute the same result anyway.
No, it is redundant. You don't need to have "half-baked" values in hps
either. as you already use val to build your hugepage size. ret is just an
extra step that will vanish in compilation.
There are many ways around it. For instance:
unsigned long val = 0;
...
out:
hps = (val)?: 2*1024*1024; /* fallback to 2 MB pages */
return hps;
--
Gabriel Krisman Bertazi
^ permalink raw reply
* Re: [PATCH] setup: dynamically detect default huge page size
From: Prateek @ 2026-06-23 11:09 UTC (permalink / raw)
To: gabriel; +Cc: io-uring, kprateek283
In-Reply-To: <87qzlyy0zd.fsf@mailhost.krisman.be>
Hi Gabriel,
Thanks for the review.
On Mon, Jun 22, 2026 at 16:49 Gabriel Krisman Bertazi wrote:
> > +static size_t get_huge_page_size(void)
> > +{
> > + static size_t hps;
>
> Please, initialize your static variables to makes it readable. I.e,
> should be initialized it to 2MB.
hps is left at 0 on purpose as a "not computed yet" flag -- same thing get_page_size() does in arch/aarch64/lib.h with cache_val. If I set hps = 2MB upfront, the first call just returns 2MB without ever reading /proc/meminfo, which defeats the point.
> > + size_t ret = 2 * 1024 * 1024; /* fallback: 2MB */
>
> ret redundant with hps, could go away.
The local ret is there so I only write to hps once at the end. If two threads race into this function, neither one sees a half-baked fallback value in hps. The race itself is harmless since both threads would compute the same result anyway.
> > + if (p + 13 <= end &&
> > + p[0] == 'H' && p[1] == 'u' && p[2] == 'g' &&
> > + p[3] == 'e' && p[4] == 'p' && p[5] == 'a' &&
> > + p[6] == 'g' && p[7] == 'e' && p[8] == 's' &&
> > + p[9] == 'i' && p[10] == 'z' && p[11] == 'e' &&
> > + p[12] == ':') {
>
> This is unreadable. It would be much better as a two line loop
> iterating over two strings... But then, why not create it a couple line
> implementation of memcmp and atoi in arch/generic/lib.h instead?
Yeah, the char-by-char match is ugly, agreed. For v2 I'll add a __uring_memcmp in nolibc.c and shim it in lib.h behind #ifdef CONFIG_NOLIBC, same way memset/malloc/free are done today. arch/generic/lib.h only gets included on archs without nolibc support, so putting memcmp there wouldn't help x86/aarch64/riscv64 nolibc builds. nolibc.c + lib.h shim covers all configs. Then setup.c just calls memcmp(p, "Hugepagesize:", 13) -- normal builds use libc's memcmp, nolibc builds use the shim. I'll keep the digit parsing loop as-is since it's simple enough and pulling in atoi feels like overkill.
> This function should go in arch/generic/lib.h too. A hint is the
> get_page_size is already there.
get_huge_page_size() only lives in setup.c and uses the __sys* wrappers from syscall.h, which work in all build configs. Unlike get_page_size() which is needed across multiple files, there's no reason to put this in the arch headers and duplicate it four times.
> That said, we should be looking into something like the kernel's nolibc
> instead of reinventing libc.
Agreed, worth looking into separately. This patch just fixes the immediate hugepage issue.
Will send a v2 with the memcmp approach.
Thanks,
Prateek
^ permalink raw reply
* Re: [PATCH] io_uring/memmap: bound io_pin_pages() by page array byte size
From: Jens Axboe @ 2026-06-22 21:14 UTC (permalink / raw)
To: Deepanshu Kartikey; +Cc: io-uring, linux-kernel, syzbot+f99b00a963915b6b52c6
In-Reply-To: <20260621012933.50571-1-kartikey406@gmail.com>
On Sun, 21 Jun 2026 06:59:33 +0530, Deepanshu Kartikey wrote:
> io_pin_pages() checks that nr_pages does not exceed INT_MAX, then
> allocates a struct page * array of nr_pages entries. kvmalloc() limits
> allocations to INT_MAX bytes, but the check counts pages, not bytes.
> On 64-bit each entry is 8 bytes, so the array hits the INT_MAX byte
> limit at INT_MAX / sizeof(struct page *) pages, well before the page
> count check fires.
>
> [...]
Applied, thanks!
[1/1] io_uring/memmap: bound io_pin_pages() by page array byte size
commit: 3996771b8f759729cba0a28007438c085f814d61
Best regards,
--
Jens Axboe
^ permalink raw reply
* Re: [PATCH v4] io_uring/register: add IORING_REGISTER_CLONE_FILES opcode
From: Gabriel Krisman Bertazi @ 2026-06-22 20:04 UTC (permalink / raw)
To: Harshal Chavan, io-uring, axboe
Cc: gregkh, kees, gustavoars, linux-kernel, linux-hardening,
Harshal Chavan
In-Reply-To: <20260619093641.25339-1-harshal24.chavan@gmail.com>
Harshal Chavan <harshal24.chavan@gmail.com> writes:
> Currently, if an application wants to duplicate registered file
> descriptors from one io_uring instance to another, it must manually
> unregister and re-register them, incurring unnecessary overhead.
>
> Add IORING_REGISTER_CLONE_FILES to allow direct cloning of the file
> table from a source ring to a destination ring. This implementation
> strictly mirrors the io_clone_buffers UAPI, supporting partial offsets
> and the IORING_REGISTER_DST_REPLACE flag.
>
> To ensure lock synchronization safety, destination nodes are strictly
> allocated as new, private io_rsrc_nodes rather than sharing references
> across rings.
>
> Signed-off-by: Harshal Chavan <harshal24.chavan@gmail.com>
Hello,
Do you have the liburing side and test cases?
A few comments inline.
> ---
> include/uapi/linux/io_uring.h | 12 +++
> io_uring/register.c | 6 ++
> io_uring/rsrc.c | 149 ++++++++++++++++++++++++++++++++++
> io_uring/rsrc.h | 1 +
> 4 files changed, 168 insertions(+)
>
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 909fb7aea638..0727602ce12f 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -723,6 +723,9 @@ enum io_uring_register_op {
> /* register bpf filtering programs */
> IORING_REGISTER_BPF_FILTER = 37,
>
> + /* clone file descriptors from another ring*/
^ spacing
> + IORING_REGISTER_CLONE_FILES = 38,
> +
> /* this goes last */
> IORING_REGISTER_LAST,
>
> @@ -854,6 +857,15 @@ struct io_uring_clone_buffers {
> __u32 pad[3];
> };
>
> +struct io_uring_clone_files {
> + __u32 src_fd;
> + __u32 flags;
> + __u32 src_off;
> + __u32 dst_off;
> + __u32 nr;
> + __u32 pad[3];
> +};
> +
> struct io_uring_buf {
> __u64 addr;
> __u32 len;
> diff --git a/io_uring/register.c b/io_uring/register.c
> index dce5e2f9cf77..bbc8c506ea2d 100644
> --- a/io_uring/register.c
> +++ b/io_uring/register.c
> @@ -924,6 +924,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
> break;
> ret = io_register_clone_buffers(ctx, arg);
> break;
> + case IORING_REGISTER_CLONE_FILES:
> + ret = -EINVAL;
> + if (!arg || nr_args != 1)
> + break;
> + ret = io_register_clone_files(ctx, arg);
> + break;
> case IORING_REGISTER_ZCRX_IFQ:
> ret = -EINVAL;
> if (!arg || nr_args != 1)
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index 650303626be6..a598e5af4c0a 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -1303,6 +1303,155 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
> return ret;
> }
>
> +static int io_clone_file_node(struct io_ring_ctx *ctx,
> + struct io_rsrc_node *src_node,
> + int dst_index,
> + struct io_file_table *new_table)
> +{
> + struct io_rsrc_node *dst_node;
> + struct file *file;
> +
> + dst_node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE);
> + if (!dst_node)
> + return -ENOMEM;
> +
> + file = io_slot_file(src_node);
> + get_file(file);
> + io_fixed_file_set(dst_node, file);
> +
> + new_table->data.nodes[dst_index] = dst_node;
> + io_file_bitmap_set(new_table, dst_index);
> +
> + return 0;
> +}
> +
> +static int io_clone_files(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx,
> + struct io_uring_clone_files *arg)
> +{
> + struct io_file_table new_file_table;
> + unsigned int dst_nr = ctx->file_table.data.nr;
> + unsigned int src_nr = src_ctx->file_table.data.nr;
> + unsigned int new_nr, i;
> +
> + lockdep_assert_held(&ctx->uring_lock);
> + lockdep_assert_held(&src_ctx->uring_lock);
> +
> + if (ctx->user != src_ctx->user || ctx->mm_account != src_ctx->mm_account)
> + return -EINVAL;
I don't think it makes sense to check ->user here. But is mm_account
necessary either? How could you get the src_ctx from another process?
> +
> + if (dst_nr && !(arg->flags & IORING_REGISTER_DST_REPLACE))
> + return -EBUSY;
> +
> + if (!src_nr)
> + return -ENXIO;
> +
> + if (!arg->nr)
> + arg->nr = src_nr;
> + else if (arg->nr > src_nr)
> + return -EINVAL;
> +
> + if (check_add_overflow(arg->src_off, arg->nr, &i) || i > src_nr)
> + return -EINVAL;
> + if (check_add_overflow(arg->dst_off, arg->nr, &i))
> + return -EINVAL;
> +
> + new_nr = max(dst_nr, arg->dst_off + arg->nr);
> + if (new_nr > IORING_MAX_FIXED_FILES)
> + return -EINVAL;
> +
> + memset(&new_file_table, 0, sizeof(new_file_table));
> + if (!io_alloc_file_tables(ctx, &new_file_table, new_nr))
> + return -ENOMEM;
> +
> + /* Copy original nodes from before the cloned range */
> + for (i = 0; i < min(arg->dst_off, dst_nr); i++) {
> + struct io_rsrc_node *src_node = io_rsrc_node_lookup(&ctx->file_table.data, i);
> +
> + if (!src_node)
> + continue;
> + if (io_clone_file_node(ctx, src_node, i, &new_file_table))
> + goto out;
> + }
> +
> + /* Copy the actual cloned range from the source ring */
> + for (i = 0; i < arg->nr; i++) {
> + struct io_rsrc_node *src_node = io_rsrc_node_lookup(&src_ctx->file_table.data,
> + arg->src_off + i);
> +
> + if (!src_node)
> + continue;
> + if (io_clone_file_node(ctx, src_node, arg->dst_off + i, &new_file_table))
> + goto out;
> + }
> +
> + /* Copy original nodes from after the cloned range */
> + for (i = arg->dst_off + arg->nr; i < dst_nr; i++) {
> + struct io_rsrc_node *src_node = io_rsrc_node_lookup(&ctx->file_table.data, i);
> +
> + if (!src_node)
> + continue;
> + if (io_clone_file_node(ctx, src_node, i, &new_file_table))
> + goto out;
> + }
> +
> + /* free the old file table if there is any data present */
> + if (dst_nr)
> + io_free_file_tables(ctx, &ctx->file_table);
> +
> + WARN_ON_ONCE(ctx->file_table.data.nr);
> + ctx->file_table = new_file_table;
> + io_file_table_set_alloc_range(ctx, 0, ctx->file_table.data.nr);
> + return 0;
> +
> +out:
> + /* Error Path: Safely destroy whatever we partially built */
> + io_free_file_tables(ctx, &new_file_table);
> + return -ENOMEM;
> +}
> +
> +int io_register_clone_files(struct io_ring_ctx *ctx, void __user *arg)
> +{
> + struct io_uring_clone_files clone_arg;
> + struct io_ring_ctx *src_ctx;
> + bool registered_src;
> + struct file *file;
> + int ret;
> +
> + if (copy_from_user(&clone_arg, arg, sizeof(clone_arg)))
> + return -EFAULT;
> + if (clone_arg.flags &
> + ~(IORING_REGISTER_SRC_REGISTERED | IORING_REGISTER_DST_REPLACE))
> + return -EINVAL;
> +
> + if (memchr_inv(clone_arg.pad, 0, sizeof(clone_arg.pad)))
> + return -EINVAL;
> +
> + registered_src = (clone_arg.flags & IORING_REGISTER_SRC_REGISTERED) != 0;
This is better written as
registered_src = !!(clone_arg.flags & IORING_REGISTER_SRC_REGISTERED);
> + file = io_uring_ctx_get_file(clone_arg.src_fd, registered_src);
> + if (IS_ERR(file))
> + return PTR_ERR(file);
> +
> + src_ctx = file->private_data;
> + /* Same ring clone is not allowed */
> + if (src_ctx == ctx) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + mutex_unlock(&ctx->uring_lock);
> + lock_two_rings(ctx, src_ctx);
> +
> + ret = io_clone_files(ctx, src_ctx, &clone_arg);
> +
> +out:
> + if (src_ctx != ctx)
> + mutex_unlock(&src_ctx->uring_lock);
Make the mutex_unlock unconditionally above the out label. It is never
locked in the error context.
--
Gabriel Krisman Bertazi
^ permalink raw reply
* Re: [PATCH] setup: dynamically detect default huge page size
From: Gabriel Krisman Bertazi @ 2026-06-22 16:49 UTC (permalink / raw)
To: Prateek, io-uring; +Cc: Prateek
In-Reply-To: <20260620113609.123575-1-kprateek283@gmail.com>
Prateek <kprateek283@gmail.com> writes:
> Replaces the hardcoded 2MB huge page size with dynamic detection by
> parsing /proc/meminfo. This fixes no-mmap allocation failures on
> architectures with different default huge page sizes (like ARM64
> which often uses 512MB) or x86 systems configured for 1GB pages.
>
> - Safely parses /proc/meminfo without allocating memory.
> - Uses raw syscalls and manual byte-by-byte matching to maintain
> strict compatibility with CONFIG_NOLIBC builds (avoiding strstr).
> - Drops the MAP_HUGE_2MB mmap flag to allow the kernel to correctly
> apply the system's default huge page size.
> - Falls back safely to 2MB if /proc/meminfo is unreadable.
>
> Signed-off-by: Prateek <kprateek283@gmail.com>
> ---
> src/setup.c | 84 +++++++++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 68 insertions(+), 16 deletions(-)
>
> diff --git a/src/setup.c b/src/setup.c
> index ea6f11fd..46e20e0b 100644
> --- a/src/setup.c
> +++ b/src/setup.c
> @@ -220,15 +220,67 @@ __cold int io_uring_ring_dontfork(struct io_uring *ring)
> return 0;
> }
>
> -#ifndef MAP_HUGE_SHIFT
> -#define MAP_HUGE_SHIFT 26
> -#endif
> -#ifndef MAP_HUGE_2MB
> -#define MAP_HUGE_2MB (21U << MAP_HUGE_SHIFT)
> -#endif
>
> -/* FIXME */
> -static size_t huge_page_size = 2 * 1024 * 1024;
> +static size_t get_huge_page_size(void)
> +{
> + static size_t hps;
Please, initialize your static variables to makes it readable. I.e,
should be initialized it to 2MB.
> + size_t ret = 2 * 1024 * 1024; /* fallback: 2MB */
ret redundant with hps, could go away.
> + char buf[4096];
> + char *p, *end;
> + unsigned long val;
> + ssize_t n;
> + int fd;
> +
> + if (hps)
> + return hps;
> +
> + fd = __sys_open("/proc/meminfo", O_RDONLY, 0);
> + if (fd < 0)
> + goto out;
> +
> + n = __sys_read(fd, buf, sizeof(buf) - 1);
> + __sys_close(fd);
> + if (n <= 0)
> + goto out;
> + buf[n] = '\0';
> +
> + /*
> + * Scan line-by-line for "Hugepagesize:". We avoid strstr() and
> + * memcmp() because they are not available in CONFIG_NOLIBC builds.
> + */
> + p = buf;
> + end = buf + n;
> + while (p < end) {
> + /* Check if this line starts with "Hugepagesize:" (13 chars) */
> + if (p + 13 <= end &&
> + p[0] == 'H' && p[1] == 'u' && p[2] == 'g' &&
> + p[3] == 'e' && p[4] == 'p' && p[5] == 'a' &&
> + p[6] == 'g' && p[7] == 'e' && p[8] == 's' &&
> + p[9] == 'i' && p[10] == 'z' && p[11] == 'e' &&
> + p[12] == ':') {
This is unreadable. It would be much better as a two line loop
iterating over two strings... But then, why not create it a couple line
implementation of memcmp and atoi in arch/generic/lib.h instead?
> + p += 13;
> + while (p < end && (*p == ' ' || *p == '\t'))
> + p++;
> + val = 0;
> + while (p < end && *p >= '0' && *p <= '9') {
> + val = val * 10 + (*p - '0');
> + p++;
> + }
> + if (val)
> + ret = val * 1024; /* kB -> bytes */
> + break;
> + }
> + /* Advance to next line */
> + while (p < end && *p != '\n')
> + p++;
> + if (p < end)
> + p++;
> + }
> +out:
> + hps = ret;
> + return hps;
> +}
This function should go in arch/generic/lib.h too. A hint is the
get_page_size is already there.
That said, we should be looking into something like the kernel's nolibc
instead of reinventing libc.
--
Gabriel Krisman Bertazi
^ permalink raw reply
* Re: [PATCH] io_uring/memmap: bound io_pin_pages() by page array byte size
From: Gabriel Krisman Bertazi @ 2026-06-22 14:11 UTC (permalink / raw)
To: Deepanshu Kartikey, axboe
Cc: io-uring, linux-kernel, Deepanshu Kartikey,
syzbot+f99b00a963915b6b52c6
In-Reply-To: <20260621012933.50571-1-kartikey406@gmail.com>
Deepanshu Kartikey <kartikey406@gmail.com> writes:
> io_pin_pages() checks that nr_pages does not exceed INT_MAX, then
> allocates a struct page * array of nr_pages entries. kvmalloc() limits
> allocations to INT_MAX bytes, but the check counts pages, not bytes.
> On 64-bit each entry is 8 bytes, so the array hits the INT_MAX byte
> limit at INT_MAX / sizeof(struct page *) pages, well before the page
> count check fires.
>
> Since commit b4e41050b212 ("io_uring/rsrc: raise registered buffer 1GB
> limit") raised the per-buffer cap to 1TB, a buffer near that cap maps
> ~2^28 pages, making the array allocation exceed INT_MAX bytes. This
> passes the page count check, reaches kvmalloc(), and triggers the
> WARN_ON_ONCE() for oversized allocations in __kvmalloc_node_noprof().
>
> Check nr_pages against INT_MAX / sizeof(struct page *) so the buffer is
> rejected with -EOVERFLOW before the allocation is attempted.
>
> Reported-by: syzbot+f99b00a963915b6b52c6@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=f99b00a963915b6b52c6
> Fixes: b4e41050b212 ("io_uring/rsrc: raise registered buffer 1GB limit")
> Tested-by: syzbot+f99b00a963915b6b52c6@syzkaller.appspotmail.com
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Looks good, feel free to add:
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
> ---
> io_uring/memmap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/io_uring/memmap.c b/io_uring/memmap.c
> index 4f9b439319c4..da1f6c5d07f8 100644
> --- a/io_uring/memmap.c
> +++ b/io_uring/memmap.c
> @@ -53,7 +53,7 @@ struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages)
> nr_pages = end - start;
> if (WARN_ON_ONCE(!nr_pages))
> return ERR_PTR(-EINVAL);
> - if (WARN_ON_ONCE(nr_pages > INT_MAX))
> + if (nr_pages > INT_MAX / sizeof(struct page *))
> return ERR_PTR(-EOVERFLOW);
>
> pages = kvmalloc_objs(struct page *, nr_pages, GFP_KERNEL_ACCOUNT);
> --
> 2.43.0
>
--
Gabriel Krisman Bertazi
^ permalink raw reply
* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: David Hildenbrand (Arm) @ 2026-06-22 11:27 UTC (permalink / raw)
To: Alexei Starovoitov, Kaitao Cheng
Cc: Andrew Morton, Jens Axboe, Tejun Heo, Alexander Viro,
Christian Brauner, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Johannes Weiner, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Juri Lelli, Vincent Guittot, Paul Moore, Andy Shevchenko,
Paul E. McKenney, Shakeel Butt, Christian König,
David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
Philipp Stanner, linux-block, LKML,
open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
io-uring, audit, bpf, Network Development, dri-devel,
linux-perf-use., linux-trace-kernel, kexec, live-patching,
linux-modules, Linux Crypto Mailing List, Linux Power Management,
rcu, sched-ext, linux-mm, virtualization, damon,
clang-built-linux, chengkaitao
In-Reply-To: <CAADnVQJmPWFT01b7DuLdtafv=8FyB84GYHNZ8zSTck+9Aw0JpA@mail.gmail.com>
On 6/22/26 07:28, Alexei Starovoitov wrote:
> On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
>>
>> From: chengkaitao <chengkaitao@kylinos.cn>
>>
>> The list_for_each*_safe() helpers are used when the loop body may remove
>> the current entry. Their current interface, however, forces every caller
>> to define a temporary cursor outside the macro and pass it in, even when
>> the caller never uses that cursor directly. For most call sites this
>> extra cursor is just boilerplate required by the macro implementation.
>>
>> This is awkward because the saved next pointer is an internal detail of
>> the iteration. Callers that only remove or move the current entry do not
>> need to spell it out.
>>
>> The _safe() suffix has also caused confusion. Christian Koenig pointed
>> out that the name is easy to read as a thread-safe variant, especially
>> for beginners, even though it only means that the iterator keeps enough
>> state to tolerate removal of the current entry. He suggested _mutable()
>> as a clearer description of what the loop permits.
>>
>> Add *_mutable() iterator variants for list, hlist and llist. The new
>> helpers are variadic and support both forms. In the common case, the
>> caller omits the temporary cursor and the macro creates a unique internal
>> cursor with typeof(pos) and __UNIQUE_ID(). If a loop really needs an
>> explicit temporary cursor, the caller can still pass it and the helper
>> keeps the existing *_safe() behaviour.
>>
>> For example, a call site may use the shorter form:
>>
>> list_for_each_entry_mutable(pos, head, member)
>>
>> or keep the explicit temporary cursor form:
>>
>> list_for_each_entry_mutable(pos, tmp, head, member)
>>
>> The existing *_safe() helpers remain available for compatibility. This
>> series only converts users in mm, block, kernel, init and io_uring. If
>> this approach looks acceptable, the remaining users can be converted in
>> follow-up series.
>>
>> Changes in v3 (Christian König, Andy Shevchenko):
>> - Convert safe list walks to mutable iterators
>>
>> Changes in v2 (Muchun Song, Andy Shevchenko):
>> - Drop the list_for_each_entry_mutable*() helpers from v1 and make the
>> cursor change directly in the existing list_for_each_entry*() helpers.
>> - Open-code special list walks that rely on updating the loop cursor in
>> the body, preserving their existing traversal semantics.
>>
>> Link to v2:
>> https://lore.kernel.org/all/20260609061347.93688-1-kaitao.cheng@linux.dev/
>>
>> Link to v1:
>> https://lore.kernel.org/all/20260529082149.76764-1-kaitao.cheng@linux.dev/
>>
>> Kaitao Cheng (7):
>> list: Add mutable iterator variants
>> llist: Add mutable iterator variants
>> mm: Use mutable list iterators
>> block: Use mutable list iterators
>> kernel: Use mutable list iterators
>> initramfs: Use mutable list iterator
>> io_uring: Use mutable list iterators
>>
>> block/bfq-iosched.c | 17 +-
>> block/blk-cgroup.c | 12 +-
>> block/blk-flush.c | 4 +-
>> block/blk-iocost.c | 18 +-
>> block/blk-mq.c | 8 +-
>> block/blk-throttle.c | 4 +-
>> block/kyber-iosched.c | 4 +-
>> block/partitions/ldm.c | 8 +-
>> block/sed-opal.c | 4 +-
>> include/linux/list.h | 269 ++++++++++++++++++++++++----
>> include/linux/llist.h | 81 +++++++--
>> init/initramfs.c | 5 +-
>> io_uring/cancel.c | 6 +-
>> io_uring/poll.c | 3 +-
>> io_uring/rw.c | 4 +-
>> io_uring/timeout.c | 8 +-
>> io_uring/uring_cmd.c | 3 +-
>> kernel/audit_tree.c | 4 +-
>> kernel/audit_watch.c | 16 +-
>> kernel/auditfilter.c | 4 +-
>> kernel/auditsc.c | 4 +-
>> kernel/bpf/arena.c | 10 +-
>> kernel/bpf/arraymap.c | 8 +-
>> kernel/bpf/bpf_local_storage.c | 3 +-
>> kernel/bpf/bpf_lru_list.c | 25 ++-
>> kernel/bpf/btf.c | 18 +-
>> kernel/bpf/cgroup.c | 7 +-
>> kernel/bpf/cpumap.c | 4 +-
>> kernel/bpf/devmap.c | 10 +-
>> kernel/bpf/helpers.c | 8 +-
>> kernel/bpf/local_storage.c | 4 +-
>> kernel/bpf/memalloc.c | 16 +-
>> kernel/bpf/offload.c | 8 +-
>> kernel/bpf/states.c | 4 +-
>> kernel/bpf/stream.c | 4 +-
>> kernel/bpf/verifier.c | 6 +-
>> kernel/cgroup/cgroup-v1.c | 4 +-
>> kernel/cgroup/cgroup.c | 54 +++---
>> kernel/cgroup/dmem.c | 12 +-
>> kernel/cgroup/rdma.c | 8 +-
>> kernel/events/core.c | 44 +++--
>> kernel/events/uprobes.c | 12 +-
>> kernel/exit.c | 8 +-
>> kernel/fail_function.c | 4 +-
>> kernel/gcov/clang.c | 4 +-
>> kernel/irq_work.c | 4 +-
>> kernel/kexec_core.c | 4 +-
>> kernel/kprobes.c | 16 +-
>> kernel/livepatch/core.c | 4 +-
>> kernel/livepatch/core.h | 4 +-
>> kernel/liveupdate/kho_block.c | 4 +-
>> kernel/liveupdate/luo_flb.c | 4 +-
>> kernel/locking/rwsem.c | 2 +-
>> kernel/locking/test-ww_mutex.c | 2 +-
>> kernel/module/main.c | 11 +-
>> kernel/padata.c | 4 +-
>> kernel/power/snapshot.c | 8 +-
>> kernel/power/wakelock.c | 4 +-
>> kernel/printk/printk.c | 11 +-
>> kernel/ptrace.c | 4 +-
>> kernel/rcu/rcutorture.c | 3 +-
>> kernel/rcu/tasks.h | 9 +-
>> kernel/rcu/tree.c | 6 +-
>> kernel/resource.c | 4 +-
>> kernel/sched/core.c | 4 +-
>> kernel/sched/ext.c | 22 +--
>> kernel/sched/fair.c | 28 +--
>> kernel/sched/topology.c | 4 +-
>> kernel/sched/wait.c | 4 +-
>> kernel/seccomp.c | 4 +-
>> kernel/signal.c | 11 +-
>> kernel/smp.c | 4 +-
>> kernel/taskstats.c | 8 +-
>> kernel/time/clockevents.c | 6 +-
>> kernel/time/clocksource.c | 4 +-
>> kernel/time/posix-cpu-timers.c | 4 +-
>> kernel/time/posix-timers.c | 3 +-
>> kernel/torture.c | 3 +-
>> kernel/trace/bpf_trace.c | 4 +-
>> kernel/trace/ftrace.c | 49 +++--
>> kernel/trace/ring_buffer.c | 25 ++-
>> kernel/trace/trace.c | 12 +-
>> kernel/trace/trace_dynevent.c | 6 +-
>> kernel/trace/trace_dynevent.h | 5 +-
>> kernel/trace/trace_events.c | 35 ++--
>> kernel/trace/trace_events_filter.c | 4 +-
>> kernel/trace/trace_events_hist.c | 8 +-
>> kernel/trace/trace_events_trigger.c | 17 +-
>> kernel/trace/trace_events_user.c | 16 +-
>> kernel/trace/trace_stat.c | 4 +-
>> kernel/user-return-notifier.c | 3 +-
>> kernel/workqueue.c | 16 +-
>> mm/backing-dev.c | 8 +-
>> mm/balloon.c | 8 +-
>> mm/cma.c | 4 +-
>> mm/compaction.c | 4 +-
>> mm/damon/core.c | 4 +-
>> mm/damon/sysfs-schemes.c | 4 +-
>> mm/dmapool.c | 4 +-
>> mm/huge_memory.c | 8 +-
>> mm/hugetlb.c | 56 +++---
>> mm/hugetlb_vmemmap.c | 16 +-
>> mm/khugepaged.c | 14 +-
>> mm/kmemleak.c | 7 +-
>> mm/ksm.c | 25 +--
>> mm/list_lru.c | 4 +-
>> mm/memcontrol-v1.c | 8 +-
>> mm/memory-failure.c | 12 +-
>> mm/memory-tiers.c | 4 +-
>> mm/migrate.c | 23 ++-
>> mm/mmu_notifier.c | 9 +-
>> mm/page_alloc.c | 8 +-
>> mm/page_reporting.c | 2 +-
>> mm/percpu.c | 11 +-
>> mm/pgtable-generic.c | 4 +-
>> mm/rmap.c | 10 +-
>> mm/shmem.c | 9 +-
>> mm/slab_common.c | 14 +-
>> mm/slub.c | 33 ++--
>> mm/swapfile.c | 4 +-
>> mm/userfaultfd.c | 12 +-
>> mm/vmalloc.c | 24 +--
>> mm/vmscan.c | 7 +-
>> mm/zsmalloc.c | 4 +-
>> 124 files changed, 875 insertions(+), 681 deletions(-)
>
> Not sure what you were thinking, but this diff stat
> is not landable.
Agreed. If we decide we want this, I guess we should target per-subsystem
conversions.
If this goes through the MM tree, I would even appreciate doing this on a per-MM
component granularity.
(unless we have some magic "Linus converts all of them" script, which I doubt we
will have)
Is there a way forward to replace list_for_each_*_safe entirely, possibly just
reusing the old name but simply the parameter?
--
Cheers,
David
^ permalink raw reply
* Re: [PATCH v3 0/7] Prepare mutable list iterators to cache cursor state
From: Andy Shevchenko @ 2026-06-22 10:46 UTC (permalink / raw)
To: Kaitao Cheng
Cc: Alexei Starovoitov, Andrew Morton, David Hildenbrand, Jens Axboe,
Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Paul E. McKenney, Shakeel Butt, Christian König,
David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
Philipp Stanner, linux-block, LKML,
open list:CONTROL GROUP (CGROUP), linux-ntfs-dev, Linux-Fsdevel,
io-uring, audit, bpf, Network Development, dri-devel,
linux-perf-use., linux-trace-kernel, kexec, live-patching,
linux-modules, Linux Crypto Mailing List, Linux Power Management,
rcu, sched-ext, linux-mm, virtualization, damon,
clang-built-linux, chengkaitao, Muchun Song
In-Reply-To: <8c8f1849-86d3-4c69-be27-30bbdffdf616@linux.dev>
On Mon, Jun 22, 2026 at 02:15:01PM +0800, Kaitao Cheng wrote:
> 在 2026/6/22 13:28, Alexei Starovoitov 写道:
> > On Sun, Jun 21, 2026 at 9:06 PM Kaitao Cheng <kaitao.cheng@linux.dev> wrote:
...
> >> block/bfq-iosched.c | 17 +-
> >> block/blk-cgroup.c | 12 +-
> >> block/blk-flush.c | 4 +-
> >> block/blk-iocost.c | 18 +-
> >> block/blk-mq.c | 8 +-
> >> block/blk-throttle.c | 4 +-
> >> block/kyber-iosched.c | 4 +-
> >> block/partitions/ldm.c | 8 +-
> >> block/sed-opal.c | 4 +-
> >> include/linux/list.h | 269 ++++++++++++++++++++++++----
> >> include/linux/llist.h | 81 +++++++--
> >> init/initramfs.c | 5 +-
> >> io_uring/cancel.c | 6 +-
> >> io_uring/poll.c | 3 +-
> >> io_uring/rw.c | 4 +-
> >> io_uring/timeout.c | 8 +-
> >> io_uring/uring_cmd.c | 3 +-
> >> kernel/audit_tree.c | 4 +-
> >> kernel/audit_watch.c | 16 +-
> >> kernel/auditfilter.c | 4 +-
> >> kernel/auditsc.c | 4 +-
> >> kernel/bpf/arena.c | 10 +-
> >> kernel/bpf/arraymap.c | 8 +-
> >> kernel/bpf/bpf_local_storage.c | 3 +-
> >> kernel/bpf/bpf_lru_list.c | 25 ++-
> >> kernel/bpf/btf.c | 18 +-
> >> kernel/bpf/cgroup.c | 7 +-
> >> kernel/bpf/cpumap.c | 4 +-
> >> kernel/bpf/devmap.c | 10 +-
> >> kernel/bpf/helpers.c | 8 +-
> >> kernel/bpf/local_storage.c | 4 +-
> >> kernel/bpf/memalloc.c | 16 +-
> >> kernel/bpf/offload.c | 8 +-
> >> kernel/bpf/states.c | 4 +-
> >> kernel/bpf/stream.c | 4 +-
> >> kernel/bpf/verifier.c | 6 +-
> >> kernel/cgroup/cgroup-v1.c | 4 +-
> >> kernel/cgroup/cgroup.c | 54 +++---
> >> kernel/cgroup/dmem.c | 12 +-
> >> kernel/cgroup/rdma.c | 8 +-
> >> kernel/events/core.c | 44 +++--
> >> kernel/events/uprobes.c | 12 +-
> >> kernel/exit.c | 8 +-
> >> kernel/fail_function.c | 4 +-
> >> kernel/gcov/clang.c | 4 +-
> >> kernel/irq_work.c | 4 +-
> >> kernel/kexec_core.c | 4 +-
> >> kernel/kprobes.c | 16 +-
> >> kernel/livepatch/core.c | 4 +-
> >> kernel/livepatch/core.h | 4 +-
> >> kernel/liveupdate/kho_block.c | 4 +-
> >> kernel/liveupdate/luo_flb.c | 4 +-
> >> kernel/locking/rwsem.c | 2 +-
> >> kernel/locking/test-ww_mutex.c | 2 +-
> >> kernel/module/main.c | 11 +-
> >> kernel/padata.c | 4 +-
> >> kernel/power/snapshot.c | 8 +-
> >> kernel/power/wakelock.c | 4 +-
> >> kernel/printk/printk.c | 11 +-
> >> kernel/ptrace.c | 4 +-
> >> kernel/rcu/rcutorture.c | 3 +-
> >> kernel/rcu/tasks.h | 9 +-
> >> kernel/rcu/tree.c | 6 +-
> >> kernel/resource.c | 4 +-
> >> kernel/sched/core.c | 4 +-
> >> kernel/sched/ext.c | 22 +--
> >> kernel/sched/fair.c | 28 +--
> >> kernel/sched/topology.c | 4 +-
> >> kernel/sched/wait.c | 4 +-
> >> kernel/seccomp.c | 4 +-
> >> kernel/signal.c | 11 +-
> >> kernel/smp.c | 4 +-
> >> kernel/taskstats.c | 8 +-
> >> kernel/time/clockevents.c | 6 +-
> >> kernel/time/clocksource.c | 4 +-
> >> kernel/time/posix-cpu-timers.c | 4 +-
> >> kernel/time/posix-timers.c | 3 +-
> >> kernel/torture.c | 3 +-
> >> kernel/trace/bpf_trace.c | 4 +-
> >> kernel/trace/ftrace.c | 49 +++--
> >> kernel/trace/ring_buffer.c | 25 ++-
> >> kernel/trace/trace.c | 12 +-
> >> kernel/trace/trace_dynevent.c | 6 +-
> >> kernel/trace/trace_dynevent.h | 5 +-
> >> kernel/trace/trace_events.c | 35 ++--
> >> kernel/trace/trace_events_filter.c | 4 +-
> >> kernel/trace/trace_events_hist.c | 8 +-
> >> kernel/trace/trace_events_trigger.c | 17 +-
> >> kernel/trace/trace_events_user.c | 16 +-
> >> kernel/trace/trace_stat.c | 4 +-
> >> kernel/user-return-notifier.c | 3 +-
> >> kernel/workqueue.c | 16 +-
> >> mm/backing-dev.c | 8 +-
> >> mm/balloon.c | 8 +-
> >> mm/cma.c | 4 +-
> >> mm/compaction.c | 4 +-
> >> mm/damon/core.c | 4 +-
> >> mm/damon/sysfs-schemes.c | 4 +-
> >> mm/dmapool.c | 4 +-
> >> mm/huge_memory.c | 8 +-
> >> mm/hugetlb.c | 56 +++---
> >> mm/hugetlb_vmemmap.c | 16 +-
> >> mm/khugepaged.c | 14 +-
> >> mm/kmemleak.c | 7 +-
> >> mm/ksm.c | 25 +--
> >> mm/list_lru.c | 4 +-
> >> mm/memcontrol-v1.c | 8 +-
> >> mm/memory-failure.c | 12 +-
> >> mm/memory-tiers.c | 4 +-
> >> mm/migrate.c | 23 ++-
> >> mm/mmu_notifier.c | 9 +-
> >> mm/page_alloc.c | 8 +-
> >> mm/page_reporting.c | 2 +-
> >> mm/percpu.c | 11 +-
> >> mm/pgtable-generic.c | 4 +-
> >> mm/rmap.c | 10 +-
> >> mm/shmem.c | 9 +-
> >> mm/slab_common.c | 14 +-
> >> mm/slub.c | 33 ++--
> >> mm/swapfile.c | 4 +-
> >> mm/userfaultfd.c | 12 +-
> >> mm/vmalloc.c | 24 +--
> >> mm/vmscan.c | 7 +-
> >> mm/zsmalloc.c | 4 +-
> >> 124 files changed, 875 insertions(+), 681 deletions(-)
> >
> > Not sure what you were thinking, but this diff stat
> > is not landable.
>
> [PATCH v3 1/7] and [PATCH v3 2/7] contain the main logic and can
> be merged directly. They are also compatible with the old API.
> [PATCH v3 3/7] through [PATCH v3 7/7] are just simple interface
> replacements and do not change any functional logic. They can be
> left unmerged for now; individual modules can pick them up later
> if needed.
>
> In v2, Andy Shevchenko mentioned: "If it's done by Linus himself
> during the day when he prepares -rc1, it's fine."
Yes, but you need to get his blessing first to go with this.
Have you communicated with him on this?
> Even so, the
> changes in this patch series are indeed quite large and touch
> almost every subsystem. I have only converted part of them for
> now, so I wanted to send this out first and see what people think.
That's why it's better to provide a script to convert (e.g., coccinelle)
instead of tons of patches.
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: [PATCH v3 1/7] list: Add mutable iterator variants
From: Christian König @ 2026-06-22 8:51 UTC (permalink / raw)
To: Kaitao Cheng, Andrew Morton, David Hildenbrand, Jens Axboe,
Tejun Heo, Alexander Viro, Christian Brauner, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Johannes Weiner, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Thomas Gleixner, Juri Lelli, Vincent Guittot, Paul Moore,
Andy Shevchenko, Paul E. McKenney, Shakeel Butt
Cc: David Howells, Simona Vetter, Randy Dunlap, Luca Ceresoli,
Philipp Stanner, linux-block, linux-kernel, cgroups,
linux-ntfs-dev, linux-fsdevel, io-uring, audit, bpf, netdev,
dri-devel, linux-perf-users, linux-trace-kernel, kexec,
live-patching, linux-modules, linux-crypto, linux-pm, rcu,
sched-ext, linux-mm, virtualization, damon, llvm, Kaitao Cheng
In-Reply-To: <20260622040533.29824-2-kaitao.cheng@linux.dev>
On 6/22/26 06:05, Kaitao Cheng wrote:
> From: Kaitao Cheng <chengkaitao@kylinos.cn>
>
> The list_for_each*_safe() helpers are used when the loop body may
> remove the current entry. Their API exposes the temporary cursor at
> every call site, even though most users only need it for the iterator
> implementation and never reference it in the loop body.
>
> Add *_mutable() variants for list and hlist iteration. The new helpers
> support both forms: callers may keep passing an explicit temporary cursor
> when they need to inspect or reset it, or omit it and let the helper use
> a unique internal cursor.
That sounds like a bad idea to me. The macro should really be doing one job and that as best as it can.
> This makes call sites that only mutate the list through the current entry
> less noisy, while keeping the existing *_safe() helpers available for
> compatibility.
This can be perfectly used for code that which really needs the separate variable for the next entry.
Regards,
Christian.
>
> Signed-off-by: Kaitao Cheng <chengkaitao@kylinos.cn>
> ---
> include/linux/list.h | 269 +++++++++++++++++++++++++++++++++++++------
> 1 file changed, 231 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/list.h b/include/linux/list.h
> index 09d979976b3b..1081def7cea9 100644
> --- a/include/linux/list.h
> +++ b/include/linux/list.h
> @@ -7,6 +7,7 @@
> #include <linux/stddef.h>
> #include <linux/poison.h>
> #include <linux/const.h>
> +#include <linux/args.h>
>
> #include <asm/barrier.h>
>
> @@ -763,28 +764,72 @@ static inline void list_splice_tail_init(struct list_head *list,
> #define list_for_each_prev(pos, head) \
> for (pos = (head)->prev; !list_is_head(pos, (head)); pos = pos->prev)
>
> -/**
> - * list_for_each_safe - iterate over a list safe against removal of list entry
> - * @pos: the &struct list_head to use as a loop cursor.
> - * @n: another &struct list_head to use as temporary storage
> - * @head: the head for your list.
> +/*
> + * list_for_each_safe is an old interface, use list_for_each_mutable instead.
> */
> #define list_for_each_safe(pos, n, head) \
> for (pos = (head)->next, n = pos->next; \
> !list_is_head(pos, (head)); \
> pos = n, n = pos->next)
>
> +#define __list_for_each_mutable_internal(pos, tmp, head) \
> + for (typeof(pos) tmp = (pos = (head)->next)->next; \
> + !list_is_head(pos, (head)); \
> + pos = tmp, tmp = pos->next)
> +
> +#define __list_for_each_mutable1(pos, head) \
> + __list_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
> +
> +#define __list_for_each_mutable2(pos, next, head) \
> + list_for_each_safe(pos, next, head)
> +
> /**
> - * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
> + * list_for_each_mutable - iterate over a list safe against entry removal
> * @pos: the &struct list_head to use as a loop cursor.
> - * @n: another &struct list_head to use as temporary storage
> - * @head: the head for your list.
> + * @...: either (head) or (next, head)
> + *
> + * next: another &struct list_head to use as optional temporary storage.
> + * The temporary cursor is internal unless explicitly supplied by
> + * the caller.
> + * head: the head for your list.
> + */
> +#define list_for_each_mutable(pos, ...) \
> + CONCATENATE(__list_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
> + (pos, __VA_ARGS__)
> +
> +/*
> + * list_for_each_prev_safe is an old interface, use list_for_each_prev_mutable instead.
> */
> #define list_for_each_prev_safe(pos, n, head) \
> for (pos = (head)->prev, n = pos->prev; \
> !list_is_head(pos, (head)); \
> pos = n, n = pos->prev)
>
> +#define __list_for_each_prev_mutable_internal(pos, tmp, head) \
> + for (typeof(pos) tmp = (pos = (head)->prev)->prev; \
> + !list_is_head(pos, (head)); \
> + pos = tmp, tmp = pos->prev)
> +
> +#define __list_for_each_prev_mutable1(pos, head) \
> + __list_for_each_prev_mutable_internal(pos, __UNIQUE_ID(prev), head)
> +
> +#define __list_for_each_prev_mutable2(pos, prev, head) \
> + list_for_each_prev_safe(pos, prev, head)
> +
> +/**
> + * list_for_each_prev_mutable - iterate over a list backwards safe against entry removal
> + * @pos: the &struct list_head to use as a loop cursor.
> + * @...: either (head) or (prev, head)
> + *
> + * prev: another &struct list_head to use as optional temporary storage.
> + * The temporary cursor is internal unless explicitly supplied by
> + * the caller.
> + * head: the head for your list.
> + */
> +#define list_for_each_prev_mutable(pos, ...) \
> + CONCATENATE(__list_for_each_prev_mutable, COUNT_ARGS(__VA_ARGS__)) \
> + (pos, __VA_ARGS__)
> +
> /**
> * list_count_nodes - count nodes in the list
> * @head: the head for your list.
> @@ -895,12 +940,8 @@ static inline size_t list_count_nodes(struct list_head *head)
> for (; !list_entry_is_head(pos, head, member); \
> pos = list_prev_entry(pos, member))
>
> -/**
> - * list_for_each_entry_safe - iterate over list of given type safe against removal of list entry
> - * @pos: the type * to use as a loop cursor.
> - * @n: another type * to use as temporary storage
> - * @head: the head for your list.
> - * @member: the name of the list_head within the struct.
> +/*
> + * list_for_each_entry_safe is an old interface, use list_for_each_entry_mutable instead.
> */
> #define list_for_each_entry_safe(pos, n, head, member) \
> for (pos = list_first_entry(head, typeof(*pos), member), \
> @@ -908,15 +949,36 @@ static inline size_t list_count_nodes(struct list_head *head)
> !list_entry_is_head(pos, head, member); \
> pos = n, n = list_next_entry(n, member))
>
> +#define __list_for_each_entry_mutable_internal(pos, tmp, head, member) \
> + for (typeof(pos) tmp = list_next_entry(pos = \
> + list_first_entry(head, typeof(*pos), member), member); \
> + !list_entry_is_head(pos, head, member); \
> + pos = tmp, tmp = list_next_entry(tmp, member))
> +
> +#define __list_for_each_entry_mutable2(pos, head, member) \
> + __list_for_each_entry_mutable_internal(pos, __UNIQUE_ID(next), head, member)
> +
> +#define __list_for_each_entry_mutable3(pos, next, head, member) \
> + list_for_each_entry_safe(pos, next, head, member)
> +
> /**
> - * list_for_each_entry_safe_continue - continue list iteration safe against removal
> + * list_for_each_entry_mutable - iterate over a list safe against entry removal
> * @pos: the type * to use as a loop cursor.
> - * @n: another type * to use as temporary storage
> - * @head: the head for your list.
> - * @member: the name of the list_head within the struct.
> + * @...: either (head, member) or (next, head, member)
> *
> - * Iterate over list of given type, continuing after current point,
> - * safe against removal of list entry.
> + * next: another type * to use as optional temporary storage. The
> + * temporary cursor is internal unless explicitly supplied by the
> + * caller.
> + * head: the head for your list.
> + * member: the name of the list_head within the struct.
> + */
> +#define list_for_each_entry_mutable(pos, ...) \
> + CONCATENATE(__list_for_each_entry_mutable, COUNT_ARGS(__VA_ARGS__)) \
> + (pos, __VA_ARGS__)
> +
> +/*
> + * list_for_each_entry_safe_continue is an old interface,
> + * use list_for_each_entry_mutable_continue instead.
> */
> #define list_for_each_entry_safe_continue(pos, n, head, member) \
> for (pos = list_next_entry(pos, member), \
> @@ -924,30 +986,79 @@ static inline size_t list_count_nodes(struct list_head *head)
> !list_entry_is_head(pos, head, member); \
> pos = n, n = list_next_entry(n, member))
>
> +#define __list_for_each_entry_mutable_continue_internal(pos, tmp, head, member) \
> + for (typeof(pos) tmp = list_next_entry(pos = \
> + list_next_entry(pos, member), member); \
> + !list_entry_is_head(pos, head, member); \
> + pos = tmp, tmp = list_next_entry(tmp, member))
> +
> +#define __list_for_each_entry_mutable_continue2(pos, head, member) \
> + __list_for_each_entry_mutable_continue_internal(pos, \
> + __UNIQUE_ID(next), head, member)
> +
> +#define __list_for_each_entry_mutable_continue3(pos, next, head, member) \
> + list_for_each_entry_safe_continue(pos, next, head, member)
> +
> /**
> - * list_for_each_entry_safe_from - iterate over list from current point safe against removal
> + * list_for_each_entry_mutable_continue - continue list iteration safe against removal
> * @pos: the type * to use as a loop cursor.
> - * @n: another type * to use as temporary storage
> - * @head: the head for your list.
> - * @member: the name of the list_head within the struct.
> + * @...: either (head, member) or (next, head, member)
> *
> - * Iterate over list of given type from current point, safe against
> - * removal of list entry.
> + * next: another type * to use as optional temporary storage. The
> + * temporary cursor is internal unless explicitly supplied by the
> + * caller.
> + * head: the head for your list.
> + * member: the name of the list_head within the struct.
> + *
> + * Iterate over list of given type, continuing after current point,
> + * safe against removal of list entry.
> + */
> +#define list_for_each_entry_mutable_continue(pos, ...) \
> + CONCATENATE(__list_for_each_entry_mutable_continue, \
> + COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
> +
> +/*
> + * list_for_each_entry_safe_from is an old interface,
> + * use list_for_each_entry_mutable_from instead.
> */
> #define list_for_each_entry_safe_from(pos, n, head, member) \
> for (n = list_next_entry(pos, member); \
> !list_entry_is_head(pos, head, member); \
> pos = n, n = list_next_entry(n, member))
>
> +#define __list_for_each_entry_mutable_from_internal(pos, tmp, head, member) \
> + for (typeof(pos) tmp = list_next_entry(pos, member); \
> + !list_entry_is_head(pos, head, member); \
> + pos = tmp, tmp = list_next_entry(tmp, member))
> +
> +#define __list_for_each_entry_mutable_from2(pos, head, member) \
> + __list_for_each_entry_mutable_from_internal(pos, \
> + __UNIQUE_ID(next), head, member)
> +
> +#define __list_for_each_entry_mutable_from3(pos, next, head, member) \
> + list_for_each_entry_safe_from(pos, next, head, member)
> +
> /**
> - * list_for_each_entry_safe_reverse - iterate backwards over list safe against removal
> + * list_for_each_entry_mutable_from - iterate over list from current point safe against removal
> * @pos: the type * to use as a loop cursor.
> - * @n: another type * to use as temporary storage
> - * @head: the head for your list.
> - * @member: the name of the list_head within the struct.
> + * @...: either (head, member) or (next, head, member)
> *
> - * Iterate backwards over list of given type, safe against removal
> - * of list entry.
> + * next: another type * to use as optional temporary storage. The
> + * temporary cursor is internal unless explicitly supplied by the
> + * caller.
> + * head: the head for your list.
> + * member: the name of the list_head within the struct.
> + *
> + * Iterate over list of given type from current point, safe against
> + * removal of list entry.
> + */
> +#define list_for_each_entry_mutable_from(pos, ...) \
> + CONCATENATE(__list_for_each_entry_mutable_from, \
> + COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
> +
> +/*
> + * list_for_each_entry_safe_reverse is an old interface,
> + * use list_for_each_entry_mutable_reverse instead.
> */
> #define list_for_each_entry_safe_reverse(pos, n, head, member) \
> for (pos = list_last_entry(head, typeof(*pos), member), \
> @@ -955,6 +1066,37 @@ static inline size_t list_count_nodes(struct list_head *head)
> !list_entry_is_head(pos, head, member); \
> pos = n, n = list_prev_entry(n, member))
>
> +#define __list_for_each_entry_mutable_reverse_internal(pos, tmp, head, member) \
> + for (typeof(pos) tmp = list_prev_entry(pos = \
> + list_last_entry(head, typeof(*pos), member), member); \
> + !list_entry_is_head(pos, head, member); \
> + pos = tmp, tmp = list_prev_entry(tmp, member))
> +
> +#define __list_for_each_entry_mutable_reverse2(pos, head, member) \
> + __list_for_each_entry_mutable_reverse_internal(pos, \
> + __UNIQUE_ID(prev), head, member)
> +
> +#define __list_for_each_entry_mutable_reverse3(pos, prev, head, member) \
> + list_for_each_entry_safe_reverse(pos, prev, head, member)
> +
> +/**
> + * list_for_each_entry_mutable_reverse - iterate backwards over list safe against removal
> + * @pos: the type * to use as a loop cursor.
> + * @...: either (head, member) or (prev, head, member)
> + *
> + * prev: another type * to use as optional temporary storage. The
> + * temporary cursor is internal unless explicitly supplied by the
> + * caller.
> + * head: the head for your list.
> + * member: the name of the list_head within the struct.
> + *
> + * Iterate backwards over list of given type, safe against removal
> + * of list entry.
> + */
> +#define list_for_each_entry_mutable_reverse(pos, ...) \
> + CONCATENATE(__list_for_each_entry_mutable_reverse, \
> + COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
> +
> /**
> * list_safe_reset_next - reset a stale list_for_each_entry_safe loop
> * @pos: the loop cursor used in the list_for_each_entry_safe loop
> @@ -1189,6 +1331,31 @@ static inline void hlist_splice_init(struct hlist_head *from,
> for (pos = (head)->first; pos && ({ n = pos->next; 1; }); \
> pos = n)
>
> +#define __hlist_for_each_mutable_internal(pos, tmp, head) \
> + for (typeof(pos) tmp = (pos = (head)->first) ? pos->next : NULL; \
> + pos; \
> + pos = tmp, tmp = pos ? pos->next : NULL)
> +
> +#define __hlist_for_each_mutable1(pos, head) \
> + __hlist_for_each_mutable_internal(pos, __UNIQUE_ID(next), head)
> +
> +#define __hlist_for_each_mutable2(pos, next, head) \
> + hlist_for_each_safe(pos, next, head)
> +
> +/**
> + * hlist_for_each_mutable - iterate over a hlist safe against entry removal
> + * @pos: the &struct hlist_node to use as a loop cursor.
> + * @...: either (head) or (next, head)
> + *
> + * next: another &struct hlist_node to use as optional temporary storage.
> + * The temporary cursor is internal unless explicitly supplied by
> + * the caller.
> + * head: the head for your hlist.
> + */
> +#define hlist_for_each_mutable(pos, ...) \
> + CONCATENATE(__hlist_for_each_mutable, COUNT_ARGS(__VA_ARGS__)) \
> + (pos, __VA_ARGS__)
> +
> #define hlist_entry_safe(ptr, type, member) \
> ({ typeof(ptr) ____ptr = (ptr); \
> ____ptr ? hlist_entry(____ptr, type, member) : NULL; \
> @@ -1224,18 +1391,44 @@ static inline void hlist_splice_init(struct hlist_head *from,
> for (; pos; \
> pos = hlist_entry_safe((pos)->member.next, typeof(*(pos)), member))
>
> -/**
> - * hlist_for_each_entry_safe - iterate over list of given type safe against removal of list entry
> - * @pos: the type * to use as a loop cursor.
> - * @n: a &struct hlist_node to use as temporary storage
> - * @head: the head for your list.
> - * @member: the name of the hlist_node within the struct.
> +/*
> + * hlist_for_each_entry_safe is an old interface, use hlist_for_each_entry_mutable instead.
> */
> #define hlist_for_each_entry_safe(pos, n, head, member) \
> for (pos = hlist_entry_safe((head)->first, typeof(*pos), member);\
> pos && ({ n = pos->member.next; 1; }); \
> pos = hlist_entry_safe(n, typeof(*pos), member))
>
> +#define __hlist_for_each_entry_mutable_internal(pos, tmp, head, member) \
> + for (struct hlist_node *tmp = (pos = \
> + hlist_entry_safe((head)->first, typeof(*pos), member)) ? \
> + pos->member.next : NULL; \
> + pos; \
> + pos = hlist_entry_safe((tmp), typeof(*pos), member), \
> + tmp = pos ? pos->member.next : NULL)
> +
> +#define __hlist_for_each_entry_mutable2(pos, head, member) \
> + __hlist_for_each_entry_mutable_internal(pos, \
> + __UNIQUE_ID(next), head, member)
> +
> +#define __hlist_for_each_entry_mutable3(pos, next, head, member) \
> + hlist_for_each_entry_safe(pos, next, head, member)
> +
> +/**
> + * hlist_for_each_entry_mutable - iterate over hlist safe against entry removal
> + * @pos: the type * to use as a loop cursor.
> + * @...: either (head, member) or (next, head, member)
> + *
> + * next: a &struct hlist_node to use as optional temporary storage. The
> + * temporary cursor is internal unless explicitly supplied by the
> + * caller.
> + * head: the head for your hlist.
> + * member: the name of the hlist_node within the struct.
> + */
> +#define hlist_for_each_entry_mutable(pos, ...) \
> + CONCATENATE(__hlist_for_each_entry_mutable, \
> + COUNT_ARGS(__VA_ARGS__))(pos, __VA_ARGS__)
> +
> /**
> * hlist_count_nodes - count nodes in the hlist
> * @head: the head for your hlist.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox