* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Chris Li @ 2025-09-03 12:01 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pasha Tatashin, pratyush, jasonmiu, graf, changyuanl, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <20250902134156.GM186519@nvidia.com>
On Tue, Sep 2, 2025 at 6:42 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Fri, Aug 29, 2025 at 12:18:43PM -0700, Chris Li wrote:
>
> > Another idea is that having a middle layer manages the life cycle of
> > the reserved memory for you. Kind of like a slab allocator for the
> > preserved memory.
>
> If you want a slab allocator then I think you should make slab
> preservable.. Don't need more allocators :\
Sure, we can reuse the slab allocator to add the KHO function to it. I
consider that as the implementation detail side, I haven't even
started yet. I just want to point out that we might want to have a
high level library to take care of the life cycle of the preserved
memory. Less boilerplate code for the caller.
> > Question: Do we have a matching FDT node to match the memfd C
> > structure hierarchy? Otherwise all the C struct will lump into one FDT
> > node. Maybe one FDT node for all C struct is fine. Then there is a
> > risk of overflowing the 4K buffer limit on the FDT node.
>
> I thought you were getting rid of FDT? My suggestion was to be taken
> as a FDT replacement..
Thanks for the clarification. Yes, I do want to get rid of FDT, very much so.
If we are not using FDT, adding an object might change the underlying
C structure layout causing a chain reaction of C struct change back to
the root. That is where I assume you might be still using FDT. I see
your later comments address that with a list of objects. I will
discuss it there.
> You need some kind of hierarchy of identifiers, things like memfd
> should chain off some higher level luo object for a file descriptor.
Ack.
>
> PCI should be the same, but not fd based.
Ack.
> It may be that luo maintains some flat dictionary of
> string -> [object type, version, u64 ptr]*
I see, got it. That answers my question of how to add a new object
without changing the C structure layout. You are using a list of the
same C structure. When adding more objects to it, just add more items
to the list. This part of the boiler plate detail is not mentioned in
your original suggestion. I understand your proposal better now.
> And if you want to serialize that the optimal path would be to have a
> vmalloc of all the strings and a vmalloc of the [] data, sort of like
> the kho array idea.
The KHO array idea is already implemented in the existing KHO code or
that is something new you want to propose?
Then we will have to know the combined size of the string up front,
similar to the FDT story. Ideally the list can incrementally add items
to it. May be stored as a list as raw pointer without vmalloc
first,then have a final pass vmalloc and serialize the string and
data.
With the additional detail above, I would like to point out something
I have observed earlier: even though the core idea of the native C
struct is simple and intuitive, the end of end implementation is not.
When we compare C struct implementation, we need to include all those
additional boilerplate details as a whole, otherwise it is not a apple
to apple comparison.
> > At this stage, do you see that exploring such a machine idea can be
> > beneficial or harmful to the project? If such an idea is considered
> > harmful, we should stop discussing such an idea at all. Go back to
> > building more batches of hand crafted screws, which are waiting by the
> > next critical component.
>
> I haven't heard a compelling idea that will obviously make things
> better.. Adding more layers and complexity is not better.
Yes, I completely understand how you reason it, and I agree with your
assessment.
I like to add to that you have been heavily discounting the
boilerplate stuff in the C struct solution. Here is where our view
point might different:
If the "more layer" has its counterpart in the C struct solution as
well, then it is not "more", it is the necessary evil. We need to
compare apples to apples.
> Your BTF proposal doesn't seem to benifit memfd at all, it was focused
> on extracting data directly from an existing struct which I feel very
> strongly we should never do.
From data flow point of view, the data is get from a C struct and
eventually store into a C struct. That is no way around that. That is
the necessary evil if you automate this process. Hey, there is also no
rule saying that you can't use a bounce buffer of some kind of manual
control in between.
It is just a way to automate stuff to reduce the boilerplate. We can
put different label on that and escalate that label or concept is bad.
Your C struct has the exact same thing pulling data from the C struct
and storing into C struct. It is just the label we are arguing. This
label is good and that label is bad. Underlying it has the similar
common necessary evil.
> The above dictionary, I also don't see how BTF helps. It is such a
> special encoding. Yes you could make some elaborate serialization
> infrastructure, like FDT, but we have all been saying FDT is too hard
> to use and too much code. I'm not sure I'm convinced there is really a
Are you ready to be connived? If you keep this as a religion you can
never be convinced.
The reason FDT is too hard to use have other reason. FDT is design to
be constructed by offline tools. In kernel mostly just read only. We
are using FDT outside of its original design parameter. It does not
mean that some thing (the machine) specially design for this purpose
can't be build and easier to use.
> better middle ground :\
With due respect, it sounds like you have the risk of judging
something you haven't fully understood. I feel that a baby, my baby,
has been thrown out with the bathwater.
As a test of water for the above statement, can you describe my idea
equal or better than I do so it passes the test of I say: "yes, this
is exactly what I am trying to build".
That is the communication barrier I am talking about. I estimate at
this rate it will take us about 15 email exchanges to get to the core
stuff. It might be much quicker to lock you and me in a room, Only
release us when you and I can describe each other's viewpoint at a
mutual satisfactory level. I understand your time is precious, and I
don't want to waste your time. I fully respect and comply with your
decision. If you want me to stop now, I can stop. No question asked.
That gets back to my original question, do we already have a ruling
that even the discussion of "the machine" idea is forbidden.
> IMHO if there is some way to improve this it still yet to be found,
In my mind, I have found it. I have to get over the communication
barrier to plead my case to you. You can issue a preliminary ruling to
dismiss my case. I just wish you fully understood the case facts
before you make such a ruling.
> and I think we don't well understand what we need to serialize just
> yet.
That may be true, we don't have 100% understanding of what needs to be
serialized. On the other hand, it is not 0% either. Based on what we
understand, we can already use "the machine" to help us do what we
know much more effectively. Of course, there is a trade off for
developing "the machine". It takes extra time and the complexity to
maintain such a machine. I fully understand that.
> Smaller ideas like preserve the vmalloc will make big improvement
> already.
Yes, I totally agree. It is a local optimization we can do, it might
not be the global optimized though. "the machine" might not use
vmalloc at all, all this small incremental change will be throw away
once we have "the machine".
I put this situation in the airplane story, yes, we build diamond
plated filers to produce the hand craft screws faster. The missing
opportunity is that, if we have "the machine" earlier, we can pump out
machined screws much faster at scale, minus the time to build the
machine, it might still be an overall win. We don't need to use
diamond plated filter if we have the machine.
> Lets not race ahead until we understand the actual problem properly.
Is that the final ruling? It feels like so. Just clarifying what I am receiving.
I feel a much stronger sense of urgency than you though. The stakes
are high, currently you already have four departments can use this
common serialization library right now:
1) PCI
2) VFIO
3) IOMMU
4) Memfd.
We are getting into the more complex data structures. If we merge this
into the mainline, it is much harder to pull them out later.
Basically, this is a done deal. That is why I am putting my reputation
and my job on the line to pitch "the machine" idea. It is a very risky
move, I fully understand that.
Chris
^ permalink raw reply
* Re: [PATCH v4] linux: Add openat2 (BZ 31664)
From: Arjun Shankar @ 2025-09-03 12:09 UTC (permalink / raw)
To: enh
Cc: Paul Eggert, Aleksa Sarai, Adhemerval Zanella Netto, libc-alpha,
linux-api
In-Reply-To: <CAJgzZooK+w7NTjsFV_0c=SmPSnsSMiWXFgnvcw=w3msj7NBY9A@mail.gmail.com>
Hello!
> > Earlier on in this thread, Aleksa mentioned sched_setattr as
> > establishing precedent for the kernel modifying non-const objects. It
> > looks like glibc actually does provide a sched_setattr wrapper since
> > 2.41. The relevant argument hasn't been marked as const and the kernel
> > does modify the contents, and glibc's syscall wrapper simply passes it
> > through. So we already do this.
>
> given that
>
> SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,
> unsigned int, flags)
>
> calls sched_setattr(), which is defined thus:
>
> int sched_setattr(struct task_struct *p, const struct sched_attr *attr)
> {
> return __sched_setscheduler(p, attr, true, true);
> }
>
> i think that's just a copy & paste mistake in the kernel -- carefully
> preserved in glibc and bionic -- no?
>
> (i only see the kernel updating its own _copy_ of the passed-in struct.)
Based on my understanding, it all happens before the call to the const
marked sched_setattr. Starting from line 986 (as of today) on the same
syscalls.c file [1]:
retval = sched_copy_attr(uattr, &attr);
if (retval)
return retval;
Which inside sched_copy_attr does:
ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size);
if (ret) {
if (ret == -E2BIG)
goto err_size;
And that leads to:
err_size:
put_user(sizeof(*attr), &uattr->size);
return -E2BIG;
Which writes to userspace.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/syscalls.c?id=e6b9dce0aeeb91dfc0974ab87f02454e24566182#n986
--
Arjun Shankar
he/him/his
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Pratyush Yadav @ 2025-09-03 14:10 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pratyush Yadav, Pasha Tatashin, jasonmiu, graf, changyuanl, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <20250902134846.GN186519@nvidia.com>
Hi Jason,
On Tue, Sep 02 2025, Jason Gunthorpe wrote:
> On Mon, Sep 01, 2025 at 07:10:53PM +0200, Pratyush Yadav wrote:
>> Building kvalloc on top of this becomes trivial.
>>
>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/pratyush/linux.git/commit/?h=kho-array&id=cf4c04c1e9ac854e3297018ad6dada17c54a59af
>
> This isn't really an array, it is a non-seekable serialization of
> key/values with some optimization for consecutive keys. IMHO it is
Sure, an array is not the best name for the thing. Call it whatever,
maybe a "sparse collection of pointers". But I hope you get the idea.
> most useful if you don't know the size of the thing you want to
> serialize in advance since it has a nice dynamic append.
>
> But if you do know the size, I think it makes more sense just to do a
> preserving vmalloc and write out a linear array..
I think there are two separate parts here. One is the data format and
the other is the data builder.
The format itself is quite simple. It is a linked list of discontiguous
pages that holds a set of pointers. We use that idea already for the
preserved pages bitmap. Mike's vmalloc preservation patches also use the
same idea, just with a small variation.
The builder part (ka_iter in my patches) is an abstraction on top to
build the data structure. I designed it with the nice dynamic append
property since it seemed like a nice and convenient design, but we can
have it define the size statically as well. The underlying data format
won't change.
>
> So, it could be useful, but I wouldn't use it for memfd, the vmalloc
> approach is better and we shouldn't optimize for sparsness which
> should never happen.
I disagree. I think we are re-inventing the same data format with minor
variations. I think we should define extensible fundamental data formats
first, and then use those as the building blocks for the rest of our
serialization logic.
I think KHO array does exactly that. It provides the fundamental
serialization for a collection of pointers, and other serialization use
cases can then build on top of it. For example, the preservation bitmaps
can get rid of their linked list logic and just use KHO array to hold
and retrieve its bitmaps. It will make the serialization simpler.
Similar argument for vmalloc preservation.
I also don't get why you think sparseness "should never happen". For
memfd for example, you say in one of your other emails that "And again
in real systems we expect memfd to be fully populated too." Which
systems and use cases do you have in mind? Why do you think people won't
want a sparse memfd?
And finally, from a data format perspective, the sparseness only adds a
small bit of complexity (the startpos for each kho_array_page).
Everything else is practically the same as a continuous array.
All in all, I think KHO array is going to prove useful and will make
serialization for subsystems easier. I think sparseness will also prove
useful but it is not a hill I want to die on. I am fine with starting
with a non-sparse array if people really insist. But I do think we
should go with KHO array as a base instead of re-inventing the linked
list of pages again and again.
>
>> > The versioning should be first class, not hidden away as some emergent
>> > property of registering multiple serializers or something like that.
>>
>> That makes sense. How about some simple changes to the LUO interfaces to
>> make the version more prominent:
>>
>> int (*prepare)(struct liveupdate_file_handler *handler,
>> struct file *file, u64 *data, char **compatible);
>
> Yeah, something more integrated with the ops is better.
>
> You could list the supported versions in the ops itself
>
> const char **supported_deserialize_versions;
>
> And let the luo framework find the right versions.
>
> But for prepare I would expect an inbetween object:
>
> int (*prepare)(struct liveupdate_file_handler *handler,
> struct luo_object *obj, struct file *file);
>
> And then you'd do function calls on 'obj' to store 'data' per version.
What do you mean by "data per version"? I think there should be only one
version of the serialized object. Multiple versions of the same thing
will get ugly real quick.
Other than that, I think this could work well. I am guessing luo_object
stores the version and gives us a way to query it on the other side. I
think if we are letting LUO manage supported versions, it should be
richer than just a list of strings. I think it should include a ops
structure for deserializing each version. That would encapsulate the
versioning more cleanly.
--
Regards,
Pratyush Yadav
^ permalink raw reply
* Re: [PATCH v2] uapi/fcntl: define RENAME_* and AT_RENAME_* macros
From: Amir Goldstein @ 2025-09-03 14:14 UTC (permalink / raw)
To: Randy Dunlap
Cc: linux-fsdevel, patches, Jeff Layton, Chuck Lever, Alexander Aring,
Josef Bacik, Aleksa Sarai, Jan Kara, Christian Brauner,
Matthew Wilcox, David Howells, linux-api
In-Reply-To: <a6246609-3ec0-4e38-8733-b2cf3b8fbd9a@infradead.org>
On Wed, Sep 3, 2025 at 2:46 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>
>
>
> On 9/2/25 2:31 PM, Randy Dunlap wrote:
> > Hi,
> >
> > On 9/1/25 11:58 PM, Amir Goldstein wrote:
> >> On Tue, Sep 2, 2025 at 1:14 AM Randy Dunlap <rdunlap@infradead.org> wrote:
> >>>
> >>> Define the RENAME_* and AT_RENAME_* macros exactly the same as in
> >>> recent glibc <stdio.h> so that duplicate definition build errors in
> >>> both samples/watch_queue/watch_test.c and samples/vfs/test-statx.c
> >>> no longer happen. When they defined in exactly the same way in
> >>> multiple places, the build errors are prevented.
> >>>
> >>> Defining only the AT_RENAME_* macros is not sufficient since they
> >>> depend on the RENAME_* macros, which may not be defined when the
> >>> AT_RENAME_* macros are used.
> >>>
> >>> Build errors being fixed:
> >>>
> >>> for samples/vfs/test-statx.c:
> >>>
> >>> In file included from ../samples/vfs/test-statx.c:23:
> >>> usr/include/linux/fcntl.h:159:9: warning: ‘AT_RENAME_NOREPLACE’ redefined
> >>> 159 | #define AT_RENAME_NOREPLACE 0x0001
> >>> In file included from ../samples/vfs/test-statx.c:13:
> >>> /usr/include/stdio.h:171:10: note: this is the location of the previous definition
> >>> 171 | # define AT_RENAME_NOREPLACE RENAME_NOREPLACE
> >>> usr/include/linux/fcntl.h:160:9: warning: ‘AT_RENAME_EXCHANGE’ redefined
> >>> 160 | #define AT_RENAME_EXCHANGE 0x0002
> >>> /usr/include/stdio.h:173:10: note: this is the location of the previous definition
> >>> 173 | # define AT_RENAME_EXCHANGE RENAME_EXCHANGE
> >>> usr/include/linux/fcntl.h:161:9: warning: ‘AT_RENAME_WHITEOUT’ redefined
> >>> 161 | #define AT_RENAME_WHITEOUT 0x0004
> >>> /usr/include/stdio.h:175:10: note: this is the location of the previous definition
> >>> 175 | # define AT_RENAME_WHITEOUT RENAME_WHITEOUT
> >>>
> >>> for samples/watch_queue/watch_test.c:
> >>>
> >>> In file included from usr/include/linux/watch_queue.h:6,
> >>> from ../samples/watch_queue/watch_test.c:19:
> >>> usr/include/linux/fcntl.h:159:9: warning: ‘AT_RENAME_NOREPLACE’ redefined
> >>> 159 | #define AT_RENAME_NOREPLACE 0x0001
> >>> In file included from ../samples/watch_queue/watch_test.c:11:
> >>> /usr/include/stdio.h:171:10: note: this is the location of the previous definition
> >>> 171 | # define AT_RENAME_NOREPLACE RENAME_NOREPLACE
> >>> usr/include/linux/fcntl.h:160:9: warning: ‘AT_RENAME_EXCHANGE’ redefined
> >>> 160 | #define AT_RENAME_EXCHANGE 0x0002
> >>> /usr/include/stdio.h:173:10: note: this is the location of the previous definition
> >>> 173 | # define AT_RENAME_EXCHANGE RENAME_EXCHANGE
> >>> usr/include/linux/fcntl.h:161:9: warning: ‘AT_RENAME_WHITEOUT’ redefined
> >>> 161 | #define AT_RENAME_WHITEOUT 0x0004
> >>> /usr/include/stdio.h:175:10: note: this is the location of the previous definition
> >>> 175 | # define AT_RENAME_WHITEOUT RENAME_WHITEOUT
> >>>
> >>> Fixes: b4fef22c2fb9 ("uapi: explain how per-syscall AT_* flags should be allocated")
> >>> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> >>> ---
> >>> Cc: Amir Goldstein <amir73il@gmail.com>
> >>> Cc: Jeff Layton <jlayton@kernel.org>
> >>> Cc: Chuck Lever <chuck.lever@oracle.com>
> >>> Cc: Alexander Aring <alex.aring@gmail.com>
> >>> Cc: Josef Bacik <josef@toxicpanda.com>
> >>> Cc: Aleksa Sarai <cyphar@cyphar.com>
> >>> Cc: Jan Kara <jack@suse.cz>
> >>> Cc: Christian Brauner <brauner@kernel.org>
> >>> Cc: Matthew Wilcox <willy@infradead.org>
> >>> Cc: David Howells <dhowells@redhat.com>
> >>> CC: linux-api@vger.kernel.org
> >>> To: linux-fsdevel@vger.kernel.org
> >>>
> >>> include/uapi/linux/fcntl.h | 9 ++++++---
> >>> 1 file changed, 6 insertions(+), 3 deletions(-)
> >>>
> >>> --- linux-next-20250819.orig/include/uapi/linux/fcntl.h
> >>> +++ linux-next-20250819/include/uapi/linux/fcntl.h
> >>> @@ -156,9 +156,12 @@
> >>> */
> >>>
> >>> /* Flags for renameat2(2) (must match legacy RENAME_* flags). */
> >>> -#define AT_RENAME_NOREPLACE 0x0001
> >>> -#define AT_RENAME_EXCHANGE 0x0002
> >>> -#define AT_RENAME_WHITEOUT 0x0004
> >>> +# define RENAME_NOREPLACE (1 << 0)
> >>> +# define AT_RENAME_NOREPLACE RENAME_NOREPLACE
> >>> +# define RENAME_EXCHANGE (1 << 1)
> >>> +# define AT_RENAME_EXCHANGE RENAME_EXCHANGE
> >>> +# define RENAME_WHITEOUT (1 << 2)
> >>> +# define AT_RENAME_WHITEOUT RENAME_WHITEOUT
> >>>
> >>
> >> This solution, apart from being terribly wrong (adjust the source to match
> >> to value of its downstream copy), does not address the issue that Mathew
> >> pointed out on v1 discussion [1]:
> >
> > I didn't forget or ignore this.
> > If the macros have the same values (well, not just values but also the
> > same text), then I don't see why it matters whether they are in some older
> > version of glibc.
> >
> >> $ grep -r AT_RENAME_NOREPLACE /usr/include
> >> /usr/include/linux/fcntl.h:#define AT_RENAME_NOREPLACE 0x0001
> >>
> >> It's not in stdio.h at all. This is with libc6 2.41-10
> >>
> >> [1] https://lore.kernel.org/linux-fsdevel/aKxfGix_o4glz8-Z@casper.infradead.org/
> >>
> >> I don't know how to resolve the mess that glibc has created.
> >
> > Yeah, I guess I don't either.
> >
> >> Perhaps like this:
> >>
> >> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> >> index f291ab4f94ebc..dde14fa3c2007 100644
> >> --- a/include/uapi/linux/fcntl.h
> >> +++ b/include/uapi/linux/fcntl.h
> >> @@ -155,10 +155,16 @@
> >> * as possible, so we can use them for generic bits in the future if necessary.
> >> */
> >>
> >> -/* Flags for renameat2(2) (must match legacy RENAME_* flags). */
> >> -#define AT_RENAME_NOREPLACE 0x0001
> >> -#define AT_RENAME_EXCHANGE 0x0002
> >> -#define AT_RENAME_WHITEOUT 0x0004
> >> +/*
> >> + * The legacy renameat2(2) RENAME_* flags are conceptually also
> >> syscall-specific
> >> + * flags, so it could makes sense to create the AT_RENAME_* aliases
> >> for them and
> >> + * maybe later add support for generic AT_* flags to this syscall.
> >> + * However, following a mismatch of definitions in glibc and since no
> >> kernel code
> >> + * currently uses the AT_RENAME_* aliases, we leave them undefined here.
> >> +#define AT_RENAME_NOREPLACE RENAME_NOREPLACE
> >> +#define AT_RENAME_EXCHANGE RENAME_EXCHANGE
> >> +#define AT_RENAME_WHITEOUT RENAME_WHITEOUT
> >> +*/
> >
> > Well, we do have samples/ code that uses fcntl.h (indirectly; maybe
> > that can be fixed).
> > See the build errors in the patch description.
> >
> >
> >> /* Flag for faccessat(2). */
> >> #define AT_EACCESS 0x200 /* Test access permitted for
> >
> > With this patch (your suggestion above):
> >
> > IF a userspace program in samples/ uses <uapi/linux/fcntl.h> without
> > using <stdio.h>, [yes, I created one to test this] and without using
> > <uapi/linux/fs.h> then the build fails with similar build errors:
> >
> > ../samples/watch_queue/watch_nostdio.c: In function ‘consumer’:
> > ../samples/watch_queue/watch_nostdio.c:33:32: error: ‘RENAME_NOREPLACE’ undeclared (first use in this function)
> > 33 | return RENAME_NOREPLACE;
> > ../samples/watch_queue/watch_nostdio.c:33:32: note: each undeclared identifier is reported only once for each function it appears in
> > ../samples/watch_queue/watch_nostdio.c:37:32: error: ‘RENAME_EXCHANGE’ undeclared (first use in this function)
> > 37 | return RENAME_EXCHANGE;
> > ../samples/watch_queue/watch_nostdio.c:41:32: error: ‘RENAME_WHITEOUT’ undeclared (first use in this function)
> > 41 | return RENAME_WHITEOUT;
> >
> > This build succeeds with my version 1 patch (full defining of both
> > RENAME_* and AT_RENAME_* macros). It fails with the patch that you suggested
> > above.
> >
> > OK, here's what I propose.
> >
> > a. remove the unused and (sort of) recently added AT_RENAME_* macros
> > in include/uapi/linux/fcntl.h. Nothing in the kernel tree uses them.
> > This is:
> >
> > commit b4fef22c2fb9
> > Author: Aleksa Sarai <cyphar@cyphar.com>
> > Date: Wed Aug 28 20:19:42 2024 +1000
> > uapi: explain how per-syscall AT_* flags should be allocated
> >
> > These macros should have never been added here IMO.
> > Just putting them somewhere as examples (in comments) would be OK.
> >
I agree.
I did not get this patch from Aleksa,
but I proposed something similar above.
> > This alone fixes all of the build errors in samples/ that I originally
> > reported.
> >
> > b. if a userspace program wants to use the RENAME_* macros, it should
> > #include <linux/fs.h> instead of <linux/fcntl.h>.
> >
> > This fixes the "contrived" build error that I manufactured.
> >
> > Note that some programs in tools/ do use AT_RENAME_* (all 3 macros)
> > but they define those macros locally.
> >
>
> And after more testing, this is what I think works:
>
> a. remove all of the AT_RENAME-* macros from <uapi/linux/fcntl.h>
> (as above)
ok.
>
> b. put the AT_RENAME_* macros into <uapi/linux/fs.h> like so:
>
> +/* Flags for renameat2(2) (must match legacy RENAME_* flags). */
> +# define AT_RENAME_NOREPLACE RENAME_NOREPLACE
> +# define AT_RENAME_EXCHANGE RENAME_EXCHANGE
> +# define AT_RENAME_WHITEOUT RENAME_WHITEOUT
>
> so that they match what is in upstream glibc stdio.h, hence not
> causing duplicate definition errors.
Disagree.
We do not need to define them at all.
The *only* reason we defined them in fcntl.h is so the
definition will be together with the rest of the AT_ flags.
Now we change that to a comment, but there is no reason to
define them at fs.h. Why would we need to do that?
Thanks,
Amir.
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Pratyush Yadav @ 2025-09-03 14:17 UTC (permalink / raw)
To: Mike Rapoport
Cc: Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, jasonmiu, graf,
changyuanl, dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen,
kanie, ojeda, aliceryhl, masahiroy, akpm, tj, yoann.congal,
mmaurer, roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <aLbYk30V2EEJJtAf@kernel.org>
Hi Mike,
On Tue, Sep 02 2025, Mike Rapoport wrote:
> Hi Pratyush,
>
> On Mon, Sep 01, 2025 at 07:01:38PM +0200, Pratyush Yadav wrote:
>> Hi Mike,
>>
>> On Mon, Sep 01 2025, Mike Rapoport wrote:
>>
>> > On Tue, Aug 26, 2025 at 01:20:19PM -0300, Jason Gunthorpe wrote:
>> >> On Thu, Aug 07, 2025 at 01:44:35AM +0000, Pasha Tatashin wrote:
>> >>
>> >> > + /*
>> >> > + * Most of the space should be taken by preserved folios. So take its
>> >> > + * size, plus a page for other properties.
>> >> > + */
>> >> > + fdt = memfd_luo_create_fdt(PAGE_ALIGN(preserved_size) + PAGE_SIZE);
>> >> > + if (!fdt) {
>> >> > + err = -ENOMEM;
>> >> > + goto err_unpin;
>> >> > + }
>> >>
>> >> This doesn't seem to have any versioning scheme, it really should..
>> >>
>> >> > + err = fdt_property_placeholder(fdt, "folios", preserved_size,
>> >> > + (void **)&preserved_folios);
>> >> > + if (err) {
>> >> > + pr_err("Failed to reserve folios property in FDT: %s\n",
>> >> > + fdt_strerror(err));
>> >> > + err = -ENOMEM;
>> >> > + goto err_free_fdt;
>> >> > + }
>> >>
>> >> Yuk.
>> >>
>> >> This really wants some luo helper
>> >>
>> >> 'luo alloc array'
>> >> 'luo restore array'
>> >> 'luo free array'
>> >
>> > We can just add kho_{preserve,restore}_vmalloc(). I've drafted it here:
>> > https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=kho/vmalloc/v1
>> >
>> > Will wait for kbuild and then send proper patches.
>>
>> I have been working on something similar, but in a more generic way.
>>
>> I have implemented a sparse KHO-preservable array (called kho_array)
>> with xarray like properties. It can take in 4-byte aligned pointers and
>> supports saving non-pointer values similar to xa_mk_value(). For now it
>> doesn't support multi-index entries, but if needed the data format can
>> be extended to support it as well.
>>
>> The structure is very similar to what you have implemented. It uses a
>> linked list of pages with some metadata at the head of each page.
>>
>> I have used it for memfd preservation, and I think it is quite
>> versatile. For example, your kho_preserve_vmalloc() can be very easily
>> built on top of this kho_array by simply saving each physical page
>> address at consecutive indices in the array.
>
> I've started to work on something similar to your kho_array for memfd case
> and then I thought that since we know the size of the array we can simply
> vmalloc it and preserve vmalloc, and that lead me to implementing
> preservation of vmalloc :)
>
> I like the idea to have kho_array for cases when we don't know the amount
> of data to preserve in advance, but for memfd as it's currently
> implemented I think that allocating and preserving vmalloc is simpler.
>
> As for porting kho_preserve_vmalloc() to kho_array, I also feel that it
> would just make kho_preserve_vmalloc() more complex and I'd rather simplify
> it even more, e.g. with preallocating all the pages that preserve indices
> in advance.
I think there are two parts here. One is the data format of the KHO
array and the other is the way to build it. I think the format is quite
simple and versatile, and we can have many strategies of building it.
For example, if you are only concerned with pre-allocating data, I can
very well add a way to initialize the KHO array with with a fixed size
up front.
Beyond that, I think KHO array will actually make kho_preserve_vmalloc()
simpler since it won't have to deal with the linked list traversal
logic. It can just do ka_for_each() and just get all the pages. We can
also convert the preservation bitmaps to use it so the linked list logic
is in one place, and others just build on top of it.
>
>> The code is still WIP and currently a bit hacky, but I will clean it up
>> in a couple days and I think it should be ready for posting. You can
>> find the current version at [0][1]. Would be good to hear your thoughts,
>> and if you agree with the approach, I can also port
>> kho_preserve_vmalloc() to work on top of kho_array as well.
>>
>> [0] https://git.kernel.org/pub/scm/linux/kernel/git/pratyush/linux.git/commit/?h=kho-array&id=cf4c04c1e9ac854e3297018ad6dada17c54a59af
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/pratyush/linux.git/commit/?h=kho-array&id=5eb0d7316274a9c87acaeedd86941979fc4baf96
>>
>> --
>> Regards,
>> Pratyush Yadav
--
Regards,
Pratyush Yadav
^ permalink raw reply
* Re: [PATCH v4] linux: Add openat2 (BZ 31664)
From: enh @ 2025-09-03 14:33 UTC (permalink / raw)
To: Arjun Shankar
Cc: Paul Eggert, Aleksa Sarai, Adhemerval Zanella Netto, libc-alpha,
linux-api
In-Reply-To: <CAG_osabF4nynNNFc=CP_ZFqZ_iJr47VXTJpsN75CzX+Pi+CgEQ@mail.gmail.com>
On Wed, Sep 3, 2025 at 8:10 AM Arjun Shankar <arjun@redhat.com> wrote:
>
> Hello!
>
> > > Earlier on in this thread, Aleksa mentioned sched_setattr as
> > > establishing precedent for the kernel modifying non-const objects. It
> > > looks like glibc actually does provide a sched_setattr wrapper since
> > > 2.41. The relevant argument hasn't been marked as const and the kernel
> > > does modify the contents, and glibc's syscall wrapper simply passes it
> > > through. So we already do this.
> >
> > given that
> >
> > SYSCALL_DEFINE3(sched_setattr, pid_t, pid, struct sched_attr __user *, uattr,
> > unsigned int, flags)
> >
> > calls sched_setattr(), which is defined thus:
> >
> > int sched_setattr(struct task_struct *p, const struct sched_attr *attr)
> > {
> > return __sched_setscheduler(p, attr, true, true);
> > }
> >
> > i think that's just a copy & paste mistake in the kernel -- carefully
> > preserved in glibc and bionic -- no?
> >
> > (i only see the kernel updating its own _copy_ of the passed-in struct.)
>
> Based on my understanding, it all happens before the call to the const
> marked sched_setattr. Starting from line 986 (as of today) on the same
> syscalls.c file [1]:
>
> retval = sched_copy_attr(uattr, &attr);
> if (retval)
> return retval;
>
> Which inside sched_copy_attr does:
>
> ret = copy_struct_from_user(attr, sizeof(*attr), uattr, size);
> if (ret) {
> if (ret == -E2BIG)
> goto err_size;
>
> And that leads to:
>
> err_size:
> put_user(sizeof(*attr), &uattr->size);
> return -E2BIG;
oh, wow. it didn't even occur to me to look inside a function called
sched_copy_attr(), though the fact that it wasn't a direct call to
copy_struct_from_user() should perhaps have been a clue :-(
> Which writes to userspace.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/syscalls.c?id=e6b9dce0aeeb91dfc0974ab87f02454e24566182#n986
>
> --
> Arjun Shankar
> he/him/his
>
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Jason Gunthorpe @ 2025-09-03 15:01 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Pasha Tatashin, jasonmiu, graf, changyuanl, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <mafs0v7lzvd7m.fsf@kernel.org>
On Wed, Sep 03, 2025 at 04:10:37PM +0200, Pratyush Yadav wrote:
> > So, it could be useful, but I wouldn't use it for memfd, the vmalloc
> > approach is better and we shouldn't optimize for sparsness which
> > should never happen.
>
> I disagree. I think we are re-inventing the same data format with minor
> variations. I think we should define extensible fundamental data formats
> first, and then use those as the building blocks for the rest of our
> serialization logic.
page, vmalloc, slab seem to me to be the fundamental units of memory
management in linux, so they should get KHO support.
If you want to preserve a known-sized array you use vmalloc and then
write out the per-list items. If it is a dictionary/sparse array then
you write an index with each item too. This is all trivial and doesn't
really need more abstraction in of itself, IMHO.
> cases can then build on top of it. For example, the preservation bitmaps
> can get rid of their linked list logic and just use KHO array to hold
> and retrieve its bitmaps. It will make the serialization simpler.
I don't think the bitmaps should, the serialization here is very
special because it is not actually preserved, it just exists for the
time while the new kernel runs in scratch and is insta freed once the
allocators start up.
> I also don't get why you think sparseness "should never happen". For
> memfd for example, you say in one of your other emails that "And again
> in real systems we expect memfd to be fully populated too." Which
> systems and use cases do you have in mind? Why do you think people won't
> want a sparse memfd?
memfd should principally be used to back VM memory, and I expect VM
memory to be fully populated. Why would it be sparse?
> All in all, I think KHO array is going to prove useful and will make
> serialization for subsystems easier. I think sparseness will also prove
> useful but it is not a hill I want to die on. I am fine with starting
> with a non-sparse array if people really insist. But I do think we
> should go with KHO array as a base instead of re-inventing the linked
> list of pages again and again.
The two main advantages I see to the kho array design vs vmalloc is
that it should be a bit faster as it doesn't establish a vmap, and it
handles unknown size lists much better.
Are these important considerations? IDK.
As I said to Chris, I think we should see more examples of what we
actually need before assuming any certain datastructure is the best
choice.
So I'd stick to simpler open coded things and go back and improve them
than start out building the wrong shared data structure.
How about have at least three luo clients that show meaningful benefit
before proposing something beyond the fundamental page, vmalloc, slab
things?
> What do you mean by "data per version"? I think there should be only one
> version of the serialized object. Multiple versions of the same thing
> will get ugly real quick.
If you want to support backwards/forwards compatability then you
probably should support multiple versions as well. Otherwise it
could become quite hard to make downgrades..
Ideally I'd want to remove the upstream code for obsolete versions
fairly quickly so I'd imagine kernels will want to generate both
versions during the transition period and then eventually newer
kernels will only accept the new version.
I've argued before that the extended matrix of any kernel version to
any other kernel version should lie with the distro/CSP making the
kernel fork. They know what their upgrade sequence will be so they can
manage any missing versions to make it work.
Upstream should do like v6.1 to v6.2 only or something similarly well
constrained. I think this is a reasonable trade off to get subsystem
maintainers to even accept this stuff at all.
> Other than that, I think this could work well. I am guessing luo_object
> stores the version and gives us a way to query it on the other side. I
> think if we are letting LUO manage supported versions, it should be
> richer than just a list of strings. I think it should include a ops
> structure for deserializing each version. That would encapsulate the
> versioning more cleanly.
Yeah, sounds about right
Jason
^ permalink raw reply
* Re: [PATCH v20 4/8] fork: Add shadow stack support to clone3()
From: Catalin Marinas @ 2025-09-03 15:34 UTC (permalink / raw)
To: Mark Brown
Cc: Rick P. Edgecombe, Deepak Gupta, Szabolcs Nagy, H.J. Lu,
Florian Weimer, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, Peter Zijlstra, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, Christian Brauner, Shuah Khan,
linux-kernel, Will Deacon, jannh, Andrew Morton, Yury Khrustalev,
Wilco Dijkstra, linux-kselftest, linux-api, Kees Cook
In-Reply-To: <734e4c2c-a478-4019-86f7-4965c2b042e1@sirena.org.uk>
On Wed, Sep 03, 2025 at 11:01:05AM +0100, Mark Brown wrote:
> On Tue, Sep 02, 2025 at 10:02:07PM +0100, Catalin Marinas wrote:
> > On Tue, Sep 02, 2025 at 11:21:48AM +0100, Mark Brown wrote:
> > > + mmap_read_lock(mm);
> > > +
> > > + addr = untagged_addr_remote(mm, args->shadow_stack_token);
> > > + page = get_user_page_vma_remote(mm, addr, FOLL_FORCE | FOLL_WRITE,
> > > + &vma);
>
> > However, I wonder whether it makes sense to use the remote mm access
> > here at all. Does this code ever run without CLONE_VM? If not, this is
> > all done within the current mm context.
>
> Yes, userspace can select if it wants CLONE_VM or not so we should
> handle that case. We discussed this on prior versions and we felt that
> while we couldn't immediately see the use case for !CLONE_VM there
> wasn't a good reason to restrict the creativity of userspace developers,
> and given that you can specify the regular stack in these cases it seems
> logical that you'd also be able to specify the shadow stack.
Yeah. Not sure it makes much sense in practice but if we allow a new
stack without CLONE_VM, we should also allow a shadow stack. Thanks for
the clarification.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Pasha Tatashin @ 2025-09-03 15:59 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pratyush Yadav, Mike Rapoport, jasonmiu, graf, changyuanl,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <20250902113857.GB186519@nvidia.com>
> > > > The patch looks okay to me, but it doesn't support holes in vmap
> > > > areas. While that is likely acceptable for vmalloc, it could be a
> > > > problem if we want to preserve memfd with holes and using vmap
> > > > preservation as a method, which would require a different approach.
> > > > Still, this would help with preserving memfd.
> > >
> > > I agree. I think we should do it the other way round. Build a sparse
> > > array first, and then use that to build vmap preservation. Our emails
> >
> > Yes, sparse array support would help both: vmalloc and memfd preservation.
>
> Why? vmalloc is always full popoulated, no sparseness..
vmalloc is always fully populated, but if we add support for
preserving an area with holes, it can also be used for preserving
vmalloc. By the way, I don't like calling it *vmalloc* preservation
because we aren't preserving the original virtual addresses; we are
preserving a list of pages that are reassembled into a virtually
contiguous area. Maybe kho map, or kho page map, not sure, but vmalloc
does not sound right to me.
> And again in real systems we expect memfd to be fully populated too.
I thought so too, but we already have a use case for slightly sparse
memfd, unfortunately, that becomes *very* inefficient when fully
populated.
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Jason Gunthorpe @ 2025-09-03 16:40 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Pratyush Yadav, Mike Rapoport, jasonmiu, graf, changyuanl,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <CA+CK2bB-CaEdvzxt9=c1SZwXBfy-nE202Q2mfHL_2K7spjf8rw@mail.gmail.com>
On Wed, Sep 03, 2025 at 03:59:40PM +0000, Pasha Tatashin wrote:
> vmalloc is always fully populated, but if we add support for
> preserving an area with holes, it can also be used for preserving
> vmalloc.
Why? If you can't create it with vmap what is the point?
> By the way, I don't like calling it *vmalloc* preservation
> because we aren't preserving the original virtual addresses; we are
> preserving a list of pages that are reassembled into a virtually
> contiguous area. Maybe kho map, or kho page map, not sure, but vmalloc
> does not sound right to me.
No preservation retains the virtual address, that is pretty much
universal.
It is vmalloc preservation because the flow is
x = vmalloc()
kho_preserve_vmalloc(x, &preserved)
[..]
x = kho_restore_vmalloc(preserved)
vfree(x)
It is the same naming as folio preservation. Upon restore you get a
vmalloc() back.
> > And again in real systems we expect memfd to be fully populated too.
>
> I thought so too, but we already have a use case for slightly sparse
> memfd, unfortunately, that becomes *very* inefficient when fully
> populated.
Really? Why not use multiple memfds :(
So maybe you need to do optimized sparseness in memfd :(
Jason
^ permalink raw reply
* Re: [PATCH v2] uapi/fcntl: define RENAME_* and AT_RENAME_* macros
From: Randy Dunlap @ 2025-09-03 18:10 UTC (permalink / raw)
To: Amir Goldstein
Cc: linux-fsdevel, patches, Jeff Layton, Chuck Lever, Alexander Aring,
Josef Bacik, Aleksa Sarai, Jan Kara, Christian Brauner,
Matthew Wilcox, David Howells, linux-api
In-Reply-To: <CAOQ4uxhN2kPLguMN+VR8qu4AzBzLziFADqJg_dvOOO_gw=GpTw@mail.gmail.com>
On 9/3/25 7:14 AM, Amir Goldstein wrote:
> On Wed, Sep 3, 2025 at 2:46 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>>
>>
>>
>> On 9/2/25 2:31 PM, Randy Dunlap wrote:
>>> Hi,
>>>
>>> On 9/1/25 11:58 PM, Amir Goldstein wrote:
>>>> On Tue, Sep 2, 2025 at 1:14 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>>>>>
>>>>> Define the RENAME_* and AT_RENAME_* macros exactly the same as in
>>>>> recent glibc <stdio.h> so that duplicate definition build errors in
>>>>> both samples/watch_queue/watch_test.c and samples/vfs/test-statx.c
>>>>> no longer happen. When they defined in exactly the same way in
>>>>> multiple places, the build errors are prevented.
>>>>>
>>>>> Defining only the AT_RENAME_* macros is not sufficient since they
>>>>> depend on the RENAME_* macros, which may not be defined when the
>>>>> AT_RENAME_* macros are used.
>>>>>
>>>>> Build errors being fixed:
>>>>>
>>>>> for samples/vfs/test-statx.c:
>>>>>
>>>>> In file included from ../samples/vfs/test-statx.c:23:
>>>>> usr/include/linux/fcntl.h:159:9: warning: ‘AT_RENAME_NOREPLACE’ redefined
>>>>> 159 | #define AT_RENAME_NOREPLACE 0x0001
>>>>> In file included from ../samples/vfs/test-statx.c:13:
>>>>> /usr/include/stdio.h:171:10: note: this is the location of the previous definition
>>>>> 171 | # define AT_RENAME_NOREPLACE RENAME_NOREPLACE
>>>>> usr/include/linux/fcntl.h:160:9: warning: ‘AT_RENAME_EXCHANGE’ redefined
>>>>> 160 | #define AT_RENAME_EXCHANGE 0x0002
>>>>> /usr/include/stdio.h:173:10: note: this is the location of the previous definition
>>>>> 173 | # define AT_RENAME_EXCHANGE RENAME_EXCHANGE
>>>>> usr/include/linux/fcntl.h:161:9: warning: ‘AT_RENAME_WHITEOUT’ redefined
>>>>> 161 | #define AT_RENAME_WHITEOUT 0x0004
>>>>> /usr/include/stdio.h:175:10: note: this is the location of the previous definition
>>>>> 175 | # define AT_RENAME_WHITEOUT RENAME_WHITEOUT
>>>>>
>>>>> for samples/watch_queue/watch_test.c:
>>>>>
>>>>> In file included from usr/include/linux/watch_queue.h:6,
>>>>> from ../samples/watch_queue/watch_test.c:19:
>>>>> usr/include/linux/fcntl.h:159:9: warning: ‘AT_RENAME_NOREPLACE’ redefined
>>>>> 159 | #define AT_RENAME_NOREPLACE 0x0001
>>>>> In file included from ../samples/watch_queue/watch_test.c:11:
>>>>> /usr/include/stdio.h:171:10: note: this is the location of the previous definition
>>>>> 171 | # define AT_RENAME_NOREPLACE RENAME_NOREPLACE
>>>>> usr/include/linux/fcntl.h:160:9: warning: ‘AT_RENAME_EXCHANGE’ redefined
>>>>> 160 | #define AT_RENAME_EXCHANGE 0x0002
>>>>> /usr/include/stdio.h:173:10: note: this is the location of the previous definition
>>>>> 173 | # define AT_RENAME_EXCHANGE RENAME_EXCHANGE
>>>>> usr/include/linux/fcntl.h:161:9: warning: ‘AT_RENAME_WHITEOUT’ redefined
>>>>> 161 | #define AT_RENAME_WHITEOUT 0x0004
>>>>> /usr/include/stdio.h:175:10: note: this is the location of the previous definition
>>>>> 175 | # define AT_RENAME_WHITEOUT RENAME_WHITEOUT
>>>>>
>>>>> Fixes: b4fef22c2fb9 ("uapi: explain how per-syscall AT_* flags should be allocated")
>>>>> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
>>>>> ---
>>>>> Cc: Amir Goldstein <amir73il@gmail.com>
>>>>> Cc: Jeff Layton <jlayton@kernel.org>
>>>>> Cc: Chuck Lever <chuck.lever@oracle.com>
>>>>> Cc: Alexander Aring <alex.aring@gmail.com>
>>>>> Cc: Josef Bacik <josef@toxicpanda.com>
>>>>> Cc: Aleksa Sarai <cyphar@cyphar.com>
>>>>> Cc: Jan Kara <jack@suse.cz>
>>>>> Cc: Christian Brauner <brauner@kernel.org>
>>>>> Cc: Matthew Wilcox <willy@infradead.org>
>>>>> Cc: David Howells <dhowells@redhat.com>
>>>>> CC: linux-api@vger.kernel.org
>>>>> To: linux-fsdevel@vger.kernel.org
>>>>>
>>>>> include/uapi/linux/fcntl.h | 9 ++++++---
>>>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>
>>>>> --- linux-next-20250819.orig/include/uapi/linux/fcntl.h
>>>>> +++ linux-next-20250819/include/uapi/linux/fcntl.h
>>>>> @@ -156,9 +156,12 @@
>>>>> */
>>>>>
>>>>> /* Flags for renameat2(2) (must match legacy RENAME_* flags). */
>>>>> -#define AT_RENAME_NOREPLACE 0x0001
>>>>> -#define AT_RENAME_EXCHANGE 0x0002
>>>>> -#define AT_RENAME_WHITEOUT 0x0004
>>>>> +# define RENAME_NOREPLACE (1 << 0)
>>>>> +# define AT_RENAME_NOREPLACE RENAME_NOREPLACE
>>>>> +# define RENAME_EXCHANGE (1 << 1)
>>>>> +# define AT_RENAME_EXCHANGE RENAME_EXCHANGE
>>>>> +# define RENAME_WHITEOUT (1 << 2)
>>>>> +# define AT_RENAME_WHITEOUT RENAME_WHITEOUT
>>>>>
>>>>
>>>> This solution, apart from being terribly wrong (adjust the source to match
>>>> to value of its downstream copy), does not address the issue that Mathew
>>>> pointed out on v1 discussion [1]:
>>>
>>> I didn't forget or ignore this.
>>> If the macros have the same values (well, not just values but also the
>>> same text), then I don't see why it matters whether they are in some older
>>> version of glibc.
>>>
>>>> $ grep -r AT_RENAME_NOREPLACE /usr/include
>>>> /usr/include/linux/fcntl.h:#define AT_RENAME_NOREPLACE 0x0001
>>>>
>>>> It's not in stdio.h at all. This is with libc6 2.41-10
>>>>
>>>> [1] https://lore.kernel.org/linux-fsdevel/aKxfGix_o4glz8-Z@casper.infradead.org/
>>>>
>>>> I don't know how to resolve the mess that glibc has created.
>>>
>>> Yeah, I guess I don't either.
>>>
>>>> Perhaps like this:
>>>>
>>>> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
>>>> index f291ab4f94ebc..dde14fa3c2007 100644
>>>> --- a/include/uapi/linux/fcntl.h
>>>> +++ b/include/uapi/linux/fcntl.h
>>>> @@ -155,10 +155,16 @@
>>>> * as possible, so we can use them for generic bits in the future if necessary.
>>>> */
>>>>
>>>> -/* Flags for renameat2(2) (must match legacy RENAME_* flags). */
>>>> -#define AT_RENAME_NOREPLACE 0x0001
>>>> -#define AT_RENAME_EXCHANGE 0x0002
>>>> -#define AT_RENAME_WHITEOUT 0x0004
>>>> +/*
>>>> + * The legacy renameat2(2) RENAME_* flags are conceptually also
>>>> syscall-specific
>>>> + * flags, so it could makes sense to create the AT_RENAME_* aliases
>>>> for them and
>>>> + * maybe later add support for generic AT_* flags to this syscall.
>>>> + * However, following a mismatch of definitions in glibc and since no
>>>> kernel code
>>>> + * currently uses the AT_RENAME_* aliases, we leave them undefined here.
>>>> +#define AT_RENAME_NOREPLACE RENAME_NOREPLACE
>>>> +#define AT_RENAME_EXCHANGE RENAME_EXCHANGE
>>>> +#define AT_RENAME_WHITEOUT RENAME_WHITEOUT
>>>> +*/
>>>
>>> Well, we do have samples/ code that uses fcntl.h (indirectly; maybe
>>> that can be fixed).
>>> See the build errors in the patch description.
>>>
>>>
>>>> /* Flag for faccessat(2). */
>>>> #define AT_EACCESS 0x200 /* Test access permitted for
>>>
>>> With this patch (your suggestion above):
>>>
>>> IF a userspace program in samples/ uses <uapi/linux/fcntl.h> without
>>> using <stdio.h>, [yes, I created one to test this] and without using
>>> <uapi/linux/fs.h> then the build fails with similar build errors:
>>>
>>> ../samples/watch_queue/watch_nostdio.c: In function ‘consumer’:
>>> ../samples/watch_queue/watch_nostdio.c:33:32: error: ‘RENAME_NOREPLACE’ undeclared (first use in this function)
>>> 33 | return RENAME_NOREPLACE;
>>> ../samples/watch_queue/watch_nostdio.c:33:32: note: each undeclared identifier is reported only once for each function it appears in
>>> ../samples/watch_queue/watch_nostdio.c:37:32: error: ‘RENAME_EXCHANGE’ undeclared (first use in this function)
>>> 37 | return RENAME_EXCHANGE;
>>> ../samples/watch_queue/watch_nostdio.c:41:32: error: ‘RENAME_WHITEOUT’ undeclared (first use in this function)
>>> 41 | return RENAME_WHITEOUT;
>>>
>>> This build succeeds with my version 1 patch (full defining of both
>>> RENAME_* and AT_RENAME_* macros). It fails with the patch that you suggested
>>> above.
>>>
>>> OK, here's what I propose.
>>>
>>> a. remove the unused and (sort of) recently added AT_RENAME_* macros
>>> in include/uapi/linux/fcntl.h. Nothing in the kernel tree uses them.
>>> This is:
>>>
>>> commit b4fef22c2fb9
>>> Author: Aleksa Sarai <cyphar@cyphar.com>
>>> Date: Wed Aug 28 20:19:42 2024 +1000
>>> uapi: explain how per-syscall AT_* flags should be allocated
>>>
>>> These macros should have never been added here IMO.
>>> Just putting them somewhere as examples (in comments) would be OK.
>>>
>
> I agree.
> I did not get this patch from Aleksa,
> but I proposed something similar above.
>
>>> This alone fixes all of the build errors in samples/ that I originally
>>> reported.
>>>
>>> b. if a userspace program wants to use the RENAME_* macros, it should
>>> #include <linux/fs.h> instead of <linux/fcntl.h>.
>>>
>>> This fixes the "contrived" build error that I manufactured.
>>>
>>> Note that some programs in tools/ do use AT_RENAME_* (all 3 macros)
>>> but they define those macros locally.
>>>
>>
>> And after more testing, this is what I think works:
>>
>> a. remove all of the AT_RENAME-* macros from <uapi/linux/fcntl.h>
>> (as above)
>
> ok.
>
>>
>> b. put the AT_RENAME_* macros into <uapi/linux/fs.h> like so:
>>
>> +/* Flags for renameat2(2) (must match legacy RENAME_* flags). */
>> +# define AT_RENAME_NOREPLACE RENAME_NOREPLACE
>> +# define AT_RENAME_EXCHANGE RENAME_EXCHANGE
>> +# define AT_RENAME_WHITEOUT RENAME_WHITEOUT
>>
>> so that they match what is in upstream glibc stdio.h, hence not
>> causing duplicate definition errors.
>
> Disagree.
> We do not need to define them at all.
>
> The *only* reason we defined them in fcntl.h is so the
> definition will be together with the rest of the AT_ flags.
> Now we change that to a comment, but there is no reason to
> define them at fs.h. Why would we need to do that?
OK, that works. I'll make a v3 like that.
Thanks.
--
~Randy
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Mike Rapoport @ 2025-09-03 19:29 UTC (permalink / raw)
To: Pasha Tatashin
Cc: Jason Gunthorpe, Pratyush Yadav, jasonmiu, graf, changyuanl,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <CA+CK2bB-CaEdvzxt9=c1SZwXBfy-nE202Q2mfHL_2K7spjf8rw@mail.gmail.com>
On Wed, Sep 03, 2025 at 03:59:40PM +0000, Pasha Tatashin wrote:
> >
> > And again in real systems we expect memfd to be fully populated too.
>
> I thought so too, but we already have a use case for slightly sparse
> memfd, unfortunately, that becomes *very* inefficient when fully
> populated.
Wait, regardless of how sparse memfd is, once you memfd_pin_folios() the
number of folios to preserve is known and the metadata to preserve is a
fully populated array.
--
Sincerely yours,
Mike.
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Mike Rapoport @ 2025-09-03 19:39 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Jason Gunthorpe, Pasha Tatashin, jasonmiu, graf, changyuanl,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <mafs0qzwnvcwk.fsf@kernel.org>
Hi Pratyush,
On Wed, Sep 03, 2025 at 04:17:15PM +0200, Pratyush Yadav wrote:
> On Tue, Sep 02 2025, Mike Rapoport wrote:
> >
> > As for porting kho_preserve_vmalloc() to kho_array, I also feel that it
> > would just make kho_preserve_vmalloc() more complex and I'd rather simplify
> > it even more, e.g. with preallocating all the pages that preserve indices
> > in advance.
>
> I think there are two parts here. One is the data format of the KHO
> array and the other is the way to build it. I think the format is quite
> simple and versatile, and we can have many strategies of building it.
>
> For example, if you are only concerned with pre-allocating data, I can
> very well add a way to initialize the KHO array with with a fixed size
> up front.
I wasn't concerned with preallocation vs allocating a page at a time, I
though with preallocation the vmalloc code will become even simpler, but
it's not :)
> Beyond that, I think KHO array will actually make kho_preserve_vmalloc()
> simpler since it won't have to deal with the linked list traversal
> logic. It can just do ka_for_each() and just get all the pages.
>
> We can also convert the preservation bitmaps to use it so the linked list
> logic is in one place, and others just build on top of it.
I disagree. The boilerplate to initialize and iterate the kho_array will
not make neither vmalloc nor bitmaps preservation simpler IMO.
And for bitmaps Pasha and Jason M. are anyway working on a different data
structure already, so if their proposal moves forward converting bitmap
preservation to anything would be a wasted effort.
> --
> Regards,
> Pratyush Yadav
--
Sincerely yours,
Mike.
^ permalink raw reply
* [PATCH v3] uapi/linux/fcntl: remove AT_RENAME* macros
From: Randy Dunlap @ 2025-09-04 6:22 UTC (permalink / raw)
To: linux-fsdevel
Cc: patches, Randy Dunlap, Amir Goldstein, Jeff Layton, Chuck Lever,
Alexander Aring, Josef Bacik, Aleksa Sarai, Jan Kara,
Christian Brauner, Matthew Wilcox, David Howells, linux-api
Don't define the AT_RENAME_* macros at all since the kernel does not
use them nor does the kernel need to provide them for userspace.
Leave them as comments in <uapi/linux/fcntl.h> only as an example.
The AT_RENAME_* macros have recently been added to glibc's <stdio.h>.
For a kernel allmodconfig build, this made the macros be defined
differently in 2 places (same values but different macro text),
causing build errors/warnings (duplicate definitions) in both
samples/watch_queue/watch_test.c and samples/vfs/test-statx.c.
(<linux/fcntl.h> is included indirecty in both programs above.)
Fixes: b4fef22c2fb9 ("uapi: explain how per-syscall AT_* flags should be allocated")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
---
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Howells <dhowells@redhat.com>
CC: linux-api@vger.kernel.org
To: linux-fsdevel@vger.kernel.org
---
include/uapi/linux/fcntl.h | 6 ++++++
1 file changed, 6 insertions(+)
--- linux-next-20250819.orig/include/uapi/linux/fcntl.h
+++ linux-next-20250819/include/uapi/linux/fcntl.h
@@ -155,10 +155,16 @@
* as possible, so we can use them for generic bits in the future if necessary.
*/
+/*
+ * Note: This is an example of how the AT_RENAME_* flags could be defined,
+ * but the kernel has no need to define them, so leave them as comments.
+ */
/* Flags for renameat2(2) (must match legacy RENAME_* flags). */
+/*
#define AT_RENAME_NOREPLACE 0x0001
#define AT_RENAME_EXCHANGE 0x0002
#define AT_RENAME_WHITEOUT 0x0004
+*/
/* Flag for faccessat(2). */
#define AT_EACCESS 0x200 /* Test access permitted for
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Pratyush Yadav @ 2025-09-04 12:39 UTC (permalink / raw)
To: Mike Rapoport
Cc: Pratyush Yadav, Jason Gunthorpe, Pasha Tatashin, jasonmiu, graf,
changyuanl, dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen,
kanie, ojeda, aliceryhl, masahiroy, akpm, tj, yoann.congal,
mmaurer, roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <aLiZbb_F5R2x9-y2@kernel.org>
Hi Mike,
On Wed, Sep 03 2025, Mike Rapoport wrote:
> On Wed, Sep 03, 2025 at 04:17:15PM +0200, Pratyush Yadav wrote:
>> On Tue, Sep 02 2025, Mike Rapoport wrote:
>> >
>> > As for porting kho_preserve_vmalloc() to kho_array, I also feel that it
>> > would just make kho_preserve_vmalloc() more complex and I'd rather simplify
>> > it even more, e.g. with preallocating all the pages that preserve indices
>> > in advance.
[...]
>
>> Beyond that, I think KHO array will actually make kho_preserve_vmalloc()
>> simpler since it won't have to deal with the linked list traversal
>> logic. It can just do ka_for_each() and just get all the pages.
>>
>> We can also convert the preservation bitmaps to use it so the linked list
>> logic is in one place, and others just build on top of it.
>
> I disagree. The boilerplate to initialize and iterate the kho_array will
> not make neither vmalloc nor bitmaps preservation simpler IMO.
I have done 80% of the work on this already, so let's do this: I will do
the rest of the 20% and publish the patches. Then you and Jason can have
a look and if you still think it's not worth it, I am fine shelving it
for now and revisiting later when there might be a stronger case.
>
> And for bitmaps Pasha and Jason M. are anyway working on a different data
> structure already, so if their proposal moves forward converting bitmap
> preservation to anything would be a wasted effort.
--
Regards,
Pratyush Yadav
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Pratyush Yadav @ 2025-09-04 12:57 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Pratyush Yadav, Pasha Tatashin, jasonmiu, graf, changyuanl, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <20250903150157.GH470103@nvidia.com>
Hi Jason,
On Wed, Sep 03 2025, Jason Gunthorpe wrote:
> On Wed, Sep 03, 2025 at 04:10:37PM +0200, Pratyush Yadav wrote:
>
>> > So, it could be useful, but I wouldn't use it for memfd, the vmalloc
>> > approach is better and we shouldn't optimize for sparsness which
>> > should never happen.
>>
>> I disagree. I think we are re-inventing the same data format with minor
>> variations. I think we should define extensible fundamental data formats
>> first, and then use those as the building blocks for the rest of our
>> serialization logic.
>
> page, vmalloc, slab seem to me to be the fundamental units of memory
> management in linux, so they should get KHO support.
>
> If you want to preserve a known-sized array you use vmalloc and then
> write out the per-list items. If it is a dictionary/sparse array then
> you write an index with each item too. This is all trivial and doesn't
> really need more abstraction in of itself, IMHO.
We will use up double the space for tracking metadata, but maybe that is
fine until we start seeing bigger memfds in real workloads.
>
>> cases can then build on top of it. For example, the preservation bitmaps
>> can get rid of their linked list logic and just use KHO array to hold
>> and retrieve its bitmaps. It will make the serialization simpler.
>
> I don't think the bitmaps should, the serialization here is very
> special because it is not actually preserved, it just exists for the
> time while the new kernel runs in scratch and is insta freed once the
> allocators start up.
I don't think it matters if they are preserved or not. The serialization
and deserialization is independent of that. You can very well create a
KHO array that you don't KHO-preserve. On next boot, you can still use
it, you just have to be careful of doing it while scratch-only. Same as
we do now.
>
>> I also don't get why you think sparseness "should never happen". For
>> memfd for example, you say in one of your other emails that "And again
>> in real systems we expect memfd to be fully populated too." Which
>> systems and use cases do you have in mind? Why do you think people won't
>> want a sparse memfd?
>
> memfd should principally be used to back VM memory, and I expect VM
> memory to be fully populated. Why would it be sparse?
For the _hypervisor_ live update case, sure. Though even there, I have a
feeling we will start seeing userspace components on the hypervisor use
memfd for stashing some of their state. Pasha has already mentioned they
have a use case for a memfd that is not VM memory.
But hypervisor live upadte isn't the only use case for LUO. We are
looking at enabling state preservation for "normal" userspace
applications. Think big storage nodes with memory in order of TiB. Those
can use a memfd to back their caches so on a kernel upgrade the caches
don't have to be re-fetched. Sparseness is to be expected for such use
cases.
>
>> All in all, I think KHO array is going to prove useful and will make
>> serialization for subsystems easier. I think sparseness will also prove
>> useful but it is not a hill I want to die on. I am fine with starting
>> with a non-sparse array if people really insist. But I do think we
>> should go with KHO array as a base instead of re-inventing the linked
>> list of pages again and again.
>
> The two main advantages I see to the kho array design vs vmalloc is
> that it should be a bit faster as it doesn't establish a vmap, and it
> handles unknown size lists much better.
>
> Are these important considerations? IDK.
>
> As I said to Chris, I think we should see more examples of what we
> actually need before assuming any certain datastructure is the best
> choice.
>
> So I'd stick to simpler open coded things and go back and improve them
> than start out building the wrong shared data structure.
>
> How about have at least three luo clients that show meaningful benefit
> before proposing something beyond the fundamental page, vmalloc, slab
> things?
I think the fundamentals themselves get some benefit. But anyway, since
I have done most of the work on this feature anyway, I will do the rest
and send the patches out. Then you can have a look and if you're still
not convinced, I am fine shelving it for now to revisit later when a
stronger case can be made.
>
>> What do you mean by "data per version"? I think there should be only one
>> version of the serialized object. Multiple versions of the same thing
>> will get ugly real quick.
>
> If you want to support backwards/forwards compatability then you
> probably should support multiple versions as well. Otherwise it
> could become quite hard to make downgrades..
Hmm, forward can work regardless since a newer kernel should speak older
formats too, but for backwards it makes sense to have an older version.
But perhaps it might be a better idea to come up with a mechanism for
the kernel to discover which formats the "next" kernel speaks so it can
for one decide whether it can do the live update at all, and for another
which formats it should use. Maybe we give a way for luod to choose
formats, and give it the responsibility for doing these checks?
>
> Ideally I'd want to remove the upstream code for obsolete versions
> fairly quickly so I'd imagine kernels will want to generate both
> versions during the transition period and then eventually newer
> kernels will only accept the new version.
>
> I've argued before that the extended matrix of any kernel version to
> any other kernel version should lie with the distro/CSP making the
> kernel fork. They know what their upgrade sequence will be so they can
> manage any missing versions to make it work.
>
> Upstream should do like v6.1 to v6.2 only or something similarly well
> constrained. I think this is a reasonable trade off to get subsystem
> maintainers to even accept this stuff at all.
[...]
--
Regards,
Pratyush Yadav
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Jason Gunthorpe @ 2025-09-04 14:42 UTC (permalink / raw)
To: Pratyush Yadav
Cc: Pasha Tatashin, jasonmiu, graf, changyuanl, rppt, dmatlack,
rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda, aliceryhl,
masahiroy, akpm, tj, yoann.congal, mmaurer, roman.gushchin,
chenridong, axboe, mark.rutland, jannh, vincent.guittot, hannes,
dan.j.williams, david, joel.granados, rostedt, anna.schumaker,
song, zhangguopeng, linux, linux-kernel, linux-doc, linux-mm,
gregkh, tglx, mingo, bp, dave.hansen, x86, hpa, rafael, dakr,
bartosz.golaszewski, cw00.choi, myungjoo.ham, yesanishhere,
Jonathan.Cameron, quic_zijuhu, aleksander.lobakin, ira.weiny,
andriy.shevchenko, leon, lukas, bhelgaas, wagi, djeffery,
stuart.w.hayes, lennart, brauner, linux-api, linux-fsdevel,
saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <mafs0a53av0hs.fsf@kernel.org>
On Thu, Sep 04, 2025 at 02:57:35PM +0200, Pratyush Yadav wrote:
> I don't think it matters if they are preserved or not. The serialization
> and deserialization is independent of that. You can very well create a
> KHO array that you don't KHO-preserve. On next boot, you can still use
> it, you just have to be careful of doing it while scratch-only. Same as
> we do now.
The KHO array machinery itself can't preserve its own memory
either.
> For the _hypervisor_ live update case, sure. Though even there, I have a
> feeling we will start seeing userspace components on the hypervisor use
> memfd for stashing some of their state.
Sure, but don't make excessively sparse memfds for kexec use, why
should that be hard?
> applications. Think big storage nodes with memory in order of TiB. Those
> can use a memfd to back their caches so on a kernel upgrade the caches
> don't have to be re-fetched. Sparseness is to be expected for such use
> cases.
Oh? I'm surpised you'd have sparseness there. sparseness seems like
such a weird feature to want to rely on :\
> But perhaps it might be a better idea to come up with a mechanism for
> the kernel to discover which formats the "next" kernel speaks so it can
> for one decide whether it can do the live update at all, and for another
> which formats it should use. Maybe we give a way for luod to choose
> formats, and give it the responsibility for doing these checks?
I have felt that we should catalog the formats&versions the kernel can
read/write in some way during kbuild.
Maybe this turns into a sysfs directory of all the data with an
'enable_write' flag that luod could set to 0 to optimize.
And maybe this could be a kbuild report that luod could parse to do
this optimization.
And maybe distro/csps use this information mechanically to check if
version pairs are kexec compatible.
Which re-enforces my feeling that the formats/version should be first
class concepts, every version should be registered and luo should
sequence calling the code for the right version at the right time.
Jason
^ permalink raw reply
* Re: [PATCH v3 29/30] luo: allow preserving memfd
From: Jason Gunthorpe @ 2025-09-04 17:34 UTC (permalink / raw)
To: Chris Li
Cc: Pasha Tatashin, pratyush, jasonmiu, graf, changyuanl, rppt,
dmatlack, rientjes, corbet, rdunlap, ilpo.jarvinen, kanie, ojeda,
aliceryhl, masahiroy, akpm, tj, yoann.congal, mmaurer,
roman.gushchin, chenridong, axboe, mark.rutland, jannh,
vincent.guittot, hannes, dan.j.williams, david, joel.granados,
rostedt, anna.schumaker, song, zhangguopeng, linux, linux-kernel,
linux-doc, linux-mm, gregkh, tglx, mingo, bp, dave.hansen, x86,
hpa, rafael, dakr, bartosz.golaszewski, cw00.choi, myungjoo.ham,
yesanishhere, Jonathan.Cameron, quic_zijuhu, aleksander.lobakin,
ira.weiny, andriy.shevchenko, leon, lukas, bhelgaas, wagi,
djeffery, stuart.w.hayes, ptyadav, lennart, brauner, linux-api,
linux-fsdevel, saeedm, ajayachandra, parav, leonro, witu
In-Reply-To: <CACePvbWGR+XPfTub41=Ekj3aSMjzyO+FyJmzMy5HEQKq0-wqag@mail.gmail.com>
On Wed, Sep 03, 2025 at 05:01:15AM -0700, Chris Li wrote:
> > And if you want to serialize that the optimal path would be to have a
> > vmalloc of all the strings and a vmalloc of the [] data, sort of like
> > the kho array idea.
>
> The KHO array idea is already implemented in the existing KHO code or
> that is something new you want to propose?
Pratyush has proposed it
> Then we will have to know the combined size of the string up front,
> similar to the FDT story. Ideally the list can incrementally add items
> to it. May be stored as a list as raw pointer without vmalloc
> first,then have a final pass vmalloc and serialize the string and
> data.
There are many options, and the dynamic extendability from the KHO
array might be a good fit here. But you can also just store the
serializations in a linked list and then write them out.
> With the additional detail above, I would like to point out something
> I have observed earlier: even though the core idea of the native C
> struct is simple and intuitive, the end of end implementation is not.
> When we compare C struct implementation, we need to include all those
> additional boilerplate details as a whole, otherwise it is not a apple
> to apple comparison.
You need all of this anyhow, BTF doesn't create version meta data,
evaluate which version are suitable, or de-serialize complex rbtree or
linked lists structures.
> > Your BTF proposal doesn't seem to benifit memfd at all, it was focused
> > on extracting data directly from an existing struct which I feel very
> > strongly we should never do.
>
> From data flow point of view, the data is get from a C struct and
> eventually store into a C struct. That is no way around that. That is
> the necessary evil if you automate this process. Hey, there is also no
> rule saying that you can't use a bounce buffer of some kind of manual
> control in between.
Yeah but if I already wrote the code to make the required C struct
there only difference is 'memcpy c struct' vs 'serialze with btf c
struct' and that isn't meaningful.
If the boilerplate is around arrays of C structs and things then the
KHO array proposal is a good direction to de-duplicate code.
> It is just a way to automate stuff to reduce the boilerplate.
You haven't clearly spelled out what the boilerplate even is, this was
my feedback to you to be very clear on what is being improved.
> I feel a much stronger sense of urgency than you though. The stakes
> are high, currently you already have four departments can use this
> common serialization library right now:
> 1) PCI
> 2) VFIO
> 3) IOMMU
> 4) Memfd.
We don't know what they actually need to write out, we haven't seen
any patches.
Let's start with simple patches and deal with the fundamental problems
like versioning, then you can come with ideas to optimize if it turns
out there is something to improve here.
I'm not convinced PCI (a few bits per struct pci_device to start),
memfd (xarray) and IOMMU (dictionaries of HW physical pointers) share
a significant overlap of serialization requirements beyond luo level
managing the objects and versioning.
Jason
^ permalink raw reply
* Re: [PATCH v3] uapi/linux/fcntl: remove AT_RENAME* macros
From: Amir Goldstein @ 2025-09-04 18:17 UTC (permalink / raw)
To: Randy Dunlap
Cc: linux-fsdevel, patches, Jeff Layton, Chuck Lever, Alexander Aring,
Josef Bacik, Aleksa Sarai, Jan Kara, Christian Brauner,
Matthew Wilcox, David Howells, linux-api
In-Reply-To: <20250904062215.2362311-1-rdunlap@infradead.org>
On Thu, Sep 4, 2025 at 8:22 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> Don't define the AT_RENAME_* macros at all since the kernel does not
> use them nor does the kernel need to provide them for userspace.
> Leave them as comments in <uapi/linux/fcntl.h> only as an example.
>
> The AT_RENAME_* macros have recently been added to glibc's <stdio.h>.
> For a kernel allmodconfig build, this made the macros be defined
> differently in 2 places (same values but different macro text),
> causing build errors/warnings (duplicate definitions) in both
> samples/watch_queue/watch_test.c and samples/vfs/test-statx.c.
> (<linux/fcntl.h> is included indirecty in both programs above.)
>
> Fixes: b4fef22c2fb9 ("uapi: explain how per-syscall AT_* flags should be allocated")
> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> ---
> Cc: Amir Goldstein <amir73il@gmail.com>
> Cc: Jeff Layton <jlayton@kernel.org>
> Cc: Chuck Lever <chuck.lever@oracle.com>
> Cc: Alexander Aring <alex.aring@gmail.com>
> Cc: Josef Bacik <josef@toxicpanda.com>
> Cc: Aleksa Sarai <cyphar@cyphar.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Howells <dhowells@redhat.com>
> CC: linux-api@vger.kernel.org
> To: linux-fsdevel@vger.kernel.org
> ---
> include/uapi/linux/fcntl.h | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> --- linux-next-20250819.orig/include/uapi/linux/fcntl.h
> +++ linux-next-20250819/include/uapi/linux/fcntl.h
> @@ -155,10 +155,16 @@
> * as possible, so we can use them for generic bits in the future if necessary.
> */
>
> +/*
> + * Note: This is an example of how the AT_RENAME_* flags could be defined,
> + * but the kernel has no need to define them, so leave them as comments.
> + */
> /* Flags for renameat2(2) (must match legacy RENAME_* flags). */
> +/*
> #define AT_RENAME_NOREPLACE 0x0001
> #define AT_RENAME_EXCHANGE 0x0002
> #define AT_RENAME_WHITEOUT 0x0004
> +*/
>
I find this end result a bit odd, but I don't want to suggest another variant
I already proposed one in v2 review [1] that maybe you did not like.
It's fine.
I'll let Aleksa and Christian chime in to decide on if and how they want this
comment to look or if we should just delete these definitions and be done with
this episode.
Thanks,
Amir.
[1] https://lore.kernel.org/r/CAOQ4uxjXvYBsW1Nb2HKaoUg1qi8Pkq1XKtQEbnAvMUGcp7LrZA@mail.gmail.com/
^ permalink raw reply
* Re: [PATCH v3] uapi/linux/fcntl: remove AT_RENAME* macros
From: Florian Weimer @ 2025-09-04 18:49 UTC (permalink / raw)
To: Amir Goldstein
Cc: Randy Dunlap, linux-fsdevel, patches, Jeff Layton, Chuck Lever,
Alexander Aring, Josef Bacik, Aleksa Sarai, Jan Kara,
Christian Brauner, Matthew Wilcox, David Howells, linux-api
In-Reply-To: <CAOQ4uxiJibbq_MX3HkNaFb3GXGsZ0nNehk+MNODxXxy_khSwEQ@mail.gmail.com>
* Amir Goldstein:
> I find this end result a bit odd, but I don't want to suggest another variant
> I already proposed one in v2 review [1] that maybe you did not like.
> It's fine.
> I'll let Aleksa and Christian chime in to decide on if and how they want this
> comment to look or if we should just delete these definitions and be done with
> this episode.
We should fix the definition in glibc to be identical token-wise to the
kernel's.
Thanks,
Florian
^ permalink raw reply
* Re: [PATCH v3] uapi/linux/fcntl: remove AT_RENAME* macros
From: Randy Dunlap @ 2025-09-04 21:51 UTC (permalink / raw)
To: Amir Goldstein
Cc: linux-fsdevel, patches, Jeff Layton, Chuck Lever, Alexander Aring,
Josef Bacik, Aleksa Sarai, Jan Kara, Christian Brauner,
Matthew Wilcox, David Howells, linux-api
In-Reply-To: <CAOQ4uxiJibbq_MX3HkNaFb3GXGsZ0nNehk+MNODxXxy_khSwEQ@mail.gmail.com>
On 9/4/25 11:17 AM, Amir Goldstein wrote:
> On Thu, Sep 4, 2025 at 8:22 AM Randy Dunlap <rdunlap@infradead.org> wrote:
>>
>> Don't define the AT_RENAME_* macros at all since the kernel does not
>> use them nor does the kernel need to provide them for userspace.
>> Leave them as comments in <uapi/linux/fcntl.h> only as an example.
>>
>> The AT_RENAME_* macros have recently been added to glibc's <stdio.h>.
>> For a kernel allmodconfig build, this made the macros be defined
>> differently in 2 places (same values but different macro text),
>> causing build errors/warnings (duplicate definitions) in both
>> samples/watch_queue/watch_test.c and samples/vfs/test-statx.c.
>> (<linux/fcntl.h> is included indirecty in both programs above.)
>>
>> Fixes: b4fef22c2fb9 ("uapi: explain how per-syscall AT_* flags should be allocated")
>> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
>> ---
>> Cc: Amir Goldstein <amir73il@gmail.com>
>> Cc: Jeff Layton <jlayton@kernel.org>
>> Cc: Chuck Lever <chuck.lever@oracle.com>
>> Cc: Alexander Aring <alex.aring@gmail.com>
>> Cc: Josef Bacik <josef@toxicpanda.com>
>> Cc: Aleksa Sarai <cyphar@cyphar.com>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Christian Brauner <brauner@kernel.org>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: David Howells <dhowells@redhat.com>
>> CC: linux-api@vger.kernel.org
>> To: linux-fsdevel@vger.kernel.org
>> ---
>> include/uapi/linux/fcntl.h | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> --- linux-next-20250819.orig/include/uapi/linux/fcntl.h
>> +++ linux-next-20250819/include/uapi/linux/fcntl.h
>> @@ -155,10 +155,16 @@
>> * as possible, so we can use them for generic bits in the future if necessary.
>> */
>>
>> +/*
>> + * Note: This is an example of how the AT_RENAME_* flags could be defined,
>> + * but the kernel has no need to define them, so leave them as comments.
>> + */
>> /* Flags for renameat2(2) (must match legacy RENAME_* flags). */
>> +/*
>> #define AT_RENAME_NOREPLACE 0x0001
>> #define AT_RENAME_EXCHANGE 0x0002
>> #define AT_RENAME_WHITEOUT 0x0004
>> +*/
>>
>
> I find this end result a bit odd, but I don't want to suggest another variant
> I already proposed one in v2 review [1] that maybe you did not like.
> It's fine.
Yes, I replied to that with another problem.
> I'll let Aleksa and Christian chime in to decide on if and how they want this
> comment to look or if we should just delete these definitions and be done with
> this episode.
Sure, I'm ready to just throw my hands up (give up).
>
> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/r/CAOQ4uxjXvYBsW1Nb2HKaoUg1qi8Pkq1XKtQEbnAvMUGcp7LrZA@mail.gmail.com/
thanks.
--
~Randy
^ permalink raw reply
* Re: [PATCH v3] uapi/linux/fcntl: remove AT_RENAME* macros
From: Randy Dunlap @ 2025-09-04 21:52 UTC (permalink / raw)
To: Florian Weimer, Amir Goldstein
Cc: linux-fsdevel, patches, Jeff Layton, Chuck Lever, Alexander Aring,
Josef Bacik, Aleksa Sarai, Jan Kara, Christian Brauner,
Matthew Wilcox, David Howells, linux-api
In-Reply-To: <lhua53auk7q.fsf@oldenburg.str.redhat.com>
On 9/4/25 11:49 AM, Florian Weimer wrote:
> * Amir Goldstein:
>
>> I find this end result a bit odd, but I don't want to suggest another variant
>> I already proposed one in v2 review [1] that maybe you did not like.
>> It's fine.
>> I'll let Aleksa and Christian chime in to decide on if and how they want this
>> comment to look or if we should just delete these definitions and be done with
>> this episode.
>
> We should fix the definition in glibc to be identical token-wise to the
> kernel's.
That's probably a good suggestion...
while I tried the reverse of that and Amir opposed.
Now I find that I don't care enough to sustain this.
Thanks.
--
~Randy
^ permalink raw reply
* Re: [PATCH v3] uapi/linux/fcntl: remove AT_RENAME* macros
From: Aleksa Sarai @ 2025-09-05 5:11 UTC (permalink / raw)
To: Amir Goldstein
Cc: Randy Dunlap, linux-fsdevel, patches, Jeff Layton, Chuck Lever,
Alexander Aring, Josef Bacik, Jan Kara, Christian Brauner,
Matthew Wilcox, David Howells, linux-api
In-Reply-To: <CAOQ4uxiJibbq_MX3HkNaFb3GXGsZ0nNehk+MNODxXxy_khSwEQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3411 bytes --]
On 2025-09-04, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Sep 4, 2025 at 8:22 AM Randy Dunlap <rdunlap@infradead.org> wrote:
> >
> > Don't define the AT_RENAME_* macros at all since the kernel does not
> > use them nor does the kernel need to provide them for userspace.
> > Leave them as comments in <uapi/linux/fcntl.h> only as an example.
> >
> > The AT_RENAME_* macros have recently been added to glibc's <stdio.h>.
> > For a kernel allmodconfig build, this made the macros be defined
> > differently in 2 places (same values but different macro text),
> > causing build errors/warnings (duplicate definitions) in both
> > samples/watch_queue/watch_test.c and samples/vfs/test-statx.c.
> > (<linux/fcntl.h> is included indirecty in both programs above.)
> >
> > Fixes: b4fef22c2fb9 ("uapi: explain how per-syscall AT_* flags should be allocated")
> > Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> > ---
> > Cc: Amir Goldstein <amir73il@gmail.com>
> > Cc: Jeff Layton <jlayton@kernel.org>
> > Cc: Chuck Lever <chuck.lever@oracle.com>
> > Cc: Alexander Aring <alex.aring@gmail.com>
> > Cc: Josef Bacik <josef@toxicpanda.com>
> > Cc: Aleksa Sarai <cyphar@cyphar.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: David Howells <dhowells@redhat.com>
> > CC: linux-api@vger.kernel.org
> > To: linux-fsdevel@vger.kernel.org
> > ---
> > include/uapi/linux/fcntl.h | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > --- linux-next-20250819.orig/include/uapi/linux/fcntl.h
> > +++ linux-next-20250819/include/uapi/linux/fcntl.h
> > @@ -155,10 +155,16 @@
> > * as possible, so we can use them for generic bits in the future if necessary.
> > */
> >
> > +/*
> > + * Note: This is an example of how the AT_RENAME_* flags could be defined,
> > + * but the kernel has no need to define them, so leave them as comments.
> > + */
> > /* Flags for renameat2(2) (must match legacy RENAME_* flags). */
> > +/*
> > #define AT_RENAME_NOREPLACE 0x0001
> > #define AT_RENAME_EXCHANGE 0x0002
> > #define AT_RENAME_WHITEOUT 0x0004
> > +*/
> >
>
> I find this end result a bit odd, but I don't want to suggest another variant
> I already proposed one in v2 review [1] that maybe you did not like.
> It's fine.
> I'll let Aleksa and Christian chime in to decide on if and how they want this
> comment to look or if we should just delete these definitions and be done with
> this episode.
For my part, I'm fine with these becoming comments or even removing them
outright. I think that defining them as AT_* flags would've been useful
examples of how these flags should be used, but it is what it is.
Then again, AT_EXECVE_CHECK went in and used a higher-level bit despite
the comments describing that this was unfavourable and what should be
done instead, so maybe attempting to avoid conflicts is an exercise in
futility...
If it's too much effort to synchronise them between glibc then it's
better to just close the book on this whole chapter (even though my
impression is that glibc made a mistake or two when adding the
definitions).
In either case, feel free to take my
Acked-by: Aleksa Sarai <cyphar@cyphar.com>
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]
^ permalink raw reply
* Re: [PATCH v3] uapi/linux/fcntl: remove AT_RENAME* macros
From: Florian Weimer @ 2025-09-05 7:19 UTC (permalink / raw)
To: Randy Dunlap
Cc: Amir Goldstein, linux-fsdevel, patches, Jeff Layton, Chuck Lever,
Alexander Aring, Josef Bacik, Aleksa Sarai, Jan Kara,
Christian Brauner, Matthew Wilcox, David Howells, linux-api
In-Reply-To: <b35f0ff7-8ffb-400f-b537-d15e83319808@infradead.org>
* Randy Dunlap:
> On 9/4/25 11:49 AM, Florian Weimer wrote:
>> * Amir Goldstein:
>>
>>> I find this end result a bit odd, but I don't want to suggest another variant
>>> I already proposed one in v2 review [1] that maybe you did not like.
>>> It's fine.
>>> I'll let Aleksa and Christian chime in to decide on if and how they want this
>>> comment to look or if we should just delete these definitions and be done with
>>> this episode.
>>
>> We should fix the definition in glibc to be identical token-wise to the
>> kernel's.
>
> That's probably a good suggestion...
> while I tried the reverse of that and Amir opposed.
It's certainly odd that the kernel uses different token sequences for
defining AT_RENAME_* and RENAME_*. But it's probably too late to fix
that.
Here's the glibc patch:
[PATCH] libio: Define AT_RENAME_* with the same tokens as Linux
<https://inbox.sourceware.org/libc-alpha/lhubjnpv03o.fsf@oldenburg.str.redhat.com/T/#u>
Thanks,
Florian
^ permalink raw reply
* Re: [PATCH v3] uapi/linux/fcntl: remove AT_RENAME* macros
From: Randy Dunlap @ 2025-09-05 7:36 UTC (permalink / raw)
To: Florian Weimer
Cc: Amir Goldstein, linux-fsdevel, patches, Jeff Layton, Chuck Lever,
Alexander Aring, Josef Bacik, Aleksa Sarai, Jan Kara,
Christian Brauner, Matthew Wilcox, David Howells, linux-api
In-Reply-To: <lhu7bydv01m.fsf@oldenburg.str.redhat.com>
Hi,
On 9/5/25 12:19 AM, Florian Weimer wrote:
> * Randy Dunlap:
>
>> On 9/4/25 11:49 AM, Florian Weimer wrote:
>>> * Amir Goldstein:
>>>
>>>> I find this end result a bit odd, but I don't want to suggest another variant
>>>> I already proposed one in v2 review [1] that maybe you did not like.
>>>> It's fine.
>>>> I'll let Aleksa and Christian chime in to decide on if and how they want this
>>>> comment to look or if we should just delete these definitions and be done with
>>>> this episode.
>>>
>>> We should fix the definition in glibc to be identical token-wise to the
>>> kernel's.
>>
>> That's probably a good suggestion...
>> while I tried the reverse of that and Amir opposed.
>
> It's certainly odd that the kernel uses different token sequences for
> defining AT_RENAME_* and RENAME_*. But it's probably too late to fix
> that.
>
> Here's the glibc patch:
>
> [PATCH] libio: Define AT_RENAME_* with the same tokens as Linux
> <https://inbox.sourceware.org/libc-alpha/lhubjnpv03o.fsf@oldenburg.str.redhat.com/T/#u>
Thanks!
--
~Randy
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox