From: Jason Gunthorpe <jgg@nvidia.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
Eric Biederman <ebiederm@xmission.com>,
Arnd Bergmann <arnd@arndb.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Hugh Dickins <hughd@google.com>, Alexander Graf <graf@amazon.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
David Woodhouse <dwmw2@infradead.org>,
James Gowans <jgowans@amazon.com>,
Mike Rapoport <rppt@kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Pasha Tatashin <tatashin@google.com>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
Dave Hansen <dave.hansen@intel.com>,
David Hildenbrand <david@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
Wei Yang <richard.weiyang@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, kexec@lists.infradead.org
Subject: Re: [RFC PATCH 1/5] misc: introduce FDBox
Date: Mon, 17 Mar 2025 13:46:00 -0300 [thread overview]
Message-ID: <20250317164600.GM9311@nvidia.com> (raw)
In-Reply-To: <20250308-wutanfall-ersetzbar-2aedc820d80d@brauner>
On Sat, Mar 08, 2025 at 12:09:53PM +0100, Christian Brauner wrote:
> On Fri, Mar 07, 2025 at 11:14:17AM -0400, Jason Gunthorpe wrote:
> > On Fri, Mar 07, 2025 at 10:31:39AM +0100, Christian Brauner wrote:
> > > On Fri, Mar 07, 2025 at 12:57:35AM +0000, Pratyush Yadav wrote:
> > > > The File Descriptor Box (FDBox) is a mechanism for userspace to name
> > > > file descriptors and give them over to the kernel to hold. They can
> > > > later be retrieved by passing in the same name.
> > > >
> > > > The primary purpose of FDBox is to be used with Kexec Handover (KHO).
> > > > There are many kinds anonymous file descriptors in the kernel like
> > > > memfd, guest_memfd, iommufd, etc. that would be useful to be preserved
> > > > using KHO. To be able to do that, there needs to be a mechanism to label
> > > > FDs that allows userspace to set the label before doing KHO and to use
> > > > the label to map them back after KHO. FDBox achieves that purpose by
> > > > exposing a miscdevice which exposes ioctls to label and transfer FDs
> > > > between the kernel and userspace. FDBox is not intended to work with any
> > > > generic file descriptor. Support for each kind of FDs must be explicitly
> > > > enabled.
> > >
> > > This makes no sense as a generic concept. If you want to restore shmem
> > > and possibly anonymous inodes files via KHO then tailor the solution to
> > > shmem and anon inodes but don't make this generic infrastructure. This
> > > has zero chances to cover generic files.
> >
> > We need it to cover a range of FD types in the kernel like iommufd and
>
> anonymous inode
>
> > vfio.
>
> anonymous inode
Yes, I think Pratyush did not really capture that point, that it is
really only for very limited FD types. Realistically probably only
anonymous like things.
> > It is not "generic" in the sense every FD in the kernel magicaly works
> > with fdbox, but that any driver/subsystem providing a FD could be
> > enlightened to support it.
> >
> > Very much do not want the infrastructure tied to just shmem and memfd.
>
> Anything you can reasonably want will either be an internal shmem mount,
> devtmpfs, or anonymous inodes. Anything else isn't going to work.
Yes.
> I'm not yet sold that this needs to be a character device. Because
> that's fundamentally limiting in how useful this can be.
It is part of KHO, and I think KHO wants a character device for other
reasons anyhow.
The whole concept is tied to KHO intrinsically because this new
file_operations callback is going to be calling KHO related functions
to register the information contained in the FD with KHO.
Also, I kind of expect it to be semi-destructive to the FDs in
someway, especially for VFIO and iommufd. The FD will have to be
prepared to go into the KHO first.
> It might be way more useful if this ended up being a separate tiny
> filesystem where such preserved files are simply shown as named entries
> that you can open instead of ioctl()ing your way through character
> devices. But I need to think about that.
It could be possible, but I think this is more complex, and not really
too useful. How do you store a iommufd anonymous inode in a new
special filesystem? What permissions does it have after kexec? How
does open work? What if you open the same path multiple times? What
about the single-open rules of VFIO? How do you "open" co-linked FDs
like VFIO & iommufd?
A char device can give pretty reasonable answers to these questions
when we don't have to pretend to be a filesytem..
Jason
next prev parent reply other threads:[~2025-03-17 16:46 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-07 0:57 [RFC PATCH 0/5] Introduce FDBox, and preserve memfd with shmem over KHO Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 1/5] misc: introduce FDBox Pratyush Yadav
2025-03-07 6:03 ` Greg Kroah-Hartman
2025-03-07 9:31 ` Christian Brauner
2025-03-07 13:19 ` Christian Brauner
2025-03-07 15:14 ` Jason Gunthorpe
2025-03-08 11:09 ` Christian Brauner
2025-03-17 16:46 ` Jason Gunthorpe [this message]
2025-03-08 0:10 ` Pratyush Yadav
2025-03-09 12:03 ` Christian Brauner
2025-03-17 16:59 ` Jason Gunthorpe
2025-03-18 14:25 ` Christian Brauner
2025-03-18 14:57 ` Jason Gunthorpe
2025-03-18 23:02 ` Pratyush Yadav
2025-03-18 23:27 ` Jason Gunthorpe
2025-03-19 13:35 ` Pratyush Yadav
2025-03-20 12:14 ` Jason Gunthorpe
2025-03-26 22:40 ` Pratyush Yadav
2025-03-31 15:38 ` Jason Gunthorpe
2025-03-07 0:57 ` [RFC PATCH 2/5] misc: add documentation for FDBox Pratyush Yadav
2025-03-07 2:19 ` Randy Dunlap
2025-03-07 15:03 ` Pratyush Yadav
2025-03-07 14:22 ` Jonathan Corbet
2025-03-07 14:51 ` Pratyush Yadav
2025-03-07 15:25 ` Jonathan Corbet
2025-03-07 23:28 ` Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 3/5] mm: shmem: allow callers to specify operations to shmem_undo_range Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 4/5] mm: shmem: allow preserving file over FDBOX + KHO Pratyush Yadav
2025-03-07 0:57 ` [RFC PATCH 5/5] mm/memfd: allow preserving FD " Pratyush Yadav
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250317164600.GM9311@nvidia.com \
--to=jgg@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=benh@kernel.crashing.org \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=gregkh@linuxfoundation.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jgowans@amazon.com \
--cc=kexec@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=ptyadav@amazon.de \
--cc=richard.weiyang@gmail.com \
--cc=rppt@kernel.org \
--cc=tatashin@google.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.