All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Steven Sistare <steven.sistare@oracle.com>
Cc: iommu@lists.linux.dev, Kevin Tian <kevin.tian@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>
Subject: Re: [RFC V1 0/4] iommufd live update
Date: Mon, 19 Aug 2024 11:59:08 -0300	[thread overview]
Message-ID: <20240819145908.GH2032816@nvidia.com> (raw)
In-Reply-To: <53e7ab6b-9419-4808-b429-a88faeb3f6a7@oracle.com>

On Mon, Aug 12, 2024 at 01:41:46PM -0400, Steven Sistare wrote:
> On 8/8/2024 3:52 PM, Jason Gunthorpe wrote:
> > On Thu, Aug 08, 2024 at 03:15:02PM -0400, Steven Sistare wrote:
> > > On 8/6/2024 8:56 AM, Jason Gunthorpe wrote:
> > > > On Mon, Aug 05, 2024 at 03:03:30PM -0400, Steven Sistare wrote:
> > > > > On 7/22/2024 11:55 AM, Jason Gunthorpe wrote:
> > > > > > On Sat, Jul 20, 2024 at 11:56:40AM -0700, Steve Sistare wrote:
> > > > > > > Live update is a technique wherein an application saves its state, launches
> > > > > > > an updated version of itself, and restores its state.  Clients of the
> > > > > > > application experience a brief suspension of service, on the order of
> > > > > > > 100's of milliseconds, but are otherwise unaffected.
> > > > > > > 
> > > > > > > Define the IOMMU_IOAS_CHANGE_PROCESS ioctl to allow management and use
> > > > > > > of an iommufd device to be transferred from one process to another.  The
> > > > > > > application is responsible for transferring the device descriptor to the new
> > > > > > > process, eg either by preservation across fork and exec or via SCM_RIGHTS.
> > > > > > 
> > > > > > It seems Ok to me, I'm glad it worked out for you
> > > > > > 
> > > > > > But have you considered using something like the new
> > > > > > memfd_pin_folios() system so that iommufd is bound to the FDs backing
> > > > > > the memory instead of VMAs?
> > > > > > 
> > > > > > https://lore.kernel.org/all/20240624063952.1572359-1-vivek.kasireddy@intel.com/
> > > > > > 
> > > > > > I've been expecting to add support for that, but does it help this scenario?
> > > > > 
> > > > > Thanks for the pointer, I had not seen it.
> > > > > AFAICT it does not affect live update.  The memfd is passed to new qemu, and
> > > > > the manner in which its pages were pinned does not matter, as long as the effect
> > > > > on the mm fields that we manipulate is the same.
> > > > 
> > > > I mean instead of using mmap's() and telling iommfd to take the pages
> > > > from a VMA you'd use a memfd and tell iommufd to take the pages from
> > > > the memfd directly.
> > > > 
> > > > Since the memfd is not part of a process or mm_struct it is not
> > > > effected by live update's exec() and none of these gyrations are
> > > > necessary.
> > > 
> > > The problem is that kernel clients (eg mdevs) use userland VA to identify
> > > memory when calling iommufd, so we must update the VA's after exec.
> > 
> > Technically no, they use IOVA too and iommufd translates IOVA into a
> > VMA and what not.
> > 
> > So if we teach iommufd how to do memfd it would also learn how to
> > adapt it to mdevs as well.
> > 
> > > vdpa does the same, if/when it converts to iommufd.  I cannot see us
> > > changing vaddr to (file, offset) everywhere in iommufd and its clients,
> > > up through the mdev code stack, can you?
> > 
> > That is exactly what I imagine, because it isn't vaddr already, it is
> > IOVA and IOVA always already translates to an area which gets you the
> > vaddr.
> > 
> > It is why this series can remap the vaddrs on the fly without reaching
> > outside the area struct.
> 
> OK, that looks tractable.  There are not too many instances of
> struct iopt_pages uptr to fiddle with, adding support for
> file+offset.  We must of course keep uptr to continue to support
> anonymous memory for iommufd, but such memory will not be supported
> for live update.
> 
> Do you envision a new userland interface variant of IOMMU_IOAS_MAP
> that takes fd and offset?

Yes

> Or have userland pass user_va as usual, but have the kernel check if it maps to a file,
> and save the file?  The latter is more work in the kernel but requires no change in
> applications.

Maybe this is possible too..

> Do you plan to work on this any time soon?  Do you want me to?

I wasn't at the point of this yet, if you are interested I suggest
taking a stab. Now that the the infrastructure is in the mm it should
mostly just be changing pin_user_pages() to the other one. It might be
quite shore

> We still need IOMMU_IOAS_CHANGE_PROCESS to handle
> IOMMU_OPTION_RLIMIT_MODE, and to handle a changed uid.

Yes that makes sense

Jason

  reply	other threads:[~2024-08-19 14:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-20 18:56 [RFC V1 0/4] iommufd live update Steve Sistare
2024-07-20 18:56 ` [RFC V1 1/4] iommufd: Export do_update_pinned Steve Sistare
2024-07-20 18:56 ` [RFC V1] iommufd debug print Steve Sistare
2024-07-20 19:01   ` Steven Sistare
2024-07-20 18:56 ` [RFC V1 2/4] iommufd: Lock all objects Steve Sistare
2024-07-22 15:37   ` Jason Gunthorpe
2024-08-05 19:01     ` Steven Sistare
2024-09-26 14:00       ` Steven Sistare
2024-07-20 18:56 ` [RFC V1 3/4] iommufd: Add IOMMU_IOAS_CHANGE_PROCESS Steve Sistare
2024-07-20 18:56 ` [RFC V1 4/4] iommufd: update VA Steve Sistare
2024-07-22 15:51   ` Jason Gunthorpe
2024-08-05 19:02     ` Steven Sistare
2024-08-06 12:54       ` Jason Gunthorpe
2024-07-20 19:21 ` [RFC V1 0/4] iommufd live update Steven Sistare
2024-07-22 15:55 ` Jason Gunthorpe
2024-08-05 19:03   ` Steven Sistare
2024-08-06 12:56     ` Jason Gunthorpe
2024-08-08 19:15       ` Steven Sistare
2024-08-08 19:52         ` Jason Gunthorpe
2024-08-12 17:41           ` Steven Sistare
2024-08-19 14:59             ` Jason Gunthorpe [this message]
2024-08-21 17:54               ` Steven Sistare
2024-08-21 18:04                 ` Jason Gunthorpe
2024-08-22 21:05                   ` Steven Sistare
2024-08-22 21:10                     ` Jason Gunthorpe
2024-07-23 12:48 ` Jason Gunthorpe
2024-08-05 19:02   ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240819145908.GH2032816@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=kevin.tian@intel.com \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.