linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	linuxppc-dev@lists.ozlabs.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	kvm@vger.kernel.org
Subject: Re: [PATCH kernel v3 3/4] vfio/spapr: Cache mm in tce_container
Date: Fri, 21 Oct 2016 11:21:34 +1100	[thread overview]
Message-ID: <20161021002134.GS11140@umbus.fritz.box> (raw)
In-Reply-To: <20161020183121.073f01ac@roar.ozlabs.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 5458 bytes --]

On Thu, Oct 20, 2016 at 06:31:21PM +1100, Nicholas Piggin wrote:
> On Thu, 20 Oct 2016 14:03:49 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
> > In some situations the userspace memory context may live longer than
> > the userspace process itself so if we need to do proper memory context
> > cleanup, we better cache @mm and use it later when the process is gone
> > (@current or @current->mm is NULL).
> > 
> > This references mm and stores the pointer in the container; this is done
> > when a container is just created so checking for !current->mm in other
> > places becomes pointless.
> > 
> > This replaces current->mm with container->mm everywhere except debug
> > prints.
> > 
> > This adds a check that current->mm is the same as the one stored in
> > the container to prevent userspace from registering memory in other
> > processes.
> > 
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> >  drivers/vfio/vfio_iommu_spapr_tce.c | 127 ++++++++++++++++++++----------------
> >  1 file changed, 71 insertions(+), 56 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
> > index d0c38b2..6b0b121 100644
> > --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> > @@ -31,49 +31,46 @@
> 
> Does it make sense to move the rest of these hunks into patch 2?
> I think they're similarly just moving the mm reference into callers.
> 
> 
> >  static void tce_iommu_detach_group(void *iommu_data,
> >  		struct iommu_group *iommu_group);
> >  
> > -static long try_increment_locked_vm(long npages)
> > +static long try_increment_locked_vm(struct mm_struct *mm, long npages)
> >  {
> >  	long ret = 0, locked, lock_limit;
> >  
> > -	if (!current || !current->mm)
> > -		return -ESRCH; /* process exited */
> > -
> >  	if (!npages)
> >  		return 0;
> >  
> > -	down_write(&current->mm->mmap_sem);
> > -	locked = current->mm->locked_vm + npages;
> > +	down_write(&mm->mmap_sem);
> > +	locked = mm->locked_vm + npages;
> >  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> >  	if (locked > lock_limit && !capable(CAP_IPC_LOCK))
> >  		ret = -ENOMEM;
> >  	else
> > -		current->mm->locked_vm += npages;
> > +		mm->locked_vm += npages;
> >  
> >  	pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid,
> >  			npages << PAGE_SHIFT,
> > -			current->mm->locked_vm << PAGE_SHIFT,
> > +			mm->locked_vm << PAGE_SHIFT,
> >  			rlimit(RLIMIT_MEMLOCK),
> >  			ret ? " - exceeded" : "");
> >  
> > -	up_write(&current->mm->mmap_sem);
> > +	up_write(&mm->mmap_sem);
> >  
> >  	return ret;
> >  }
> >  
> > -static void decrement_locked_vm(long npages)
> > +static void decrement_locked_vm(struct mm_struct *mm, long npages)
> >  {
> > -	if (!current || !current->mm || !npages)
> > +	if (!mm || !npages)
> >  		return; /* process exited */
> 
> I know you're trying to be defensive and change as little logic as possible,
> but some cases should be an error, and I think some of the "process exited"
> comments were wrong anyway.
> 
> Maybe pull the !mm test into the caller and make it WARN_ON?
> 
> 
> > @@ -317,6 +311,9 @@ static void *tce_iommu_open(unsigned long arg)
> >  		return ERR_PTR(-EINVAL);
> >  	}
> >  
> > +	if (!current->mm)
> > +		return ERR_PTR(-ESRCH); /* process exited */
> 
> A userspace thread in the kernel can't have its mm disappear, unless you
> are actually in the exit code. !current->mm is more like a test for a kernel
> thread.
> 
> 
> > +
> >  	container = kzalloc(sizeof(*container), GFP_KERNEL);
> >  	if (!container)
> >  		return ERR_PTR(-ENOMEM);
> > @@ -326,13 +323,17 @@ static void *tce_iommu_open(unsigned long arg)
> >  
> >  	container->v2 = arg == VFIO_SPAPR_TCE_v2_IOMMU;
> >  
> > +	container->mm = current->mm;
> > +	atomic_inc(&container->mm->mm_count);
> > +
> >  	return container;
> 
> It's a nitpick if you respin the patch, but I guess it would better be
> described as a reference than a cache of the object. "have tce_container
> take a reference to mm_struct".
> 
> 
> > @@ -515,13 +526,16 @@ static long tce_iommu_build_v2(struct tce_container *container,
> >  	unsigned long hpa;
> >  	enum dma_data_direction dirtmp;
> >  
> > +	if (container->mm != current->mm)
> > +		return -ESRCH;
> 
> Good, is this condition now enforced on all entrypoints that use
> container->mm (except the final teardown)? (The mlock/rlimit stuff,
> as we talked about before, doesn't make sense if not).

Right.  I don't know that it's actually dangerous, but i think it
would be needlessly weird for one process to be able to manipulate
another process's mm via the container fd.  So all the entry points
that are directly called from userspace (basically, the ioctl()s)
should verify that current->mm matches container->mm (except the one
which initiallizes container->mm, obviously).

One other concern.  If I follow the logic correctly, if a process
created a container, passed the fd to another process then exited, the
container fd held by the other process would keep the original
process's mm alive indefinitely.  I'm not sure if that's a problem.
Nick?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2016-10-21  1:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-20  3:03 [PATCH kernel v3 0/4] powerpc/spapr/vfio: Put pages on VFIO container shutdown Alexey Kardashevskiy
2016-10-20  3:03 ` [PATCH kernel v3 1/4] powerpc/iommu: Pass mm_struct to init/cleanup helpers Alexey Kardashevskiy
2016-10-20 23:14   ` David Gibson
2016-10-20  3:03 ` [PATCH kernel v3 2/4] powerpc/iommu: Stop using @current in mm_iommu_xxx Alexey Kardashevskiy
2016-10-20 23:18   ` David Gibson
2016-10-20  3:03 ` [PATCH kernel v3 3/4] vfio/spapr: Cache mm in tce_container Alexey Kardashevskiy
2016-10-20  7:31   ` Nicholas Piggin
2016-10-21  0:21     ` David Gibson [this message]
2016-10-21  1:47       ` Nicholas Piggin
2016-10-24  4:25     ` Alexey Kardashevskiy
2016-10-24  4:55       ` Nicholas Piggin
2016-10-21  0:25   ` David Gibson
2016-10-20  3:03 ` [PATCH kernel v3 4/4] powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown Alexey Kardashevskiy
2016-10-21  0:29   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161021002134.GS11140@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).