From: Nick Piggin <npiggin@suse.de>
To: apw@shadowen.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
shaggy@austin.ibm.com, axboe@oracle.com,
torvalds@linux-foundation.org, linux-mm@kvack.org,
linux-arch@vger.kernel.org
Subject: Re: [patch 2/2] mm: lockless get_user_pages
Date: Sat, 24 May 2008 01:44:32 +0200 [thread overview]
Message-ID: <20080523234432.GD3144@wotan.suse.de> (raw)
In-Reply-To: <20080523123112.GA9357@shadowen.org>
On Fri, May 23, 2008 at 01:31:12PM +0100, apw@shadowen.org wrote:
> On Fri, May 23, 2008 at 04:27:33AM +0200, Nick Piggin wrote:
> [...]
> > > Why is this optimisation taken out as part of this patch. From what I
> > > an see the caller below does not move to fast_gup so it appears
> > > (naively) that this would still be applicable, though the locking may be
> > > hard.
> >
> > The get_iovec_page_array caller AFAIKS does move to fast_gup.
> >
> > That funny optimisation is an attempt to avoid taking and dropping the
> > mmap_sem for each page if we can perform the copy without taking a
> > fault. If the same conditions exist for fast_gup, then it can perform
> > the get_user_pages operation without taking mmap_sem at all.
>
> Yep see below though, the call site for this optimisation seems to go
> to copy_from_user ...
>
> > > > -/*
> > > > * Map an iov into an array of pages and offset/length tupples. With the
> > > > * partial_page structure, we can map several non-contiguous ranges into
> > > > * our ones pages[] map instead of splitting that operation into pieces.
> > > > @@ -1189,8 +1159,6 @@ static int get_iovec_page_array(const st
> > > > {
> > > > int buffers = 0, error = 0;
> > > >
> > > > - down_read(¤t->mm->mmap_sem);
> > > > -
> > > > while (nr_vecs) {
> > > > unsigned long off, npages;
> > > > struct iovec entry;
> > > > @@ -1199,7 +1167,7 @@ static int get_iovec_page_array(const st
> > > > int i;
> > > >
> > > > error = -EFAULT;
> > > > - if (copy_from_user_mmap_sem(&entry, iov, sizeof(entry)))
> > > > + if (copy_from_user(&entry, iov, sizeof(entry)))
> > > > break;
> > > >
> > > > base = entry.iov_base;
> > > > @@ -1233,9 +1201,8 @@ static int get_iovec_page_array(const st
> > > > if (npages > PIPE_BUFFERS - buffers)
> > > > npages = PIPE_BUFFERS - buffers;
> > > >
> > > > - error = get_user_pages(current, current->mm,
> > > > - (unsigned long) base, npages, 0, 0,
> > > > - &pages[buffers], NULL);
> > > > + error = fast_gup((unsigned long)base, npages,
> > > > + 0, &pages[buffers]);
> > > >
> > > > if (unlikely(error <= 0))
> > > > break;
>
> Its not that clear from these two deltas that it does move to fast_gup,
> yes this second one does but the first which used the old optimisation
> goes straight to copy_from_user.
Oh, the copy_from_user is OK, it is just copying the iovec out from
userspace.
> > > > + pte_unmap(ptep - 1);
> > >
> > > Is this pte_unmap right. I thought you had to unmap the same address
> > > that was returned from pte_offset_map, not an incremented version?
> >
> > So long as it is within the same pte page, it's OK. Check some of
> > the similar loops in mm/memory.c
>
> Bah, yes I missed this in kunmap_atomic()... I hate code mixed into the
> declarations:
>
> unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
>
> so yes its safe.
Yeah, the convention is a bit ugly, but that's what we've got :P
> [...]
> > > > +static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr,
> > > > + unsigned long end, int write, struct page **pages, int *nr)
> > > > +{
> > > > + unsigned long mask;
> > > > + pte_t pte = *(pte_t *)&pmd;
> > > > + struct page *head, *page;
> > > > + int refs;
> > > > +
> > > > + mask = _PAGE_PRESENT|_PAGE_USER;
> > > > + if (write)
> > > > + mask |= _PAGE_RW;
> > > > + if ((pte_val(pte) & mask) != mask)
> > > > + return 0;
> > > > + /* hugepages are never "special" */
> > > > + VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
> > >
> > > As we am not expecting them to be special could we not check for that anyhow as you do in the normal case. Safe against anyone changing hugepages.
> >
> > Couldn't quite parse this ;) Do you mean to put the same VM_BUG_ON in the
> > follow_hugetlb_page path? Sure we could do that, although in that case
> > we have everything locked down and have looked up the vma etc. so I was
> > just being paranoid here really.
>
> I was more saying, as we know that _PAGE_SPECIAL isn't currently used and
> we can only handle the !_PAGE_SPECIAL pages in these paths it might be
> prudent to check for the absence of _PAGE_SPECIAL here _anyhow_ exactly
> as you did for small pages. That prevents this code false triggering
> should someone later add _PAGE_SPECIAL for hugepages, as they would not
> be naturally drawn here to fix it.
I see. Yes that might be a good idea.
> [...]
> > > I did wonder if we could also check _PAGE_BIT_USER bit as by my reading
> > > that would only ever be set on user pages, and by rejecting pages without
> > > that set we could prevent any kernel pages being returned basically
> > > for free.
> >
> > I still do want the access_ok check to avoid any possible issues with
> > kernel page table modifications... but checking for the user bit would
> > be another good sanity check, good idea.
>
> Definatly not advocating removing any checks at all. Just thinking the
> addition is one more safety net should any one of the checks be flawed.
> Security being a pig to prove at the best of times.
It isn't a bad idea at all. I'll see what I can do.
Thanks
Nick
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-05-23 23:44 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-21 11:59 [patch 1/2] x86: implement pte_special Nick Piggin
2008-05-21 12:11 ` [patch 2/2] mm: lockless get_user_pages Nick Piggin
2008-05-22 10:28 ` Andy Whitcroft
2008-05-23 2:27 ` Nick Piggin
2008-05-23 12:31 ` apw
2008-05-23 23:44 ` Nick Piggin [this message]
2008-05-24 0:04 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080523234432.GD3144@wotan.suse.de \
--to=npiggin@suse.de \
--cc=akpm@linux-foundation.org \
--cc=apw@shadowen.org \
--cc=axboe@oracle.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shaggy@austin.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).