linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check
       [not found]       ` <20130315171000.GA2342@Krystal>
@ 2013-03-15 17:21         ` Linus Torvalds
  2013-03-15 17:57           ` Mathieu Desnoyers
                             ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Linus Torvalds @ 2013-03-15 17:21 UTC (permalink / raw)
  To: Mathieu Desnoyers, linux-arch@vger.kernel.org
  Cc: Jens Axboe, security@kernel.org, Greg Kroah-Hartman, Al Viro,
	Nick Piggin

Adding linux-arch. Guys, can you check your architectures?

Also, make sure to check huge-pages if they are separate. Basically,
if you have code like this:

                if (!pte_present(pte) ||
                    pte_special(pte) || (write && !pte_write(pte))) {
                        pte_unmap(ptep);
                        return 0;
                }

it's probably buggy. It's not sufficient to just check write
permissions, you do need to check user permissions too.

Powerpc,x86 and sh seem to get it right by virtue of checking rthe
user bit. s390 checks against TASK_SIZE.

MIPS does seem buggy. Sparc I don't know the meaning of the bits for.
And powerpc does have several variants, so while the main one looks
fine, I didn't look at the other ones.

                     Linus

On Fri, Mar 15, 2013 at 10:10 AM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
> * Linus Torvalds (torvalds@linux-foundation.org) wrote:
>>
>> It's a bit subtle, but at least the x86 get-user-pages does actually
>> check access_ok() implicitly. It's just that it does so using the bits
>> in the page table, and does the page table lookup as a "user access".
>> So it checks the page tables themselves, not the user limit.
>>
>> Which is fine, because that's what the *hardware* does. So if the page
>> tables make something readable to users, then they are readable by
>> definition.
>>
>> So get_user_pages_fast() doesn't need access_ok() before it, and the
>> naming isn't actually confusing. And I'm sure we knew this at some
>> point.
>
> Ah, I see! so my guess is that it is expected that "gup_*" functions
> implicitly check that they are getting user pages. If we look at this
> through fresh eyes, across all architectures:
>
> * x86: looks OK: gup* checks with _PAGE_USER flag. The slow path that
>   goes through __get_user_pages() seem to rely on follow_page_mask() and
>   then pgd_bad() as well as pud_bad() to check the _PAGE_USER flag.
>
> * mips: access_ok missing in get_user_pages_fast,
>   -> I don't see any explicit mention of "USER" pages flags within the
>      gup functions.
>
> * powerpc: access_ok is there, everything is fine,
>
> * s390: access_ok missing in both __get_user_pages_fast and
>   get_user_pages_fast.
>   -> I don't see clear indication of USER pages being flagged.
>
> * sh: access_ok missing in get_user_pages_fast,
>   -> OK, gup_* functions are checking a _PAGE_USER flag.
>
> * sparc: access_ok missing in get_user_pages_fast,
>   -> no indication of any _PAGE_USER flag.
>
> * generic: mm/util.c:get_user_pages_fast() ends up calling
>   mm/memory.c:get_user_pages() and then __get_user_pages(), which are
>   also used as slow-path for all architectures above:
>
>   -> from my understanding, through follow_page_mask() pgd_bad() and
>      pud_bad() are checking _PAGE_USER flags (when they exist).
>      Unfortunately, the following grep is slightly worrying:

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check
  2013-03-15 17:21         ` [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check Linus Torvalds
@ 2013-03-15 17:57           ` Mathieu Desnoyers
  2013-03-15 18:01             ` Linus Torvalds
  2013-03-18  6:51           ` Benjamin Herrenschmidt
  2013-03-21 21:33           ` David Miller
  2 siblings, 1 reply; 7+ messages in thread
From: Mathieu Desnoyers @ 2013-03-15 17:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch@vger.kernel.org, Jens Axboe, security@kernel.org,
	Greg Kroah-Hartman, Al Viro, Nick Piggin

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> Adding linux-arch. Guys, can you check your architectures?
> 
> Also, make sure to check huge-pages if they are separate. Basically,
> if you have code like this:
> 
>                 if (!pte_present(pte) ||
>                     pte_special(pte) || (write && !pte_write(pte))) {
>                         pte_unmap(ptep);
>                         return 0;
>                 }
> 
> it's probably buggy. It's not sufficient to just check write
> permissions, you do need to check user permissions too.
> 
> Powerpc,x86 and sh seem to get it right by virtue of checking rthe
> user bit. s390 checks against TASK_SIZE.
> 
> MIPS does seem buggy. Sparc I don't know the meaning of the bits for.
> And powerpc does have several variants, so while the main one looks
> fine, I didn't look at the other ones.

In addition to get_user_pages_fast() issues, I see that there are many
direct callers of get_user_pages() that seem to assume that access
checks are performed within this function.  AFAIU, on architectures that
have a _PAGE_USER flag, this check is performed internally by pgd_bad()
and pud_bad(), but what happens to all the others ?

One possible way to fix this without adding unwelcomed performance
impact might be to add an access_ok check in __get_user_pages() that is
entirely skipped by architectures that define a non-nopped-out
pgd_bad()/pud_bad().

Thoughts ? 

Thanks,

Mathieu

> 
>                      Linus
> 
> On Fri, Mar 15, 2013 at 10:10 AM, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> > * Linus Torvalds (torvalds@linux-foundation.org) wrote:
> >>
> >> It's a bit subtle, but at least the x86 get-user-pages does actually
> >> check access_ok() implicitly. It's just that it does so using the bits
> >> in the page table, and does the page table lookup as a "user access".
> >> So it checks the page tables themselves, not the user limit.
> >>
> >> Which is fine, because that's what the *hardware* does. So if the page
> >> tables make something readable to users, then they are readable by
> >> definition.
> >>
> >> So get_user_pages_fast() doesn't need access_ok() before it, and the
> >> naming isn't actually confusing. And I'm sure we knew this at some
> >> point.
> >
> > Ah, I see! so my guess is that it is expected that "gup_*" functions
> > implicitly check that they are getting user pages. If we look at this
> > through fresh eyes, across all architectures:
> >
> > * x86: looks OK: gup* checks with _PAGE_USER flag. The slow path that
> >   goes through __get_user_pages() seem to rely on follow_page_mask() and
> >   then pgd_bad() as well as pud_bad() to check the _PAGE_USER flag.
> >
> > * mips: access_ok missing in get_user_pages_fast,
> >   -> I don't see any explicit mention of "USER" pages flags within the
> >      gup functions.
> >
> > * powerpc: access_ok is there, everything is fine,
> >
> > * s390: access_ok missing in both __get_user_pages_fast and
> >   get_user_pages_fast.
> >   -> I don't see clear indication of USER pages being flagged.
> >
> > * sh: access_ok missing in get_user_pages_fast,
> >   -> OK, gup_* functions are checking a _PAGE_USER flag.
> >
> > * sparc: access_ok missing in get_user_pages_fast,
> >   -> no indication of any _PAGE_USER flag.
> >
> > * generic: mm/util.c:get_user_pages_fast() ends up calling
> >   mm/memory.c:get_user_pages() and then __get_user_pages(), which are
> >   also used as slow-path for all architectures above:
> >
> >   -> from my understanding, through follow_page_mask() pgd_bad() and
> >      pud_bad() are checking _PAGE_USER flags (when they exist).
> >      Unfortunately, the following grep is slightly worrying:

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check
  2013-03-15 17:57           ` Mathieu Desnoyers
@ 2013-03-15 18:01             ` Linus Torvalds
  2013-03-15 18:04               ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2013-03-15 18:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-arch@vger.kernel.org, Jens Axboe, security@kernel.org,
	Greg Kroah-Hartman, Al Viro, Nick Piggin

On Fri, Mar 15, 2013 at 10:57 AM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
>
> In addition to get_user_pages_fast() issues, I see that there are many
> direct callers of get_user_pages() that seem to assume that access
> checks are performed within this function.

get_user_pages() does check permissions. It looks up the vma and
checks them there, which is much more than access_ok() ever does.

>  AFAIU, on architectures that
> have a _PAGE_USER flag, this check is performed internally by pgd_bad()
> and pud_bad(), but what happens to all the others ?

Irrelevant. See above.

              Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check
  2013-03-15 18:01             ` Linus Torvalds
@ 2013-03-15 18:04               ` Linus Torvalds
  2013-03-15 18:07                 ` Mathieu Desnoyers
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2013-03-15 18:04 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-arch@vger.kernel.org, Jens Axboe, security@kernel.org,
	Greg Kroah-Hartman, Al Viro, Nick Piggin

On Fri, Mar 15, 2013 at 11:01 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> get_user_pages() does check permissions. It looks up the vma and
> checks them there, which is much more than access_ok() ever does.

To clarify: the fact that a vma exists at all already means "it's a
user mapping" (with the special gate area being a secondary user
mapping). So you don't need any kind of explicit user check.

              Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check
  2013-03-15 18:04               ` Linus Torvalds
@ 2013-03-15 18:07                 ` Mathieu Desnoyers
  0 siblings, 0 replies; 7+ messages in thread
From: Mathieu Desnoyers @ 2013-03-15 18:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch@vger.kernel.org, Jens Axboe, security@kernel.org,
	Greg Kroah-Hartman, Al Viro, Nick Piggin

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> On Fri, Mar 15, 2013 at 11:01 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > get_user_pages() does check permissions. It looks up the vma and
> > checks them there, which is much more than access_ok() ever does.
> 
> To clarify: the fact that a vma exists at all already means "it's a
> user mapping" (with the special gate area being a secondary user
> mapping). So you don't need any kind of explicit user check.

Ah! That's the bit I was missing.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check
  2013-03-15 17:21         ` [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check Linus Torvalds
  2013-03-15 17:57           ` Mathieu Desnoyers
@ 2013-03-18  6:51           ` Benjamin Herrenschmidt
  2013-03-21 21:33           ` David Miller
  2 siblings, 0 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2013-03-18  6:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mathieu Desnoyers, linux-arch@vger.kernel.org, Jens Axboe,
	security@kernel.org, Greg Kroah-Hartman, Al Viro, Nick Piggin

On Fri, 2013-03-15 at 10:21 -0700, Linus Torvalds wrote:
> Adding linux-arch. Guys, can you check your architectures?
> 
> Also, make sure to check huge-pages if they are separate. Basically,
> if you have code like this:
> 
>                 if (!pte_present(pte) ||
>                     pte_special(pte) || (write && !pte_write(pte))) {
>                         pte_unmap(ptep);
>                         return 0;
>                 }
> 
> it's probably buggy. It's not sufficient to just check write
> permissions, you do need to check user permissions too.
> 
> Powerpc,x86 and sh seem to get it right by virtue of checking rthe
> user bit. s390 checks against TASK_SIZE.
> 
> MIPS does seem buggy. Sparc I don't know the meaning of the bits for.
> And powerpc does have several variants, so while the main one looks
> fine, I didn't look at the other ones.

Took the train half way through... I assume we are talking gup_fast
here ? So we have an access_ok() accross the range, which should make
us safe. Additionally on ppc64 we have a different pgd for user and
kernel pages anyway.

We do check for huge pages at every level as far as I can tell (and
those are user only) and finally we check for _PAGE_USER.

The only "subtlety" I can think of is that PROT_NONE has no _PAGE_USER
for us and thus will fail a gup but that's expected right ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check
  2013-03-15 17:21         ` [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check Linus Torvalds
  2013-03-15 17:57           ` Mathieu Desnoyers
  2013-03-18  6:51           ` Benjamin Herrenschmidt
@ 2013-03-21 21:33           ` David Miller
  2 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2013-03-21 21:33 UTC (permalink / raw)
  To: torvalds
  Cc: mathieu.desnoyers, linux-arch, axboe, security, gregkh, viro,
	npiggin

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 15 Mar 2013 10:21:41 -0700

> Adding linux-arch. Guys, can you check your architectures?
 ...
>> * sparc: access_ok missing in get_user_pages_fast,
>>   -> no indication of any _PAGE_USER flag.

Well, access_ok() returns 1 unconditionally on sparc64.  I can add a
call there if you want.  :-)

And that's because the user virtual address space is %100 segregated
from the privileged one.  There are no "user address ranges"
vs. kernel ones.

For example, we can and do give 64-bit processes the full 64-bit
virtual address space for user mappings.

The page tables, outside of swapper_pg_dir, only map user pages.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-03-21 21:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20130315133748.GA31887@Krystal>
     [not found] ` <20130315152326.GM31875@kernel.dk>
     [not found]   ` <20130315155808.GB1659@Krystal>
     [not found]     ` <CA+55aFxW0vkpgJCpJVJVqDmDG61P_AOoVMFVhfqVxM45Mj-LNA@mail.gmail.com>
     [not found]       ` <20130315171000.GA2342@Krystal>
2013-03-15 17:21         ` [RFC PATCH (resend)] block layer zero-copy: missing access_ok() check Linus Torvalds
2013-03-15 17:57           ` Mathieu Desnoyers
2013-03-15 18:01             ` Linus Torvalds
2013-03-15 18:04               ` Linus Torvalds
2013-03-15 18:07                 ` Mathieu Desnoyers
2013-03-18  6:51           ` Benjamin Herrenschmidt
2013-03-21 21:33           ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).