linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* copy_from_user_*() and buffer zeroing
@ 2013-11-26 21:07 H. Peter Anvin
  2013-11-26 21:54 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: H. Peter Anvin @ 2013-11-26 21:07 UTC (permalink / raw)
  To: Ingo Molnar, Al Viro, Thomas Gleixner, Andrew Morton,
	Linux Kernel Mailing List
  Cc: Vitaly Mayatskikh

I just started looking into the horribly confused state of buffer
zeroing for the various copy_from_user variants.  This came up after we
did some minor tuning last week.

copy_from_user_inatomic() seems to be documented to not zero the buffer.
 This is definitely *NOT* true on x86-64, although it does seem to be
true on i386 -- on x86-64, we carry along a "zerorest" flag but in all
possible codepaths it will be set to true unless the remaining byte
count is zero anyway.

Furthermore, on at least x86-64, if we do an early bailout, we don't
zero the entire buffer in the case of a hard-coded 10- or 16-byte buffer
(why only those sizes is anybody's guess.)  See lines 71-88 of uaccess_64.h.

I'd like to figure out what is the required and what is the desirable
behavior here, and then fix the code accordingly.

	-hpa

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: copy_from_user_*() and buffer zeroing
  2013-11-26 21:07 copy_from_user_*() and buffer zeroing H. Peter Anvin
@ 2013-11-26 21:54 ` Andrew Morton
  2013-11-26 22:28   ` H. Peter Anvin
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2013-11-26 21:54 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Ingo Molnar, Al Viro, Thomas Gleixner, Linux Kernel Mailing List,
	Vitaly Mayatskikh

On Tue, 26 Nov 2013 13:07:07 -0800 "H. Peter Anvin" <hpa@zytor.com> wrote:

> I just started looking into the horribly confused state of buffer
> zeroing for the various copy_from_user variants.  This came up after we
> did some minor tuning last week.
> 
> copy_from_user_inatomic() seems to be documented to not zero the buffer.
>  This is definitely *NOT* true on x86-64, although it does seem to be
> true on i386 -- on x86-64, we carry along a "zerorest" flag but in all
> possible codepaths it will be set to true unless the remaining byte
> count is zero anyway.
> 
> Furthermore, on at least x86-64, if we do an early bailout, we don't
> zero the entire buffer in the case of a hard-coded 10- or 16-byte buffer
> (why only those sizes is anybody's guess.)  See lines 71-88 of uaccess_64.h.
> 
> I'd like to figure out what is the required and what is the desirable
> behavior here, and then fix the code accordingly.
> 

Nine years ago:

commit 7079f897164cb14f616c785d3d01629fd6a97719
Author: mingo <mingo>
Date:   Fri Aug 27 17:33:18 2004 +0000

    [PATCH] Add a few might_sleep() checks
    
    Add a whole bunch more might_sleep() checks.  We also enable might_sleep()
    checking in copy_*_user().  This was non-trivial because of the "copy_*_user()
    in atomic regions" trick would generate false positives.  Fix that up by
    adding a new __copy_*_user_inatomic(), which avoids the might_sleep() check.
    
    Only i386 is supported in this patch.


I can't think of any reason why __copy_from_user_inatomic() should be
non-zeroing.  But maybe I'm missing something - this would pretty
easily permit uninitialised data to appear in pagecache and someone
surely would have noticed..


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: copy_from_user_*() and buffer zeroing
  2013-11-26 21:54 ` Andrew Morton
@ 2013-11-26 22:28   ` H. Peter Anvin
  2013-11-26 23:04     ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: H. Peter Anvin @ 2013-11-26 22:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Al Viro, Thomas Gleixner, Linux Kernel Mailing List,
	Vitaly Mayatskikh, Murty, Ravi, neilb

On 11/26/2013 01:54 PM, Andrew Morton wrote:
> 
> Nine years ago:
> 
> commit 7079f897164cb14f616c785d3d01629fd6a97719
> Author: mingo <mingo>
> Date:   Fri Aug 27 17:33:18 2004 +0000
> 
>     [PATCH] Add a few might_sleep() checks
>     
>     Add a whole bunch more might_sleep() checks.  We also enable might_sleep()
>     checking in copy_*_user().  This was non-trivial because of the "copy_*_user()
>     in atomic regions" trick would generate false positives.  Fix that up by
>     adding a new __copy_*_user_inatomic(), which avoids the might_sleep() check.
>     
>     Only i386 is supported in this patch.
> 
> 
> I can't think of any reason why __copy_from_user_inatomic() should be
> non-zeroing.  But maybe I'm missing something - this would pretty
> easily permit uninitialised data to appear in pagecache and someone
> surely would have noticed..
> 

Yes, and the might_sleep() check is indeed bypassed.

However, the non-zeroing bit is currently motivated by the following
comment:

/**
 * __copy_from_user: - Copy a block of data from user space, with less
checking.
 * @to:   Destination address, in kernel space.
 * @from: Source address, in user space.
 * @n:    Number of bytes to copy.
 *
 * Context: User context only.  This function may sleep.
 *
 * Copy data from user space to kernel space.  Caller must check
 * the specified block with access_ok() before calling this function.
 *
 * Returns number of bytes that could not be copied.
 * On success, this will be zero.
 *
 * If some data could not be copied, this function will pad the copied
 * data to the requested size using zero bytes.
 *
 * An alternate version - __copy_from_user_inatomic() - may be called from
 * atomic context and will fail rather than sleep.  In this case the
 * uncopied bytes will *NOT* be padded with zeros.  See fs/filemap.h
 * for explanation of why this is needed.
 */

This comment is only present in the 32-bit code.  fs/filemap.h of course
no longer exists, however, the original commit seems to be
01408c4939479ec46c15aa7ef6e2406be50eeeca which puts a comment in the
(now defunct mm/filemap.h).

I have to say I don't follow the explanation in that patch.  It seems
like if you're concurrently reading a buffer being written you should
expect to get any kind of mismash...

Neil, is this still an issue?

	-hpa





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: copy_from_user_*() and buffer zeroing
  2013-11-26 22:28   ` H. Peter Anvin
@ 2013-11-26 23:04     ` NeilBrown
  2013-11-26 23:08       ` H. Peter Anvin
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2013-11-26 23:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Ingo Molnar, Al Viro, Thomas Gleixner,
	Linux Kernel Mailing List, Vitaly Mayatskikh, Murty, Ravi

[-- Attachment #1: Type: text/plain, Size: 4422 bytes --]

On Tue, 26 Nov 2013 14:28:59 -0800 "H. Peter Anvin" <hpa@zytor.com> wrote:

> On 11/26/2013 01:54 PM, Andrew Morton wrote:
> > 
> > Nine years ago:
> > 
> > commit 7079f897164cb14f616c785d3d01629fd6a97719
> > Author: mingo <mingo>
> > Date:   Fri Aug 27 17:33:18 2004 +0000
> > 
> >     [PATCH] Add a few might_sleep() checks
> >     
> >     Add a whole bunch more might_sleep() checks.  We also enable might_sleep()
> >     checking in copy_*_user().  This was non-trivial because of the "copy_*_user()
> >     in atomic regions" trick would generate false positives.  Fix that up by
> >     adding a new __copy_*_user_inatomic(), which avoids the might_sleep() check.
> >     
> >     Only i386 is supported in this patch.
> > 
> > 
> > I can't think of any reason why __copy_from_user_inatomic() should be
> > non-zeroing.  But maybe I'm missing something - this would pretty
> > easily permit uninitialised data to appear in pagecache and someone
> > surely would have noticed..
> > 
> 
> Yes, and the might_sleep() check is indeed bypassed.
> 
> However, the non-zeroing bit is currently motivated by the following
> comment:
> 
> /**
>  * __copy_from_user: - Copy a block of data from user space, with less
> checking.
>  * @to:   Destination address, in kernel space.
>  * @from: Source address, in user space.
>  * @n:    Number of bytes to copy.
>  *
>  * Context: User context only.  This function may sleep.
>  *
>  * Copy data from user space to kernel space.  Caller must check
>  * the specified block with access_ok() before calling this function.
>  *
>  * Returns number of bytes that could not be copied.
>  * On success, this will be zero.
>  *
>  * If some data could not be copied, this function will pad the copied
>  * data to the requested size using zero bytes.
>  *
>  * An alternate version - __copy_from_user_inatomic() - may be called from
>  * atomic context and will fail rather than sleep.  In this case the
>  * uncopied bytes will *NOT* be padded with zeros.  See fs/filemap.h
>  * for explanation of why this is needed.
>  */
> 
> This comment is only present in the 32-bit code.  fs/filemap.h of course
> no longer exists, however, the original commit seems to be
> 01408c4939479ec46c15aa7ef6e2406be50eeeca which puts a comment in the
> (now defunct mm/filemap.h).
> 
> I have to say I don't follow the explanation in that patch.  It seems
> like if you're concurrently reading a buffer being written you should
> expect to get any kind of mismash...
> 
> Neil, is this still an issue?
> 

I can't be certain if this is "still" and issue as many things could have
changed and I haven't been following them.
I can try to explain the original issue though.

If a process tries to read a file while another process is writing to the
same page of the same file, then it is quite reasonable for the reader to see
almost any combination of the old and the new data.  However it is wrong for
it so see something else.  In particular if the file actually contains no
nuls, and the writer doesn't write any nuls, then the read should not see any
nuls.

At the time of this patch, that could happen.
If the page contains valid data it will not be locked, and a read can succeed
at any time without further locking.
When writing to a page, filemap_copy_from_user would first try an atomic copy
and if that failed, it could write zeros into the page, which would then be
over-written by a subsequent non-atomic copy.
This leaves a small window where zeros can be seen in the page by a read (or
a memory-mapping).

A quick look at the code history shows that Nick Piggin removed the comment
from mm/filemap.h in commit  4a9e5ef1f4f15205e477817a5 and it looks like the
code was changed so it doesn't "try one way, then try another".  So it could
well be that the failure mode that caused the problem before is no longer a
possible failure mode.
And if that failure mode is no longer possible, then maybe copy_from_user
will never fail and so never has a need to fill with zeros??

The reason only i386 was changed it that it was the only arch were
copy_from_user_atomic might ever zero a tail.  Most arch just used memcpy or
similar.  powerpc is the only other arch that defined a non-trivial
copy_from_user_atomic and I confirmed at the time that it would never (need
to) zero a tail.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: copy_from_user_*() and buffer zeroing
  2013-11-26 23:04     ` NeilBrown
@ 2013-11-26 23:08       ` H. Peter Anvin
  0 siblings, 0 replies; 5+ messages in thread
From: H. Peter Anvin @ 2013-11-26 23:08 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, Ingo Molnar, Al Viro, Thomas Gleixner,
	Linux Kernel Mailing List, Vitaly Mayatskikh, Murty, Ravi,
	Nick Piggin

On 11/26/2013 03:04 PM, NeilBrown wrote:
> On Tue, 26 Nov 2013 14:28:59 -0800 "H. Peter Anvin" <hpa@zytor.com>
> wrote:
> 
>> On 11/26/2013 01:54 PM, Andrew Morton wrote:
>>> 
>>> Nine years ago:
>>> 
>>> commit 7079f897164cb14f616c785d3d01629fd6a97719 Author: mingo
>>> <mingo> Date:   Fri Aug 27 17:33:18 2004 +0000
>>> 
>>> [PATCH] Add a few might_sleep() checks
>>> 
>>> Add a whole bunch more might_sleep() checks.  We also enable
>>> might_sleep() checking in copy_*_user().  This was non-trivial
>>> because of the "copy_*_user() in atomic regions" trick would
>>> generate false positives.  Fix that up by adding a new
>>> __copy_*_user_inatomic(), which avoids the might_sleep()
>>> check.
>>> 
>>> Only i386 is supported in this patch.
>>> 
>>> 
>>> I can't think of any reason why __copy_from_user_inatomic()
>>> should be non-zeroing.  But maybe I'm missing something - this
>>> would pretty easily permit uninitialised data to appear in
>>> pagecache and someone surely would have noticed..
>>> 
>> 
>> Yes, and the might_sleep() check is indeed bypassed.
>> 
>> However, the non-zeroing bit is currently motivated by the
>> following comment:
>> 
>> /** * __copy_from_user: - Copy a block of data from user space,
>> with less checking. * @to:   Destination address, in kernel
>> space. * @from: Source address, in user space. * @n:    Number of
>> bytes to copy. * * Context: User context only.  This function may
>> sleep. * * Copy data from user space to kernel space.  Caller
>> must check * the specified block with access_ok() before calling
>> this function. * * Returns number of bytes that could not be
>> copied. * On success, this will be zero. * * If some data could
>> not be copied, this function will pad the copied * data to the
>> requested size using zero bytes. * * An alternate version -
>> __copy_from_user_inatomic() - may be called from * atomic context
>> and will fail rather than sleep.  In this case the * uncopied
>> bytes will *NOT* be padded with zeros.  See fs/filemap.h * for
>> explanation of why this is needed. */
>> 
>> This comment is only present in the 32-bit code.  fs/filemap.h of
>> course no longer exists, however, the original commit seems to
>> be 01408c4939479ec46c15aa7ef6e2406be50eeeca which puts a comment
>> in the (now defunct mm/filemap.h).
>> 
>> I have to say I don't follow the explanation in that patch.  It
>> seems like if you're concurrently reading a buffer being written
>> you should expect to get any kind of mismash...
>> 
>> Neil, is this still an issue?
>> 
> 
> I can't be certain if this is "still" and issue as many things
> could have changed and I haven't been following them. I can try to
> explain the original issue though.
> 
> If a process tries to read a file while another process is writing
> to the same page of the same file, then it is quite reasonable for
> the reader to see almost any combination of the old and the new
> data.  However it is wrong for it so see something else.  In
> particular if the file actually contains no nuls, and the writer
> doesn't write any nuls, then the read should not see any nuls.
> 
> At the time of this patch, that could happen. If the page contains
> valid data it will not be locked, and a read can succeed at any
> time without further locking. When writing to a page,
> filemap_copy_from_user would first try an atomic copy and if that
> failed, it could write zeros into the page, which would then be 
> over-written by a subsequent non-atomic copy. This leaves a small
> window where zeros can be seen in the page by a read (or a
> memory-mapping).
> 
> A quick look at the code history shows that Nick Piggin removed the
> comment from mm/filemap.h in commit  4a9e5ef1f4f15205e477817a5 and
> it looks like the code was changed so it doesn't "try one way, then
> try another".  So it could well be that the failure mode that
> caused the problem before is no longer a possible failure mode. And
> if that failure mode is no longer possible, then maybe
> copy_from_user will never fail and so never has a need to fill with
> zeros??
> 

Nick, could you perhaps comment on this?

> The reason only i386 was changed it that it was the only arch were 
> copy_from_user_atomic might ever zero a tail.  Most arch just used
> memcpy or similar.  powerpc is the only other arch that defined a
> non-trivial copy_from_user_atomic and I confirmed at the time that
> it would never (need to) zero a tail.

Well, there are several that do now...

	-hpa





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-11-26 23:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-26 21:07 copy_from_user_*() and buffer zeroing H. Peter Anvin
2013-11-26 21:54 ` Andrew Morton
2013-11-26 22:28   ` H. Peter Anvin
2013-11-26 23:04     ` NeilBrown
2013-11-26 23:08       ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).