Re: file offset corruption on 32-bit machines?

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: file offset corruption on 32-bit machines?
       [not found] <Pine.SOC.4.64.0804081101430.28938@math.ut.ee>
@ 2008-04-10 13:55 ` Michal Hocko
  2008-04-10 14:01   ` Jiri Kosina
                     ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Michal Hocko @ 2008-04-10 13:55 UTC (permalink / raw)
  To: Meelis Roos; +Cc: Linux Kernel list, linux-fsdevel

[Adding fsdevel list]

On Tuesday 08 April 2008 10:05:47 am Meelis Roos wrote:
> Jeff Robertson analyzes the behaviour of different operating systems'
> 64-bit file offset implementation and concludes that on 32-bit
> machines, Linux and Solaris lack any locking to keep the two 32-bit
> halves in sync and this could cause rare file offset corruption.
>
> http://jeffr-tech.livejournal.com/21014.html

AFAICS, this race is theoretically possible, but it is very hard (almost 
impossible) to trigger with a sane file usage pattern. 
Note that you have to access shared struct file (same file descriptor) in 
different threads which should be synchronized by caller anyway (*).

I also don't see any security implications from this race, but maybe someone 
with more knowlage about fs can see (f_pos is used at many places in the 
kernel code).

I think that it is better to live with tiny-race-on-broken-patterns rather 
than paying for synchronization which is not needed for correct paths. 

[*] file_pos_{read,write} (fs/read_write.c) are not called under lock (in 
sys_read, sys_write, ...), so even if f_pos is written atomically, you will 
be able to get races when accessing shared descriptor from different threads.
I think that POSIX states, that behavior is undefined under these conditions.

Best regards
-- 
Michal Hocko
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 13:55 ` Michal Hocko
@ 2008-04-10 14:01   ` Jiri Kosina
  2008-04-10 14:27     ` Jan Kara
  2008-04-10 14:31     ` Michal Hocko
  2008-04-10 14:11   ` Martin Mares
  2008-04-10 15:33   ` Andi Kleen
  2 siblings, 2 replies; 53+ messages in thread
From: Jiri Kosina @ 2008-04-10 14:01 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Meelis Roos, Linux Kernel list, linux-fsdevel

On Thu, 10 Apr 2008, Michal Hocko wrote:

> > Jeff Robertson analyzes the behaviour of different operating systems'
> > 64-bit file offset implementation and concludes that on 32-bit
> > machines, Linux and Solaris lack any locking to keep the two 32-bit
> > halves in sync and this could cause rare file offset corruption.
> > http://jeffr-tech.livejournal.com/21014.html
> AFAICS, this race is theoretically possible, but it is very hard (almost 
> impossible) to trigger with a sane file usage pattern. Note that you 
> have to access shared struct file (same file descriptor) in different 
> threads which should be synchronized by caller anyway (*).

... but not in cases the caller is an intentionally evil code, right? :)

> I also don't see any security implications from this race, but maybe 
> someone with more knowlage about fs can see (f_pos is used at many 
> places in the kernel code).

The f_pos races are in fact exploitable, we've already been there. See 
for example http://www.isec.pl/vulnerabilities/isec-0016-procleaks.txt

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 13:55 ` Michal Hocko
  2008-04-10 14:01   ` Jiri Kosina
@ 2008-04-10 14:11   ` Martin Mares
  2008-04-10 15:12     ` Jan Kara
  2008-04-10 15:14     ` Jamie Lokier
  2008-04-10 15:33   ` Andi Kleen
  2 siblings, 2 replies; 53+ messages in thread
From: Martin Mares @ 2008-04-10 14:11 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Meelis Roos, Linux Kernel list, linux-fsdevel

Hello!

> [*] file_pos_{read,write} (fs/read_write.c) are not called under lock (in 
> sys_read, sys_write, ...), so even if f_pos is written atomically, you will 
> be able to get races when accessing shared descriptor from different threads.

There are however cases when such behavior is perfectly valid: For example
you can have a file of records of a fixed size, whose order does not matter.
Then multiple processes can produce the records in parallel, sharing
a single fd.

> I think that POSIX states, that behavior is undefined under these conditions.

Do you have a pointer to that?

				Have a nice fortnight
-- 
Martin `MJ' Mares                          <mj@ucw.cz>   http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Mr. Worf, scan that ship."  "Aye, Captain... 600 DPI?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:01   ` Jiri Kosina
@ 2008-04-10 14:27     ` Jan Kara
  2008-04-10 14:31       ` Jiri Kosina
  2008-04-11 19:26       ` Pavel Machek
  2008-04-10 14:31     ` Michal Hocko
  1 sibling, 2 replies; 53+ messages in thread
From: Jan Kara @ 2008-04-10 14:27 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Michal Hocko, Meelis Roos, Linux Kernel list, linux-fsdevel

> On Thu, 10 Apr 2008, Michal Hocko wrote:
> 
> > > Jeff Robertson analyzes the behaviour of different operating systems'
> > > 64-bit file offset implementation and concludes that on 32-bit
> > > machines, Linux and Solaris lack any locking to keep the two 32-bit
> > > halves in sync and this could cause rare file offset corruption.
> > > http://jeffr-tech.livejournal.com/21014.html
> > AFAICS, this race is theoretically possible, but it is very hard (almost 
> > impossible) to trigger with a sane file usage pattern. Note that you 
> > have to access shared struct file (same file descriptor) in different 
> > threads which should be synchronized by caller anyway (*).
> 
> ... but not in cases the caller is an intentionally evil code, right? :)
  Yes.

> > I also don't see any security implications from this race, but maybe 
> > someone with more knowlage about fs can see (f_pos is used at many 
> > places in the kernel code).
> 
> The f_pos races are in fact exploitable, we've already been there. See 
> for example http://www.isec.pl/vulnerabilities/isec-0016-procleaks.txt
  Well, this race is more subtle - the window is just one instruction
wide (stores to f_pos from CPU2 must come between the store of lower and
upper 32-bits of f_pos on CPU1). And the only result is that f_pos has
32-bits from one file pointer and 32-bits from the other one. So I can
hardly imagine this would be exploitable...

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:27     ` Jan Kara
@ 2008-04-10 14:31       ` Jiri Kosina
  2008-04-10 14:48         ` Matthew Wilcox
                           ` (2 more replies)
  2008-04-11 19:26       ` Pavel Machek
  1 sibling, 3 replies; 53+ messages in thread
From: Jiri Kosina @ 2008-04-10 14:31 UTC (permalink / raw)
  To: Jan Kara; +Cc: Michal Hocko, Meelis Roos, Linux Kernel list, linux-fsdevel

On Thu, 10 Apr 2008, Jan Kara wrote:

> > The f_pos races are in fact exploitable, we've already been there. See 
> > for example http://www.isec.pl/vulnerabilities/isec-0016-procleaks.txt
>   Well, this race is more subtle - the window is just one instruction
> wide (stores to f_pos from CPU2 must come between the store of lower and
> upper 32-bits of f_pos on CPU1). And the only result is that f_pos has
> 32-bits from one file pointer and 32-bits from the other one. So I can
> hardly imagine this would be exploitable...

Supposing you are not holding any spinlock and are running with 
preemptible kernel (pretty common scenario nowadays), there is nothing 
that would prevent kernel from rescheduling between the two instructions, 
enlarging the race window to be more comfortable for attacker, right?

I think this is worth fixing.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:01   ` Jiri Kosina
  2008-04-10 14:27     ` Jan Kara
@ 2008-04-10 14:31     ` Michal Hocko
  2008-04-10 14:35       ` Jiri Kosina
  1 sibling, 1 reply; 53+ messages in thread
From: Michal Hocko @ 2008-04-10 14:31 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Meelis Roos, Linux Kernel list, linux-fsdevel

On Thursday 10 April 2008 04:01:27 pm Jiri Kosina wrote:
> On Thu, 10 Apr 2008, Michal Hocko wrote:
> > > Jeff Robertson analyzes the behaviour of different operating systems'
> > > 64-bit file offset implementation and concludes that on 32-bit
> > > machines, Linux and Solaris lack any locking to keep the two 32-bit
> > > halves in sync and this could cause rare file offset corruption.
> > > http://jeffr-tech.livejournal.com/21014.html
> >
> > AFAICS, this race is theoretically possible, but it is very hard (almost
> > impossible) to trigger with a sane file usage pattern. Note that you
> > have to access shared struct file (same file descriptor) in different
> > threads which should be synchronized by caller anyway (*).
>
> ... but not in cases the caller is an intentionally evil code, right? :)

Ok, but evil code needs to have access to your struct file and in such a case 
he can do worse things ;)
Or do you have some concrete (innocent looking) example? 

>
> > I also don't see any security implications from this race, but maybe
> > someone with more knowlage about fs can see (f_pos is used at many
> > places in the kernel code).
>
> The f_pos races are in fact exploitable, we've already been there. See
> for example http://www.isec.pl/vulnerabilities/isec-0016-procleaks.txt

This is different race with file position IMO. If I understand the report 
correctly, problem was with sleeping copy_to_user while the f_pos has 
changed.

Best regards
-- 
Michal Hocko
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:31     ` Michal Hocko
@ 2008-04-10 14:35       ` Jiri Kosina
  0 siblings, 0 replies; 53+ messages in thread
From: Jiri Kosina @ 2008-04-10 14:35 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Meelis Roos, Linux Kernel list, linux-fsdevel

On Thu, 10 Apr 2008, Michal Hocko wrote:

> This is different race with file position IMO. If I understand the 
> report correctly, problem was with sleeping copy_to_user while the f_pos 
> has changed.

Is this really in principle different from obtaining reschedule between 
the two mov instructions?

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:31       ` Jiri Kosina
@ 2008-04-10 14:48         ` Matthew Wilcox
  2008-04-10 15:22           ` Jan Kara
  2008-04-10 15:19         ` Jan Kara
  2008-04-10 16:03         ` Diego Calleja
  2 siblings, 1 reply; 53+ messages in thread
From: Matthew Wilcox @ 2008-04-10 14:48 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Jan Kara, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

On Thu, Apr 10, 2008 at 04:31:09PM +0200, Jiri Kosina wrote:
> >   Well, this race is more subtle - the window is just one instruction
> > wide (stores to f_pos from CPU2 must come between the store of lower and
> > upper 32-bits of f_pos on CPU1). And the only result is that f_pos has
> > 32-bits from one file pointer and 32-bits from the other one. So I can
> > hardly imagine this would be exploitable...
> 
> Supposing you are not holding any spinlock and are running with 
> preemptible kernel (pretty common scenario nowadays), there is nothing 
> that would prevent kernel from rescheduling between the two instructions, 
> enlarging the race window to be more comfortable for attacker, right?
> 
> I think this is worth fixing.

Seems a lot like reading jiffies to me.  Is the seqlock the right
solution to use for fixing this?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:11   ` Martin Mares
@ 2008-04-10 15:12     ` Jan Kara
  2008-04-10 15:14     ` Jamie Lokier
  1 sibling, 0 replies; 53+ messages in thread
From: Jan Kara @ 2008-04-10 15:12 UTC (permalink / raw)
  To: Martin Mares; +Cc: Michal Hocko, Meelis Roos, Linux Kernel list, linux-fsdevel

> Hello!
> 
> > [*] file_pos_{read,write} (fs/read_write.c) are not called under lock (in 
> > sys_read, sys_write, ...), so even if f_pos is written atomically, you will 
> > be able to get races when accessing shared descriptor from different threads.
> 
> There are however cases when such behavior is perfectly valid: For example
> you can have a file of records of a fixed size, whose order does not matter.
> Then multiple processes can produce the records in parallel, sharing
> a single fd.
  Well, but noone guarantees that both processes don't read the same
data.

> > I think that POSIX states, that behavior is undefined under these conditions.
> 
> Do you have a pointer to that?
  SUSv3 says:
On files that support seeking (for example, a regular file), the read()
shall start at a position in the file given by the file offset
associated with fildes. The file offset shall be incremented by the
number of bytes actually read.

  But nowhere is specified when this happens so OS is perfectly free to
advance f_pos after read finishes when read from the other process is
already running. And Linux does exactly that - actually, we do:
  pos = f_pos
  do reading which advances pos
  f_pos = pos

  So it can even in theory happen that one thread reads entries 1,2,3,2
because the other thread in the mean time finished reading entry 1...

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:11   ` Martin Mares
  2008-04-10 15:12     ` Jan Kara
@ 2008-04-10 15:14     ` Jamie Lokier
  2008-04-10 15:21       ` Matthew Wilcox
  2008-04-10 15:28       ` Jan Kara
  1 sibling, 2 replies; 53+ messages in thread
From: Jamie Lokier @ 2008-04-10 15:14 UTC (permalink / raw)
  To: Martin Mares; +Cc: Michal Hocko, Meelis Roos, Linux Kernel list, linux-fsdevel

Martin Mares wrote:
> > [*] file_pos_{read,write} (fs/read_write.c) are not called under
> > lock (in sys_read, sys_write, ...), so even if f_pos is written
> > atomically, you will be able to get races when accessing shared
> > descriptor from different threads.
> 
> There are however cases when such behavior is perfectly valid: For example
> you can have a file of records of a fixed size, whose order does not matter.
> Then multiple processes can produce the records in parallel, sharing
> a single fd.

A rather more common thing:

Does this problem apply when appending lines or records to a log file,
with or without O_APPEND?

Also, can this problem affect programs doing concurrent reads/writes
using pread/pwrite (or the AIO equivalents)?

-- Jamie

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:31       ` Jiri Kosina
  2008-04-10 14:48         ` Matthew Wilcox
@ 2008-04-10 15:19         ` Jan Kara
  2008-04-10 15:37           ` Michal Hocko
  2008-04-10 16:03         ` Diego Calleja
  2 siblings, 1 reply; 53+ messages in thread
From: Jan Kara @ 2008-04-10 15:19 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Michal Hocko, Meelis Roos, Linux Kernel list, linux-fsdevel

> On Thu, 10 Apr 2008, Jan Kara wrote:
> 
> > > The f_pos races are in fact exploitable, we've already been there. See 
> > > for example http://www.isec.pl/vulnerabilities/isec-0016-procleaks.txt
> >   Well, this race is more subtle - the window is just one instruction
> > wide (stores to f_pos from CPU2 must come between the store of lower and
> > upper 32-bits of f_pos on CPU1). And the only result is that f_pos has
> > 32-bits from one file pointer and 32-bits from the other one. So I can
> > hardly imagine this would be exploitable...
> 
> Supposing you are not holding any spinlock and are running with 
> preemptible kernel (pretty common scenario nowadays), there is nothing 
> that would prevent kernel from rescheduling between the two instructions, 
> enlarging the race window to be more comfortable for attacker, right?
  Yes, this is theoretically possible.

> I think this is worth fixing.
  Hmm, maybe it is, although I still don't see how to exploit it :).

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 15:14     ` Jamie Lokier
@ 2008-04-10 15:21       ` Matthew Wilcox
  2008-04-10 15:28       ` Jan Kara
  1 sibling, 0 replies; 53+ messages in thread
From: Matthew Wilcox @ 2008-04-10 15:21 UTC (permalink / raw)
  To: Martin Mares, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

On Thu, Apr 10, 2008 at 04:14:06PM +0100, Jamie Lokier wrote:
> Also, can this problem affect programs doing concurrent reads/writes
> using pread/pwrite (or the AIO equivalents)?

pread/pwrite specify an explicit offset and do not change the file
offset, so there's no way they can be affected.  See the manpage.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:48         ` Matthew Wilcox
@ 2008-04-10 15:22           ` Jan Kara
  2008-04-10 15:30             ` Matthew Wilcox
  0 siblings, 1 reply; 53+ messages in thread
From: Jan Kara @ 2008-04-10 15:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jiri Kosina, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

> On Thu, Apr 10, 2008 at 04:31:09PM +0200, Jiri Kosina wrote:
> > >   Well, this race is more subtle - the window is just one instruction
> > > wide (stores to f_pos from CPU2 must come between the store of lower and
> > > upper 32-bits of f_pos on CPU1). And the only result is that f_pos has
> > > 32-bits from one file pointer and 32-bits from the other one. So I can
> > > hardly imagine this would be exploitable...
> > 
> > Supposing you are not holding any spinlock and are running with 
> > preemptible kernel (pretty common scenario nowadays), there is nothing 
> > that would prevent kernel from rescheduling between the two instructions, 
> > enlarging the race window to be more comfortable for attacker, right?
> > 
> > I think this is worth fixing.
> 
> Seems a lot like reading jiffies to me.  Is the seqlock the right
> solution to use for fixing this?
  You can get your inspiration in the implementation of i_size_read()
and i_size_write() functions :). They deal with exactly the same problem.
But in the case of f_pos, the number of readers and writers is balanced so
maybe a spinlock would be fine as well...

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 15:14     ` Jamie Lokier
  2008-04-10 15:21       ` Matthew Wilcox
@ 2008-04-10 15:28       ` Jan Kara
  1 sibling, 0 replies; 53+ messages in thread
From: Jan Kara @ 2008-04-10 15:28 UTC (permalink / raw)
  To: Martin Mares, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

> Martin Mares wrote:
> > > [*] file_pos_{read,write} (fs/read_write.c) are not called under
> > > lock (in sys_read, sys_write, ...), so even if f_pos is written
> > > atomically, you will be able to get races when accessing shared
> > > descriptor from different threads.
> > 
> > There are however cases when such behavior is perfectly valid: For example
> > you can have a file of records of a fixed size, whose order does not matter.
> > Then multiple processes can produce the records in parallel, sharing
> > a single fd.
> 
> A rather more common thing:
> 
> Does this problem apply when appending lines or records to a log file,
> with or without O_APPEND?
  O_APPEND works correctly in all cases (it ignores f_pos in the
descriptor). Without O_APPEND you can hit the race (but I'd like to see
a sensible use case of this ;).

> Also, can this problem affect programs doing concurrent reads/writes
> using pread/pwrite (or the AIO equivalents)?
  As Matthew said, pread/pwrite are safe, parallel read can hit the race,
write was described above...

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 15:22           ` Jan Kara
@ 2008-04-10 15:30             ` Matthew Wilcox
  0 siblings, 0 replies; 53+ messages in thread
From: Matthew Wilcox @ 2008-04-10 15:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jiri Kosina, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

On Thu, Apr 10, 2008 at 05:22:12PM +0200, Jan Kara wrote:
>   You can get your inspiration in the implementation of i_size_read()
> and i_size_write() functions :). They deal with exactly the same problem.
> But in the case of f_pos, the number of readers and writers is balanced so
> maybe a spinlock would be fine as well...

It's not quite balanced -- see sys_getdents() for a counterexample.

i_size_read/write use a seqcount rather than a seqlock, but the
principle is the same.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 13:55 ` Michal Hocko
  2008-04-10 14:01   ` Jiri Kosina
  2008-04-10 14:11   ` Martin Mares
@ 2008-04-10 15:33   ` Andi Kleen
  2 siblings, 0 replies; 53+ messages in thread
From: Andi Kleen @ 2008-04-10 15:33 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Meelis Roos, Linux Kernel list, linux-fsdevel

Michal Hocko <mhocko@suse.cz> writes:

> [Adding fsdevel list]
>
> On Tuesday 08 April 2008 10:05:47 am Meelis Roos wrote:
>> Jeff Robertson analyzes the behaviour of different operating systems'
>> 64-bit file offset implementation and concludes that on 32-bit
>> machines, Linux and Solaris lack any locking to keep the two 32-bit
>> halves in sync and this could cause rare file offset corruption.
>>
>> http://jeffr-tech.livejournal.com/21014.html
>
> AFAICS, this race is theoretically possible, but it is very hard (almost 
> impossible) to trigger with a sane file usage pattern. 

We discussed this extensively some time ago in 

http://thread.gmane.org/gmane.linux.file-systems/20712/focus=20771

No solution so far

-Andi

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 15:19         ` Jan Kara
@ 2008-04-10 15:37           ` Michal Hocko
  2008-04-10 15:56             ` Jan Kara
  0 siblings, 1 reply; 53+ messages in thread
From: Michal Hocko @ 2008-04-10 15:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jiri Kosina, Meelis Roos, Linux Kernel list, linux-fsdevel

On Thursday 10 April 2008 05:19:45 pm Jan Kara wrote:
> > On Thu, 10 Apr 2008, Jan Kara wrote:
> > > > The f_pos races are in fact exploitable, we've already been there.
> > > > See for example
> > > > http://www.isec.pl/vulnerabilities/isec-0016-procleaks.txt
> > >
> > >   Well, this race is more subtle - the window is just one instruction
> > > wide (stores to f_pos from CPU2 must come between the store of lower
> > > and upper 32-bits of f_pos on CPU1). And the only result is that f_pos
> > > has 32-bits from one file pointer and 32-bits from the other one. So I
> > > can hardly imagine this would be exploitable...
> >
> > Supposing you are not holding any spinlock and are running with
> > preemptible kernel (pretty common scenario nowadays), there is nothing
> > that would prevent kernel from rescheduling between the two instructions,
> > enlarging the race window to be more comfortable for attacker, right?
>
>   Yes, this is theoretically possible.
>
> > I think this is worth fixing.
>
>   Hmm, maybe it is, although I still don't see how to exploit it :).

Maybe (just guess) some high priority malicious process could try to preempt 
reading thread to always in the bad moment (when the half of the f_pos is 
written) and thus forcing it to read bad data (you usually don't check that 
file position is growing after each read and you wait only for end of the 
file). 
But do agree, I still don't see something with really security implications 
(privileged processes usually don't work with such a big files).

>
> 									Honza


Best regards
-- 
Michal Hocko
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 15:37           ` Michal Hocko
@ 2008-04-10 15:56             ` Jan Kara
  0 siblings, 0 replies; 53+ messages in thread
From: Jan Kara @ 2008-04-10 15:56 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Jiri Kosina, Meelis Roos, Linux Kernel list, linux-fsdevel

On Thu 10-04-08 17:37:16, Michal Hocko wrote:
> On Thursday 10 April 2008 05:19:45 pm Jan Kara wrote:
> > > On Thu, 10 Apr 2008, Jan Kara wrote:
> > > > > The f_pos races are in fact exploitable, we've already been there.
> > > > > See for example
> > > > > http://www.isec.pl/vulnerabilities/isec-0016-procleaks.txt
> > > >
> > > >   Well, this race is more subtle - the window is just one instruction
> > > > wide (stores to f_pos from CPU2 must come between the store of lower
> > > > and upper 32-bits of f_pos on CPU1). And the only result is that f_pos
> > > > has 32-bits from one file pointer and 32-bits from the other one. So I
> > > > can hardly imagine this would be exploitable...
> > >
> > > Supposing you are not holding any spinlock and are running with
> > > preemptible kernel (pretty common scenario nowadays), there is nothing
> > > that would prevent kernel from rescheduling between the two instructions,
> > > enlarging the race window to be more comfortable for attacker, right?
> >
> >   Yes, this is theoretically possible.
> >
> > > I think this is worth fixing.
> >
> >   Hmm, maybe it is, although I still don't see how to exploit it :).
> 
> Maybe (just guess) some high priority malicious process could try to preempt 
> reading thread to always in the bad moment (when the half of the f_pos is 
> written) and thus forcing it to read bad data (you usually don't check that 
> file position is growing after each read and you wait only for end of the 
> file). 
> But do agree, I still don't see something with really security implications 
> (privileged processes usually don't work with such a big files).
  Well, but for this to work the process you try to attack must access the
file from several threads in parallel without any locking... And I'm not
aware of anybody really doing this.
  Really the only attack vector I could imagine is that you create several
malitious processes which will try to corrupt f_pos and then use it (like
if you could make it negative, I could imagine this could trigger some bug
somewhere). But since possible corruptions are quite limited, I don't see
how to corrupt f_pos to something at least remotely "useful".

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:31       ` Jiri Kosina
  2008-04-10 14:48         ` Matthew Wilcox
  2008-04-10 15:19         ` Jan Kara
@ 2008-04-10 16:03         ` Diego Calleja
  2008-04-10 16:15           ` Jan Kara
  2 siblings, 1 reply; 53+ messages in thread
From: Diego Calleja @ 2008-04-10 16:03 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Jan Kara, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

El Thu, 10 Apr 2008 16:31:09 +0200 (CEST), Jiri Kosina <jkosina@suse.cz> escribió:

> I think this is worth fixing.

This question comes very often, and Linus even wrote a patch
(http://lkml.org/lkml/2006/4/13/124 , http://lkml.org/lkml/2006/4/13/130)

But apparently there's no much interest in fixing it, because it would
slow down some workloads...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 16:03         ` Diego Calleja
@ 2008-04-10 16:15           ` Jan Kara
  0 siblings, 0 replies; 53+ messages in thread
From: Jan Kara @ 2008-04-10 16:15 UTC (permalink / raw)
  To: Diego Calleja
  Cc: Jiri Kosina, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

On Thu 10-04-08 18:03:35, Diego Calleja wrote:
> El Thu, 10 Apr 2008 16:31:09 +0200 (CEST), Jiri Kosina <jkosina@suse.cz> escribió:
> 
> > I think this is worth fixing.
> 
> This question comes very often, and Linus even wrote a patch
> (http://lkml.org/lkml/2006/4/13/124 , http://lkml.org/lkml/2006/4/13/130)
> 
> But apparently there's no much interest in fixing it, because it would
> slow down some workloads...
  Well, what Linus writes about is a different issue (and with a more
costly solution). Here we are concerned just with the problem that
  file->f_pos = pos;
isn't atomic on some archs.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
       [not found]         ` <ah7vN-7Wz-9@gated-at.bofh.it>
@ 2008-04-11 12:24           ` Bodo Eggert
  2008-04-11 13:55             ` Lennart Sorensen
  0 siblings, 1 reply; 53+ messages in thread
From: Bodo Eggert @ 2008-04-11 12:24 UTC (permalink / raw)
  To: Diego Calleja, Jiri Kosina, Jan Kara, Michal Hocko, Meelis Roos,
	Linux Kernel list <linux-kernel

Diego Calleja <diegocg@gmail.com> wrote:

> El Thu, 10 Apr 2008 16:31:09 +0200 (CEST), Jiri Kosina <jkosina@suse.cz>
> escribió:
> 
>> I think this is worth fixing.
> 
> This question comes very often, and Linus even wrote a patch
> (http://lkml.org/lkml/2006/4/13/124 , http://lkml.org/lkml/2006/4/13/130)
> 
> But apparently there's no much interest in fixing it, because it would
> slow down some workloads...

AS far as I understand, the race is e.g.:

fpos := A:a, we want to make process/thread a read A:b or B:a without it
being a correct value in fpos. a!=b!=c, A!=B, A!=C.

a: read fpos.high (A:?)
b: write fpos (B:b)
a: read fpos.low (A:b)


If you change this to 

a: read fpos.high
a: read fpos.low
a: read fpos.high
a: read fpos.low

and compare the results, you need to

a: read fpos.high (A:?)
b: write fpos (B:b)
a: read fpos.low (A:b)
b: write fpos (A:c)
a: read fpos.high (A:b),(A:?)
b: write fpos (C:b)
a: read fpos.low (A:b),(A:b)

That would be winning three races in order to hit the bug. 


OTOH, writers MUST NOT be interrupted, because:

b: write fpos.high (B:a)
a: read fpos.high (B:?)
a: read fpos.low (B:a)
a: read fpos.high (B:a),(B:?)
a: read fpos.low (B:a),(B:a)
b: write fpos.low (B:b)


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-11 12:24           ` file offset corruption on 32-bit machines? Bodo Eggert
@ 2008-04-11 13:55             ` Lennart Sorensen
  2008-04-11 16:59               ` Bryan Henderson
  2008-04-14 16:20               ` Jan Kara
  0 siblings, 2 replies; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-11 13:55 UTC (permalink / raw)
  To: Bodo Eggert
  Cc: Diego Calleja, Jiri Kosina, Jan Kara, Michal Hocko, Meelis Roos,
	Linux Kernel list, linux-fsdevel

On Fri, Apr 11, 2008 at 02:24:34PM +0200, Bodo Eggert wrote:
> AS far as I understand, the race is e.g.:
> 
> fpos := A:a, we want to make process/thread a read A:b or B:a without it
> being a correct value in fpos. a!=b!=c, A!=B, A!=C.
> 
> a: read fpos.high (A:?)
> b: write fpos (B:b)
> a: read fpos.low (A:b)
> 
> 
> If you change this to 
> 
> a: read fpos.high
> a: read fpos.low
> a: read fpos.high
> a: read fpos.low
> 
> and compare the results, you need to
> 
> a: read fpos.high (A:?)
> b: write fpos (B:b)
> a: read fpos.low (A:b)
> b: write fpos (A:c)
> a: read fpos.high (A:b),(A:?)
> b: write fpos (C:b)
> a: read fpos.low (A:b),(A:b)
> 
> That would be winning three races in order to hit the bug. 
> 
> 
> OTOH, writers MUST NOT be interrupted, because:
> 
> b: write fpos.high (B:a)
> a: read fpos.high (B:?)
> a: read fpos.low (B:a)
> a: read fpos.high (B:a),(B:?)
> a: read fpos.low (B:a),(B:a)
> b: write fpos.low (B:b)

So if you write multithreaded code and don't understand what locking
around shared resources is for, then your application might break.  Can
you give an example where locking is being used correctly where this can
possibly fail?  The kernel can't prevent idiots from writing bad code
that breaks.

I just don't get this "problem".

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-11 13:55             ` Lennart Sorensen
@ 2008-04-11 16:59               ` Bryan Henderson
  2008-04-11 17:15                 ` Lennart Sorensen
  2008-04-14 16:20               ` Jan Kara
  1 sibling, 1 reply; 53+ messages in thread
From: Bryan Henderson @ 2008-04-11 16:59 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Bodo Eggert, Diego Calleja, Jan Kara, Jiri Kosina, linux-fsdevel,
	Linux Kernel list, Michal Hocko, Meelis Roos

>So if you write multithreaded code and don't understand what locking
>around shared resources is for, then your application might break.

I think I know what locking around shared resources is for, which is why 
I'm surprised the kernel doesn't do it.

Is it normal for a kernel resource not to be thread-safe (i.e. you don't 
get advertised/sensible results if two threads access it at the same 
time)?

>Can you give an example where locking is being used correctly where this 
can
>possibly fail?

I could accept (though I haven't thought about it) that there aren't any 
real-world applications that do simultaneous reads and writes through the 
same file pointer.  I might even accept that there can be no useful 
application that does.  But can you say such an application is incorrect?

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-11 16:59               ` Bryan Henderson
@ 2008-04-11 17:15                 ` Lennart Sorensen
  2008-04-11 21:29                   ` Bryan Henderson
  2008-04-12  8:48                   ` Pavel Machek
  0 siblings, 2 replies; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-11 17:15 UTC (permalink / raw)
  To: Bryan Henderson
  Cc: Bodo Eggert, Diego Calleja, Jan Kara, Jiri Kosina, linux-fsdevel,
	Linux Kernel list, Michal Hocko, Meelis Roos

On Fri, Apr 11, 2008 at 09:59:45AM -0700, Bryan Henderson wrote:
> >So if you write multithreaded code and don't understand what locking
> >around shared resources is for, then your application might break.
> 
> I think I know what locking around shared resources is for, which is why 
> I'm surprised the kernel doesn't do it.
> 
> Is it normal for a kernel resource not to be thread-safe (i.e. you don't 
> get advertised/sensible results if two threads access it at the same 
> time)?

If two threads are changing one filehandle at the same time, then the
program is broken.  I can't see how the kernel making updates to 64bit
filehandles "atomic" helps.  You could still seek in one thread, then
seek in another and then start the write in the first and get a wrong
result.  Changes to a shared filehandle of any kind requires locking to
work reliably, so additional slow downs and locking in the kernel won't
fix anything.

> I could accept (though I haven't thought about it) that there aren't any 
> real-world applications that do simultaneous reads and writes through the 
> same file pointer.  I might even accept that there can be no useful 
> application that does.  But can you say such an application is incorrect?

Unless the application has it's own locking to ensure multiple threads
don't screw up each other's fileposition, it simply wouldn't work.

What is the difference between doing:

threadA: seek(positionA)
threadB: seel(positionB)
threadA: write
threadB: write

versus

threadA: seek(posisionA) but only set half the 64bits
threadB: seek(positionB) set all 64bits
threadA: complete seek operation setting the other half of the bits
threadA: write
threadB: write

either way you end up writing to the wrong file location even though the
first case the kernel made the setting of the fileposition atomic and in
the second case it wasn't.

The application has to do:
threadA: lock access to filehandle
threadA: seek(positionA)
threadB: try to get lock and wait
threadA: write
threadA: unlock
threadB: get lock finally
threadB: seek(positionB)
threadB: write
threadB: unlock

Once the application does locking, it doesn't matter if the setting of
the fileposition is atomic or not since no other thread can touch the
filehandle anyhow.

Doesn't matter if you read or write.  If it's a shared filehandle you
have only one current position so to share it you have to lock access
while doing a seek + read or write operation if you want predictable
results.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-10 14:27     ` Jan Kara
  2008-04-10 14:31       ` Jiri Kosina
@ 2008-04-11 19:26       ` Pavel Machek
  2008-04-14 16:25         ` Jan Kara
  1 sibling, 1 reply; 53+ messages in thread
From: Pavel Machek @ 2008-04-11 19:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jiri Kosina, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

On Thu 2008-04-10 16:27:00, Jan Kara wrote:
> > On Thu, 10 Apr 2008, Michal Hocko wrote:
> > 
> > > > Jeff Robertson analyzes the behaviour of different operating systems'
> > > > 64-bit file offset implementation and concludes that on 32-bit
> > > > machines, Linux and Solaris lack any locking to keep the two 32-bit
> > > > halves in sync and this could cause rare file offset corruption.
> > > > http://jeffr-tech.livejournal.com/21014.html
> > > AFAICS, this race is theoretically possible, but it is very hard (almost 
> > > impossible) to trigger with a sane file usage pattern. Note that you 
> > > have to access shared struct file (same file descriptor) in different 
> > > threads which should be synchronized by caller anyway (*).
> > 
> > ... but not in cases the caller is an intentionally evil code, right? :)
>   Yes.
> 
> > > I also don't see any security implications from this race, but maybe 
> > > someone with more knowlage about fs can see (f_pos is used at many 
> > > places in the kernel code).
> > 
> > The f_pos races are in fact exploitable, we've already been there. See 
> > for example http://www.isec.pl/vulnerabilities/isec-0016-procleaks.txt
>   Well, this race is more subtle - the window is just one instruction
> wide (stores to f_pos from CPU2 must come between the store of lower and
> upper 32-bits of f_pos on CPU1). And the only result is that f_pos has
> 32-bits from one file pointer and 32-bits from the other one. So I can
> hardly imagine this would be exploitable...

Don't we have rlimit on max file size? I'd guess this could work
around it?
							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-11 17:15                 ` Lennart Sorensen
@ 2008-04-11 21:29                   ` Bryan Henderson
  2008-04-12  8:48                   ` Pavel Machek
  1 sibling, 0 replies; 53+ messages in thread
From: Bryan Henderson @ 2008-04-11 21:29 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Bodo Eggert, Diego Calleja, Jan Kara, Jiri Kosina, linux-fsdevel,
	Linux Kernel list, Michal Hocko, Meelis Roos

>What is the difference between doing:
>
>threadA: seek(positionA)
>threadB: seel(positionB)
>threadA: write
>threadB: write
>
>versus
>
>threadA: seek(posisionA) but only set half the 64bits
>threadB: seek(positionB) set all 64bits
>threadA: complete seek operation setting the other half of the bits
>threadA: write
>threadB: write
>
>either way you end up writing to the wrong file location

Only if you make an assumption about what this program considers the right 
location.  One difference is that in the first case, data gets written 
only at a place to which the program seeked, while in the second, it gets 
written to a totally illogical place.  Another is that in the first, the 
data gets written as specified in standards and in the second, it doesn't. 
 I can imagine a program that would be satisfied with the first and not 
the second, and for such a program, I cannot use the word "incorrect" or 
"broken" or say the programmer doesn't understand shared resources.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-11 17:15                 ` Lennart Sorensen
  2008-04-11 21:29                   ` Bryan Henderson
@ 2008-04-12  8:48                   ` Pavel Machek
  1 sibling, 0 replies; 53+ messages in thread
From: Pavel Machek @ 2008-04-12  8:48 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Bryan Henderson, Bodo Eggert, Diego Calleja, Jan Kara,
	Jiri Kosina, linux-fsdevel, Linux Kernel list, Michal Hocko,
	Meelis Roos

Hi!

> > >So if you write multithreaded code and don't understand what locking
> > >around shared resources is for, then your application might break.
> > 
> > I think I know what locking around shared resources is for, which is why 
> > I'm surprised the kernel doesn't do it.
> > 
> > Is it normal for a kernel resource not to be thread-safe (i.e. you don't 
> > get advertised/sensible results if two threads access it at the same 
> > time)?
> 
> If two threads are changing one filehandle at the same time, then the
> program is broken.  I can't see how the kernel making updates to 64bit
> filehandles "atomic" helps.  You could still seek in one thread, then
> seek in another and then start the write in the first and get a wrong
> result.  Changes to a shared filehandle of any kind requires locking to
> work reliably, so additional slow downs and locking in the kernel won't
> fix anything.

Well, app may be broken, or it may be trying to confuse you.

If you were stracing app, it seeked at 1GB and at 7GB, then did
read(), you'd be certainly very surprised if it did read secret data
at 3GB, right?

And ptrace monitors do exist.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-11 13:55             ` Lennart Sorensen
  2008-04-11 16:59               ` Bryan Henderson
@ 2008-04-14 16:20               ` Jan Kara
  2008-04-14 16:22                 ` Lennart Sorensen
  1 sibling, 1 reply; 53+ messages in thread
From: Jan Kara @ 2008-04-14 16:20 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Bodo Eggert, Diego Calleja, Jiri Kosina, Jan Kara, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Fri 11-04-08 09:55:44, Lennart Sorensen wrote:
> On Fri, Apr 11, 2008 at 02:24:34PM +0200, Bodo Eggert wrote:
> > AS far as I understand, the race is e.g.:
> > 
> > fpos := A:a, we want to make process/thread a read A:b or B:a without it
> > being a correct value in fpos. a!=b!=c, A!=B, A!=C.
> > 
> > a: read fpos.high (A:?)
> > b: write fpos (B:b)
> > a: read fpos.low (A:b)
> > 
> > 
> > If you change this to 
> > 
> > a: read fpos.high
> > a: read fpos.low
> > a: read fpos.high
> > a: read fpos.low
> > 
> > and compare the results, you need to
> > 
> > a: read fpos.high (A:?)
> > b: write fpos (B:b)
> > a: read fpos.low (A:b)
> > b: write fpos (A:c)
> > a: read fpos.high (A:b),(A:?)
> > b: write fpos (C:b)
> > a: read fpos.low (A:b),(A:b)
> > 
> > That would be winning three races in order to hit the bug. 
> > 
> > 
> > OTOH, writers MUST NOT be interrupted, because:
> > 
> > b: write fpos.high (B:a)
> > a: read fpos.high (B:?)
> > a: read fpos.low (B:a)
> > a: read fpos.high (B:a),(B:?)
> > a: read fpos.low (B:a),(B:a)
> > b: write fpos.low (B:b)
> 
> So if you write multithreaded code and don't understand what locking
> around shared resources is for, then your application might break.  Can
> you give an example where locking is being used correctly where this can
> possibly fail?  The kernel can't prevent idiots from writing bad code
> that breaks.
> 
> I just don't get this "problem".
  Well, as Jiri Kosina wrote, this isn't a problem unless someone finds
a way how to use this race for some attack (and for example making f_pos
negative compromises security so it is not so far-fetched as it would
seem). So proactively fixing this makes some sence.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-14 16:20               ` Jan Kara
@ 2008-04-14 16:22                 ` Lennart Sorensen
  2008-04-14 16:53                   ` Jan Kara
  0 siblings, 1 reply; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-14 16:22 UTC (permalink / raw)
  To: Jan Kara
  Cc: Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Mon, Apr 14, 2008 at 06:20:31PM +0200, Jan Kara wrote:
>   Well, as Jiri Kosina wrote, this isn't a problem unless someone finds
> a way how to use this race for some attack (and for example making f_pos
> negative compromises security so it is not so far-fetched as it would
> seem). So proactively fixing this makes some sence.

But you would have to be part of that process to affect the filehandle
wouldn't you?  If you are part of the process already wouldn't it be
easier to manipulate things directly rather than playing with the
filehandle position?

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-11 19:26       ` Pavel Machek
@ 2008-04-14 16:25         ` Jan Kara
  0 siblings, 0 replies; 53+ messages in thread
From: Jan Kara @ 2008-04-14 16:25 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jiri Kosina, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

On Fri 11-04-08 21:26:56, Pavel Machek wrote:
> On Thu 2008-04-10 16:27:00, Jan Kara wrote:
> > > On Thu, 10 Apr 2008, Michal Hocko wrote:
> > > 
> > > > > Jeff Robertson analyzes the behaviour of different operating systems'
> > > > > 64-bit file offset implementation and concludes that on 32-bit
> > > > > machines, Linux and Solaris lack any locking to keep the two 32-bit
> > > > > halves in sync and this could cause rare file offset corruption.
> > > > > http://jeffr-tech.livejournal.com/21014.html
> > > > AFAICS, this race is theoretically possible, but it is very hard (almost 
> > > > impossible) to trigger with a sane file usage pattern. Note that you 
> > > > have to access shared struct file (same file descriptor) in different 
> > > > threads which should be synchronized by caller anyway (*).
> > > 
> > > ... but not in cases the caller is an intentionally evil code, right? :)
> >   Yes.
> > 
> > > > I also don't see any security implications from this race, but maybe 
> > > > someone with more knowlage about fs can see (f_pos is used at many 
> > > > places in the kernel code).
> > > 
> > > The f_pos races are in fact exploitable, we've already been there. See 
> > > for example http://www.isec.pl/vulnerabilities/isec-0016-procleaks.txt
> >   Well, this race is more subtle - the window is just one instruction
> > wide (stores to f_pos from CPU2 must come between the store of lower and
> > upper 32-bits of f_pos on CPU1). And the only result is that f_pos has
> > 32-bits from one file pointer and 32-bits from the other one. So I can
> > hardly imagine this would be exploitable...
> 
> Don't we have rlimit on max file size? I'd guess this could work
> around it?
  There is this limit but AFAIK it limits max size of file you're able to
create. And write/truncate checks already their local variable so the real
value used later.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-14 16:22                 ` Lennart Sorensen
@ 2008-04-14 16:53                   ` Jan Kara
  2008-04-14 16:54                     ` Alan Cox
  2008-04-14 17:06                     ` Lennart Sorensen
  0 siblings, 2 replies; 53+ messages in thread
From: Jan Kara @ 2008-04-14 16:53 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Mon 14-04-08 12:22:02, Lennart Sorensen wrote:
> On Mon, Apr 14, 2008 at 06:20:31PM +0200, Jan Kara wrote:
> >   Well, as Jiri Kosina wrote, this isn't a problem unless someone finds
> > a way how to use this race for some attack (and for example making f_pos
> > negative compromises security so it is not so far-fetched as it would
> > seem). So proactively fixing this makes some sence.
> 
> But you would have to be part of that process to affect the filehandle
> wouldn't you?  If you are part of the process already wouldn't it be
> easier to manipulate things directly rather than playing with the
> filehandle position?
  Well, but imagine you have a file /proc/my_secret_file from which you
are able to read from position A:a and B:b but not from position
A:b. Concievably, checks for the file position could be bypassed because of
this race... I know this is kind of dumb example but I can imagine someone
can eventually find something like this. So I guess one spin lock/unlock
pair is a price worth paying in the callpath which is quite long anyway.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-14 16:53                   ` Jan Kara
@ 2008-04-14 16:54                     ` Alan Cox
  2008-04-14 18:34                       ` Alexey Dobriyan
  2008-04-14 17:06                     ` Lennart Sorensen
  1 sibling, 1 reply; 53+ messages in thread
From: Alan Cox @ 2008-04-14 16:54 UTC (permalink / raw)
  To: Jan Kara
  Cc: Lennart Sorensen, Bodo Eggert, Diego Calleja, Jiri Kosina,
	Michal Hocko, Meelis Roos, Linux Kernel list, linux-fsdevel

>   Well, but imagine you have a file /proc/my_secret_file from which you
> are able to read from position A:a and B:b but not from position
> A:b. Concievably, checks for the file position could be bypassed because of
> this race... I know this is kind of dumb example but I can imagine someone

Unlikely as the ppos passed to the driver is a private copy and the user
could equally use pread/pwrite to specify that offset.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-14 16:53                   ` Jan Kara
  2008-04-14 16:54                     ` Alan Cox
@ 2008-04-14 17:06                     ` Lennart Sorensen
  2008-04-14 19:03                       ` Jan Kara
  1 sibling, 1 reply; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-14 17:06 UTC (permalink / raw)
  To: Jan Kara
  Cc: Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Mon, Apr 14, 2008 at 06:53:54PM +0200, Jan Kara wrote:
>   Well, but imagine you have a file /proc/my_secret_file from which you
> are able to read from position A:a and B:b but not from position
> A:b. Concievably, checks for the file position could be bypassed because of
> this race... I know this is kind of dumb example but I can imagine someone
> can eventually find something like this. So I guess one spin lock/unlock
> pair is a price worth paying in the callpath which is quite long anyway.

But only two threads within the process can read from the filehandle and
hence the process would be doing locking.  And external attacker can't
break the internal locking of the process between the threads, and even
if you do open the file in /proc that the process is using, being and
external process you would have your own file handle and hence your own
file position since you aren't part of that process.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-14 16:54                     ` Alan Cox
@ 2008-04-14 18:34                       ` Alexey Dobriyan
  0 siblings, 0 replies; 53+ messages in thread
From: Alexey Dobriyan @ 2008-04-14 18:34 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jan Kara, Lennart Sorensen, Bodo Eggert, Diego Calleja,
	Jiri Kosina, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

On Mon, Apr 14, 2008 at 05:54:52PM +0100, Alan Cox wrote:
> >   Well, but imagine you have a file /proc/my_secret_file from which you
> > are able to read from position A:a and B:b but not from position
> > A:b. Concievably, checks for the file position could be bypassed because of
> > this race... I know this is kind of dumb example but I can imagine someone
> 
> Unlikely as the ppos passed to the driver is a private copy and the user
> could equally use pread/pwrite to specify that offset.

pread is banned on proc files implemented via seq_files.
And in no-seq_file case, there are MAX_NON_LFS checks which fits into
32 bits.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-14 17:06                     ` Lennart Sorensen
@ 2008-04-14 19:03                       ` Jan Kara
  2008-04-14 19:29                         ` Lennart Sorensen
  0 siblings, 1 reply; 53+ messages in thread
From: Jan Kara @ 2008-04-14 19:03 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Mon 14-04-08 13:06:13, Lennart Sorensen wrote:
> On Mon, Apr 14, 2008 at 06:53:54PM +0200, Jan Kara wrote:
> >   Well, but imagine you have a file /proc/my_secret_file from which you
> > are able to read from position A:a and B:b but not from position
> > A:b. Concievably, checks for the file position could be bypassed because of
> > this race... I know this is kind of dumb example but I can imagine someone
> > can eventually find something like this. So I guess one spin lock/unlock
> > pair is a price worth paying in the callpath which is quite long anyway.
> 
> But only two threads within the process can read from the filehandle and
> hence the process would be doing locking.  And external attacker can't
  Why would it be doing locking? If some nasty user runs the process, he
*wants* his two threads to race as much as possible and trigger the race.
And then use corrupted f_pos.

> break the internal locking of the process between the threads, and even
> if you do open the file in /proc that the process is using, being and
> external process you would have your own file handle and hence your own
> file position since you aren't part of that process.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-14 19:03                       ` Jan Kara
@ 2008-04-14 19:29                         ` Lennart Sorensen
  2008-04-14 19:42                           ` Jan Kara
  2008-04-15  8:57                           ` Pavel Machek
  0 siblings, 2 replies; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-14 19:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Mon, Apr 14, 2008 at 09:03:09PM +0200, Jan Kara wrote:
>   Why would it be doing locking? If some nasty user runs the process, he
> *wants* his two threads to race as much as possible and trigger the race.
> And then use corrupted f_pos.

Why would you want to?  You can already set the filepointer explicitly
to any value you want if you have the filehandle.

If you had a file with some security checks for whether the user could
read from it implemented based on locations then you would check it when
you read/write not when you seek, since after all you could just keep
reading until you get to the desired position.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-14 19:29                         ` Lennart Sorensen
@ 2008-04-14 19:42                           ` Jan Kara
  2008-04-14 19:45                             ` Lennart Sorensen
  2008-04-15  8:57                           ` Pavel Machek
  1 sibling, 1 reply; 53+ messages in thread
From: Jan Kara @ 2008-04-14 19:42 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Mon 14-04-08 15:29:28, Lennart Sorensen wrote:
> On Mon, Apr 14, 2008 at 09:03:09PM +0200, Jan Kara wrote:
> >   Why would it be doing locking? If some nasty user runs the process, he
> > *wants* his two threads to race as much as possible and trigger the race.
> > And then use corrupted f_pos.
> 
> Why would you want to?  You can already set the filepointer explicitly
> to any value you want if you have the filehandle.
> 
> If you had a file with some security checks for whether the user could
> read from it implemented based on locations then you would check it when
> you read/write not when you seek, since after all you could just keep
> reading until you get to the desired position.
  Yes and no - for example if you manage to corrupt f_pos so that it
becomes negative, you have won because it is checked only in seek, pread,
pwrite, but not in read or write which rely on the check in seek...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-14 19:42                           ` Jan Kara
@ 2008-04-14 19:45                             ` Lennart Sorensen
  0 siblings, 0 replies; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-14 19:45 UTC (permalink / raw)
  To: Jan Kara
  Cc: Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Mon, Apr 14, 2008 at 09:42:46PM +0200, Jan Kara wrote:
> > Why would you want to?  You can already set the filepointer explicitly
> > to any value you want if you have the filehandle.
> > 
> > If you had a file with some security checks for whether the user could
> > read from it implemented based on locations then you would check it when
> > you read/write not when you seek, since after all you could just keep
> > reading until you get to the desired position.
>   Yes and no - for example if you manage to corrupt f_pos so that it
> becomes negative, you have won because it is checked only in seek, pread,
> pwrite, but not in read or write which rely on the check in seek...

The only file that could possibly implement any such silly security
based on position would be in /proc or /sys or similar, in which case
whatever driver implements it can check the position during any
read/write operation, and it would have to if it wants to implement such
a silly security system.

Any sane system would put the secured data in a seperate file from the
unsecured data obviously.

Trying to read from a negative position on a normal file should clearly
fail, and if it doesn't then that is a seperate issue to fix and has
nothing to do with the file position being set atomicly.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-14 19:29                         ` Lennart Sorensen
  2008-04-14 19:42                           ` Jan Kara
@ 2008-04-15  8:57                           ` Pavel Machek
  2008-04-15 15:32                             ` Lennart Sorensen
  1 sibling, 1 reply; 53+ messages in thread
From: Pavel Machek @ 2008-04-15  8:57 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Jan Kara, Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

> On Mon, Apr 14, 2008 at 09:03:09PM +0200, Jan Kara wrote:
> >   Why would it be doing locking? If some nasty user runs the process, he
> > *wants* his two threads to race as much as possible and trigger the race.
> > And then use corrupted f_pos.
> 
> Why would you want to?  You can already set the filepointer explicitly
> to any value you want if you have the filehandle.
> 
> If you had a file with some security checks for whether the user could
> read from it implemented based on locations then you would check it when
> you read/write not when you seek, since after all you could just keep
> reading until you get to the desired position.

Not if you tried to do checking from ptrace monitor.

And heck, yes, it is very confusing to see

seek(somewhere)
write()

ond ptrace and write going somewhere else.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15  8:57                           ` Pavel Machek
@ 2008-04-15 15:32                             ` Lennart Sorensen
  2008-04-15 17:34                               ` Pavel Machek
  0 siblings, 1 reply; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-15 15:32 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jan Kara, Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Tue, Apr 15, 2008 at 10:57:41AM +0200, Pavel Machek wrote:
> Not if you tried to do checking from ptrace monitor.
> 
> And heck, yes, it is very confusing to see
> 
> seek(somewhere)
> write()
> 
> ond ptrace and write going somewhere else.

Yes bugs are confusing.  An application can't do this on demand so you
can't write code that relies on the effect between threads.  So it would
only be a bug, not a bizare feature (that wouldn't even work on 64bit
machines).

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15 15:32                             ` Lennart Sorensen
@ 2008-04-15 17:34                               ` Pavel Machek
  2008-04-15 18:24                                 ` Lennart Sorensen
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Machek @ 2008-04-15 17:34 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Jan Kara, Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

Hi!

> > Not if you tried to do checking from ptrace monitor.
> > 
> > And heck, yes, it is very confusing to see
> > 
> > seek(somewhere)
> > write()
> > 
> > ond ptrace and write going somewhere else.
> 
> Yes bugs are confusing.  An application can't do this on demand so you
> can't write code that relies on the effect between threads.  So it would
> only be a bug, not a bizare feature (that wouldn't even work on 64bit
> machines).

Yes, kernel bugs are confusing ;-).

The "application" could be malware trying to confuse debugger, for
example.

The "application" could be something you are trying to debug.

I did brief reading on lseek man pages, and it does not mention
"kernel may seek to random place if you attempt to seek  from two
threads at the same time". So this is a kernel or manpages bug.

Maybe you can take a look at POSIX if it permits this behaviour? 
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15 17:34                               ` Pavel Machek
@ 2008-04-15 18:24                                 ` Lennart Sorensen
  2008-04-15 19:12                                   ` Pavel Machek
  0 siblings, 1 reply; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-15 18:24 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jan Kara, Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Tue, Apr 15, 2008 at 07:34:30PM +0200, Pavel Machek wrote:
> Yes, kernel bugs are confusing ;-).

I only see an application bug so far.

> The "application" could be malware trying to confuse debugger, for
> example.

If you can't do it on demand (which I can't see any way to do) then I
don't think malware can take advantage of it.

> The "application" could be something you are trying to debug.

True, but even without this behaviour doing seeks and read/writes from
multiple threads without locking will already show plenty of problems
even if you somehow manage to hit this issue, and not only that you have
to have threads writing to different 4GB aligned chunks of the file to
cause a problem, since otherwise they would all be setting the top bits
the same.  I would hope anyone doing multithreaded work on a file that
big would like to avoid the locking issue by using pread and pwrite
instead in which case there is no problem either.

> I did brief reading on lseek man pages, and it does not mention
> "kernel may seek to random place if you attempt to seek  from two
> threads at the same time". So this is a kernel or manpages bug.

Does it say anything about what happens if you try to seek from two
places at once?

> Maybe you can take a look at POSIX if it permits this behaviour? 

The parts I can find for posix don't say one way or the other.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15 18:24                                 ` Lennart Sorensen
@ 2008-04-15 19:12                                   ` Pavel Machek
  2008-04-15 19:49                                     ` Lennart Sorensen
  0 siblings, 1 reply; 53+ messages in thread
From: Pavel Machek @ 2008-04-15 19:12 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Jan Kara, Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

Hi!

> > Yes, kernel bugs are confusing ;-).
> 
> I only see an application bug so far.

Really?

>       The lseek() function repositions the offset of the open file
> associated with
>       the file descriptor fildes to the argument offset according to
>the directive
>       whence as follows:

It does not say "repositions the offset to the random number" nor
"under certain conditions repositions the offsets" nor "it repositions
the offset unless you are unlucky and hit kernel race". More
seriously, it does not contain note "not safe from multithreaded
programs" nor "multithreaded behaviour is undefined".

So this pretty clearly is application bug.

> > The "application" could be malware trying to confuse debugger, for
> > example.
> 
> If you can't do it on demand (which I can't see any way to do) then I
> don't think malware can take advantage of it.

Really? I see an application to detecting if I'm being debugged. Try
to hit the race 1000 times, if you hit it, you are probably not
debugged (because debugger would be very likely to make that race hard
to hit). Will only work on multicores, but...

[Plus, there's "strace seen it writing to either offset A or offset B,
but I see the data at offset C, WTF?]

> > The "application" could be something you are trying to debug.
> 
> True, but even without this behaviour doing seeks and read/writes from
> multiple threads without locking will already show plenty of problems
> even if you somehow manage to hit this issue, and not only that you have
> to have threads writing to different 4GB aligned chunks of the file
> cause a problem, since otherwise they would all be setting the top bits
> the same.  I would hope anyone doing multithreaded work on a file that
> big would like to avoid the locking issue by using pread and pwrite
> instead in which case there is no problem either.

I'm not saying this kernel bug is likely to hit in practice. It is
still a kernel bug.

Is the slowdown of lseek worth getting rid of this minor bug? Not
sure, probably yes.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15 19:12                                   ` Pavel Machek
@ 2008-04-15 19:49                                     ` Lennart Sorensen
  2008-04-15 20:06                                       ` Pavel Machek
  0 siblings, 1 reply; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-15 19:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jan Kara, Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Tue, Apr 15, 2008 at 09:12:38PM +0200, Pavel Machek wrote:
> It does not say "repositions the offset to the random number" nor
> "under certain conditions repositions the offsets" nor "it repositions
> the offset unless you are unlucky and hit kernel race". More
> seriously, it does not contain note "not safe from multithreaded
> programs" nor "multithreaded behaviour is undefined".

And if you debug it on a 64bit system then it won't be able to do that.
So not exactly a useful thing to try, and even trying 1000 times you are
unlikely to hit it, so you can't know for sure unless you happen to be
lucky and hit it.

> So this pretty clearly is application bug.

> Really? I see an application to detecting if I'm being debugged. Try
> to hit the race 1000 times, if you hit it, you are probably not
> debugged (because debugger would be very likely to make that race hard
> to hit). Will only work on multicores, but...

If lseek not being atomic breaks your application, then your application
would be broken already.  Any weird debug detection you might be able to
do using the fact is isn't atomic could I suppose be considered a kernel
bug if you think being able to do such detection is a bug.  Nothing
prevents the debuger from preloading an override to the access to lseek
that uses it's own locks to make the call atomic and hence prevent such
use.

So other than that, is there any case in which lseek being not atomic
can cause an application to break if it wasn't already broken (due to
having a race condition by trying to do 2 or more seeks on the same file
handle at the same time)?  If not, I think adding any kind of locking to
seek in the kernel (which would I think have to cause a slight slow
down) is a bad move.  But hey that's just my opinion. :)  I won't be
upset either way.

> [Plus, there's "strace seen it writing to either offset A or offset B,
> but I see the data at offset C, WTF?]

Most likely it would also be a program where you see it randomly seek to
A and write or seek to A then B then write depending on how it happens
to get scheduled when you run it.  Already the program is clearly doing
something unreliable.  And C only happens to vary from B if A and B
differ in the upper 32 bits of the file position.

> I'm not saying this kernel bug is likely to hit in practice. It is
> still a kernel bug.
> 
> Is the slowdown of lseek worth getting rid of this minor bug? Not
> sure, probably yes.

I think a slow down is the worse choice.  Adding a note to the
documentation saying that "By the way, on 32bit systems the seek call is
not atomic for 64bit file offsets, so if you happen to issue two at the
same time to the same file pointer to offsets that differ in the upper
32bits, then the result of the seek might not be either of A or B but
will contain the upper 32bits of either A or B and the lower 32bits of
ether A or B.  You should of course use locking for your file access to
ensure you know where your threads end up writing so this should be a
non issue."

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15 19:49                                     ` Lennart Sorensen
@ 2008-04-15 20:06                                       ` Pavel Machek
  2008-04-15 20:28                                         ` Peter Zijlstra
  2008-04-15 20:29                                         ` Lennart Sorensen
  0 siblings, 2 replies; 53+ messages in thread
From: Pavel Machek @ 2008-04-15 20:06 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Jan Kara, Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

Hi!

> So other than that, is there any case in which lseek being not atomic
> can cause an application to break if it wasn't already broken (due to
> having a race condition by trying to do 2 or more seeks on the same file
> handle at the same time)?  If not, I think adding any kind of locking to
> seek in the kernel (which would I think have to cause a slight slow
> down) is a bad move.  But hey that's just my opinion. :)  I won't be
> upset either way.

Of course I can write an application that will be broken by this, and
was not broken before. It will be slightly nasty code. Come on, you
can do this too ;-).


> > I'm not saying this kernel bug is likely to hit in practice. It is
> > still a kernel bug.
> > 
> > Is the slowdown of lseek worth getting rid of this minor bug? Not
> > sure, probably yes.
> 
> I think a slow down is the worse choice.  Adding a note to the
> documentation saying that "By the way, on 32bit systems the seek call is
> not atomic for 64bit file offsets, so if you happen to issue two at

That would be very wrong addition to documentation. If you really
wanted to do something like this, you would probably want to say
something like

"Doing concurrent seeks on one file is undefined. Kernel may end up
with seeking to some other place."

Unfortunately, you'd have to get this addition into POSIX standard...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15 20:06                                       ` Pavel Machek
@ 2008-04-15 20:28                                         ` Peter Zijlstra
  2008-04-16  8:15                                           ` Pavel Machek
  2008-04-15 20:29                                         ` Lennart Sorensen
  1 sibling, 1 reply; 53+ messages in thread
From: Peter Zijlstra @ 2008-04-15 20:28 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Lennart Sorensen, Jan Kara, Bodo Eggert, Diego Calleja,
	Jiri Kosina, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

On Tue, 2008-04-15 at 22:06 +0200, Pavel Machek wrote:

> > > I'm not saying this kernel bug is likely to hit in practice. It is
> > > still a kernel bug.
> > > 
> > > Is the slowdown of lseek worth getting rid of this minor bug? Not
> > > sure, probably yes.
> > 
> > I think a slow down is the worse choice.  Adding a note to the
> > documentation saying that "By the way, on 32bit systems the seek call is
> > not atomic for 64bit file offsets, so if you happen to issue two at
> 
> That would be very wrong addition to documentation. If you really
> wanted to do something like this, you would probably want to say
> something like
> 
> "Doing concurrent seeks on one file is undefined. Kernel may end up
> with seeking to some other place."
> 
> Unfortunately, you'd have to get this addition into POSIX standard...

Is not treating the point not similar to undefined? And undefined
semantics cover pretty much anything, including the current behaviour.

FWIW I really think this issue is a non-issue; one cannot expect sane
behaviour of unsynchronized usage of a shared resource.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15 20:06                                       ` Pavel Machek
  2008-04-15 20:28                                         ` Peter Zijlstra
@ 2008-04-15 20:29                                         ` Lennart Sorensen
  2008-04-15 22:11                                           ` Bryan Henderson
  1 sibling, 1 reply; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-15 20:29 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jan Kara, Bodo Eggert, Diego Calleja, Jiri Kosina, Michal Hocko,
	Meelis Roos, Linux Kernel list, linux-fsdevel

On Tue, Apr 15, 2008 at 10:06:47PM +0200, Pavel Machek wrote:
> Of course I can write an application that will be broken by this, and
> was not broken before. It will be slightly nasty code. Come on, you
> can do this too ;-).

Well it would take seriously hard work to make a program that would work
correctly if it was atomic and would break if it isn't.  Certainly a
normal program that just tries to seek and read/write should never have
any issue.

> That would be very wrong addition to documentation. If you really
> wanted to do something like this, you would probably want to say
> something like
> 
> "Doing concurrent seeks on one file is undefined. Kernel may end up
> with seeking to some other place."

Well perhaps that is a lot simpler.

> Unfortunately, you'd have to get this addition into POSIX standard...

Well I do see something in a PDF on posix I found that says all posix
functions (at least in POXIS.1 which I think might be an old name for
it) are thread safe unless stated otherwise, so since lseek doesn't
state otherwise I suppose it better be completely thread safe in all
cases.  It seems a bit stupid given any program that wants to work
reliably has to do its own locking already, so why waste time on it in
the kernel.  Any way the kernel could know how many copies of the
filehandle exist (yeah right, of course not) to ensure that it only has
to lock if there is multiple accesses going on?  Darn.

Those stupid standards documents. :)

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15 20:29                                         ` Lennart Sorensen
@ 2008-04-15 22:11                                           ` Bryan Henderson
  2008-04-16  9:40                                             ` Jamie Lokier
  0 siblings, 1 reply; 53+ messages in thread
From: Bryan Henderson @ 2008-04-15 22:11 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Bodo Eggert, Diego Calleja, Jan Kara, Jiri Kosina, linux-fsdevel,
	Linux Kernel list, Michal Hocko, Meelis Roos, Pavel Machek

> Well it would take seriously hard work to make a program that would work
> correctly if it was atomic and would break if it isn't.  Certainly a
> normal program that just tries to seek and read/write should never have
> any issue.

I can easily imagine such a program.  I think you aren't exercising enough 
imagination about the kinds of requirements a program might be 
implementing.

That lack of imagination (in all of us) is the reason we shouldn't 
tolerate something working not as designed or not as expected just because 
we went through every possible use scenario and it didn't matter in any of 
them.  Just focus on the layer in question.

The easiest way to imagine a program not doing locking and being useful 
anyway (as long as the kernel is thread-safe) is to use the same arguments 
you use for the kernel doing it: there's a higher level user responsible 
for locking.  The code in question doesn't guarantee that user writes all 
its stuff to the right place, but at least it guarantees that that user's 
lack of locking doesn't screw some other user of the file.  It does that 
by ensuring it never seeks to a place the user doesn't own and that no two 
separate users ever access the file at the same time.

I'd even like to accomodate the poor user trying to debug the broken 
locking in his application.  He sees the file getting corrupted and 
immediately thinks, "what if my thread serialization isn't working right?" 
 But he notices that the corruption isn't consistent with that hypothesis. 
 He knows he was working with only the beginning and the end of the file 
and the corruption happened in the middle.  So he wastes a week 
considering other hypotheses, including a kernel bug, until someone points 
out a paragraph in the lseek() man page that says contrary to all Unix 
convention, that particular function and system call is not thread-safe, 
and it doesn't necessarily seek to the place mentioned in its argument.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15 20:28                                         ` Peter Zijlstra
@ 2008-04-16  8:15                                           ` Pavel Machek
  2008-04-16  8:20                                             ` Peter Zijlstra
                                                               ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Pavel Machek @ 2008-04-16  8:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lennart Sorensen, Jan Kara, Bodo Eggert, Diego Calleja,
	Jiri Kosina, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

On Tue 2008-04-15 22:28:55, Peter Zijlstra wrote:
> On Tue, 2008-04-15 at 22:06 +0200, Pavel Machek wrote:
> 
> > > > I'm not saying this kernel bug is likely to hit in practice. It is
> > > > still a kernel bug.
> > > > 
> > > > Is the slowdown of lseek worth getting rid of this minor bug? Not
> > > > sure, probably yes.
> > > 
> > > I think a slow down is the worse choice.  Adding a note to the
> > > documentation saying that "By the way, on 32bit systems the seek call is
> > > not atomic for 64bit file offsets, so if you happen to issue two at
> > 
> > That would be very wrong addition to documentation. If you really
> > wanted to do something like this, you would probably want to say
> > something like
> > 
> > "Doing concurrent seeks on one file is undefined. Kernel may end up
> > with seeking to some other place."
> > 
> > Unfortunately, you'd have to get this addition into POSIX standard...
> 
> Is not treating the point not similar to undefined? And undefined
> semantics cover pretty much anything, including the current behaviour.
> 
> FWIW I really think this issue is a non-issue; one cannot expect sane
> behaviour of unsynchronized usage of a shared resource.

Why not? Kernel syscalls are traditionally atomic, and Lennard seems
to have found sentence in POSIX that says so.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-16  8:15                                           ` Pavel Machek
@ 2008-04-16  8:20                                             ` Peter Zijlstra
  2008-04-16 10:54                                             ` Alan Cox
  2008-04-16 13:57                                             ` Lennart Sorensen
  2 siblings, 0 replies; 53+ messages in thread
From: Peter Zijlstra @ 2008-04-16  8:20 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Lennart Sorensen, Jan Kara, Bodo Eggert, Diego Calleja,
	Jiri Kosina, Michal Hocko, Meelis Roos, Linux Kernel list,
	linux-fsdevel

On Wed, 2008-04-16 at 10:15 +0200, Pavel Machek wrote:
> On Tue 2008-04-15 22:28:55, Peter Zijlstra wrote:
> > On Tue, 2008-04-15 at 22:06 +0200, Pavel Machek wrote:
> > 
> > > > > I'm not saying this kernel bug is likely to hit in practice. It is
> > > > > still a kernel bug.
> > > > > 
> > > > > Is the slowdown of lseek worth getting rid of this minor bug? Not
> > > > > sure, probably yes.
> > > > 
> > > > I think a slow down is the worse choice.  Adding a note to the
> > > > documentation saying that "By the way, on 32bit systems the seek call is
> > > > not atomic for 64bit file offsets, so if you happen to issue two at
> > > 
> > > That would be very wrong addition to documentation. If you really
> > > wanted to do something like this, you would probably want to say
> > > something like
> > > 
> > > "Doing concurrent seeks on one file is undefined. Kernel may end up
> > > with seeking to some other place."
> > > 
> > > Unfortunately, you'd have to get this addition into POSIX standard...
> > 
> > Is not treating the point not similar to undefined? And undefined
> > semantics cover pretty much anything, including the current behaviour.
> > 
> > FWIW I really think this issue is a non-issue; one cannot expect sane
> > behaviour of unsynchronized usage of a shared resource.
> 
> Why not? Kernel syscalls are traditionally atomic, and Lennard seems
> to have found sentence in POSIX that says so.

Ah, ok missed that part.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-15 22:11                                           ` Bryan Henderson
@ 2008-04-16  9:40                                             ` Jamie Lokier
  0 siblings, 0 replies; 53+ messages in thread
From: Jamie Lokier @ 2008-04-16  9:40 UTC (permalink / raw)
  To: Bryan Henderson
  Cc: Lennart Sorensen, Bodo Eggert, Diego Calleja, Jan Kara,
	Jiri Kosina, linux-fsdevel, Linux Kernel list, Michal Hocko,
	Meelis Roos, Pavel Machek

Bryan Henderson wrote:
> The easiest way to imagine a program not doing locking and being useful 
> anyway (as long as the kernel is thread-safe) is to use the same arguments 
> you use for the kernel doing it: there's a higher level user responsible 
> for locking.  The code in question doesn't guarantee that user writes all 
> its stuff to the right place, but at least it guarantees that that user's 
> lack of locking doesn't screw some other user of the file.  It does that 
> by ensuring it never seeks to a place the user doesn't own and that no two 
> separate users ever access the file at the same time.
> 
> I'd even like to accomodate the poor user trying to debug the broken 
> locking in his application.  He sees the file getting corrupted and 
> immediately thinks, "what if my thread serialization isn't working right?" 
>  But he notices that the corruption isn't consistent with that hypothesis. 
>  He knows he was working with only the beginning and the end of the file 
> and the corruption happened in the middle.  So he wastes a week 
> considering other hypotheses, including a kernel bug, until someone points 
> out a paragraph in the lseek() man page that says contrary to all Unix 
> convention, that particular function and system call is not thread-safe, 
> and it doesn't necessarily seek to the place mentioned in its argument.

I think that argument is the strongest yet.  Wasted debugging time due
to totally surprising and hardly justifiable kernel behaviour.  Strace
/ GDB on the application shows a trace which doesn't relate at all to
the unexpected file changes.

There is also POSIX specification:

  http://www.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_09.html

  "All functions defined by this volume of IEEE Std 1003.1-2001 shall
be thread-safe, except that the following functions need not be
thread-safe."

  [List which does not include lseek(), therefore lseek() shall be
  thread-safe.  Same for read() and write().]

Docs for HP-UX and AIX say the same as POSIX about thread-safety.

-- Jamie

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-16  8:15                                           ` Pavel Machek
  2008-04-16  8:20                                             ` Peter Zijlstra
@ 2008-04-16 10:54                                             ` Alan Cox
  2008-04-16 13:57                                             ` Lennart Sorensen
  2 siblings, 0 replies; 53+ messages in thread
From: Alan Cox @ 2008-04-16 10:54 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Peter Zijlstra, Lennart Sorensen, Jan Kara, Bodo Eggert,
	Diego Calleja, Jiri Kosina, Michal Hocko, Meelis Roos,
	Linux Kernel list, linux-fsdevel

> Why not? Kernel syscalls are traditionally atomic, and Lennard seems
> to have found sentence in POSIX that says so.

Almost no call is atomic or has atomicity guarantees. There are specific
rules for certain disk access and pipe queueing but almost nothing else.
The same is as true (often more true) for all Unix systems

Alan

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: file offset corruption on 32-bit machines?
  2008-04-16  8:15                                           ` Pavel Machek
  2008-04-16  8:20                                             ` Peter Zijlstra
  2008-04-16 10:54                                             ` Alan Cox
@ 2008-04-16 13:57                                             ` Lennart Sorensen
  2 siblings, 0 replies; 53+ messages in thread
From: Lennart Sorensen @ 2008-04-16 13:57 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Peter Zijlstra, Jan Kara, Bodo Eggert, Diego Calleja, Jiri Kosina,
	Michal Hocko, Meelis Roos, Linux Kernel list, linux-fsdevel

On Wed, Apr 16, 2008 at 10:15:23AM +0200, Pavel Machek wrote:
> Why not? Kernel syscalls are traditionally atomic, and Lennard seems
> to have found sentence in POSIX that says so.

Well it didn't say atomic, but it did say "thread safe" which I suppose
comes down to about the same thing.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2008-04-17  0:34 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <agh4d-6yc-35@gated-at.bofh.it>
     [not found] ` <ah5tY-3lR-7@gated-at.bofh.it>
     [not found]   ` <ah5DA-3X9-9@gated-at.bofh.it>
     [not found]     ` <ah5X5-4tl-13@gated-at.bofh.it>
     [not found]       ` <ah66A-4Nk-7@gated-at.bofh.it>
     [not found]         ` <ah7vN-7Wz-9@gated-at.bofh.it>
2008-04-11 12:24           ` file offset corruption on 32-bit machines? Bodo Eggert
2008-04-11 13:55             ` Lennart Sorensen
2008-04-11 16:59               ` Bryan Henderson
2008-04-11 17:15                 ` Lennart Sorensen
2008-04-11 21:29                   ` Bryan Henderson
2008-04-12  8:48                   ` Pavel Machek
2008-04-14 16:20               ` Jan Kara
2008-04-14 16:22                 ` Lennart Sorensen
2008-04-14 16:53                   ` Jan Kara
2008-04-14 16:54                     ` Alan Cox
2008-04-14 18:34                       ` Alexey Dobriyan
2008-04-14 17:06                     ` Lennart Sorensen
2008-04-14 19:03                       ` Jan Kara
2008-04-14 19:29                         ` Lennart Sorensen
2008-04-14 19:42                           ` Jan Kara
2008-04-14 19:45                             ` Lennart Sorensen
2008-04-15  8:57                           ` Pavel Machek
2008-04-15 15:32                             ` Lennart Sorensen
2008-04-15 17:34                               ` Pavel Machek
2008-04-15 18:24                                 ` Lennart Sorensen
2008-04-15 19:12                                   ` Pavel Machek
2008-04-15 19:49                                     ` Lennart Sorensen
2008-04-15 20:06                                       ` Pavel Machek
2008-04-15 20:28                                         ` Peter Zijlstra
2008-04-16  8:15                                           ` Pavel Machek
2008-04-16  8:20                                             ` Peter Zijlstra
2008-04-16 10:54                                             ` Alan Cox
2008-04-16 13:57                                             ` Lennart Sorensen
2008-04-15 20:29                                         ` Lennart Sorensen
2008-04-15 22:11                                           ` Bryan Henderson
2008-04-16  9:40                                             ` Jamie Lokier
     [not found] <Pine.SOC.4.64.0804081101430.28938@math.ut.ee>
2008-04-10 13:55 ` Michal Hocko
2008-04-10 14:01   ` Jiri Kosina
2008-04-10 14:27     ` Jan Kara
2008-04-10 14:31       ` Jiri Kosina
2008-04-10 14:48         ` Matthew Wilcox
2008-04-10 15:22           ` Jan Kara
2008-04-10 15:30             ` Matthew Wilcox
2008-04-10 15:19         ` Jan Kara
2008-04-10 15:37           ` Michal Hocko
2008-04-10 15:56             ` Jan Kara
2008-04-10 16:03         ` Diego Calleja
2008-04-10 16:15           ` Jan Kara
2008-04-11 19:26       ` Pavel Machek
2008-04-14 16:25         ` Jan Kara
2008-04-10 14:31     ` Michal Hocko
2008-04-10 14:35       ` Jiri Kosina
2008-04-10 14:11   ` Martin Mares
2008-04-10 15:12     ` Jan Kara
2008-04-10 15:14     ` Jamie Lokier
2008-04-10 15:21       ` Matthew Wilcox
2008-04-10 15:28       ` Jan Kara
2008-04-10 15:33   ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).