Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page range

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
       [not found] ` <20121126163328.ACEB011FE9C@bugzilla.kernel.org>
@ 2012-11-26 16:45   ` Theodore Ts'o
  2012-11-26 18:59     ` Hiro Lalwani
  2012-11-26 20:05     ` Hugh Dickins
  0 siblings, 2 replies; 12+ messages in thread
From: Theodore Ts'o @ 2012-11-26 16:45 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: meetmehiro, linux-mm, linux-fsdevel

On Mon, Nov 26, 2012 at 04:33:28PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=50981
>
> as this is working properly with XFS, so in ext4/ext3...etc also we shouldn't
> require synchronization at the Application level,., FS should take care of
> locking... will we expecting the fix for the same ???

Meetmehiro,

At this point, there seems to be consensus that the kernel should take
care of the locking, and that this is not something that needs be a
worry for the application.  Whether this should be done in the file
system layer or in the mm layer is the current question at hand ---
since this is a bug that also affects btrfs and other non-XFS file
systems.

So the question is whether every file system which supports AIO should
add its own locking, or whether it should be done at the mm layer, and
at which point the lock in the XFS layer could be removed as no longer
necessary.

I've added linux-mm and linux-fsdevel to make sure all of the relevant
kernel developers are aware of this question/issue.

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page range
  2012-11-26 16:45   ` [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page range Theodore Ts'o
@ 2012-11-26 18:59     ` Hiro Lalwani
  2012-11-26 20:05     ` Hugh Dickins
  1 sibling, 0 replies; 12+ messages in thread
From: Hiro Lalwani @ 2012-11-26 18:59 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: bugzilla-daemon, linux-mm, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1409 bytes --]

 Thanks a lot Theodore Ts'o ...

 Thanks to all (kernel,ext4 ..etc. )developer for quick debug and
response...


On Mon, Nov 26, 2012 at 10:15 PM, Theodore Ts'o <tytso@mit.edu> wrote:

> On Mon, Nov 26, 2012 at 04:33:28PM +0000,
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=50981
> >
> > as this is working properly with XFS, so in ext4/ext3...etc also we
> shouldn't
> > require synchronization at the Application level,., FS should take care
> of
> > locking... will we expecting the fix for the same ???
>
> Meetmehiro,
>
> At this point, there seems to be consensus that the kernel should take
> care of the locking, and that this is not something that needs be a
> worry for the application.  Whether this should be done in the file
> system layer or in the mm layer is the current question at hand ---
> since this is a bug that also affects btrfs and other non-XFS file
> systems.
>
> So the question is whether every file system which supports AIO should
> add its own locking, or whether it should be done at the mm layer, and
> at which point the lock in the XFS layer could be removed as no longer
> necessary.
>
> I've added linux-mm and linux-fsdevel to make sure all of the relevant
> kernel developers are aware of this question/issue.
>
> Regards,
>
>                                                 - Ted
>



-- 
thanks & regards
Hiro Lalwani

[-- Attachment #2: Type: text/html, Size: 1988 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
  2012-11-26 16:45   ` [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page range Theodore Ts'o
  2012-11-26 18:59     ` Hiro Lalwani
@ 2012-11-26 20:05     ` Hugh Dickins
  2012-11-26 20:13       ` Christoph Hellwig
  2012-11-26 20:15       ` Zach Brown
  1 sibling, 2 replies; 12+ messages in thread
From: Hugh Dickins @ 2012-11-26 20:05 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Andrew Morton, Al Viro, bugzilla-daemon, meetmehiro, linux-mm,
	linux-fsdevel

On Mon, 26 Nov 2012, Theodore Ts'o wrote:
> On Mon, Nov 26, 2012 at 04:33:28PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=50981
> >
> > as this is working properly with XFS, so in ext4/ext3...etc also we shouldn't
> > require synchronization at the Application level,., FS should take care of
> > locking... will we expecting the fix for the same ???
> 
> Meetmehiro,
> 
> At this point, there seems to be consensus that the kernel should take
> care of the locking, and that this is not something that needs be a
> worry for the application.

Gosh, that's a very sudden new consensus.  The consensus over the past
ten or twenty years has been that the Linux kernel enforce locking for
consistent atomic writes, but skip that overhead on reads - hasn't it?

> Whether this should be done in the file
> system layer or in the mm layer is the current question at hand ---
> since this is a bug that also affects btrfs and other non-XFS file
> systems.
> 
> So the question is whether every file system which supports AIO should
> add its own locking, or whether it should be done at the mm layer, and
> at which point the lock in the XFS layer could be removed as no longer
> necessary.
> 
> I've added linux-mm and linux-fsdevel to make sure all of the relevant
> kernel developers are aware of this question/issue.

Thanks, that's helpful; but I think linux-mm people would want to defer
to linux-fsdevel maintainers on this: mm/filemap.c happens to be in mm/,
but a fundamental change to VFS locking philosophy is not mm's call.

I don't see that page locking would have anything to do with it: if we
are going to start guaranteeing reads atomic against concurrent writes,
then surely it's the size requested by the user to be guaranteed,
spanning however many pages and fs-blocks: i_mutex, or a more
efficiently crafted alternative.

Hugh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
  2012-11-26 20:05     ` Hugh Dickins
@ 2012-11-26 20:13       ` Christoph Hellwig
  2012-11-26 21:28         ` Dave Chinner
  2012-11-26 21:49         ` Theodore Ts'o
  2012-11-26 20:15       ` Zach Brown
  1 sibling, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2012-11-26 20:13 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Theodore Ts'o, Andrew Morton, Al Viro, bugzilla-daemon,
	meetmehiro, linux-mm, linux-fsdevel

On Mon, Nov 26, 2012 at 12:05:57PM -0800, Hugh Dickins wrote:
> Gosh, that's a very sudden new consensus.  The consensus over the past
> ten or twenty years has been that the Linux kernel enforce locking for
> consistent atomic writes, but skip that overhead on reads - hasn't it?

I'm not sure there was much of a consensus ever.  We XFS people always
ttried to push everyone down the strict rule, but there was enough
pushback that it didn't actually happen.

> Thanks, that's helpful; but I think linux-mm people would want to defer
> to linux-fsdevel maintainers on this: mm/filemap.c happens to be in mm/,
> but a fundamental change to VFS locking philosophy is not mm's call.
> 
> I don't see that page locking would have anything to do with it: if we
> are going to start guaranteeing reads atomic against concurrent writes,
> then surely it's the size requested by the user to be guaranteed,
> spanning however many pages and fs-blocks: i_mutex, or a more
> efficiently crafted alternative.

What XFS does is simply replace (or rather augment currently) i_mutex
with a rw_semaphore (i_iolock in XFS) which is used the following way:

exclusive:
 - buffer writes
 - pagecache flushing before direct I/O (then downgraded)
 - appending direct I/O writes
 - less than blocksize granularity direct I/O

shared:
 - everything else (buffered reads, "normal" direct I/O)

Doing this in the highest levels of the generic_file_ code would be
trivial, and would allow us to get rid of a fair chunk of wrappers in
XFS.

Note that we've been thinking about replacing this lock with a range
lock, but this will require more research.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
  2012-11-26 20:05     ` Hugh Dickins
  2012-11-26 20:13       ` Christoph Hellwig
@ 2012-11-26 20:15       ` Zach Brown
  1 sibling, 0 replies; 12+ messages in thread
From: Zach Brown @ 2012-11-26 20:15 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Theodore Ts'o, Andrew Morton, Al Viro, bugzilla-daemon,
	meetmehiro, linux-mm, linux-fsdevel

On Mon, Nov 26, 2012 at 12:05:57PM -0800, Hugh Dickins wrote:
> On Mon, 26 Nov 2012, Theodore Ts'o wrote:
> > On Mon, Nov 26, 2012 at 04:33:28PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=50981
> > >
> > > as this is working properly with XFS, so in ext4/ext3...etc also we shouldn't
> > > require synchronization at the Application level,., FS should take care of
> > > locking... will we expecting the fix for the same ???
> > 
> > Meetmehiro,
> > 
> > At this point, there seems to be consensus that the kernel should take
> > care of the locking, and that this is not something that needs be a
> > worry for the application.
> 
> Gosh, that's a very sudden new consensus.  The consensus over the past
> ten or twenty years has been that the Linux kernel enforce locking for
> consistent atomic writes, but skip that overhead on reads - hasn't it?

I was wondering exactly the same thing.

> > So the question is whether every file system which supports AIO should
> > add its own locking, or whether it should be done at the mm layer, and
> > at which point the lock in the XFS layer could be removed as no longer
> > necessary.

(This has nothing to do with AIO.  Buffered reads have been copied from
unlocked pages.. basically forever, right?)

> Thanks, that's helpful; but I think linux-mm people would want to defer
> to linux-fsdevel maintainers on this: mm/filemap.c happens to be in mm/,
> but a fundamental change to VFS locking philosophy is not mm's call.
> 
> I don't see that page locking would have anything to do with it: if we
> are going to start guaranteeing reads atomic against concurrent writes,
> then surely it's the size requested by the user to be guaranteed,
> spanning however many pages and fs-blocks: i_mutex, or a more
> efficiently crafted alternative.

Agreed.  While this little racing test might be fixed, those baked in
page_size == 4k == atomic granularity assumptions are pretty sketchy.

So we're talking about holding multiple page locks?  Or i_mutex?  Or
some fancy range locking?

There's consensus on serializing overlapping buffered reads and writes? 

- z
*readying the read(, mmap(), ) fault deadlock toy*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
  2012-11-26 20:13       ` Christoph Hellwig
@ 2012-11-26 21:28         ` Dave Chinner
  2012-11-26 21:39           ` Christoph Hellwig
  2012-11-26 21:49         ` Theodore Ts'o
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2012-11-26 21:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Hugh Dickins, Theodore Ts'o, Andrew Morton, Al Viro,
	bugzilla-daemon, meetmehiro, linux-mm, linux-fsdevel

On Mon, Nov 26, 2012 at 03:13:08PM -0500, Christoph Hellwig wrote:
> On Mon, Nov 26, 2012 at 12:05:57PM -0800, Hugh Dickins wrote:
> > Gosh, that's a very sudden new consensus.  The consensus over the past
> > ten or twenty years has been that the Linux kernel enforce locking for
> > consistent atomic writes, but skip that overhead on reads - hasn't it?
> 
> I'm not sure there was much of a consensus ever.  We XFS people always
> ttried to push everyone down the strict rule, but there was enough
> pushback that it didn't actually happen.
> 
> > Thanks, that's helpful; but I think linux-mm people would want to defer
> > to linux-fsdevel maintainers on this: mm/filemap.c happens to be in mm/,
> > but a fundamental change to VFS locking philosophy is not mm's call.
> > 
> > I don't see that page locking would have anything to do with it: if we
> > are going to start guaranteeing reads atomic against concurrent writes,
> > then surely it's the size requested by the user to be guaranteed,
> > spanning however many pages and fs-blocks: i_mutex, or a more
> > efficiently crafted alternative.
> 
> What XFS does is simply replace (or rather augment currently) i_mutex
> with a rw_semaphore (i_iolock in XFS) which is used the following way:
> 
> exclusive:
>  - buffer writes
>  - pagecache flushing before direct I/O (then downgraded)
>  - appending direct I/O writes
>  - less than blocksize granularity direct I/O
   - splice write

Also, direct extent manipulations that are outside the IO path such
as:
   - truncate
   - preallocation
   - hole punching

use the XFS_IOLOCK_EXCL to provide exclusion against new IO starting
while such an operation is in progress.

> shared:
>  - everything else (buffered reads, "normal" direct I/O)
> 
> Doing this in the highest levels of the generic_file_ code would be
> trivial, and would allow us to get rid of a fair chunk of wrappers in
> XFS.

We still need the iolock deep in the guts of the filesystem, though.

I suspect that if we are going to change the VFS locking, then we
should seriously consider allowing the filesystem to provide it's
own locking implementation and the VFS just pass the type of lock
required. Otherwise we are still going to need all the locking
within the filesystem to serialise all the core pieces that the VFS
locking doesn't serialise (e.g. EOF truncation on close/evict,
extent swaps for online defrag, etc).

> Note that we've been thinking about replacing this lock with a range
> lock, but this will require more research.

I'd say we need a working implementation in a filesystem before even
considering a VFS implementation...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
  2012-11-26 21:28         ` Dave Chinner
@ 2012-11-26 21:39           ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2012-11-26 21:39 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Hugh Dickins, Theodore Ts'o, Andrew Morton,
	Al Viro, bugzilla-daemon, meetmehiro, linux-mm, linux-fsdevel

On Tue, Nov 27, 2012 at 08:28:45AM +1100, Dave Chinner wrote:
> We still need the iolock deep in the guts of the filesystem, though.

I don't think we do.  The only thing that comes close to it is
xfs_swap_extents passing the XFS_IOLOCK_EXCL to xfs_trans_ijoin so
that the transaction commit automatically unlocks it, but that can
be trivially replaced with a manual unlock.

> I suspect that if we are going to change the VFS locking, then we
> should seriously consider allowing the filesystem to provide it's
> own locking implementation and the VFS just pass the type of lock
> required. Otherwise we are still going to need all the locking
> within the filesystem to serialise all the core pieces that the VFS
> locking doesn't serialise (e.g. EOF truncation on close/evict,
> extent swaps for online defrag, etc).

The VFS currently doesn't hardcode i_mutex for any data plane
operations, only a few generic helpers do it, most notably
generic_file_aio_write (which can be bypassed by using a slightly
lower level variant) and __blockdev_direct_IO when used in DIO_LOCKING
mode.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
  2012-11-26 20:13       ` Christoph Hellwig
  2012-11-26 21:28         ` Dave Chinner
@ 2012-11-26 21:49         ` Theodore Ts'o
  2012-11-26 22:09           ` Christoph Hellwig
  2012-11-26 22:17           ` Dave Chinner
  1 sibling, 2 replies; 12+ messages in thread
From: Theodore Ts'o @ 2012-11-26 21:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Hugh Dickins, Andrew Morton, Al Viro, bugzilla-daemon, meetmehiro,
	linux-mm, linux-fsdevel

On Mon, Nov 26, 2012 at 03:13:08PM -0500, Christoph Hellwig wrote:
> On Mon, Nov 26, 2012 at 12:05:57PM -0800, Hugh Dickins wrote:
> > Gosh, that's a very sudden new consensus.  The consensus over the past
> > ten or twenty years has been that the Linux kernel enforce locking for
> > consistent atomic writes, but skip that overhead on reads - hasn't it?
> 
> I'm not sure there was much of a consensus ever.  We XFS people always
> ttried to push everyone down the strict rule, but there was enough
> pushback that it didn't actually happen.

Christoph, can you give some kind of estimate for the overhead that
adding this locking in XFS actually costs in practice?  And does XFS
provide any kind of consistency guarantees if the reads/write overlap
spans multiple pages?  I assume the answer to that is no, correct?

Thanks,

                                              - Ted 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
  2012-11-26 21:49         ` Theodore Ts'o
@ 2012-11-26 22:09           ` Christoph Hellwig
  2012-11-27  1:32             ` Theodore Ts'o
  2012-11-26 22:17           ` Dave Chinner
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2012-11-26 22:09 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christoph Hellwig, Hugh Dickins, Andrew Morton, Al Viro,
	bugzilla-daemon, meetmehiro, linux-mm, linux-fsdevel

On Mon, Nov 26, 2012 at 04:49:37PM -0500, Theodore Ts'o wrote:
> Christoph, can you give some kind of estimate for the overhead that
> adding this locking in XFS actually costs in practice?

I don't know any real life measurements, but in terms of implementation
the over head is:

 a) taking a the rw_semaphore in shared mode for every buffered read
 b) taking the slightly slower exclusive rw_semaphore for buffered writes
    instead of the plain mutex

On the other hand it significantly simplifies the locking for direct
I/O and allows parallel direct I/O writers.

> And does XFS
> provide any kind of consistency guarantees if the reads/write overlap
> spans multiple pages?  I assume the answer to that is no, correct?

The answer is yes as the lock is taken globally on the inode.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
  2012-11-26 21:49         ` Theodore Ts'o
  2012-11-26 22:09           ` Christoph Hellwig
@ 2012-11-26 22:17           ` Dave Chinner
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2012-11-26 22:17 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christoph Hellwig, Hugh Dickins, Andrew Morton, Al Viro,
	bugzilla-daemon, meetmehiro, linux-mm, linux-fsdevel

On Mon, Nov 26, 2012 at 04:49:37PM -0500, Theodore Ts'o wrote:
> On Mon, Nov 26, 2012 at 03:13:08PM -0500, Christoph Hellwig wrote:
> > On Mon, Nov 26, 2012 at 12:05:57PM -0800, Hugh Dickins wrote:
> > > Gosh, that's a very sudden new consensus.  The consensus over the past
> > > ten or twenty years has been that the Linux kernel enforce locking for
> > > consistent atomic writes, but skip that overhead on reads - hasn't it?
> > 
> > I'm not sure there was much of a consensus ever.  We XFS people always
> > ttried to push everyone down the strict rule, but there was enough
> > pushback that it didn't actually happen.
> 
> Christoph, can you give some kind of estimate for the overhead that
> adding this locking in XFS actually costs in practice?

It doesn't show up any significant numbers in profiles, if that is
what you are asking.

I've tested over random 4k reads and writes at over 2 million IOPS
to a single file using concurrent direct IO, so the non-exclusive
locking overhead is pretty minimal. If the workload is modified
slightly to used buffered writes instead of direct IO writes and so
triggering shared/exclusive lock contention, then the same workload
tends to get limited at around 250,000 IOPS per file. That's a
direct result of the exclusive locking limiting the workload to what
a single CPU can sustain (i.e difference between 8p @ 250-300k iops
vs 1p @ 250k iops on the exclusive locking workload).

> And does XFS
> provide any kind of consistency guarantees if the reads/write overlap
> spans multiple pages?  I assume the answer to that is no, correct?

A buffered write is locked exclusive for the entire of the write.
That includes multiple page writes as the locking is outside of
the begin_write/end_write per-page iteration. Hence the atomicity of
the entire buffered write against both buffered read and direct IO
is guaranteed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
  2012-11-26 22:09           ` Christoph Hellwig
@ 2012-11-27  1:32             ` Theodore Ts'o
  2012-11-27  4:27               ` Dave Chinner
  0 siblings, 1 reply; 12+ messages in thread
From: Theodore Ts'o @ 2012-11-27  1:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Hugh Dickins, Andrew Morton, Al Viro, bugzilla-daemon, meetmehiro,
	linux-mm, linux-fsdevel

On Mon, Nov 26, 2012 at 05:09:08PM -0500, Christoph Hellwig wrote:
> On Mon, Nov 26, 2012 at 04:49:37PM -0500, Theodore Ts'o wrote:
> > Christoph, can you give some kind of estimate for the overhead that
> > adding this locking in XFS actually costs in practice?
> 
> I don't know any real life measurements, but in terms of implementation
> the over head is:
> 
>  a) taking a the rw_semaphore in shared mode for every buffered read
>  b) taking the slightly slower exclusive rw_semaphore for buffered writes
>     instead of the plain mutex
> 
> On the other hand it significantly simplifies the locking for direct
> I/O and allows parallel direct I/O writers.

I should probably just look at the XFS code, but.... if you're taking
an exclusve lock for buffered writes, won't this impact the
performance of buffered writes happening in parallel on different
CPU's?

						- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page  range
  2012-11-27  1:32             ` Theodore Ts'o
@ 2012-11-27  4:27               ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2012-11-27  4:27 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christoph Hellwig, Hugh Dickins, Andrew Morton, Al Viro,
	bugzilla-daemon, meetmehiro, linux-mm, linux-fsdevel

On Mon, Nov 26, 2012 at 08:32:54PM -0500, Theodore Ts'o wrote:
> On Mon, Nov 26, 2012 at 05:09:08PM -0500, Christoph Hellwig wrote:
> > On Mon, Nov 26, 2012 at 04:49:37PM -0500, Theodore Ts'o wrote:
> > > Christoph, can you give some kind of estimate for the overhead that
> > > adding this locking in XFS actually costs in practice?
> > 
> > I don't know any real life measurements, but in terms of implementation
> > the over head is:
> > 
> >  a) taking a the rw_semaphore in shared mode for every buffered read
> >  b) taking the slightly slower exclusive rw_semaphore for buffered writes
> >     instead of the plain mutex
> > 
> > On the other hand it significantly simplifies the locking for direct
> > I/O and allows parallel direct I/O writers.
> 
> I should probably just look at the XFS code, but.... if you're taking
> an exclusve lock for buffered writes, won't this impact the
> performance of buffered writes happening in parallel on different
> CPU's?

Indeed it does - see my previous email. But it's no worse than
generic_file_aio_write() that takes i_mutex across buffered writes,
which is what most filesystems currently do. And FWIW, we also take
the i_mutex outside the i_iolock for the buffered write case because
generic_file_buffered_write() is documented to require it held.
See xfs_rw_ilock() and friends for locking order semantics...

FWIW, this buffered write exclusion is why we have been considering
replacing the rwsem with a shared/exclusive range lock - so we can
do concurrent non-overlapping reads and writes (for both direct IO and
buffered IO) without compromising the POSIX atomic write guarantee
(i.e. that a read will see the entire write or none of it). Range
locking will allow us to do that for both buffered and direct IO...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-11-27  4:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-50981-5823@https.bugzilla.kernel.org/>
     [not found] ` <20121126163328.ACEB011FE9C@bugzilla.kernel.org>
2012-11-26 16:45   ` [Bug 50981] generic_file_aio_read ?: No locking means DATA CORRUPTION read and write on same 4096 page range Theodore Ts'o
2012-11-26 18:59     ` Hiro Lalwani
2012-11-26 20:05     ` Hugh Dickins
2012-11-26 20:13       ` Christoph Hellwig
2012-11-26 21:28         ` Dave Chinner
2012-11-26 21:39           ` Christoph Hellwig
2012-11-26 21:49         ` Theodore Ts'o
2012-11-26 22:09           ` Christoph Hellwig
2012-11-27  1:32             ` Theodore Ts'o
2012-11-27  4:27               ` Dave Chinner
2012-11-26 22:17           ` Dave Chinner
2012-11-26 20:15       ` Zach Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).