All of lore.kernel.org
 help / color / mirror / Atom feed
* Does fsync() block read and write ops on the same file?
@ 2009-12-10  9:22 Florian Weimer
  2009-12-11  3:55 ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2009-12-10  9:22 UTC (permalink / raw)
  To: linux-fsdevel

I've got an odd performance issue.  It seems that when fsync() is
called on a file, other processes block when they try to access it.
This is not merely due to I/O contention on the underlying block
device, it seems.

Oracle reported a similar performance issue in the Berkeley DB JE
changelog.  Is this really true?  Are there any workarounds?  (I'm
mainly interested in the situation on ext[34] and XFS.)

-- 
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Does fsync() block read and write ops on the same file?
  2009-12-10  9:22 Does fsync() block read and write ops on the same file? Florian Weimer
@ 2009-12-11  3:55 ` Dave Chinner
  2009-12-11  8:53   ` Florian Weimer
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2009-12-11  3:55 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linux-fsdevel

On Thu, Dec 10, 2009 at 09:22:35AM +0000, Florian Weimer wrote:
> I've got an odd performance issue.  It seems that when fsync() is
> called on a file, other processes block when they try to access it.
> This is not merely due to I/O contention on the underlying block
> device, it seems.

The inode mutex is held across the ->fsync() method. If that takes a
long time to run, then other processes will block trying to take the
inode mutex. i.e. part of fsync serialises access to the inode.

> Oracle reported a similar performance issue in the Berkeley DB JE
> changelog.  Is this really true?  Are there any workarounds?  (I'm
> mainly interested in the situation on ext[34] and XFS.)

For XFS, the ->fsync method blocks for as long as it takes to write
a synchronous transaction (1 IO).  ext4 looks like it writes the
inode rather than doing a journal commit, so it should only need a
single IO with the inode mutex held, too. I don't think these can be
optimised any further. You can use an external log with XFS on
separate spindles to the data volume to minimise the transaction
latency, but that's about it AFAIK.

For ext3, ordered mode can result in long (multi-second) fsync
latencies on busy filesystems because of the journal commit
involved.  Using writeback mode will avoid the long latencies
and make it operate close to ext4/XFS speeds.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Does fsync() block read and write ops on the same file?
  2009-12-11  3:55 ` Dave Chinner
@ 2009-12-11  8:53   ` Florian Weimer
  2009-12-11 12:42     ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2009-12-11  8:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel

* Dave Chinner:

> On Thu, Dec 10, 2009 at 09:22:35AM +0000, Florian Weimer wrote:
>> I've got an odd performance issue.  It seems that when fsync() is
>> called on a file, other processes block when they try to access it.
>> This is not merely due to I/O contention on the underlying block
>> device, it seems.
>
> The inode mutex is held across the ->fsync() method. If that takes a
> long time to run, then other processes will block trying to take the
> inode mutex. i.e. part of fsync serialises access to the inode.

Is an inode lock required to read from the file?

>> Oracle reported a similar performance issue in the Berkeley DB JE
>> changelog.  Is this really true?  Are there any workarounds?  (I'm
>> mainly interested in the situation on ext[34] and XFS.)
>
> For XFS, the ->fsync method blocks for as long as it takes to write
> a synchronous transaction (1 IO).  ext4 looks like it writes the
> inode rather than doing a journal commit, so it should only need a
> single IO with the inode mutex held, too. I don't think these can be
> optimised any further.

I'm not concerned with fsync latency per se.  It's going to take a
while to write a few GBs scattered across the file.  However, it's
annoying that read operations on the same file (which can't even see
the effect of the fsync operation) are blocked, some times for more
than two minutes.

-- 
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Does fsync() block read and write ops on the same file?
  2009-12-11  8:53   ` Florian Weimer
@ 2009-12-11 12:42     ` Dave Chinner
  2009-12-11 12:53       ` Florian Weimer
  2009-12-11 13:21       ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2009-12-11 12:42 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linux-fsdevel

On Fri, Dec 11, 2009 at 08:53:37AM +0000, Florian Weimer wrote:
> * Dave Chinner:
> 
> > On Thu, Dec 10, 2009 at 09:22:35AM +0000, Florian Weimer wrote:
> >> I've got an odd performance issue.  It seems that when fsync() is
> >> called on a file, other processes block when they try to access it.
> >> This is not merely due to I/O contention on the underlying block
> >> device, it seems.
> >
> > The inode mutex is held across the ->fsync() method. If that takes a
> > long time to run, then other processes will block trying to take the
> > inode mutex. i.e. part of fsync serialises access to the inode.
> 
> Is an inode lock required to read from the file?

No usually - normally only for data writes and metadata
modifications. However, some filesystems
dirty objects even on read (e.g. changing atime) and so can
serialise on other filesystem locks (e.g. ext3 journal lock) that
is being held by the fsync.

> >> Oracle reported a similar performance issue in the Berkeley DB JE
> >> changelog.  Is this really true?  Are there any workarounds?  (I'm
> >> mainly interested in the situation on ext[34] and XFS.)
> >
> > For XFS, the ->fsync method blocks for as long as it takes to write
> > a synchronous transaction (1 IO).  ext4 looks like it writes the
> > inode rather than doing a journal commit, so it should only need a
> > single IO with the inode mutex held, too. I don't think these can be
> > optimised any further.
> 
> I'm not concerned with fsync latency per se.  It's going to take a
> while to write a few GBs scattered across the file.  However, it's
> annoying that read operations on the same file (which can't even see
> the effect of the fsync operation) are blocked, some times for more
> than two minutes.

If they are blocking for that long then sysrq-w during that period
will tell us exactly where in what filesystem they are blocking on....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Does fsync() block read and write ops on the same file?
  2009-12-11 12:42     ` Dave Chinner
@ 2009-12-11 12:53       ` Florian Weimer
  2009-12-12 23:05         ` Dave Chinner
  2009-12-11 13:21       ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2009-12-11 12:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel

* Dave Chinner:

>> I'm not concerned with fsync latency per se.  It's going to take a
>> while to write a few GBs scattered across the file.  However, it's
>> annoying that read operations on the same file (which can't even see
>> the effect of the fsync operation) are blocked, some times for more
>> than two minutes.
>
> If they are blocking for that long then sysrq-w during that period
> will tell us exactly where in what filesystem they are blocking on....

Interesting.  Is it possible to trigger this from the hang timer?
From that, I've got two traces:

[307370.450502] Call Trace:
[307370.450555]  [<ffffffff802aa9c8>] dput+0x1c/0xdd
[307370.450590]  [<ffffffff8042a2e9>] __down_read+0x87/0xa1
[307370.450641]  [<ffffffffa0276cd8>] :xfs:xfs_ilock+0x31/0x60
[307370.450684]  [<ffffffffa029a208>] :xfs:xfs_read+0x147/0x21a
[307370.450718]  [<ffffffff8029ae23>] do_sync_read+0xc9/0x10c
[307370.450750]  [<ffffffff80246201>] autoremove_wake_function+0x0/0x2e
[307370.450787]  [<ffffffff8029b614>] vfs_read+0xaa/0x152
[307370.450815]  [<ffffffff8029b9f5>] sys_read+0x45/0x6e
[307370.450844]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f

[307396.186071] Call Trace:
[307396.186128]  [<ffffffff8042963d>] __mutex_lock_slowpath+0x64/0x9b
[307396.186160]  [<ffffffff804294a2>] mutex_lock+0xa/0xb
[307396.186190]  [<ffffffff8029b749>] generic_file_llseek+0x2a/0x8b
[307396.186219]  [<ffffffff8029b8f8>] sys_lseek+0x40/0x60
[307396.186248]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f

-- 
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Does fsync() block read and write ops on the same file?
  2009-12-11 12:42     ` Dave Chinner
  2009-12-11 12:53       ` Florian Weimer
@ 2009-12-11 13:21       ` Christoph Hellwig
  2009-12-11 13:35         ` Florian Weimer
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2009-12-11 13:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Florian Weimer, linux-fsdevel

On Fri, Dec 11, 2009 at 11:42:24PM +1100, Dave Chinner wrote:
> No usually - normally only for data writes and metadata
> modifications. However, some filesystems
> dirty objects even on read (e.g. changing atime) and so can
> serialise on other filesystem locks (e.g. ext3 journal lock) that
> is being held by the fsync.

Actually we also take the XFS ilock in shared mode in read, and XFS
takes it in exclusive mode if it has to update filesystem attributes
like the atime.  This might be what Florian is seeing.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Does fsync() block read and write ops on the same file?
  2009-12-11 13:21       ` Christoph Hellwig
@ 2009-12-11 13:35         ` Florian Weimer
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Weimer @ 2009-12-11 13:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dave Chinner, linux-fsdevel

* Christoph Hellwig:

> On Fri, Dec 11, 2009 at 11:42:24PM +1100, Dave Chinner wrote:
>> No usually - normally only for data writes and metadata
>> modifications. However, some filesystems
>> dirty objects even on read (e.g. changing atime) and so can
>> serialise on other filesystem locks (e.g. ext3 journal lock) that
>> is being held by the fsync.
>
> Actually we also take the XFS ilock in shared mode in read, and XFS
> takes it in exclusive mode if it has to update filesystem attributes
> like the atime.  This might be what Florian is seeing.

The file system is mounted noatime.  But the file in question is
heavily fragmented due to the way it is created--databases pages are
written in more-or-less random order, creating holes which are later
filled.

-- 
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Does fsync() block read and write ops on the same file?
  2009-12-11 12:53       ` Florian Weimer
@ 2009-12-12 23:05         ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2009-12-12 23:05 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linux-fsdevel

On Fri, Dec 11, 2009 at 12:53:11PM +0000, Florian Weimer wrote:
> * Dave Chinner:
> 
> >> I'm not concerned with fsync latency per se.  It's going to take a
> >> while to write a few GBs scattered across the file.  However, it's
> >> annoying that read operations on the same file (which can't even see
> >> the effect of the fsync operation) are blocked, some times for more
> >> than two minutes.
> >
> > If they are blocking for that long then sysrq-w during that period
> > will tell us exactly where in what filesystem they are blocking on....
> 
> Interesting.  Is it possible to trigger this from the hang timer?
> From that, I've got two traces:
> 
> [307370.450502] Call Trace:
> [307370.450555]  [<ffffffff802aa9c8>] dput+0x1c/0xdd
> [307370.450590]  [<ffffffff8042a2e9>] __down_read+0x87/0xa1
> [307370.450641]  [<ffffffffa0276cd8>] :xfs:xfs_ilock+0x31/0x60
> [307370.450684]  [<ffffffffa029a208>] :xfs:xfs_read+0x147/0x21a
> [307370.450718]  [<ffffffff8029ae23>] do_sync_read+0xc9/0x10c
> [307370.450750]  [<ffffffff80246201>] autoremove_wake_function+0x0/0x2e
> [307370.450787]  [<ffffffff8029b614>] vfs_read+0xaa/0x152
> [307370.450815]  [<ffffffff8029b9f5>] sys_read+0x45/0x6e
> [307370.450844]  [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f

That is xfs_ilock(inode, XFS_IOLOCK_SHARED), which means it is
blocked on a either a concurrent write, truncate or preallocation
occurring to the same file. ->fsync does not take the IOLOCK at all
(it takes the ILOCK which protects non-IO related inode attributes),
so that is not causing your pauses here....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-12-13  1:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-10  9:22 Does fsync() block read and write ops on the same file? Florian Weimer
2009-12-11  3:55 ` Dave Chinner
2009-12-11  8:53   ` Florian Weimer
2009-12-11 12:42     ` Dave Chinner
2009-12-11 12:53       ` Florian Weimer
2009-12-12 23:05         ` Dave Chinner
2009-12-11 13:21       ` Christoph Hellwig
2009-12-11 13:35         ` Florian Weimer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.