* Does fsync() block read and write ops on the same file? @ 2009-12-10 9:22 Florian Weimer 2009-12-11 3:55 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Florian Weimer @ 2009-12-10 9:22 UTC (permalink / raw) To: linux-fsdevel I've got an odd performance issue. It seems that when fsync() is called on a file, other processes block when they try to access it. This is not merely due to I/O contention on the underlying block device, it seems. Oracle reported a similar performance issue in the Berkeley DB JE changelog. Is this really true? Are there any workarounds? (I'm mainly interested in the situation on ext[34] and XFS.) -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Does fsync() block read and write ops on the same file? 2009-12-10 9:22 Does fsync() block read and write ops on the same file? Florian Weimer @ 2009-12-11 3:55 ` Dave Chinner 2009-12-11 8:53 ` Florian Weimer 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2009-12-11 3:55 UTC (permalink / raw) To: Florian Weimer; +Cc: linux-fsdevel On Thu, Dec 10, 2009 at 09:22:35AM +0000, Florian Weimer wrote: > I've got an odd performance issue. It seems that when fsync() is > called on a file, other processes block when they try to access it. > This is not merely due to I/O contention on the underlying block > device, it seems. The inode mutex is held across the ->fsync() method. If that takes a long time to run, then other processes will block trying to take the inode mutex. i.e. part of fsync serialises access to the inode. > Oracle reported a similar performance issue in the Berkeley DB JE > changelog. Is this really true? Are there any workarounds? (I'm > mainly interested in the situation on ext[34] and XFS.) For XFS, the ->fsync method blocks for as long as it takes to write a synchronous transaction (1 IO). ext4 looks like it writes the inode rather than doing a journal commit, so it should only need a single IO with the inode mutex held, too. I don't think these can be optimised any further. You can use an external log with XFS on separate spindles to the data volume to minimise the transaction latency, but that's about it AFAIK. For ext3, ordered mode can result in long (multi-second) fsync latencies on busy filesystems because of the journal commit involved. Using writeback mode will avoid the long latencies and make it operate close to ext4/XFS speeds. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Does fsync() block read and write ops on the same file? 2009-12-11 3:55 ` Dave Chinner @ 2009-12-11 8:53 ` Florian Weimer 2009-12-11 12:42 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Florian Weimer @ 2009-12-11 8:53 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel * Dave Chinner: > On Thu, Dec 10, 2009 at 09:22:35AM +0000, Florian Weimer wrote: >> I've got an odd performance issue. It seems that when fsync() is >> called on a file, other processes block when they try to access it. >> This is not merely due to I/O contention on the underlying block >> device, it seems. > > The inode mutex is held across the ->fsync() method. If that takes a > long time to run, then other processes will block trying to take the > inode mutex. i.e. part of fsync serialises access to the inode. Is an inode lock required to read from the file? >> Oracle reported a similar performance issue in the Berkeley DB JE >> changelog. Is this really true? Are there any workarounds? (I'm >> mainly interested in the situation on ext[34] and XFS.) > > For XFS, the ->fsync method blocks for as long as it takes to write > a synchronous transaction (1 IO). ext4 looks like it writes the > inode rather than doing a journal commit, so it should only need a > single IO with the inode mutex held, too. I don't think these can be > optimised any further. I'm not concerned with fsync latency per se. It's going to take a while to write a few GBs scattered across the file. However, it's annoying that read operations on the same file (which can't even see the effect of the fsync operation) are blocked, some times for more than two minutes. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Does fsync() block read and write ops on the same file? 2009-12-11 8:53 ` Florian Weimer @ 2009-12-11 12:42 ` Dave Chinner 2009-12-11 12:53 ` Florian Weimer 2009-12-11 13:21 ` Christoph Hellwig 0 siblings, 2 replies; 8+ messages in thread From: Dave Chinner @ 2009-12-11 12:42 UTC (permalink / raw) To: Florian Weimer; +Cc: linux-fsdevel On Fri, Dec 11, 2009 at 08:53:37AM +0000, Florian Weimer wrote: > * Dave Chinner: > > > On Thu, Dec 10, 2009 at 09:22:35AM +0000, Florian Weimer wrote: > >> I've got an odd performance issue. It seems that when fsync() is > >> called on a file, other processes block when they try to access it. > >> This is not merely due to I/O contention on the underlying block > >> device, it seems. > > > > The inode mutex is held across the ->fsync() method. If that takes a > > long time to run, then other processes will block trying to take the > > inode mutex. i.e. part of fsync serialises access to the inode. > > Is an inode lock required to read from the file? No usually - normally only for data writes and metadata modifications. However, some filesystems dirty objects even on read (e.g. changing atime) and so can serialise on other filesystem locks (e.g. ext3 journal lock) that is being held by the fsync. > >> Oracle reported a similar performance issue in the Berkeley DB JE > >> changelog. Is this really true? Are there any workarounds? (I'm > >> mainly interested in the situation on ext[34] and XFS.) > > > > For XFS, the ->fsync method blocks for as long as it takes to write > > a synchronous transaction (1 IO). ext4 looks like it writes the > > inode rather than doing a journal commit, so it should only need a > > single IO with the inode mutex held, too. I don't think these can be > > optimised any further. > > I'm not concerned with fsync latency per se. It's going to take a > while to write a few GBs scattered across the file. However, it's > annoying that read operations on the same file (which can't even see > the effect of the fsync operation) are blocked, some times for more > than two minutes. If they are blocking for that long then sysrq-w during that period will tell us exactly where in what filesystem they are blocking on.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Does fsync() block read and write ops on the same file? 2009-12-11 12:42 ` Dave Chinner @ 2009-12-11 12:53 ` Florian Weimer 2009-12-12 23:05 ` Dave Chinner 2009-12-11 13:21 ` Christoph Hellwig 1 sibling, 1 reply; 8+ messages in thread From: Florian Weimer @ 2009-12-11 12:53 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel * Dave Chinner: >> I'm not concerned with fsync latency per se. It's going to take a >> while to write a few GBs scattered across the file. However, it's >> annoying that read operations on the same file (which can't even see >> the effect of the fsync operation) are blocked, some times for more >> than two minutes. > > If they are blocking for that long then sysrq-w during that period > will tell us exactly where in what filesystem they are blocking on.... Interesting. Is it possible to trigger this from the hang timer? From that, I've got two traces: [307370.450502] Call Trace: [307370.450555] [<ffffffff802aa9c8>] dput+0x1c/0xdd [307370.450590] [<ffffffff8042a2e9>] __down_read+0x87/0xa1 [307370.450641] [<ffffffffa0276cd8>] :xfs:xfs_ilock+0x31/0x60 [307370.450684] [<ffffffffa029a208>] :xfs:xfs_read+0x147/0x21a [307370.450718] [<ffffffff8029ae23>] do_sync_read+0xc9/0x10c [307370.450750] [<ffffffff80246201>] autoremove_wake_function+0x0/0x2e [307370.450787] [<ffffffff8029b614>] vfs_read+0xaa/0x152 [307370.450815] [<ffffffff8029b9f5>] sys_read+0x45/0x6e [307370.450844] [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f [307396.186071] Call Trace: [307396.186128] [<ffffffff8042963d>] __mutex_lock_slowpath+0x64/0x9b [307396.186160] [<ffffffff804294a2>] mutex_lock+0xa/0xb [307396.186190] [<ffffffff8029b749>] generic_file_llseek+0x2a/0x8b [307396.186219] [<ffffffff8029b8f8>] sys_lseek+0x40/0x60 [307396.186248] [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Does fsync() block read and write ops on the same file? 2009-12-11 12:53 ` Florian Weimer @ 2009-12-12 23:05 ` Dave Chinner 0 siblings, 0 replies; 8+ messages in thread From: Dave Chinner @ 2009-12-12 23:05 UTC (permalink / raw) To: Florian Weimer; +Cc: linux-fsdevel On Fri, Dec 11, 2009 at 12:53:11PM +0000, Florian Weimer wrote: > * Dave Chinner: > > >> I'm not concerned with fsync latency per se. It's going to take a > >> while to write a few GBs scattered across the file. However, it's > >> annoying that read operations on the same file (which can't even see > >> the effect of the fsync operation) are blocked, some times for more > >> than two minutes. > > > > If they are blocking for that long then sysrq-w during that period > > will tell us exactly where in what filesystem they are blocking on.... > > Interesting. Is it possible to trigger this from the hang timer? > From that, I've got two traces: > > [307370.450502] Call Trace: > [307370.450555] [<ffffffff802aa9c8>] dput+0x1c/0xdd > [307370.450590] [<ffffffff8042a2e9>] __down_read+0x87/0xa1 > [307370.450641] [<ffffffffa0276cd8>] :xfs:xfs_ilock+0x31/0x60 > [307370.450684] [<ffffffffa029a208>] :xfs:xfs_read+0x147/0x21a > [307370.450718] [<ffffffff8029ae23>] do_sync_read+0xc9/0x10c > [307370.450750] [<ffffffff80246201>] autoremove_wake_function+0x0/0x2e > [307370.450787] [<ffffffff8029b614>] vfs_read+0xaa/0x152 > [307370.450815] [<ffffffff8029b9f5>] sys_read+0x45/0x6e > [307370.450844] [<ffffffff8020beca>] system_call_after_swapgs+0x8a/0x8f That is xfs_ilock(inode, XFS_IOLOCK_SHARED), which means it is blocked on a either a concurrent write, truncate or preallocation occurring to the same file. ->fsync does not take the IOLOCK at all (it takes the ILOCK which protects non-IO related inode attributes), so that is not causing your pauses here.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Does fsync() block read and write ops on the same file? 2009-12-11 12:42 ` Dave Chinner 2009-12-11 12:53 ` Florian Weimer @ 2009-12-11 13:21 ` Christoph Hellwig 2009-12-11 13:35 ` Florian Weimer 1 sibling, 1 reply; 8+ messages in thread From: Christoph Hellwig @ 2009-12-11 13:21 UTC (permalink / raw) To: Dave Chinner; +Cc: Florian Weimer, linux-fsdevel On Fri, Dec 11, 2009 at 11:42:24PM +1100, Dave Chinner wrote: > No usually - normally only for data writes and metadata > modifications. However, some filesystems > dirty objects even on read (e.g. changing atime) and so can > serialise on other filesystem locks (e.g. ext3 journal lock) that > is being held by the fsync. Actually we also take the XFS ilock in shared mode in read, and XFS takes it in exclusive mode if it has to update filesystem attributes like the atime. This might be what Florian is seeing. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Does fsync() block read and write ops on the same file? 2009-12-11 13:21 ` Christoph Hellwig @ 2009-12-11 13:35 ` Florian Weimer 0 siblings, 0 replies; 8+ messages in thread From: Florian Weimer @ 2009-12-11 13:35 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Dave Chinner, linux-fsdevel * Christoph Hellwig: > On Fri, Dec 11, 2009 at 11:42:24PM +1100, Dave Chinner wrote: >> No usually - normally only for data writes and metadata >> modifications. However, some filesystems >> dirty objects even on read (e.g. changing atime) and so can >> serialise on other filesystem locks (e.g. ext3 journal lock) that >> is being held by the fsync. > > Actually we also take the XFS ilock in shared mode in read, and XFS > takes it in exclusive mode if it has to update filesystem attributes > like the atime. This might be what Florian is seeing. The file system is mounted noatime. But the file in question is heavily fragmented due to the way it is created--databases pages are written in more-or-less random order, creating holes which are later filled. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-12-13 1:22 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-10 9:22 Does fsync() block read and write ops on the same file? Florian Weimer 2009-12-11 3:55 ` Dave Chinner 2009-12-11 8:53 ` Florian Weimer 2009-12-11 12:42 ` Dave Chinner 2009-12-11 12:53 ` Florian Weimer 2009-12-12 23:05 ` Dave Chinner 2009-12-11 13:21 ` Christoph Hellwig 2009-12-11 13:35 ` Florian Weimer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.