From: Theodore Ts'o <tytso@mit.edu>
To: David Muchene <david.muchene@avid.com>
Cc: "linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: Fsync Performance
Date: Tue, 4 Aug 2015 21:59:42 -0400 [thread overview]
Message-ID: <20150805015942.GA3432@thunk.org> (raw)
In-Reply-To: <CD10AD6FBDAE574A973D585BD95C239A4769527A@WAL-MBS02.global.avidww.com>
On Tue, Aug 04, 2015 at 07:06:13PM +0000, David Muchene wrote:
>
> I'm not sure if this is the place to ask this, if it isn't I
> apologize. We are occasionally seeing fsync take a very long time
> (sometimes upwards of 3s). We decided to run some fio tests and use
> systemtap to determine if the disks were the cause of the
> problem. One of the results from the tests is that there
> occasionally there is a significant difference between time spent
> doing io, and the total time to complete the fsync. Is there an
> explanation to this difference, or is the systemtap script bogus? If
> it is in fact the driver/disks that is taking a long time, does
> anyone have any suggestions as to how I'd debug that? I appreciate
> any help you can provide (even if it's pointing me to the relevant
> documents).
You haven't specified which functions you are including as meaning
"time spent doing I/O", but I suspect what you are seeing is the
difference between the time to send the data blocks to the disk, and
(a) time to complete the journal commit and (b) the time for the SSD
to confirm that the data and metadata blocks sent to the device have
been written to stable store (so they will survive a power failure)[1].
[1] Note that not all SSD's, especially if they are non-enterprise
SSD's, are rated to be safe against power failures.
You may be able to avoid the need to complete the journal commit if
all of the writes to the file are non-allocating writes (i.e., the
blocks were allocated and initialized by prewriting the blocks if the
blocks were allocated using fallocate), and you use fdatasync(2)
instead of fsync(2). (If there is no need to update the file system
metadata blocks in order to guarantee that the blocks can be read
after a power failure, fdatasync will omit updating the inode
mtime/ctime fields to the device.)
- Ted
next prev parent reply other threads:[~2015-08-05 1:59 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-04 19:06 Fsync Performance David Muchene
2015-08-05 1:59 ` Theodore Ts'o [this message]
-- strict thread matches above, loose matches on Subject: below --
2015-08-05 14:37 David Muchene
2015-08-05 15:14 ` Albino B Neto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150805015942.GA3432@thunk.org \
--to=tytso@mit.edu \
--cc=david.muchene@avid.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).