linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Chinmay V S <cvs268@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Stefan Priebe - Profihost AG <s.priebe@profihost.ag>,
	linux-fsdevel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	LKML <linux-kernel@vger.kernel.org>,
	Matthew Wilcox <matthew@wil.cx>
Subject: Re: Why is O_DSYNC on linux so slow / what's wrong with my SSD?
Date: Thu, 21 Nov 2013 02:11:01 -0800	[thread overview]
Message-ID: <20131121101101.GA18404@infradead.org> (raw)
In-Reply-To: <CAK-9PRDNxHAX70cN88kRt03FkYbDB_x1cFQQYmVzqiCX=aZD6w@mail.gmail.com>

> 
> 1. Most drives do NOT respond to CMD_FLUSH immediately i.e. they wait
> until the data is actually moved to the non-volatile media (which is
> the right behaviour) i.e. performance drops.

Which is what the specification sais they must do.

> 2. Some drives may implement CMD_FLUSH to return immediately i.e. no
> guarantee the data is actually on disk.

In which case they aren't spec complicant.  While I've seen countless
data integrity bugs on lower end ATA SSDs I've not seen one that simpliy
ingnores flush.  If you'd want to cheat that bluntly you'd be better
of just claiming to not have a writeback cache.

> 3. Anyway, CMD_FLUSH does NOT guarantee atomicity. (Consider power
> failure in the middle of an ongoing CMD_FLUSH on non battery-backed
> disks).

It does not guarantee atomicy by itself, but it's the only low-level
primitive a filesystem or database can use build atomic transaction
at a higher level on an ATA disk with the writeback cache enabled.

> In case the application cannot be modified to perform ASYNC IO, there
> exists a way to disable the behaviour of issuing a CMD_FLUSH for each
> sync() within the block device driver for SATA/SCSI disks. This is
> what is described by
> https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba

Which is utterly broken, and your insistance on pushing it shows you
do not understand the problem space.

You solve your performance problem by completely disabling any chance
of having data integrity guarantees, and do so in a way that is not
detectable for applications or users.

If you have a workload with lots of small synchronous writes disabling
the writeback cache on the disk does indeed often help, especially with
the non-queueable FLUSH on all but the most recent ATA devices.

If you do have workloads where you do lots of synchronous writes

> Just to be clear, i am NOT recommending that this change be mainlined;
> rather it is a reference to improve performance in the rare cases(like
> in the OP Stefan's case) where both the app performing DIRECT SYNC
> block IO and the disk firmware implementing CMD_FLUSH can NOT be
> modified. In which case the standard block driver behaviour of issuing
> a CMD_FLUSH with each write is too restrictive and thus modified using
> the patch.

Again, what your patch does is to explicitly ignore the data integrity
request from the application.  While this will usually be way faster,
it will also cause data loss.  Simply disabling the writeback cache
feature of the disk using hdparm will give you much better performance
than issueing all the FLUSH command, especially if they are non-queued,
but without breaking the gurantee to the application.

  reply	other threads:[~2013-11-21 10:11 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-20 12:12 Why is O_DSYNC on linux so slow / what's wrong with my SSD? Stefan Priebe - Profihost AG
2013-11-20 12:54 ` Christoph Hellwig
2013-11-20 13:34   ` Chinmay V S
2013-11-20 13:38     ` Christoph Hellwig
2013-11-20 14:12     ` Stefan Priebe - Profihost AG
2013-11-20 15:22       ` Chinmay V S
2013-11-20 15:37         ` Theodore Ts'o
2013-11-20 15:55           ` J. Bruce Fields
2013-11-20 17:11             ` Chinmay V S
2013-11-20 17:58               ` J. Bruce Fields
2013-11-20 18:43                 ` Chinmay V S
2013-11-21 10:11                   ` Christoph Hellwig [this message]
2013-11-22 20:01                     ` Stefan Priebe
2013-11-22 20:37                       ` Ric Wheeler
2013-11-22 21:05                         ` Stefan Priebe
2013-11-23 18:27                         ` Stefan Priebe
2013-11-23 19:35                           ` Ric Wheeler
2013-11-23 19:48                             ` Stefan Priebe
2013-11-25  7:37                             ` Stefan Priebe
2020-01-08  6:58                             ` slow sync performance on LSI / Broadcom MegaRaid performance with battery cache Stefan Priebe - Profihost AG
2013-11-22 19:57             ` Why is O_DSYNC on linux so slow / what's wrong with my SSD? Stefan Priebe
2013-11-24  0:10               ` One Thousand Gnomes
2013-11-20 16:02           ` Howard Chu
2013-11-23 20:36             ` Pavel Machek
2013-11-23 23:01               ` Ric Wheeler
2013-11-24  0:22                 ` Pavel Machek
2013-11-24  1:03                   ` One Thousand Gnomes
2013-11-24  2:43                   ` Ric Wheeler
2013-11-22 19:55         ` Stefan Priebe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131121101101.GA18404@infradead.org \
    --to=hch@infradead.org \
    --cc=bfields@fieldses.org \
    --cc=cvs268@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew@wil.cx \
    --cc=s.priebe@profihost.ag \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).