linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Periodic fstrim job vs mounting with discard
@ 2016-10-20 22:32 Jared D. Cottrell
  2016-10-21  1:48 ` Dave Chinner
  0 siblings, 1 reply; 3+ messages in thread
From: Jared D. Cottrell @ 2016-10-20 22:32 UTC (permalink / raw)
  To: linux-xfs

We've been running our Ubuntu 14.04-based, SSD-backed databases with a
weekly fstrim cron job, but have been finding more and more clusters
that are locking all IO for a couple minutes as a result of the job.
In theory, mounting with discard could be appropriate for our use case
as file deletes are infrequent and handled in background threads.
However, we read some dire warnings about using discard on this list
(http://oss.sgi.com/archives/xfs/2014-08/msg00465.html) that make us
want to avoid it.

Is discard still to be avoided at all costs? Are the corruption and
bricking problems mentioned still something to be expected even with
the protection of Linux's built-in blacklist of broken SSD hardware?
We happen to be using Amazon's in-chassis SSDs. I'm sure they use
multiple vendors but I can't imagine they're taking short-cuts with
cheap hardware.

If discard is still strongly discouraged, perhaps we can approach the
problem from the other side: does the slow fstrim we're seeing sounds
like a known issue? After a bunch of testing and research, we've
determined the following:

Essentially, XFS looks to be iterating over every allocation group and
issuing TRIM s for all free extents every time this ioctl is called.
This, coupled with the facts that Linux’s interface to the TRIM
command is both synchronous and does not support a vectorized list of
ranges (see: https://github.com/torvalds/linux/blob/3fc9d690936fb2e20e180710965ba2cc3a0881f8/block/blk-lib.c#L112),
is leading to a large number of extraneous TRIM commands (each of
which have been observed to be slow, see:
http://oss.sgi.com/archives/xfs/2011-12/msg00311.html) being issued to
the disk for ranges that both the filesystem and the disk know to be
free. In practice, we have seen IO disruptions of up to 2 minutes. I
realize that the duration of these disruptions may be controller
dependent. Unfortunately, when running on a platform like AWS, one
does not have the luxury of choosing specific hardware.

EXT4, on the other hand, tracks blocks that have been deleted since
the previous FITRIM ioctl and targets subsequent TRIM s to the
appropriate block ranges (see:
http://blog.taz.net.au/2012/01/07/fstrim-and-xfs/). In real-world
tests this significantly reduces the impact of fstrim to the point
that it is un-noticeable to the database / application.

For a bit more context, here's a write-up of the same issue we did for
the MongoDB community:

https://groups.google.com/forum/#!topic/mongodb-user/Mj0x6m-02Ms

Regards,


Jared

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-11-02 13:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-20 22:32 Periodic fstrim job vs mounting with discard Jared D. Cottrell
2016-10-21  1:48 ` Dave Chinner
2016-11-02 13:50   ` Jared D. Cottrell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).