From: Dave Chinner <david@fromorbit.com>
To: Spelic <spelic@shiftmail.org>
Cc: xfs@oss.sgi.com, linux-ext4@vger.kernel.org,
device-mapper development <dm-devel@redhat.com>
Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
Date: Thu, 21 Jun 2012 08:53:27 +1000 [thread overview]
Message-ID: <20120620225327.GL30705@dastard> (raw)
In-Reply-To: <4FE1BDF3.4080702@shiftmail.org>
On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote:
> Ok guys, I think I found the bug. One or more bugs.
>
>
> Pool has chunksize 1MB.
> In sysfs the thin volume has: queue/discard_max_bytes and
> queue/discard_granularity are 1048576 .
> And it has discard_alignment = 0, which based on sysfs-block
> documentation is correct (a less misleading name would have been
> discard_offset imho).
> Here is the blktrace from ext4 fstrim:
> ...
> 252,9 17 498 0.030466556 841 Q D 19898368 + 2048 [fstrim]
> 252,9 17 499 0.030467501 841 Q D 19900416 + 2048 [fstrim]
> 252,9 17 500 0.030468359 841 Q D 19902464 + 2048 [fstrim]
> 252,9 17 501 0.030469313 841 Q D 19904512 + 2048 [fstrim]
> 252,9 17 502 0.030470144 841 Q D 19906560 + 2048 [fstrim]
> 252,9 17 503 0.030471381 841 Q D 19908608 + 2048 [fstrim]
> 252,9 17 504 0.030472473 841 Q D 19910656 + 2048 [fstrim]
> 252,9 17 505 0.030473504 841 Q D 19912704 + 2048 [fstrim]
> 252,9 17 506 0.030474561 841 Q D 19914752 + 2048 [fstrim]
> 252,9 17 507 0.030475571 841 Q D 19916800 + 2048 [fstrim]
> 252,9 17 508 0.030476423 841 Q D 19918848 + 2048 [fstrim]
> 252,9 17 509 0.030477341 841 Q D 19920896 + 2048 [fstrim]
> 252,9 17 510 0.034299630 841 Q D 19922944 + 2048 [fstrim]
> 252,9 17 511 0.034306880 841 Q D 19924992 + 2048 [fstrim]
> 252,9 17 512 0.034307955 841 Q D 19927040 + 2048 [fstrim]
> 252,9 17 513 0.034308928 841 Q D 19929088 + 2048 [fstrim]
> 252,9 17 514 0.034309945 841 Q D 19931136 + 2048 [fstrim]
> 252,9 17 515 0.034311007 841 Q D 19933184 + 2048 [fstrim]
> 252,9 17 516 0.034312008 841 Q D 19935232 + 2048 [fstrim]
> 252,9 17 517 0.034313122 841 Q D 19937280 + 2048 [fstrim]
> 252,9 17 518 0.034314013 841 Q D 19939328 + 2048 [fstrim]
> 252,9 17 519 0.034314940 841 Q D 19941376 + 2048 [fstrim]
> 252,9 17 520 0.034315835 841 Q D 19943424 + 2048 [fstrim]
> 252,9 17 521 0.034316662 841 Q D 19945472 + 2048 [fstrim]
> 252,9 17 522 0.034317547 841 Q D 19947520 + 2048 [fstrim]
> ...
>
> Here is the blktrace from xfs fstrim:
> 252,12 16 1 0.000000000 554 Q D 96 + 2048 [fstrim]
> 252,12 16 2 0.000010149 554 Q D 2144 + 2048 [fstrim]
> 252,12 16 3 0.000011349 554 Q D 4192 + 2048 [fstrim]
> 252,12 16 4 0.000012584 554 Q D 6240 + 2048 [fstrim]
> 252,12 16 5 0.000013685 554 Q D 8288 + 2048 [fstrim]
> 252,12 16 6 0.000014660 554 Q D 10336 + 2048 [fstrim]
> 252,12 16 7 0.000015707 554 Q D 12384 + 2048 [fstrim]
> 252,12 16 8 0.000016692 554 Q D 14432 + 2048 [fstrim]
> 252,12 16 9 0.000017594 554 Q D 16480 + 2048 [fstrim]
> 252,12 16 10 0.000018539 554 Q D 18528 + 2048 [fstrim]
> 252,12 16 11 0.000019434 554 Q D 20576 + 2048 [fstrim]
> 252,12 16 12 0.000020879 554 Q D 22624 + 2048 [fstrim]
> 252,12 16 13 0.000021856 554 Q D 24672 + 2048 [fstrim]
> 252,12 16 14 0.000022786 554 Q D 26720 + 2048 [fstrim]
> 252,12 16 15 0.000023699 554 Q D 28768 + 2048 [fstrim]
> 252,12 16 16 0.000024672 554 Q D 30816 + 2048 [fstrim]
> 252,12 16 17 0.000025467 554 Q D 32864 + 2048 [fstrim]
> 252,12 16 18 0.000026374 554 Q D 34912 + 2048 [fstrim]
> 252,12 16 19 0.000027194 554 Q D 36960 + 2048 [fstrim]
> 252,12 16 20 0.000028137 554 Q D 39008 + 2048 [fstrim]
> 252,12 16 21 0.000029524 554 Q D 41056 + 2048 [fstrim]
> 252,12 16 22 0.000030479 554 Q D 43104 + 2048 [fstrim]
> 252,12 16 23 0.000031306 554 Q D 45152 + 2048 [fstrim]
> 252,12 16 24 0.000032134 554 Q D 47200 + 2048 [fstrim]
> 252,12 16 25 0.000032964 554 Q D 49248 + 2048 [fstrim]
> 252,12 16 26 0.000033794 554 Q D 51296 + 2048 [fstrim]
>
>
> As you can see, while ext4 correctly aligns the discards to 1MB, xfs
> does not.
XFs just sends a large extent to blkdev_issue_discard(), and cares
nothing about discard alignment or granularity.
> It looks like an fstrim or xfs bug: they don't look at
> discard_alignment (=0 ... a less misleading name would be
> discard_offset imho) + discard_granularity (=1MB) and they don't
> base alignments on those.
It looks like blkdev_issue_discard() has reduced each discard to
bios of a single "granule" (1MB), and not aligned them, hence they
are ignore by dm-thinp.
what are the discard parameters exposed by dm-thinp in
/sys/block/<thinp-blkdev>/queue/discard*
It looks to me that dmthinp might be setting discard_max_bytes to
1MB rather than discard_granularity. Looking at dm-thin.c:
static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
{
/*
* FIXME: these limits may be incompatible with the pool's data device
*/
limits->max_discard_sectors = pool->sectors_per_block;
/*
* This is just a hint, and not enforced. We have to cope with
* bios that overlap 2 blocks.
*/
limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
limits->discard_zeroes_data = pool->pf.zero_new_blocks;
}
Yes - discard_max_bytes == discard_granularity, and so
blkdev_issue_discard fails to align the request properly. As it is,
setting discard_max_bytes to the thinp block size is silly - it
means you'll never get range requests, and we sent a discard for
every single block in a range rather than having the thinp code
iterate over a range itself.
i.e. this is not a filesystem bug that is causing the problem....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2012-06-20 22:53 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-18 21:33 Ext4 and xfs problems in dm-thin on allocation and discard Spelic
2012-06-19 1:57 ` Dave Chinner
2012-06-19 3:12 ` Mike Snitzer
2012-06-19 6:32 ` Lukáš Czerner
2012-06-19 11:29 ` Spelic
2012-06-19 12:20 ` Lukáš Czerner
2012-06-19 13:34 ` Mike Snitzer
2012-06-19 13:16 ` Mike Snitzer
2012-06-19 13:25 ` Lukáš Czerner
2012-06-19 13:30 ` Mike Snitzer
2012-06-19 13:52 ` Spelic
2012-06-19 14:05 ` Eric Sandeen
2012-06-19 14:44 ` Mike Snitzer
2012-06-19 18:48 ` Mike Snitzer
2012-06-19 20:06 ` Dave Chinner
2012-06-19 20:21 ` Ted Ts'o
2012-06-19 20:39 ` Dave Chinner
2012-06-20 9:01 ` Christoph Hellwig
2012-06-19 21:37 ` Spelic
2012-06-19 23:12 ` Dave Chinner
2012-06-20 12:11 ` Spelic
2012-06-20 22:53 ` Dave Chinner [this message]
2012-06-21 17:47 ` Mike Snitzer
2012-06-21 23:29 ` Dave Chinner
2012-07-01 14:53 ` Paolo Bonzini
2012-07-02 13:00 ` Mike Snitzer
2012-07-02 13:15 ` Paolo Bonzini
2012-06-19 14:09 ` Lukáš Czerner
2012-06-19 14:19 ` Ted Ts'o
2012-06-19 14:23 ` Eric Sandeen
2012-06-19 14:37 ` Lukáš Czerner
2012-06-19 14:43 ` [dm-devel] " Alasdair G Kergon
2012-06-19 15:28 ` Mike Snitzer
2012-06-19 16:03 ` [dm-devel] " Alasdair G Kergon
2012-06-19 19:58 ` Ted Ts'o
2012-06-19 20:44 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120620225327.GL30705@dastard \
--to=david@fromorbit.com \
--cc=dm-devel@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=spelic@shiftmail.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).