From: Dave Chinner <david@fromorbit.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: axboe@kernel.dk, xfs@oss.sgi.com, hch@infradead.org,
device-mapper development <dm-devel@redhat.com>,
Spelic <spelic@shiftmail.org>,
Paolo Bonzini <pbonzini@redhat.com>,
linux-ext4@vger.kernel.org
Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard
Date: Fri, 22 Jun 2012 09:29:57 +1000 [thread overview]
Message-ID: <20120621232957.GE10673@dastard> (raw)
In-Reply-To: <20120621174742.GA27837@redhat.com>
On Thu, Jun 21, 2012 at 01:47:43PM -0400, Mike Snitzer wrote:
> On Wed, Jun 20 2012 at 6:53pm -0400,
> Dave Chinner <david@fromorbit.com> wrote:
>
> > On Wed, Jun 20, 2012 at 02:11:31PM +0200, Spelic wrote:
> > > Ok guys, I think I found the bug. One or more bugs.
> > >
> > >
> > > Pool has chunksize 1MB.
> > > In sysfs the thin volume has: queue/discard_max_bytes and
> > > queue/discard_granularity are 1048576 .
> > > And it has discard_alignment = 0, which based on sysfs-block
> > > documentation is correct (a less misleading name would have been
> > > discard_offset imho).
> > > Here is the blktrace from ext4 fstrim:
> > > ...
> > > 252,9 17 498 0.030466556 841 Q D 19898368 + 2048 [fstrim]
> > > 252,9 17 499 0.030467501 841 Q D 19900416 + 2048 [fstrim]
> > > 252,9 17 500 0.030468359 841 Q D 19902464 + 2048 [fstrim]
....
> > > Here is the blktrace from xfs fstrim:
> > > 252,12 16 1 0.000000000 554 Q D 96 + 2048 [fstrim]
> > > 252,12 16 2 0.000010149 554 Q D 2144 + 2048 [fstrim]
> > > 252,12 16 3 0.000011349 554 Q D 4192 + 2048 [fstrim]
.....
> > It looks like blkdev_issue_discard() has reduced each discard to
> > bios of a single "granule" (1MB), and not aligned them, hence they
> > are ignore by dm-thinp.
> >
> > what are the discard parameters exposed by dm-thinp in
> > /sys/block/<thinp-blkdev>/queue/discard*
> >
> > It looks to me that dmthinp might be setting discard_max_bytes to
> > 1MB rather than discard_granularity. Looking at dm-thin.c:
> >
> > static void set_discard_limits(struct pool *pool, struct queue_limits *limits)
> > {
> > /*
> > * FIXME: these limits may be incompatible with the pool's data device
> > */
> > limits->max_discard_sectors = pool->sectors_per_block;
> >
> > /*
> > * This is just a hint, and not enforced. We have to cope with
> > * bios that overlap 2 blocks.
> > */
> > limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
> > limits->discard_zeroes_data = pool->pf.zero_new_blocks;
> > }
> >
> >
> > Yes - discard_max_bytes == discard_granularity, and so
> > blkdev_issue_discard fails to align the request properly. As it is,
> > setting discard_max_bytes to the thinp block size is silly - it
> > means you'll never get range requests, and we sent a discard for
> > every single block in a range rather than having the thinp code
> > iterate over a range itself.
>
> So 2 different issues:
> 1) blkdev_issue_discard isn't properly aligning
> 2) thinp should accept larger discards (up to the stacked
> discard_max_bytes rather than setting an override)
Yes, in effect, but there's no real reason I can see why thinp can't
accept large discard requests than the underlying stack and break
them up appropriately itself....
> > i.e. this is not a filesystem bug that is causing the problem....
>
> Paolo Bonzini fixed blkdev_issue_discard to properly align some time
> ago; unfortunately the patches slipped through the cracks (cc'ing Paolo,
> Jens, and Christoph).
>
> Here are references to Paolo's patches:
> 0/2 https://lkml.org/lkml/2012/3/14/323
> 1/2 https://lkml.org/lkml/2012/3/14/324
> 2/2 https://lkml.org/lkml/2012/3/14/325
>
> Patch 2/2 specifically addresses the case where:
> discard_max_bytes == discard_granularity
>
> Paolo, any chance you could resend to Jens (maybe with hch's comments on
> patch#2 accounted for)? Also, please add hch's Reviewed-by when
> reposting.
>
> (would love to see this fixed for 3.5-rcX but if not 3.6 it is?)
That would be good...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-06-21 23:30 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-18 21:33 Ext4 and xfs problems in dm-thin on allocation and discard Spelic
2012-06-19 1:57 ` Dave Chinner
2012-06-19 3:12 ` Mike Snitzer
2012-06-19 6:32 ` Lukáš Czerner
2012-06-19 11:29 ` Spelic
2012-06-19 12:20 ` Lukáš Czerner
2012-06-19 13:34 ` Mike Snitzer
2012-06-19 13:16 ` Mike Snitzer
2012-06-19 13:25 ` Lukáš Czerner
2012-06-19 13:30 ` Mike Snitzer
2012-06-19 13:52 ` Spelic
2012-06-19 14:05 ` Eric Sandeen
2012-06-19 14:44 ` Mike Snitzer
2012-06-19 18:48 ` Mike Snitzer
2012-06-19 20:06 ` Dave Chinner
2012-06-19 20:21 ` Ted Ts'o
2012-06-19 20:39 ` Dave Chinner
2012-06-20 9:01 ` Christoph Hellwig
2012-06-19 21:37 ` Spelic
2012-06-19 23:12 ` Dave Chinner
2012-06-20 12:11 ` Spelic
2012-06-20 22:53 ` Dave Chinner
2012-06-21 17:47 ` Mike Snitzer
2012-06-21 23:29 ` Dave Chinner [this message]
2012-07-01 14:53 ` Paolo Bonzini
2012-07-02 13:00 ` Mike Snitzer
2012-07-02 13:15 ` Paolo Bonzini
2012-06-19 14:09 ` Lukáš Czerner
2012-06-19 14:19 ` Ted Ts'o
2012-06-19 14:23 ` Eric Sandeen
2012-06-19 14:37 ` Lukáš Czerner
2012-06-19 14:43 ` [dm-devel] " Alasdair G Kergon
2012-06-19 15:28 ` Mike Snitzer
2012-06-19 16:03 ` [dm-devel] " Alasdair G Kergon
2012-06-19 19:58 ` Ted Ts'o
2012-06-19 20:44 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120621232957.GE10673@dastard \
--to=david@fromorbit.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=hch@infradead.org \
--cc=linux-ext4@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=snitzer@redhat.com \
--cc=spelic@shiftmail.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox