From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 7/8] xfs: fix fstrim offset calculations
Date: Wed, 28 Mar 2012 08:42:15 +1100 [thread overview]
Message-ID: <20120327214215.GE5091@dastard> (raw)
In-Reply-To: <20120327204825.GB7762@sgi.com>
On Tue, Mar 27, 2012 at 03:48:25PM -0500, Ben Myers wrote:
> On Thu, Mar 22, 2012 at 04:15:12PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > xfs_ioc_fstrim() doesn't treat the incoming offset and length
> > correctly. It treats them as a filesystem block address, rather than
> > a disk address. This is wrong because the range passed in is a
> > linear representation , while the filesystem block address notiation
> > is a sparse representation. Hence we cannot convert the range direct
> > to filesystem block units and then use that for calculating the
> > range to trim.
> >
> > While this sounds dangerous, the problem is limited to calculting
> > what AGs need to be trimmed. The code that calcuates the actual
> > ranges to trim gets the right result (i.e. only ever discards free
> > space), even though it uses the wrong ranges to limit what is
> > trimmed. Hence this is not a bug that endangers user data.
>
> Yep, I can see that the calculation of what we pass to blkdev_issue_discard()
> is correct and always a free extent. I am having a hard time seeing the
> problem related to calculating which AGs to trim. Can you give an example?
I don't have the debug traces anymore, but the problem is this
sort of thing. Take a 80MB filesystem with 4 AGs, each AG is 20MB,
which is ~5000 filesystem blocks. That means we need 13 bits to
store the block count per AG. i.e. agblklogi = 13. Now, the FSB
addressing format is sparse, and the calculation is this:
FSBNO = (AGNO << agblklog) | AGBNO
Note the terminology? FSBNO != FSB. FSB is just a range converted to
filesystem block units. FSBNO is the filesystem block number, an
address.
offset offset + length
+-------------------------------------------+
range: 0 80MB
daddr: 0 160k
FSB: 0 20k
AG: +----------+----------+----------+----------+
0 1 2 3
AGBNO: 0 5k
0 5k
0 5k
0 5k
FSBNO: 0 5k
8k 13k
16k 21k
24k 29k
IOWs, the FSBNO range looks like this:
+----------+ +----------+ +----------+ +----------+
0 5k 8k 13k 16k 21k 24k 29k
And there are regions that are simple invalid (the empty, sparse
bits). This is done to make all the mathematics easy within each AG
as you can convert from the FSBNO straight to the AGBNO (and vice
versa) without needing to know the address of the first block of the
AG. It means it is easy for AGs to manage their own space without
needing to care about where they exist in the larger disk address
space - that is complete abstracted away from the internal freespace
and inode management as they all use AGBNO notation ot reference
blocks within the AG.
As a result, the FSBNO range of the filesystem is quite a bit larger
than the FSB range of the filesystem. So, if we trim a byte range of
0 to 80MB, but treat that as a FSBNO and then convert it to an AGNO,
80MB = 20k FSBs = AG 2.
Hence rather than trimming the entire range of AGs (0-3), we trim
0-2. Hence we need to convert the byte range to a daddr range, and
from there extract the AGNO according to FSBNO encoding.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-03-27 21:42 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-22 5:15 [PATH 0/8] xfs: outstanding patches for 3.4 merge window Dave Chinner
2012-03-22 5:15 ` [PATCH 1/8] xfs: Fix open flag handling in open_by_handle code Dave Chinner
2012-03-22 5:15 ` [PATCH 2/8] xfs: introduce an allocation workqueue Dave Chinner
2012-03-22 5:15 ` [PATCH 3/8] xfs: initialise xfssync work before running quotachecks Dave Chinner
2012-03-22 15:15 ` Ben Myers
2012-03-22 21:07 ` Dave Chinner
2012-03-23 13:34 ` Mark Tinguely
2012-03-25 23:22 ` Dave Chinner
2012-03-26 15:10 ` Mark Tinguely
2012-03-26 21:57 ` Dave Chinner
2012-03-28 19:40 ` Ben Myers
2012-03-29 0:29 ` Dave Chinner
2012-03-29 6:30 ` [PATCH v2] xfs: Ensure inode reclaim can run during quotacheck Dave Chinner
2012-03-28 17:38 ` [PATCH 3/8] xfs: initialise xfssync work before running quotachecks Ben Myers
2012-03-28 18:02 ` Christoph Hellwig
2012-03-28 18:43 ` Ben Myers
2012-03-22 5:15 ` [PATCH 4/8] xfs: remove MS_ACTIVE guard from inode reclaim work Dave Chinner
2012-03-22 5:15 ` [PATCH 5/8] xfs: don't cache inodes read through bulkstat Dave Chinner
2012-03-22 5:15 ` [PATCH 6/8] xfs: Account log unmount transaction correctly Dave Chinner
2012-03-24 8:27 ` Christoph Hellwig
2012-03-24 22:49 ` Dave Chinner
2012-03-26 22:28 ` Ben Myers
2012-03-22 5:15 ` [PATCH 7/8] xfs: fix fstrim offset calculations Dave Chinner
2012-03-24 13:36 ` Christoph Hellwig
2012-03-27 20:48 ` Ben Myers
2012-03-27 21:42 ` Dave Chinner [this message]
2012-03-22 5:15 ` [PATCH 8/8] xfs: add lots of attribute trace points Dave Chinner
2012-03-24 13:42 ` Christoph Hellwig
2012-03-27 21:18 ` Ben Myers
2012-03-27 21:45 ` Dave Chinner
2012-03-27 22:01 ` Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120327214215.GE5091@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox