public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 7/8] xfs: fix fstrim offset calculations
Date: Wed, 28 Mar 2012 08:42:15 +1100	[thread overview]
Message-ID: <20120327214215.GE5091@dastard> (raw)
In-Reply-To: <20120327204825.GB7762@sgi.com>

On Tue, Mar 27, 2012 at 03:48:25PM -0500, Ben Myers wrote:
> On Thu, Mar 22, 2012 at 04:15:12PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > xfs_ioc_fstrim() doesn't treat the incoming offset and length
> > correctly. It treats them as a filesystem block address, rather than
> > a disk address. This is wrong because the range passed in is a
> > linear representation , while the filesystem block address notiation
> > is a sparse representation. Hence we cannot convert the range direct
> > to filesystem block units and then use that for calculating the
> > range to trim.
> > 
> > While this sounds dangerous, the problem is limited to calculting
> > what AGs need to be trimmed. The code that calcuates the actual
> > ranges to trim gets the right result (i.e. only ever discards free
> > space), even though it uses the wrong ranges to limit what is
> > trimmed. Hence this is not a bug that endangers user data.
> 
> Yep, I can see that the calculation of what we pass to blkdev_issue_discard()
> is correct and always a free extent.  I am having a hard time seeing the
> problem related to calculating which AGs to trim.  Can you give an example?

I don't have the debug traces anymore, but the problem is this
sort of thing. Take a 80MB filesystem with 4 AGs, each AG is 20MB,
which is ~5000 filesystem blocks. That means we need 13 bits to
store the block count per AG. i.e. agblklogi = 13. Now, the FSB
addressing format is sparse, and the calculation is this:

  FSBNO = (AGNO << agblklog) | AGBNO

Note the terminology? FSBNO != FSB. FSB is just a range converted to
filesystem block units. FSBNO is the filesystem block number, an
address.

         offset                                  offset + length
            +-------------------------------------------+
  range:    0                                           80MB
  daddr:    0                                          160k
  FSB:      0                                           20k

  AG:       +----------+----------+----------+----------+
                 0          1          2          3
  AGBNO:    0         5k
                       0         5k
		                  0         5k
				             0         5k
  FSBNO:   0          5k
                       8k       13k
                                  16k      21k
				             24k       29k

IOWs, the FSBNO range looks like this:

            +----------+   +----------+   +----------+   +----------+
            0         5k   8k       13k   16k      21k   24k      29k

And there are regions that are simple invalid (the empty, sparse
bits). This is done to make all the mathematics easy within each AG
as you can convert from the FSBNO straight to the AGBNO (and vice
versa) without needing to know the address of the first block of the
AG. It means it is easy for AGs to manage their own space without
needing to care about where they exist in the larger disk address
space - that is complete abstracted away from the internal freespace
and inode management as they all use AGBNO notation ot reference
blocks within the AG.

As a result, the FSBNO range of the filesystem is quite a bit larger
than the FSB range of the filesystem. So, if we trim a byte range of
0 to 80MB, but treat that as a FSBNO and then convert it to an AGNO,
80MB = 20k FSBs = AG 2.

Hence rather than trimming the entire range of AGs (0-3), we trim
0-2. Hence we need to convert the byte range to a daddr range, and
from there extract the AGNO according to FSBNO encoding.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-03-27 21:42 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-22  5:15 [PATH 0/8] xfs: outstanding patches for 3.4 merge window Dave Chinner
2012-03-22  5:15 ` [PATCH 1/8] xfs: Fix open flag handling in open_by_handle code Dave Chinner
2012-03-22  5:15 ` [PATCH 2/8] xfs: introduce an allocation workqueue Dave Chinner
2012-03-22  5:15 ` [PATCH 3/8] xfs: initialise xfssync work before running quotachecks Dave Chinner
2012-03-22 15:15   ` Ben Myers
2012-03-22 21:07     ` Dave Chinner
2012-03-23 13:34       ` Mark Tinguely
2012-03-25 23:22         ` Dave Chinner
2012-03-26 15:10           ` Mark Tinguely
2012-03-26 21:57             ` Dave Chinner
2012-03-28 19:40               ` Ben Myers
2012-03-29  0:29                 ` Dave Chinner
2012-03-29  6:30                   ` [PATCH v2] xfs: Ensure inode reclaim can run during quotacheck Dave Chinner
2012-03-28 17:38   ` [PATCH 3/8] xfs: initialise xfssync work before running quotachecks Ben Myers
2012-03-28 18:02     ` Christoph Hellwig
2012-03-28 18:43       ` Ben Myers
2012-03-22  5:15 ` [PATCH 4/8] xfs: remove MS_ACTIVE guard from inode reclaim work Dave Chinner
2012-03-22  5:15 ` [PATCH 5/8] xfs: don't cache inodes read through bulkstat Dave Chinner
2012-03-22  5:15 ` [PATCH 6/8] xfs: Account log unmount transaction correctly Dave Chinner
2012-03-24  8:27   ` Christoph Hellwig
2012-03-24 22:49     ` Dave Chinner
2012-03-26 22:28   ` Ben Myers
2012-03-22  5:15 ` [PATCH 7/8] xfs: fix fstrim offset calculations Dave Chinner
2012-03-24 13:36   ` Christoph Hellwig
2012-03-27 20:48   ` Ben Myers
2012-03-27 21:42     ` Dave Chinner [this message]
2012-03-22  5:15 ` [PATCH 8/8] xfs: add lots of attribute trace points Dave Chinner
2012-03-24 13:42   ` Christoph Hellwig
2012-03-27 21:18   ` Ben Myers
2012-03-27 21:45     ` Dave Chinner
2012-03-27 22:01       ` Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120327214215.GE5091@dastard \
    --to=david@fromorbit.com \
    --cc=bpm@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox