public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Tinguely <tinguely@sgi.com>
To: stable@vger.kernel.org
Cc: xfs@oss.sgi.com
Subject: [3.0-stable PATCH 11/36] xfs: fix fstrim offset calculations
Date: Mon, 03 Dec 2012 17:42:19 -0600	[thread overview]
Message-ID: <20121203144309.828704878@sgi.com> (raw)
In-Reply-To: 20121203144208.143464631@sgi.com

[-- Attachment #1: 064 --]
[-- Type: text/plain, Size: 7632 bytes --]

From: Dave Chinner <dchinner@redhat.com>

Upstream commit: a66d636385d621e98a915233250356c394a437de

xfs_ioc_fstrim() doesn't treat the incoming offset and length
correctly. It treats them as a filesystem block address, rather than
a disk address. This is wrong because the range passed in is a
linear representation, while the filesystem block address notation
is a sparse representation. Hence we cannot convert the range direct
to filesystem block units and then use that for calculating the
range to trim.

While this sounds dangerous, the problem is limited to calculating
what AGs need to be trimmed. The code that calcuates the actual
ranges to trim gets the right result (i.e. only ever discards free
space), even though it uses the wrong ranges to limit what is
trimmed. Hence this is not a bug that endangers user data.

Fix this by treating the range as a disk address range and use the
appropriate functions to convert the range into the desired formats
for calculations.

Further, fix the first free extent lookup (the longest) to actually
find the largest free extent. Currently this lookup uses a <=
lookup, which results in finding the extent to the left of the
largest because we can never get an exact match on the largest
extent. This is due to the fact that while we know it's size, we
don't know it's location and so the exact match fails and we move
one record to the left to get the next largest extent. Instead, use
a >= search so that the lookup returns the largest extent regardless
of the fact we don't get an exact match on it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
---
 fs/xfs/linux-2.6/xfs_discard.c |   61 +++++++++++++++++++++++++----------------
 fs/xfs/xfs_alloc.c             |    2 -
 fs/xfs/xfs_alloc.h             |    7 ++++
 3 files changed, 46 insertions(+), 24 deletions(-)

Index: b/fs/xfs/linux-2.6/xfs_discard.c
===================================================================
--- a/fs/xfs/linux-2.6/xfs_discard.c
+++ b/fs/xfs/linux-2.6/xfs_discard.c
@@ -37,9 +37,9 @@ STATIC int
 xfs_trim_extents(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno,
-	xfs_fsblock_t		start,
-	xfs_fsblock_t		end,
-	xfs_fsblock_t		minlen,
+	xfs_daddr_t		start,
+	xfs_daddr_t		end,
+	xfs_daddr_t		minlen,
 	__uint64_t		*blocks_trimmed)
 {
 	struct block_device	*bdev = mp->m_ddev_targp->bt_bdev;
@@ -67,7 +67,7 @@ xfs_trim_extents(
 	/*
 	 * Look up the longest btree in the AGF and start with it.
 	 */
-	error = xfs_alloc_lookup_le(cur, 0,
+	error = xfs_alloc_lookup_ge(cur, 0,
 			    be32_to_cpu(XFS_BUF_TO_AGF(agbp)->agf_longest), &i);
 	if (error)
 		goto out_del_cursor;
@@ -77,8 +77,10 @@ xfs_trim_extents(
 	 * enough to be worth discarding.
 	 */
 	while (i) {
-		xfs_agblock_t fbno;
-		xfs_extlen_t flen;
+		xfs_agblock_t	fbno;
+		xfs_extlen_t	flen;
+		xfs_daddr_t	dbno;
+		xfs_extlen_t	dlen;
 
 		error = xfs_alloc_get_rec(cur, &fbno, &flen, &i);
 		if (error)
@@ -87,9 +89,17 @@ xfs_trim_extents(
 		ASSERT(flen <= be32_to_cpu(XFS_BUF_TO_AGF(agbp)->agf_longest));
 
 		/*
+		 * use daddr format for all range/len calculations as that is
+		 * the format the range/len variables are supplied in by
+		 * userspace.
+		 */
+		dbno = XFS_AGB_TO_DADDR(mp, agno, fbno);
+		dlen = XFS_FSB_TO_BB(mp, flen);
+
+		/*
 		 * Too small?  Give up.
 		 */
-		if (flen < minlen) {
+		if (dlen < minlen) {
 			trace_xfs_discard_toosmall(mp, agno, fbno, flen);
 			goto out_del_cursor;
 		}
@@ -99,8 +109,7 @@ xfs_trim_extents(
 		 * supposed to discard skip it.  Do not bother to trim
 		 * down partially overlapping ranges for now.
 		 */
-		if (XFS_AGB_TO_FSB(mp, agno, fbno) + flen < start ||
-		    XFS_AGB_TO_FSB(mp, agno, fbno) > end) {
+		if (dbno + dlen < start || dbno > end) {
 			trace_xfs_discard_exclude(mp, agno, fbno, flen);
 			goto next_extent;
 		}
@@ -115,10 +124,7 @@ xfs_trim_extents(
 		}
 
 		trace_xfs_discard_extent(mp, agno, fbno, flen);
-		error = -blkdev_issue_discard(bdev,
-				XFS_AGB_TO_DADDR(mp, agno, fbno),
-				XFS_FSB_TO_BB(mp, flen),
-				GFP_NOFS, 0);
+		error = -blkdev_issue_discard(bdev, dbno, dlen, GFP_NOFS, 0);
 		if (error)
 			goto out_del_cursor;
 		*blocks_trimmed += flen;
@@ -137,6 +143,15 @@ out_put_perag:
 	return error;
 }
 
+/*
+ * trim a range of the filesystem.
+ *
+ * Note: the parameters passed from userspace are byte ranges into the
+ * filesystem which does not match to the format we use for filesystem block
+ * addressing. FSB addressing is sparse (AGNO|AGBNO), while the incoming format
+ * is a linear address range. Hence we need to use DADDR based conversions and
+ * comparisons for determining the correct offset and regions to trim.
+ */
 int
 xfs_ioc_trim(
 	struct xfs_mount		*mp,
@@ -145,7 +160,7 @@ xfs_ioc_trim(
 	struct request_queue	*q = mp->m_ddev_targp->bt_bdev->bd_disk->queue;
 	unsigned int		granularity = q->limits.discard_granularity;
 	struct fstrim_range	range;
-	xfs_fsblock_t		start, end, minlen;
+	xfs_daddr_t		start, end, minlen;
 	xfs_agnumber_t		start_agno, end_agno, agno;
 	__uint64_t		blocks_trimmed = 0;
 	int			error, last_error = 0;
@@ -159,22 +174,22 @@ xfs_ioc_trim(
 
 	/*
 	 * Truncating down the len isn't actually quite correct, but using
-	 * XFS_B_TO_FSB would mean we trivially get overflows for values
+	 * BBTOB would mean we trivially get overflows for values
 	 * of ULLONG_MAX or slightly lower.  And ULLONG_MAX is the default
 	 * used by the fstrim application.  In the end it really doesn't
 	 * matter as trimming blocks is an advisory interface.
 	 */
-	start = XFS_B_TO_FSBT(mp, range.start);
-	end = start + XFS_B_TO_FSBT(mp, range.len) - 1;
-	minlen = XFS_B_TO_FSB(mp, max_t(u64, granularity, range.minlen));
+	start = BTOBB(range.start);
+	end = start + BTOBBT(range.len) - 1;
+	minlen = BTOBB(max_t(u64, granularity, range.minlen));
 
-	if (start >= mp->m_sb.sb_dblocks)
+	if (XFS_BB_TO_FSB(mp, start) >= mp->m_sb.sb_dblocks)
 		return -XFS_ERROR(EINVAL);
-	if (end > mp->m_sb.sb_dblocks - 1)
-		end = mp->m_sb.sb_dblocks - 1;
+	if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1)
+		end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)- 1;
 
-	start_agno = XFS_FSB_TO_AGNO(mp, start);
-	end_agno = XFS_FSB_TO_AGNO(mp, end);
+	start_agno = xfs_daddr_to_agno(mp, start);
+	end_agno = xfs_daddr_to_agno(mp, end);
 
 	for (agno = start_agno; agno <= end_agno; agno++) {
 		error = -xfs_trim_extents(mp, agno, start, end, minlen,
Index: b/fs/xfs/xfs_alloc.c
===================================================================
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -68,7 +68,7 @@ xfs_alloc_lookup_eq(
  * Lookup the first record greater than or equal to [bno, len]
  * in the btree given by cur.
  */
-STATIC int				/* error */
+int				/* error */
 xfs_alloc_lookup_ge(
 	struct xfs_btree_cur	*cur,	/* btree cursor */
 	xfs_agblock_t		bno,	/* starting block of extent */
Index: b/fs/xfs/xfs_alloc.h
===================================================================
--- a/fs/xfs/xfs_alloc.h
+++ b/fs/xfs/xfs_alloc.h
@@ -243,6 +243,13 @@ xfs_alloc_lookup_le(
 	xfs_extlen_t		len,	/* length of extent */
 	int			*stat);	/* success/failure */
 
+int				/* error */
+xfs_alloc_lookup_ge(
+	struct xfs_btree_cur	*cur,	/* btree cursor */
+	xfs_agblock_t		bno,	/* starting block of extent */
+	xfs_extlen_t		len,	/* length of extent */
+	int			*stat);	/* success/failure */
+
 int					/* error */
 xfs_alloc_get_rec(
 	struct xfs_btree_cur	*cur,	/* btree cursor */


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2012-12-03 23:40 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-03 23:42 [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 01/36] xfs: fix possible overflow in xfs_ioc_trim() Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 02/36] xfs: fix allocation length overflow in xfs_bmapi_write() Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 03/36] xfs: mark the xfssyncd workqueue as non-reentrant Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 04/36] xfs: xfs_trans_add_item() - dont assign in ASSERT() when compare is intended Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 05/36] xfs: only take the ILOCK in xfs_reclaim_inode() Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 06/36] xfs: fallback to vmalloc for large buffers in xfs_attrmulti_attr_get Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 07/36] xfs: fallback to vmalloc for large buffers in xfs_getbmap Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 08/36] xfs: fix deadlock in xfs_rtfree_extent Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 09/36] xfs: Fix open flag handling in open_by_handle code Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 10/36] xfs: Account log unmount transaction correctly Mark Tinguely
2012-12-03 23:42 ` Mark Tinguely [this message]
2012-12-03 23:42 ` [3.0-stable PATCH 12/36] xfs: dont fill statvfs with project quota for a directory Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 13/36] xfs: Ensure inode reclaim can run during quotacheck Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 14/36] xfs: use shared ilock mode for direct IO writes by default Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 15/36] xfs: punch all delalloc blocks beyond EOF on write failure Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 16/36] xfs: page type check in writeback only checks last buffer Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 17/36] xfs: punch new delalloc blocks out of failed writes inside EOF Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 18/36] xfs: dont assert on delalloc regions beyond EOF Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 19/36] xfs: limit specualtive delalloc to maxioffset Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 20/36] xfs: Use preallocation for inodes with extsz hints Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 21/36] xfs: Dont allocate new buffers on every call to _xfs_buf_find Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 22/36] xfs: clean up buffer allocation Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 23/36] xfs: fix buffer lookup race on allocation failure Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 24/36] xfs: use iolock on XFS_IOC_ALLOCSP calls Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 25/36] xfs: Properly exclude IO type flags from buffer flags Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 26/36] xfs: flush outstanding buffers on log mount failure Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 27/36] xfs: protect xfs_sync_worker with s_umount semaphore Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 28/36] xfs: fix memory reclaim deadlock on agi buffer Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 29/36] xfs: xfs_vm_writepage clear iomap_valid when Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 30/36] xfs: fix allocbt cursor leak in xfs_alloc_ag_vextent_near Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 31/36] xfs: shutdown xfs_sync_worker before the log Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 32/36] xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 33/36] xfs: check for stale inode before acquiring iflock on push Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 34/36] xfs: stop the sync worker before xfs_unmountfs Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 35/36] xfs: zero allocation_args on the kernel stack Mark Tinguely
2012-12-03 23:42 ` [3.0-stable PATCH 36/36] xfs: only update the last_sync_lsn when a transaction completes Mark Tinguely
2012-12-04 21:44 ` [3.0-stable PATCH 00/36] Proposed 3.0-stable bug patches Ben Myers
2012-12-05 21:45 ` Dave Chinner
2012-12-06 17:27   ` Mark Tinguely
2012-12-07 10:06     ` Dave Chinner
2012-12-07 21:15       ` Ben Myers
2012-12-08 12:06         ` Christoph Hellwig
2012-12-08 19:12         ` Greg KH
2012-12-10  0:24         ` Dave Chinner
2012-12-10 22:03           ` Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121203144309.828704878@sgi.com \
    --to=tinguely@sgi.com \
    --cc=stable@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox