* [PATCH v3 00/11] xfsprogs fuzzing fixes
@ 2015-08-26 0:32 Darrick J. Wong
2015-08-26 0:32 ` [PATCH 01/11] xfs_repair: set args.geo in dir2_kill_block Darrick J. Wong
` (10 more replies)
0 siblings, 11 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:32 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
Hi all,
This is a rollup of various fuzzing fixes for xfsprogs 4.2.0-rc3.
The first patch fixes a crash in xfs_repair where args.geo wasn't
getting initialized when killing a directory block. Previously
sent by sandeen, but seemed to have fallen off everyone's radars?
Patch 2, a port of a kernel patch, ensures that the dir/attr verifier
marks the buffer corrupt if the magic number isn't recognized. This
helps us to fail faster in the event of magic number corruption.
Patch 3 fixes libxfs' WANT_CORRUPTED macros to return negative error
codes like the rest of libxfs does.
Patch 4 amends libxfs_getbuf*() to clear the buffer state if the
buffer isn't dirty. This prevents repair from throwing CRC errors if
a block is prefetched, freed without being examined (the exam clears
the UNCHECKED flag), reallocated to some other data structure, and
then read for a subsequent operation.
Patch 5 fixes a bug in xfs_repair wherein if xfs_repair fixes a broken
xattr block and later decides to unmap the block, the "repaired" flag
inadvertently prohibits the unmapping of that block.
Patch 6 forces repair's xattr block checker to take a look at the
header for incorrect owner data. If the header info looks bad, we'll
discard the xattr block.
Patch 7 forces prefetch to mark corrupt bmbt blocks UNCHECKED so that
the regular bmbt examination will fix the bad CRC if it doesn't take
any other action against the block. Without this, a corruption in the
unused area will trigger a kernel error yet never get fixed by repair.
Patch 8 implements a 'reflink' and 'dedupe' command in xfs_io. This
will be used in future xfstests to test reflink and dedupe features of
btrfs and xfs filesystems.
Patch 9 fixes xfs_db/blocktrash to not fail write verification when
corrupting a block and allows trashing of log and symlink blocks.
Patch 10 enhances the blocktrash command with a '-z' option that
trashes the block at the top of the cursor stack and doesn't require
blockget to have been run.
Patch 11 implements blockget for v5 filesystems. This is a second try
at a previous patch which didn't quite catch all the new magic numbers
and had some problems iterating directory index data.
I've tested these xfsprogs changes against the for-next branch as of
8/24. The rmap/reflink patches will be dealt with separately.
Scary rewound github repo with everything attached:
https://github.com/djwong/xfsprogs
Fuzz tests and more are at:
https://github.com/djwong/xfstests
Comments and questions are, as always, welcome.
--D
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 01/11] xfs_repair: set args.geo in dir2_kill_block
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
@ 2015-08-26 0:32 ` Darrick J. Wong
2015-08-26 0:32 ` [PATCH 02/11] libxfs: verifier should set buffer error when da block has a bad magic number Darrick J. Wong
` (9 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:32 UTC (permalink / raw)
To: david, darrick.wong; +Cc: Eric Sandeen, xfs
Frøm: Eric Sandeen <sandeen@sandeen.net>
This path in xfs_repair:
dir2_kill_block
libxfs_da_shrink_inode
xfs_dir2_shrink_inode
xfs_dir2_db_to_da
segfaults, because dir2_kill_block() does not initialize
args.geo, and a null geometry winds up in xfs_dir2_db_to_da(),
which dereferences it.
Fix that.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
repair/phase6.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/repair/phase6.c b/repair/phase6.c
index 04638c2..7e275cd 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1444,6 +1444,7 @@ dir2_kill_block(
args.firstblock = &firstblock;
args.flist = &flist;
args.whichfork = XFS_DATA_FORK;
+ args.geo = mp->m_dir_geo;
if (da_bno >= mp->m_dir_geo->leafblk && da_bno < mp->m_dir_geo->freeblk)
error = -libxfs_da_shrink_inode(&args, da_bno, bp);
else
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 02/11] libxfs: verifier should set buffer error when da block has a bad magic number
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
2015-08-26 0:32 ` [PATCH 01/11] xfs_repair: set args.geo in dir2_kill_block Darrick J. Wong
@ 2015-08-26 0:32 ` Darrick J. Wong
2015-08-26 0:32 ` [PATCH 03/11] libxfs: fix XFS_WANT_CORRUPTED_* macros to return negative error codes Darrick J. Wong
` (8 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:32 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
If xfs_da3_node_read_verify() doesn't recognize the magic number of a
buffer it's just read, set the buffer error to -EFSCORRUPTED so that
the error can be sent up to userspace. Without this patch we'll
notice the bad magic eventually while trying to traverse or change
the block, but we really ought to fail early in the verifier.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
libxfs/xfs_da_btree.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/libxfs/xfs_da_btree.c b/libxfs/xfs_da_btree.c
index c874164..289dc1e 100644
--- a/libxfs/xfs_da_btree.c
+++ b/libxfs/xfs_da_btree.c
@@ -229,6 +229,7 @@ xfs_da3_node_read_verify(
bp->b_ops->verify_read(bp);
return;
default:
+ xfs_buf_ioerror(bp, -EFSCORRUPTED);
break;
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 03/11] libxfs: fix XFS_WANT_CORRUPTED_* macros to return negative error codes
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
2015-08-26 0:32 ` [PATCH 01/11] xfs_repair: set args.geo in dir2_kill_block Darrick J. Wong
2015-08-26 0:32 ` [PATCH 02/11] libxfs: verifier should set buffer error when da block has a bad magic number Darrick J. Wong
@ 2015-08-26 0:32 ` Darrick J. Wong
2015-08-26 0:32 ` [PATCH 04/11] libxfs: clear buffer state flags in libxfs_getbuf and variants Darrick J. Wong
` (7 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:32 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
Since the rest of libxfs returns negative error codes, these two sanity
checking macros ought to have the same applied. While we're at it,
fix a couple more sign errors in the same file.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
libxfs/libxfs_priv.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 2a8b850..22f2d53 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -148,9 +148,9 @@ enum ce { CE_DEBUG, CE_CONT, CE_NOTE, CE_WARN, CE_ALERT, CE_PANIC };
#define XFS_TRANS_UNRESERVE_QUOTA_NBLKS(mp,tp,ip,nblks,ninos,fl) 0
#define XFS_TEST_ERROR(expr,a,b,c) ( expr )
#define XFS_WANT_CORRUPTED_GOTO(mp, expr, l) \
- { (mp) = (mp); if (!(expr)) { error = EFSCORRUPTED; goto l; } }
+ { (mp) = (mp); if (!(expr)) { error = -EFSCORRUPTED; goto l; } }
#define XFS_WANT_CORRUPTED_RETURN(mp, expr) \
- { (mp) = (mp); if (!(expr)) { return EFSCORRUPTED; } }
+ { (mp) = (mp); if (!(expr)) { return -EFSCORRUPTED; } }
#ifdef __GNUC__
#define __return_address __builtin_return_address(0)
@@ -417,8 +417,7 @@ do { \
})
#define xfs_rotorstep 1
-#define xfs_bmap_rtalloc(a) (ENOSYS)
-#define xfs_rtpick_extent(mp,tp,len,p) (ENOSYS)
+#define xfs_bmap_rtalloc(a) (-ENOSYS)
#define xfs_get_extsz_hint(ip) (0)
#define xfs_inode_is_filestream(ip) (0)
#define xfs_filestream_lookup_ag(ip) (0)
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 04/11] libxfs: clear buffer state flags in libxfs_getbuf and variants
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
` (2 preceding siblings ...)
2015-08-26 0:32 ` [PATCH 03/11] libxfs: fix XFS_WANT_CORRUPTED_* macros to return negative error codes Darrick J. Wong
@ 2015-08-26 0:32 ` Darrick J. Wong
2015-08-26 1:02 ` Dave Chinner
2015-08-26 0:32 ` [PATCH 05/11] xfs_repair: ignore "repaired" flag after we decide to clear xattr block Darrick J. Wong
` (6 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:32 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
When we're running xfs_repair with prefetch enabled, it's possible
that repair will decide to clear an inode without examining all
metadata blocks owned by that inode. This leaves the unreferenced
prefetched buffers marked UNCHECKED, which will cause a subsequent CRC
error if the block is reallocated to a different structure and read
more than once. Typically this happens when a large directory is
corrupted and lost+found has to grow to accomodate all the
disconnected inodes.
In libxfs_getbuf*(), we're supposed to return an unused buffer which
has a clean state. Unfortunately, things like UNCHECKED can hang
around to cause incorrect verifier errors later, so change those
functions to launder the state bits clean.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
libxfs/rdwr.c | 47 +++++++++++++++++++++++++++++++++++++++++------
1 file changed, 41 insertions(+), 6 deletions(-)
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 4f8212f..d28cea8 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -631,15 +631,39 @@ libxfs_getbuf_flags(struct xfs_buftarg *btp, xfs_daddr_t blkno, int len,
return __cache_lookup(&key, flags);
}
+/*
+ * Clean the buffer flags for libxfs_getbuf*(), which wants to return
+ * an unused buffer with clean state. This prevents CRC errors on a
+ * re-read of a corrupt block that was prefetched and freed. This
+ * can happen with a massively corrupt directory that is discarded,
+ * but whose blocks are then recycled into expanding lost+found.
+ *
+ * Note however that if the buffer's dirty (prefetch calls getbuf)
+ * we'll leave the state alone because we don't want to discard blocks
+ * that have been fixed.
+ */
+static void
+try_clean_buf(
+ struct xfs_buf *bp)
+{
+ if (bp && !(bp->b_flags & LIBXFS_B_DIRTY))
+ bp->b_flags &= ~(LIBXFS_B_UNCHECKED | LIBXFS_B_STALE |
+ LIBXFS_B_UPTODATE);
+}
+
struct xfs_buf *
libxfs_getbuf(struct xfs_buftarg *btp, xfs_daddr_t blkno, int len)
{
- return libxfs_getbuf_flags(btp, blkno, len, 0);
+ struct xfs_buf *bp;
+
+ bp = libxfs_getbuf_flags(btp, blkno, len, 0);
+ try_clean_buf(bp);
+ return bp;
}
-struct xfs_buf *
-libxfs_getbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map,
- int nmaps, int flags)
+static struct xfs_buf *
+__libxfs_getbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map,
+ int nmaps, int flags)
{
struct xfs_bufkey key = {0};
int i;
@@ -659,6 +683,17 @@ libxfs_getbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map,
return __cache_lookup(&key, flags);
}
+struct xfs_buf *
+libxfs_getbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map,
+ int nmaps, int flags)
+{
+ struct xfs_buf *bp;
+
+ bp = __libxfs_getbuf_map(btp, map, nmaps, flags);
+ try_clean_buf(bp);
+ return bp;
+}
+
void
libxfs_putbuf(xfs_buf_t *bp)
{
@@ -779,7 +814,7 @@ libxfs_readbuf(struct xfs_buftarg *btp, xfs_daddr_t blkno, int len, int flags,
xfs_buf_t *bp;
int error;
- bp = libxfs_getbuf(btp, blkno, len);
+ bp = libxfs_getbuf_flags(btp, blkno, len, 0);
if (!bp)
return NULL;
@@ -860,7 +895,7 @@ libxfs_readbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map, int nmaps,
return libxfs_readbuf(btp, map[0].bm_bn, map[0].bm_len,
flags, ops);
- bp = libxfs_getbuf_map(btp, map, nmaps, 0);
+ bp = __libxfs_getbuf_map(btp, map, nmaps, 0);
if (!bp)
return NULL;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 05/11] xfs_repair: ignore "repaired" flag after we decide to clear xattr block
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
` (3 preceding siblings ...)
2015-08-26 0:32 ` [PATCH 04/11] libxfs: clear buffer state flags in libxfs_getbuf and variants Darrick J. Wong
@ 2015-08-26 0:32 ` Darrick J. Wong
2015-08-26 0:32 ` [PATCH 06/11] xfs_repair: check v5 filesystem attr block header sanity Darrick J. Wong
` (5 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:32 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
If in the course of examining extended attribute block contents we
first decide to repair an entry (*repair = 1) but secondly decide to
clear the whole block, set *repair = 0 because the clearing action
only happens if *repair == 0. Put another way, if we're nuking a
block, don't pretend like we've fixed it too.
v2: fix all the paths to clear the attr block if the processing
functions error out.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
repair/attr_repair.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index 83a07a8..c2b7c3a 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -1311,6 +1311,13 @@ process_leaf_attr_block(
* we can add it then.
*/
}
+ /*
+ * If we're just going to zap the block, don't pretend like we
+ * repaired it, because repairing the block stops the clear
+ * operation.
+ */
+ if (clearit)
+ *repair = 0;
if (*repair)
xfs_attr3_leaf_hdr_to_disk(mp->m_attr_geo, leaf, &leafhdr);
@@ -1524,6 +1531,7 @@ process_longform_attr(
xfs_dahash_t next_hashval;
int repairlinks = 0;
struct xfs_attr3_icleaf_hdr leafhdr;
+ int error;
*repair = 0;
@@ -1589,6 +1597,7 @@ process_longform_attr(
case XFS_ATTR3_LEAF_MAGIC:
if (process_leaf_attr_block(mp, leaf, 0, ino, blkmap,
0, &next_hashval, repair)) {
+ *repair = 0;
/* the block is bad. lose the attribute fork. */
libxfs_putbuf(bp);
return(1);
@@ -1604,12 +1613,16 @@ process_longform_attr(
libxfs_writebuf(bp, 0);
} else
libxfs_putbuf(bp);
- return (process_node_attr(mp, ino, dip, blkmap)); /* + repair */
+ error = process_node_attr(mp, ino, dip, blkmap); /* + repair */
+ if (error)
+ *repair = 0;
+ return error;
default:
do_warn(
_("bad attribute leaf magic # %#x for dir ino %" PRIu64 "\n"),
be16_to_cpu(leaf->hdr.info.magic), ino);
libxfs_putbuf(bp);
+ *repair = 0;
return(1);
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 06/11] xfs_repair: check v5 filesystem attr block header sanity
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
` (4 preceding siblings ...)
2015-08-26 0:32 ` [PATCH 05/11] xfs_repair: ignore "repaired" flag after we decide to clear xattr block Darrick J. Wong
@ 2015-08-26 0:32 ` Darrick J. Wong
2015-08-26 0:45 ` Dave Chinner
2015-08-26 0:33 ` [PATCH 07/11] xfs_repair: force not-so-bad bmbt blocks back through the verifier Darrick J. Wong
` (4 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:32 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
Check the v5 fields (uuid, blocknr, owner) of attribute blocks for
obvious errors while scanning xattr blocks. If the ownership info
is incorrect, kill the block.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
repair/attr_repair.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index c2b7c3a..8d03161 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -1508,6 +1508,44 @@ process_node_attr(
return (process_leaf_attr_level(mp, &da_cursor));
}
+/* check v5 metadata */
+static int
+__check_attr_header(
+ struct xfs_mount *mp,
+ struct xfs_buf *bp,
+ xfs_ino_t ino)
+{
+ struct xfs_da3_blkinfo *info = bp->b_addr;
+
+ if (info->hdr.magic != cpu_to_be16(XFS_ATTR3_LEAF_MAGIC) &&
+ info->hdr.magic != cpu_to_be16(XFS_DA3_NODE_MAGIC))
+ return 0;
+
+ /* verify owner */
+ if (be64_to_cpu(info->owner) != ino) {
+ do_warn(
+_("expected owner inode %" PRIu64 ", got %llu, attr block %" PRIu64 "\n"),
+ ino, be64_to_cpu(info->owner), bp->b_bn);
+ return 1;
+ }
+ /* verify block number */
+ if (be64_to_cpu(info->blkno) != bp->b_bn) {
+ do_warn(
+_("expected block %" PRIu64 ", got %llu, inode %" PRIu64 "attr block\n"),
+ bp->b_bn, be64_to_cpu(info->blkno), ino);
+ return 1;
+ }
+ /* verify uuid */
+ if (platform_uuid_compare(&info->uuid, &mp->m_sb.sb_meta_uuid) != 0) {
+ do_warn(
+_("wrong FS UUID, inode %" PRIu64 " attr block %" PRIu64 "\n"),
+ ino, bp->b_bn);
+ return 1;
+ }
+
+ return 0;
+}
+
/*
* Start processing for a leaf or fuller btree.
* A leaf directory is one where the attribute fork is too big for
@@ -1564,6 +1602,13 @@ process_longform_attr(
if (bp->b_error == -EFSBADCRC)
(*repair)++;
+ /* is this block sane? */
+ if (__check_attr_header(mp, bp, ino)) {
+ *repair = 0;
+ libxfs_putbuf(bp);
+ return 1;
+ }
+
/* verify leaf block */
leaf = (xfs_attr_leafblock_t *)XFS_BUF_PTR(bp);
xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 07/11] xfs_repair: force not-so-bad bmbt blocks back through the verifier
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
` (5 preceding siblings ...)
2015-08-26 0:32 ` [PATCH 06/11] xfs_repair: check v5 filesystem attr block header sanity Darrick J. Wong
@ 2015-08-26 0:33 ` Darrick J. Wong
2015-08-26 0:54 ` Dave Chinner
2015-08-26 0:33 ` [PATCH 08/11] xfs_io: support reflinking and deduping file ranges Darrick J. Wong
` (3 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:33 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
If during prefetch we encounter a bmbt block that fails the CRC check
due to corruption in the unused part of the block, force the buffer
back through the non-prefetch verifiers later so that the CRC is
updated. Otherwise, the bad checksum goes unfixed and the kernel will
still flag the bmbt block as invalid.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
repair/prefetch.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/repair/prefetch.c b/repair/prefetch.c
index 1de3ec0..77d29c8 100644
--- a/repair/prefetch.c
+++ b/repair/prefetch.c
@@ -276,6 +276,14 @@ pf_scan_lbtree(
XFS_BUF_SET_PRIORITY(bp, isadir ? B_DIR_BMAP : B_BMAP);
+ /*
+ * Make this bmbt buffer go back through the verifiers later so that
+ * we correct checksum errors stemming from bitflips in the unused
+ * parts of the bmbt block.
+ */
+ if (bp->b_error == -EFSBADCRC)
+ bp->b_flags |= LIBXFS_B_UNCHECKED;
+
rc = (*func)(XFS_BUF_TO_BLOCK(bp), level - 1, isadir, args);
libxfs_putbuf(bp);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 08/11] xfs_io: support reflinking and deduping file ranges
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
` (6 preceding siblings ...)
2015-08-26 0:33 ` [PATCH 07/11] xfs_repair: force not-so-bad bmbt blocks back through the verifier Darrick J. Wong
@ 2015-08-26 0:33 ` Darrick J. Wong
2015-09-22 2:17 ` Dave Chinner
2015-08-26 0:33 ` [PATCH 09/11] xfs_db: enable blocktrash for checksummed filesystems Darrick J. Wong
` (2 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:33 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
Wire up xfs_io to use the XFS range clone and dedupe ioctls to make
files share data blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
io/Makefile | 2 -
io/dedupe.c | 190 +++++++++++++++++++++++++++++++++++++++++++++++++++++
io/init.c | 2 +
io/io.h | 3 +
io/reflink.c | 180 ++++++++++++++++++++++++++++++++++++++++++++++++++
libxfs/xfs_fs.h | 36 ++++++++++
man/man8/xfs_io.8 | 67 +++++++++++++++++++
7 files changed, 479 insertions(+), 1 deletion(-)
create mode 100644 io/dedupe.c
create mode 100644 io/reflink.c
diff --git a/io/Makefile b/io/Makefile
index a08a782..6c4810e 100644
--- a/io/Makefile
+++ b/io/Makefile
@@ -11,7 +11,7 @@ HFILES = init.h io.h
CFILES = init.c \
attr.c bmap.c file.c freeze.c fsync.c getrusage.c imap.c link.c \
mmap.c open.c parent.c pread.c prealloc.c pwrite.c seek.c shutdown.c \
- sync.c truncate.c
+ sync.c truncate.c reflink.c dedupe.c
LLDLIBS = $(LIBXCMD) $(LIBHANDLE)
LTDEPENDENCIES = $(LIBXCMD) $(LIBHANDLE)
diff --git a/io/dedupe.c b/io/dedupe.c
new file mode 100644
index 0000000..59c3d0f
--- /dev/null
+++ b/io/dedupe.c
@@ -0,0 +1,190 @@
+/*
+ * Copyright (c) 2015 Oracle, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <sys/uio.h>
+#include <xfs/xfs.h>
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "io.h"
+
+static cmdinfo_t dedupe_cmd;
+
+static void
+dedupe_help(void)
+{
+ printf(_(
+"\n"
+" Links a range of bytes (in block size increments) from a file into a range \n"
+" of bytes in the open file. The contents of both file ranges must match.\n"
+"\n"
+" Example:\n"
+" 'dedupe some_file 0 4096 32768' - links 32768 bytes from some_file at \n"
+" offset 0 to into the open file at \n"
+" position 4096\n"
+"\n"
+" Reflink a range of blocks from a given input file to the open file. Both\n"
+" files share the same range of physical disk blocks; a write to the shared\n"
+" range of either file should result in the write landing in a new block and\n"
+" that range of the file being remapped (i.e. copy-on-write). Both files\n"
+" must reside on the same filesystem, and the contents of both ranges must\n"
+" match.\n"
+" -w -- call fdatasync(2) at the end (included in timing results)\n"
+" -W -- call fsync(2) at the end (included in timing results)\n"
+"\n"));
+}
+
+static int
+dedupe_f(
+ int argc,
+ char **argv)
+{
+ off64_t soffset, doffset;
+ long long count, total;
+ char s1[64], s2[64], ts[64];
+ char *infile;
+ int Cflag, qflag, wflag, Wflag;
+ struct xfs_ioctl_file_extent_same_args *args = NULL;
+ struct xfs_ioctl_file_extent_same_info *info;
+ size_t fsblocksize, fssectsize;
+ struct timeval t1, t2;
+ int c, fd = -1;
+
+ Cflag = qflag = wflag = Wflag = 0;
+ init_cvtnum(&fsblocksize, &fssectsize);
+
+ while ((c = getopt(argc, argv, "CqwW")) != EOF) {
+ switch (c) {
+ case 'C':
+ Cflag = 1;
+ break;
+ case 'q':
+ qflag = 1;
+ break;
+ case 'w':
+ wflag = 1;
+ break;
+ case 'W':
+ Wflag = 1;
+ break;
+ default:
+ return command_usage(&dedupe_cmd);
+ }
+ }
+ if (optind != argc - 4)
+ return command_usage(&dedupe_cmd);
+ infile = argv[optind];
+ optind++;
+ soffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
+ if (soffset < 0) {
+ printf(_("non-numeric src offset argument -- %s\n"), argv[optind]);
+ return 0;
+ }
+ optind++;
+ doffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
+ if (doffset < 0) {
+ printf(_("non-numeric dest offset argument -- %s\n"), argv[optind]);
+ return 0;
+ }
+ optind++;
+ count = cvtnum(fsblocksize, fssectsize, argv[optind]);
+ if (count < 1) {
+ printf(_("non-positive length argument -- %s\n"), argv[optind]);
+ return 0;
+ }
+
+ c = IO_READONLY;
+ fd = openfile(infile, NULL, c, 0);
+ if (fd < 0)
+ return 0;
+
+ gettimeofday(&t1, NULL);
+ args = calloc(1, sizeof(struct xfs_ioctl_file_extent_same_args) +
+ sizeof(struct xfs_ioctl_file_extent_same_info));
+ if (!args)
+ goto done;
+ info = (struct xfs_ioctl_file_extent_same_info *)(args + 1);
+ args->logical_offset = soffset;
+ args->length = count;
+ args->dest_count = 1;
+ info->fd = file->fd;
+ info->logical_offset = doffset;
+ do {
+ c = ioctl(fd, XFS_IOC_FILE_EXTENT_SAME, args);
+ if (c)
+ break;
+ args->logical_offset += info->bytes_deduped;
+ info->logical_offset += info->bytes_deduped;
+ args->length -= info->bytes_deduped;
+ } while (c == 0 && info->status == 0 && info->bytes_deduped > 0);
+ if (c)
+ perror(_("dedupe ioctl"));
+ if (info->status < 0)
+ printf("dedupe: %s\n", _(strerror(-info->status)));
+ if (info->status == XFS_SAME_DATA_DIFFERS)
+ printf(_("Extents did not match.\n"));
+ if (c != 0 || info->status != 0)
+ goto done;
+ total = info->bytes_deduped;
+ c = 1;
+ if (Wflag)
+ fsync(file->fd);
+ if (wflag)
+ fdatasync(file->fd);
+ if (qflag)
+ goto done;
+ gettimeofday(&t2, NULL);
+ t2 = tsub(t2, t1);
+
+ /* Finally, report back -- -C gives a parsable format */
+ timestr(&t2, ts, sizeof(ts), Cflag ? VERBOSE_FIXED_TIME : 0);
+ if (!Cflag) {
+ cvtstr((double)total, s1, sizeof(s1));
+ cvtstr(tdiv((double)total, t2), s2, sizeof(s2));
+ printf(_("linked %lld/%lld bytes at offset %lld\n"),
+ total, count, (long long)doffset);
+ printf(_("%s, %d ops; %s (%s/sec and %.4f ops/sec)\n"),
+ s1, c, ts, s2, tdiv((double)c, t2));
+ } else {/* bytes,ops,time,bytes/sec,ops/sec */
+ printf("%lld,%d,%s,%.3f,%.3f\n",
+ total, c, ts,
+ tdiv((double)total, t2), tdiv((double)c, t2));
+ }
+done:
+ free(args);
+ close(fd);
+ return 0;
+}
+
+void
+dedupe_init(void)
+{
+ dedupe_cmd.name = "dedupe";
+ dedupe_cmd.altname = "dd";
+ dedupe_cmd.cfunc = dedupe_f;
+ dedupe_cmd.argmin = 4;
+ dedupe_cmd.argmax = -1;
+ dedupe_cmd.flags = CMD_NOMAP_OK | CMD_FOREIGN_OK;
+ dedupe_cmd.args =
+_("infile src_off dst_off len");
+ dedupe_cmd.oneline =
+ _("dedupes a number of bytes at a specified offset");
+ dedupe_cmd.help = dedupe_help;
+
+ add_command(&dedupe_cmd);
+}
diff --git a/io/init.c b/io/init.c
index 13f35c4..739371e 100644
--- a/io/init.c
+++ b/io/init.c
@@ -83,6 +83,8 @@ init_commands(void)
sync_init();
sync_range_init();
truncate_init();
+ reflink_init();
+ dedupe_init();
}
static int
diff --git a/io/io.h b/io/io.h
index b115e4a..ec8a530 100644
--- a/io/io.h
+++ b/io/io.h
@@ -161,3 +161,6 @@ extern void readdir_init(void);
#else
#define readdir_init() do { } while (0)
#endif
+
+extern void reflink_init(void);
+extern void dedupe_init(void);
diff --git a/io/reflink.c b/io/reflink.c
new file mode 100644
index 0000000..fc2d2b9
--- /dev/null
+++ b/io/reflink.c
@@ -0,0 +1,180 @@
+/*
+ * Copyright (c) 2015 Oracle, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include <sys/uio.h>
+#include <xfs/xfs.h>
+#include "command.h"
+#include "input.h"
+#include "init.h"
+#include "io.h"
+
+static cmdinfo_t reflink_cmd;
+
+static void
+reflink_help(void)
+{
+ printf(_(
+"\n"
+" Links a range of bytes (in block size increments) from a file into a range \n"
+" of bytes in the open file. The two extent ranges need not contain identical\n"
+" data. \n"
+"\n"
+" Example:\n"
+" 'reflink some_file 0 4096 32768' - links 32768 bytes from some_file at \n"
+" offset 0 to into the open file at \n"
+" position 4096\n"
+" 'reflink some_file' - links all bytes from some_file into the open file\n"
+" at position 0\n"
+"\n"
+" Reflink a range of blocks from a given input file to the open file. Both\n"
+" files share the same range of physical disk blocks; a write to the shared\n"
+" range of either file should result in the write landing in a new block and\n"
+" that range of the file being remapped (i.e. copy-on-write). Both files\n"
+" must reside on the same filesystem.\n"
+" -w -- call fdatasync(2) at the end (included in timing results)\n"
+" -W -- call fsync(2) at the end (included in timing results)\n"
+"\n"));
+}
+
+static int
+reflink_f(
+ int argc,
+ char **argv)
+{
+ off64_t soffset, doffset;
+ long long count = 0, total;
+ char s1[64], s2[64], ts[64];
+ char *infile = NULL;
+ int Cflag, qflag, wflag, Wflag;
+ struct xfs_ioctl_clone_range_args args;
+ size_t fsblocksize, fssectsize;
+ struct timeval t1, t2;
+ int c, fd = -1;
+
+ Cflag = qflag = wflag = Wflag = 0;
+ init_cvtnum(&fsblocksize, &fssectsize);
+
+ while ((c = getopt(argc, argv, "CqwW")) != EOF) {
+ switch (c) {
+ case 'C':
+ Cflag = 1;
+ break;
+ case 'q':
+ qflag = 1;
+ break;
+ case 'w':
+ wflag = 1;
+ break;
+ case 'W':
+ Wflag = 1;
+ break;
+ default:
+ return command_usage(&reflink_cmd);
+ }
+ }
+ if (optind != argc - 4 && optind != argc - 1)
+ return command_usage(&reflink_cmd);
+ infile = argv[optind];
+ optind++;
+ if (optind == argc)
+ goto clone_all;
+ soffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
+ if (soffset < 0) {
+ printf(_("non-numeric src offset argument -- %s\n"), argv[optind]);
+ return 0;
+ }
+ optind++;
+ doffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
+ if (doffset < 0) {
+ printf(_("non-numeric dest offset argument -- %s\n"), argv[optind]);
+ return 0;
+ }
+ optind++;
+ count = cvtnum(fsblocksize, fssectsize, argv[optind]);
+ if (count < 1) {
+ printf(_("non-positive length argument -- %s\n"), argv[optind]);
+ return 0;
+ }
+
+clone_all:
+ c = IO_READONLY;
+ fd = openfile(infile, NULL, c, 0);
+ if (fd < 0)
+ return 0;
+
+ gettimeofday(&t1, NULL);
+ if (count) {
+ args.src_fd = fd;
+ args.src_offset = soffset;
+ args.src_length = count;
+ args.dest_offset = doffset;
+ c = ioctl(file->fd, XFS_IOC_CLONE_RANGE, &args);
+ } else {
+ c = ioctl(file->fd, XFS_IOC_CLONE, fd);
+ }
+ if (c < 0) {
+ perror(_("reflink"));
+ goto done;
+ }
+ total = count;
+ c = 1;
+ if (Wflag)
+ fsync(file->fd);
+ if (wflag)
+ fdatasync(file->fd);
+ if (qflag)
+ goto done;
+ gettimeofday(&t2, NULL);
+ t2 = tsub(t2, t1);
+
+ /* Finally, report back -- -C gives a parsable format */
+ timestr(&t2, ts, sizeof(ts), Cflag ? VERBOSE_FIXED_TIME : 0);
+ if (!Cflag) {
+ cvtstr((double)total, s1, sizeof(s1));
+ cvtstr(tdiv((double)total, t2), s2, sizeof(s2));
+ printf(_("linked %lld/%lld bytes at offset %lld\n"),
+ total, count, (long long)doffset);
+ printf(_("%s, %d ops; %s (%s/sec and %.4f ops/sec)\n"),
+ s1, c, ts, s2, tdiv((double)c, t2));
+ } else {/* bytes,ops,time,bytes/sec,ops/sec */
+ printf("%lld,%d,%s,%.3f,%.3f\n",
+ total, c, ts,
+ tdiv((double)total, t2), tdiv((double)c, t2));
+ }
+done:
+ close(fd);
+ return 0;
+}
+
+void
+reflink_init(void)
+{
+ reflink_cmd.name = "reflink";
+ reflink_cmd.altname = "rl";
+ reflink_cmd.cfunc = reflink_f;
+ reflink_cmd.argmin = 4;
+ reflink_cmd.argmax = -1;
+ reflink_cmd.flags = CMD_NOMAP_OK | CMD_FOREIGN_OK;
+ reflink_cmd.args =
+_("infile src_off dst_off len");
+ reflink_cmd.oneline =
+ _("reflinks a number of bytes at a specified offset");
+ reflink_cmd.help = reflink_help;
+
+ add_command(&reflink_cmd);
+}
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 89689c6..0c922ad 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -559,6 +559,42 @@ typedef struct xfs_swapext
#define XFS_IOC_GOINGDOWN _IOR ('X', 125, __uint32_t)
/* XFS_IOC_GETFSUUID ---------- deprecated 140 */
+/* reflink ioctls; these MUST match the btrfs ioctl definitions */
+struct xfs_ioctl_clone_range_args {
+ __s64 src_fd;
+ __u64 src_offset;
+ __u64 src_length;
+ __u64 dest_offset;
+};
+
+#define XFS_SAME_DATA_DIFFERS 1
+/* For extent-same ioctl */
+struct xfs_ioctl_file_extent_same_info {
+ __s64 fd; /* in - destination file */
+ __u64 logical_offset; /* in - start of extent in destination */
+ __u64 bytes_deduped; /* out - total # of bytes we were able
+ * to dedupe from this file */
+ /* status of this dedupe operation:
+ * 0 if dedup succeeds
+ * < 0 for error
+ * == XFS_SAME_DATA_DIFFERS if data differs
+ */
+ __s32 status; /* out - see above description */
+ __u32 reserved;
+};
+
+struct xfs_ioctl_file_extent_same_args {
+ __u64 logical_offset; /* in - start of extent in source */
+ __u64 length; /* in - length of extent */
+ __u16 dest_count; /* in - total elements in info array */
+ __u16 reserved1;
+ __u32 reserved2;
+ struct xfs_ioctl_file_extent_same_info info[0];
+};
+
+#define XFS_IOC_CLONE _IOW (0x94, 9, int)
+#define XFS_IOC_CLONE_RANGE _IOW (0x94, 13, struct xfs_ioctl_clone_range_args)
+#define XFS_IOC_FILE_EXTENT_SAME _IOWR(0x94, 54, struct xfs_ioctl_file_extent_same_args)
#ifndef HAVE_BBMACROS
/*
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index 416206f..305335c 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -490,6 +490,73 @@ Recursively display all the specified segments starting at the specified
.B \-s
Display the starting lseek(2) offset. This offset will be a calculated value when
both data and holes are displayed together or performing a recusively display.
+.RE
+.PD
+.TP
+.TP
+.BI "reflink [ \-w ] [ \-W ] src_file [src_offset dst_offset length]"
+On filesystems that support the
+.B XFS_IOC_CLONE_RANGE
+or
+.B BTRFS_IOC_CLONE_RANGE
+ioctls, map
+.I length
+bytes at offset
+.I dst_offset
+in the open file to the same physical blocks that are mapped at offset
+.I src_offset
+in the file
+.I src_file
+, replacing any contents that may already have been there. If a program
+writes into a reflinked block range of either file, the dirty blocks will be
+cloned, written to, and remapped ("copy on write") in the affected file,
+leaving the other file(s) unchanged. If src_offset, dst_offset, and length
+are omitted, all contents of src_file will be reflinked into the open file.
+.RS 1.0i
+.PD 0
+.TP 0.4i
+.B \-w
+Call
+.BR fdatasync (2)
+after executing the ioctl.
+.TP
+.B \-W
+Call
+.BR fsync (2)
+after executing the command.
+.RE
+.PD
+.TP
+.TP
+.BI "dedupe [ \-w ] [ \-W ] src_file src_offset dst_offset length"
+On filesystems that support the
+.B XFS_IOC_FILE_EXTENT_SAME
+or
+.B BTRFS_IOC_FILE_EXTENT_SAME
+ioctls, map
+.I length
+bytes at offset
+.I dst_offset
+in the open file to the same physical blocks that are mapped at offset
+.I src_offset
+in the file
+.I src_file
+, but only if the contents of both ranges are identical. This is known as
+block-based deduplication. If a program writes into a reflinked block range of
+either file, the dirty blocks will be cloned, written to, and remapped ("copy
+on write") in the affected file, leaving the other file(s) unchanged.
+.RS 1.0i
+.PD 0
+.TP 0.4i
+.B \-w
+Call
+.BR fdatasync (2)
+after executing the ioctl.
+.TP
+.B \-W
+Call
+.BR fsync (2)
+after executing the command.
.TP
.SH MEMORY MAPPED I/O COMMANDS
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 09/11] xfs_db: enable blocktrash for checksummed filesystems
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
` (7 preceding siblings ...)
2015-08-26 0:33 ` [PATCH 08/11] xfs_io: support reflinking and deduping file ranges Darrick J. Wong
@ 2015-08-26 0:33 ` Darrick J. Wong
2015-08-26 0:33 ` [PATCH 10/11] xfs_db: trash the block at the top of the cursor stack Darrick J. Wong
2015-08-26 0:33 ` [PATCH 11/11] xfs_db: enable blockget for v5 filesystems Darrick J. Wong
10 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:33 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
Disable the write verifiers when we're trashing a block. With this
in place, create a xfs fuzzer script that formats, populates, corrupts,
tries to use, repairs, and tries again to use a crash test xfs image.
Hopefully this will shake out some v5 filesystem bugs.
v2: Drop xfsfuzz, don't assume every block is an AGF when blocktrashing.
Don't trash log blocks by default, because that skews the blocktrash
heavily towards damaging only log blocks.
v3: Fix changelog issues, allow trashing of log blocks and symlinks,
and require the caller to explicitly ask for trashing of log blocks
and super blocks. Allowing log blocks by default skews the trashing
heavily in favor of (probably unused) log blocks, which doesn't help
us with fuzzing. Furthermore, trashing the superblock results in a
time consuming sector by sector superblock hunt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
db/check.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/db/check.c b/db/check.c
index d28199d..8f3b5b6 100644
--- a/db/check.c
+++ b/db/check.c
@@ -944,6 +944,7 @@ blocktrash_b(
int mask;
int newbit;
int offset;
+ const struct xfs_buf_ops *stashed_ops;
static char *modestr[] = {
N_("zeroed"), N_("set"), N_("flipped"), N_("randomized")
};
@@ -952,8 +953,10 @@ blocktrash_b(
offset = (int)(random() % (int)(mp->m_sb.sb_blocksize * NBBY));
newbit = 0;
push_cur();
- set_cur(&typtab[DBM_UNKNOWN],
+ set_cur(NULL,
XFS_AGB_TO_DADDR(mp, agno, agbno), blkbb, DB_RING_IGN, NULL);
+ stashed_ops = iocur_top->bp->b_ops;
+ iocur_top->bp->b_ops = NULL;
if ((buf = iocur_top->data) == NULL) {
dbprintf(_("can't read block %u/%u for trashing\n"), agno, agbno);
pop_cur();
@@ -984,6 +987,7 @@ blocktrash_b(
buf[byte] &= ~mask;
}
write_cur();
+ iocur_top->bp->b_ops = stashed_ops;
pop_cur();
printf(_("blocktrash: %u/%u %s block %d bit%s starting %d:%d %s\n"),
agno, agbno, typename[type], len, len == 1 ? "" : "s",
@@ -1040,9 +1044,11 @@ blocktrash_f(
(1 << DBM_BTINO) |
(1 << DBM_DIR) |
(1 << DBM_INODE) |
+ (1 << DBM_LOG) |
(1 << DBM_QUOTA) |
(1 << DBM_RTBITMAP) |
(1 << DBM_RTSUM) |
+ (1 << DBM_SYMLINK) |
(1 << DBM_SB);
while ((c = getopt(argc, argv, "0123n:s:t:x:y:")) != EOF) {
switch (c) {
@@ -1106,7 +1112,7 @@ blocktrash_f(
return 0;
}
if (tmask == 0)
- tmask = goodmask;
+ tmask = goodmask & ~((1 << DBM_LOG) | (1 << DBM_SB));
lentab = xmalloc(sizeof(ltab_t));
lentab->min = lentab->max = min;
lentablen = 1;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 10/11] xfs_db: trash the block at the top of the cursor stack
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
` (8 preceding siblings ...)
2015-08-26 0:33 ` [PATCH 09/11] xfs_db: enable blocktrash for checksummed filesystems Darrick J. Wong
@ 2015-08-26 0:33 ` Darrick J. Wong
2015-08-26 0:33 ` [PATCH 11/11] xfs_db: enable blockget for v5 filesystems Darrick J. Wong
10 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:33 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
Add a new -z option to blocktrash to make it trash the block that's at
the top of the stack, so that we can perform targeted fuzzing. While
we're at it, prevent fuzzing off the end of the buffer and add a -o
parameter so that we can specify an offset to start fuzzing from.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
db/check.c | 85 +++++++++++++++++++++++++++++++++++++++++------------
man/man8/xfs_db.8 | 15 +++++++++
2 files changed, 79 insertions(+), 21 deletions(-)
diff --git a/db/check.c b/db/check.c
index 8f3b5b6..5b32d07 100644
--- a/db/check.c
+++ b/db/check.c
@@ -930,8 +930,7 @@ typedef struct ltab {
static void
blocktrash_b(
- xfs_agnumber_t agno,
- xfs_agblock_t agbno,
+ int bit_offset,
dbm_t type,
ltab_t *ltabp,
int mode)
@@ -943,27 +942,40 @@ blocktrash_b(
int len;
int mask;
int newbit;
- int offset;
const struct xfs_buf_ops *stashed_ops;
static char *modestr[] = {
N_("zeroed"), N_("set"), N_("flipped"), N_("randomized")
};
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ agno = XFS_FSB_TO_AGNO(mp, XFS_DADDR_TO_FSB(mp, iocur_top->bb));
+ agbno = XFS_FSB_TO_AGBNO(mp, XFS_DADDR_TO_FSB(mp, iocur_top->bb));
+ if (iocur_top->len == 0) {
+ dbprintf(_("zero-length block %u/%u buffer to trash??\n"),
+ agno, agbno);
+ return;
+ }
len = (int)((random() % (ltabp->max - ltabp->min + 1)) + ltabp->min);
- offset = (int)(random() % (int)(mp->m_sb.sb_blocksize * NBBY));
+ /*
+ * bit_offset >= 0: start fuzzing at this exact bit_offset.
+ * bit_offset < 0: pick an offset at least as high at -(bit_offset + 1).
+ */
+ if (bit_offset < 0) {
+ bit_offset = -(bit_offset + 1);
+ bit_offset += (int)(random() % (int)((iocur_top->len - bit_offset) * NBBY));
+ }
+ if (bit_offset + len >= iocur_top->len * NBBY)
+ len = (iocur_top->len * NBBY) - bit_offset;
newbit = 0;
- push_cur();
- set_cur(NULL,
- XFS_AGB_TO_DADDR(mp, agno, agbno), blkbb, DB_RING_IGN, NULL);
stashed_ops = iocur_top->bp->b_ops;
iocur_top->bp->b_ops = NULL;
if ((buf = iocur_top->data) == NULL) {
dbprintf(_("can't read block %u/%u for trashing\n"), agno, agbno);
- pop_cur();
return;
}
for (bitno = 0; bitno < len; bitno++) {
- bit = (offset + bitno) % (mp->m_sb.sb_blocksize * NBBY);
+ bit = (bit_offset + bitno) % (mp->m_sb.sb_blocksize * NBBY);
byte = bit / NBBY;
bit %= NBBY;
mask = 1 << bit;
@@ -988,10 +1000,9 @@ blocktrash_b(
}
write_cur();
iocur_top->bp->b_ops = stashed_ops;
- pop_cur();
printf(_("blocktrash: %u/%u %s block %d bit%s starting %d:%d %s\n"),
agno, agbno, typename[type], len, len == 1 ? "" : "s",
- offset / NBBY, offset % NBBY, modestr[mode]);
+ bit_offset / NBBY, bit_offset % NBBY, modestr[mode]);
}
int
@@ -1019,11 +1030,9 @@ blocktrash_f(
uint seed;
int sopt;
int tmask;
+ bool this_block = false;
+ int bit_offset = -1;
- if (!dbmap) {
- dbprintf(_("must run blockget first\n"));
- return 0;
- }
optind = 0;
count = 1;
min = 1;
@@ -1050,7 +1059,7 @@ blocktrash_f(
(1 << DBM_RTSUM) |
(1 << DBM_SYMLINK) |
(1 << DBM_SB);
- while ((c = getopt(argc, argv, "0123n:s:t:x:y:")) != EOF) {
+ while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) {
switch (c) {
case '0':
mode = 0;
@@ -1071,6 +1080,21 @@ blocktrash_f(
return 0;
}
break;
+ case 'o': {
+ int relative = 0;
+ if (optarg[0] == '+') {
+ optarg++;
+ relative = 1;
+ }
+ bit_offset = (int)strtol(optarg, &p, 0);
+ if (*p != '\0' || bit_offset < 0) {
+ dbprintf(_("bad blocktrash offset %s\n"), optarg);
+ return 0;
+ }
+ if (relative)
+ bit_offset = -bit_offset - 1;
+ break;
+ }
case 's':
seed = (uint)strtoul(optarg, &p, 0);
sopt = 1;
@@ -1102,11 +1126,22 @@ blocktrash_f(
return 0;
}
break;
+ case 'z':
+ this_block = true;
+ break;
default:
dbprintf(_("bad option for blocktrash command\n"));
return 0;
}
}
+ if (!this_block && !dbmap) {
+ dbprintf(_("must run blockget first\n"));
+ return 0;
+ }
+ if (this_block && iocur_sp == 0) {
+ dbprintf(_("nothing on stack\n"));
+ return 0;
+ }
if (min > max) {
dbprintf(_("bad min/max for blocktrash command\n"));
return 0;
@@ -1125,6 +1160,14 @@ blocktrash_f(
} else
lentab[lentablen - 1].max = i;
}
+ if (!sopt)
+ dbprintf(_("blocktrash: seed %u\n"), seed);
+ srandom(seed);
+ if (this_block) {
+ blocktrash_b(bit_offset, DBM_UNKNOWN,
+ &lentab[random() % lentablen], mode);
+ goto out;
+ }
for (blocks = 0, agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
for (agbno = 0, p = dbmap[agno];
agbno < mp->m_sb.sb_agblocks;
@@ -1137,9 +1180,6 @@ blocktrash_f(
dbprintf(_("blocktrash: no matching blocks\n"));
goto out;
}
- if (!sopt)
- dbprintf(_("blocktrash: seed %u\n"), seed);
- srandom(seed);
for (i = 0; i < count; i++) {
randb = (xfs_rfsblock_t)((((__int64_t)random() << 32) |
random()) % blocks);
@@ -1153,8 +1193,13 @@ blocktrash_f(
continue;
if (bi++ < randb)
continue;
- blocktrash_b(agno, agbno, (dbm_t)*p,
+ push_cur();
+ set_cur(NULL,
+ XFS_AGB_TO_DADDR(mp, agno, agbno),
+ blkbb, DB_RING_IGN, NULL);
+ blocktrash_b(bit_offset, (dbm_t)*p,
&lentab[random() % lentablen], mode);
+ pop_cur();
done = 1;
break;
}
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index df54bb7..681efc4 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -232,7 +232,7 @@ enables verbose output. Messages will be printed for every block and
inode processed.
.RE
.TP
-.BI "blocktrash [\-n " count "] [\-x " min "] [\-y " max "] [\-s " seed "] [\-0|1|2|3] [\-t " type "] ..."
+.BI "blocktrash [-z] [\-o " offset "] [\-n " count "] [\-x " min "] [\-y " max "] [\-s " seed "] [\-0|1|2|3] [\-t " type "] ..."
Trash randomly selected filesystem metadata blocks.
Trashing occurs to randomly selected bits in the chosen blocks.
This command is available only in debugging versions of
@@ -259,6 +259,13 @@ supplies the
.I count
of block-trashings to perform (default 1).
.TP
+.B \-o
+supplies the bit
+.I offset
+at which to start trashing the block. If the value is preceded by a '+', the
+trashing will start at a randomly chosen offset that is larger than the value
+supplied. The default is to randomly choose an offset anywhere in the block.
+.TP
.B \-s
supplies a
.I seed
@@ -282,6 +289,12 @@ size of bit range to be trashed. The default value is 1.
sets the
.I maximum
size of bit range to be trashed. The default value is 1024.
+.TP
+.B \-z
+trashes the block at the top of the stack. It is not necessary to
+run
+.BI blockget
+if this option is supplied.
.RE
.TP
.BI "blockuse [\-n] [\-c " count ]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 11/11] xfs_db: enable blockget for v5 filesystems
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
` (9 preceding siblings ...)
2015-08-26 0:33 ` [PATCH 10/11] xfs_db: trash the block at the top of the cursor stack Darrick J. Wong
@ 2015-08-26 0:33 ` Darrick J. Wong
10 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:33 UTC (permalink / raw)
To: david, darrick.wong; +Cc: xfs
Plumb in the necessary magic number checks and other fixups required
to handle v5 filesystems.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
db/check.c | 238 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
db/type.c | 2 +
2 files changed, 214 insertions(+), 26 deletions(-)
diff --git a/db/check.c b/db/check.c
index 5b32d07..9c1541d 100644
--- a/db/check.c
+++ b/db/check.c
@@ -44,7 +44,7 @@ typedef enum {
DBM_FREE1, DBM_FREE2, DBM_FREELIST, DBM_INODE,
DBM_LOG, DBM_MISSING, DBM_QUOTA, DBM_RTBITMAP,
DBM_RTDATA, DBM_RTFREE, DBM_RTSUM, DBM_SB,
- DBM_SYMLINK,
+ DBM_SYMLINK, DBM_BTFINO,
DBM_NDBM
} dbm_t;
@@ -170,6 +170,7 @@ static const char *typename[] = {
"rtsum",
"sb",
"symlink",
+ "btfino",
NULL
};
static int verbose;
@@ -345,6 +346,9 @@ static void scanfunc_cnt(struct xfs_btree_block *block, int level,
static void scanfunc_ino(struct xfs_btree_block *block, int level,
xfs_agf_t *agf, xfs_agblock_t bno,
int isroot);
+static void scanfunc_fino(struct xfs_btree_block *block, int level,
+ struct xfs_agf *agf, xfs_agblock_t bno,
+ int isroot);
static void set_dbmap(xfs_agnumber_t agno, xfs_agblock_t agbno,
xfs_extlen_t len, dbm_t type,
xfs_agnumber_t c_agno, xfs_agblock_t c_agbno);
@@ -789,19 +793,6 @@ blockget_f(
return 0;
}
- /*
- * XXX: check does not support CRC enabled filesystems. Return
- * immediately, silently, with success but without doing anything here
- * initially so that xfstests can run without modification on metadata
- * enabled filesystems.
- *
- * XXX: ultimately we need to dump an error message here that xfstests
- * filters out, or we need to actually do the work to make check support
- * crc enabled filesystems.
- */
- if (xfs_sb_version_hascrc(&mp->m_sb))
- return 0;
-
if (!init(argc, argv)) {
if (serious_error)
exitcode = 3;
@@ -1058,6 +1049,7 @@ blocktrash_f(
(1 << DBM_RTBITMAP) |
(1 << DBM_RTSUM) |
(1 << DBM_SYMLINK) |
+ (1 << DBM_BTFINO) |
(1 << DBM_SB);
while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) {
switch (c) {
@@ -2267,7 +2259,9 @@ process_data_dir_v2(
data = iocur_top->data;
block = iocur_top->data;
if (be32_to_cpu(block->magic) != XFS_DIR2_BLOCK_MAGIC &&
- be32_to_cpu(data->magic) != XFS_DIR2_DATA_MAGIC) {
+ be32_to_cpu(data->magic) != XFS_DIR2_DATA_MAGIC &&
+ be32_to_cpu(block->magic) != XFS_DIR3_BLOCK_MAGIC &&
+ be32_to_cpu(data->magic) != XFS_DIR3_DATA_MAGIC) {
if (!sflag || v)
dbprintf(_("bad directory data magic # %#x for dir ino "
"%lld block %d\n"),
@@ -2278,7 +2272,8 @@ process_data_dir_v2(
db = xfs_dir2_da_to_db(mp->m_dir_geo, dabno);
bf = M_DIROPS(mp)->data_bestfree_p(data);
ptr = (char *)M_DIROPS(mp)->data_unused_p(data);
- if (be32_to_cpu(block->magic) == XFS_DIR2_BLOCK_MAGIC) {
+ if (be32_to_cpu(block->magic) == XFS_DIR2_BLOCK_MAGIC ||
+ be32_to_cpu(block->magic) == XFS_DIR3_BLOCK_MAGIC) {
btp = xfs_dir2_block_tail_p(mp->m_dir_geo, block);
lep = xfs_dir2_block_leaf_p(btp);
endptr = (char *)lep;
@@ -2424,7 +2419,8 @@ process_data_dir_v2(
(*dot)++;
}
}
- if (be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC) {
+ if (be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC ||
+ be32_to_cpu(data->magic) == XFS_DIR3_BLOCK_MAGIC) {
endptr = (char *)data + mp->m_dir_geo->blksize;
for (i = stale = 0; lep && i < be32_to_cpu(btp->count); i++) {
if ((char *)&lep[i] >= endptr) {
@@ -2456,7 +2452,8 @@ process_data_dir_v2(
id->ino, dabno);
error++;
}
- if (be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC &&
+ if ((be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC ||
+ be32_to_cpu(data->magic) == XFS_DIR3_BLOCK_MAGIC) &&
count != be32_to_cpu(btp->count) - be32_to_cpu(btp->stale)) {
if (!sflag || v)
dbprintf(_("dir %lld block %d bad block tail count %d "
@@ -2465,7 +2462,8 @@ process_data_dir_v2(
be32_to_cpu(btp->stale));
error++;
}
- if (be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC &&
+ if ((be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC ||
+ be32_to_cpu(data->magic) == XFS_DIR2_BLOCK_MAGIC) &&
stale != be32_to_cpu(btp->stale)) {
if (!sflag || v)
dbprintf(_("dir %lld block %d bad stale tail count %d\n"),
@@ -3051,6 +3049,73 @@ process_leaf_node_dir_v2(
}
static void
+process_leaf_node_dir_v3_free(
+ inodata_t *id,
+ int v,
+ xfs_dablk_t dabno,
+ freetab_t *freetab)
+{
+ xfs_dir2_data_off_t ent;
+ struct xfs_dir3_free *free;
+ int i;
+ int maxent;
+ int used;
+
+ free = iocur_top->data;
+ maxent = M_DIROPS(mp)->free_max_bests(mp->m_dir_geo);
+ if (be32_to_cpu(free->hdr.firstdb) != xfs_dir2_da_to_db(mp->m_dir_geo,
+ dabno - mp->m_dir_geo->freeblk) * maxent) {
+ if (!sflag || v)
+ dbprintf(_("bad free block firstdb %d for dir ino %lld "
+ "block %d\n"),
+ be32_to_cpu(free->hdr.firstdb), id->ino, dabno);
+ error++;
+ return;
+ }
+ if (be32_to_cpu(free->hdr.nvalid) > maxent ||
+ be32_to_cpu(free->hdr.nvalid) < 0 ||
+ be32_to_cpu(free->hdr.nused) > maxent ||
+ be32_to_cpu(free->hdr.nused) < 0 ||
+ be32_to_cpu(free->hdr.nused) >
+ be32_to_cpu(free->hdr.nvalid)) {
+ if (!sflag || v)
+ dbprintf(_("bad free block nvalid/nused %d/%d for dir "
+ "ino %lld block %d\n"),
+ be32_to_cpu(free->hdr.nvalid),
+ be32_to_cpu(free->hdr.nused), id->ino, dabno);
+ error++;
+ return;
+ }
+ for (used = i = 0; i < be32_to_cpu(free->hdr.nvalid); i++) {
+ if (freetab->nents <= be32_to_cpu(free->hdr.firstdb) + i)
+ ent = NULLDATAOFF;
+ else
+ ent = freetab->ents[be32_to_cpu(free->hdr.firstdb) + i];
+ if (ent != be16_to_cpu(free->bests[i])) {
+ if (!sflag || v)
+ dbprintf(_("bad free block ent %d is %d should "
+ "be %d for dir ino %lld block %d\n"),
+ i, be16_to_cpu(free->bests[i]), ent,
+ id->ino, dabno);
+ error++;
+ }
+ if (be16_to_cpu(free->bests[i]) != NULLDATAOFF)
+ used++;
+ if (ent != NULLDATAOFF)
+ freetab->ents[be32_to_cpu(free->hdr.firstdb) + i] =
+ NULLDATAOFF;
+ }
+ if (used != be32_to_cpu(free->hdr.nused)) {
+ if (!sflag || v)
+ dbprintf(_("bad free block nused %d should be %d for dir "
+ "ino %lld block %d\n"),
+ be32_to_cpu(free->hdr.nused), used, id->ino,
+ dabno);
+ error++;
+ }
+}
+
+static void
process_leaf_node_dir_v2_free(
inodata_t *id,
int v,
@@ -3064,7 +3129,8 @@ process_leaf_node_dir_v2_free(
int used;
free = iocur_top->data;
- if (be32_to_cpu(free->hdr.magic) != XFS_DIR2_FREE_MAGIC) {
+ if (be32_to_cpu(free->hdr.magic) != XFS_DIR2_FREE_MAGIC &&
+ be32_to_cpu(free->hdr.magic) != XFS_DIR3_FREE_MAGIC) {
if (!sflag || v)
dbprintf(_("bad free block magic # %#x for dir ino %lld "
"block %d\n"),
@@ -3072,6 +3138,10 @@ process_leaf_node_dir_v2_free(
error++;
return;
}
+ if (be32_to_cpu(free->hdr.magic) == XFS_DIR3_FREE_MAGIC) {
+ process_leaf_node_dir_v3_free(id, v, dabno, freetab);
+ return;
+ }
maxent = M_DIROPS(mp)->free_max_bests(mp->m_dir_geo);
if (be32_to_cpu(free->hdr.firstdb) != xfs_dir2_da_to_db(mp->m_dir_geo,
dabno - mp->m_dir_geo->freeblk) * maxent) {
@@ -3125,6 +3195,21 @@ process_leaf_node_dir_v2_free(
}
}
+/*
+ * Get address of the bestcount field in the single-leaf block.
+ */
+static inline int
+xfs_dir3_leaf_ents_count(struct xfs_dir2_leaf *lp)
+{
+ if (lp->hdr.info.magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
+ lp->hdr.info.magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC)) {
+ struct xfs_dir3_leaf *lp3 = (struct xfs_dir3_leaf *)lp;
+
+ return be16_to_cpu(lp3->hdr.count);
+ }
+ return be16_to_cpu(lp->hdr.count);
+}
+
static void
process_leaf_node_dir_v2_int(
inodata_t *id,
@@ -3135,6 +3220,7 @@ process_leaf_node_dir_v2_int(
int i;
__be16 *lbp;
xfs_dir2_leaf_t *leaf;
+ struct xfs_dir3_leaf *leaf3 = NULL;
xfs_dir2_leaf_entry_t *lep;
xfs_dir2_leaf_tail_t *ltp;
xfs_da_intnode_t *node;
@@ -3143,7 +3229,15 @@ process_leaf_node_dir_v2_int(
leaf = iocur_top->data;
switch (be16_to_cpu(leaf->hdr.info.magic)) {
+ case XFS_DIR3_LEAF1_MAGIC:
+ case XFS_DIR3_LEAFN_MAGIC:
+ case XFS_DA3_NODE_MAGIC:
+ leaf3 = iocur_top->data;
+ break;
+ }
+ switch (be16_to_cpu(leaf->hdr.info.magic)) {
case XFS_DIR2_LEAF1_MAGIC:
+ case XFS_DIR3_LEAF1_MAGIC:
if (be32_to_cpu(leaf->hdr.info.forw) ||
be32_to_cpu(leaf->hdr.info.back)) {
if (!sflag || v)
@@ -3183,10 +3277,12 @@ process_leaf_node_dir_v2_int(
}
break;
case XFS_DIR2_LEAFN_MAGIC:
+ case XFS_DIR3_LEAFN_MAGIC:
/* if it's at the root location then we can check the
* pointers are null XXX */
break;
case XFS_DA_NODE_MAGIC:
+ case XFS_DA3_NODE_MAGIC:
node = iocur_top->data;
M_DIROPS(mp)->node_hdr_from_disk(&nodehdr, node);
if (nodehdr.level < 1 || nodehdr.level > XFS_DA_NODE_MAXDEPTH) {
@@ -3208,7 +3304,7 @@ process_leaf_node_dir_v2_int(
return;
}
lep = M_DIROPS(mp)->leaf_ents_p(leaf);
- for (i = stale = 0; i < be16_to_cpu(leaf->hdr.count); i++) {
+ for (i = stale = 0; i < xfs_dir3_leaf_ents_count(leaf); i++) {
if (be32_to_cpu(lep[i].address) == XFS_DIR2_NULL_DATAPTR)
stale++;
else if (dir_hash_see(be32_to_cpu(lep[i].hashval),
@@ -3221,7 +3317,14 @@ process_leaf_node_dir_v2_int(
error++;
}
}
- if (stale != be16_to_cpu(leaf->hdr.stale)) {
+ if (leaf3 && stale != be16_to_cpu(leaf3->hdr.stale)) {
+ if (!sflag || v)
+ dbprintf(_("dir3 %lld block %d stale mismatch "
+ "%d/%d\n"),
+ id->ino, dabno, stale,
+ be16_to_cpu(leaf3->hdr.stale));
+ error++;
+ } else if (!leaf && stale != be16_to_cpu(leaf->hdr.stale)) {
if (!sflag || v)
dbprintf(_("dir %lld block %d stale mismatch "
"%d/%d\n"),
@@ -3808,6 +3911,12 @@ scan_ag(
be32_to_cpu(agi->agi_root),
be32_to_cpu(agi->agi_level),
1, scanfunc_ino, TYP_INOBT);
+ if (agi->agi_free_root) {
+ scan_sbtree(agf,
+ be32_to_cpu(agi->agi_free_root),
+ be32_to_cpu(agi->agi_free_level),
+ 1, scanfunc_fino, TYP_FINOBT);
+ }
if (be32_to_cpu(agf->agf_freeblks) != agffreeblks) {
if (!sflag)
dbprintf(_("agf_freeblks %u, counted %u in ag %u\n"),
@@ -4007,7 +4116,8 @@ scanfunc_bmap(
agno = XFS_FSB_TO_AGNO(mp, bno);
agbno = XFS_FSB_TO_AGBNO(mp, bno);
- if (be32_to_cpu(block->bb_magic) != XFS_BMAP_MAGIC) {
+ if (be32_to_cpu(block->bb_magic) != XFS_BMAP_MAGIC &&
+ be32_to_cpu(block->bb_magic) != XFS_BMAP_CRC_MAGIC) {
if (!sflag || id->ilist || CHECK_BLIST(bno))
dbprintf(_("bad magic # %#x in inode %lld bmbt block "
"%u/%u\n"),
@@ -4072,7 +4182,8 @@ scanfunc_bno(
xfs_agnumber_t seqno = be32_to_cpu(agf->agf_seqno);
xfs_agblock_t lastblock;
- if (be32_to_cpu(block->bb_magic) != XFS_ABTB_MAGIC) {
+ if (be32_to_cpu(block->bb_magic) != XFS_ABTB_MAGIC &&
+ be32_to_cpu(block->bb_magic) != XFS_ABTB_CRC_MAGIC) {
dbprintf(_("bad magic # %#x in btbno block %u/%u\n"),
be32_to_cpu(block->bb_magic), seqno, bno);
serious_error++;
@@ -4145,7 +4256,8 @@ scanfunc_cnt(
xfs_alloc_rec_t *rp;
xfs_extlen_t lastcount;
- if (be32_to_cpu(block->bb_magic) != XFS_ABTC_MAGIC) {
+ if (be32_to_cpu(block->bb_magic) != XFS_ABTC_MAGIC &&
+ be32_to_cpu(block->bb_magic) != XFS_ABTC_CRC_MAGIC) {
dbprintf(_("bad magic # %#x in btcnt block %u/%u\n"),
be32_to_cpu(block->bb_magic), seqno, bno);
serious_error++;
@@ -4225,7 +4337,8 @@ scanfunc_ino(
xfs_inobt_ptr_t *pp;
xfs_inobt_rec_t *rp;
- if (be32_to_cpu(block->bb_magic) != XFS_IBT_MAGIC) {
+ if (be32_to_cpu(block->bb_magic) != XFS_IBT_MAGIC &&
+ be32_to_cpu(block->bb_magic) != XFS_IBT_CRC_MAGIC) {
dbprintf(_("bad magic # %#x in inobt block %u/%u\n"),
be32_to_cpu(block->bb_magic), seqno, bno);
serious_error++;
@@ -4321,6 +4434,79 @@ scanfunc_ino(
}
static void
+scanfunc_fino(
+ struct xfs_btree_block *block,
+ int level,
+ struct xfs_agf *agf,
+ xfs_agblock_t bno,
+ int isroot)
+{
+ xfs_agino_t agino;
+ xfs_agnumber_t seqno = be32_to_cpu(agf->agf_seqno);
+ int i;
+ int off;
+ xfs_inobt_ptr_t *pp;
+ struct xfs_inobt_rec *rp;
+
+ if (be32_to_cpu(block->bb_magic) != XFS_FIBT_MAGIC &&
+ be32_to_cpu(block->bb_magic) != XFS_FIBT_CRC_MAGIC) {
+ dbprintf(_("bad magic # %#x in finobt block %u/%u\n"),
+ be32_to_cpu(block->bb_magic), seqno, bno);
+ serious_error++;
+ return;
+ }
+ if (be16_to_cpu(block->bb_level) != level) {
+ if (!sflag)
+ dbprintf(_("expected level %d got %d in finobt block "
+ "%u/%u\n"),
+ level, be16_to_cpu(block->bb_level), seqno, bno);
+ error++;
+ }
+ set_dbmap(seqno, bno, 1, DBM_BTFINO, seqno, bno);
+ if (level == 0) {
+ if (be16_to_cpu(block->bb_numrecs) > mp->m_inobt_mxr[0] ||
+ (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_inobt_mnr[0])) {
+ dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in "
+ "finobt block %u/%u\n"),
+ be16_to_cpu(block->bb_numrecs), mp->m_inobt_mnr[0],
+ mp->m_inobt_mxr[0], seqno, bno);
+ serious_error++;
+ return;
+ }
+ rp = XFS_INOBT_REC_ADDR(mp, block, 1);
+ for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) {
+ agino = be32_to_cpu(rp[i].ir_startino);
+ off = XFS_INO_TO_OFFSET(mp, agino);
+ if (off == 0) {
+ if ((sbversion & XFS_SB_VERSION_ALIGNBIT) &&
+ mp->m_sb.sb_inoalignmt &&
+ (XFS_INO_TO_AGBNO(mp, agino) %
+ mp->m_sb.sb_inoalignmt))
+ sbversion &= ~XFS_SB_VERSION_ALIGNBIT;
+ check_set_dbmap(seqno, XFS_AGINO_TO_AGBNO(mp, agino),
+ (xfs_extlen_t)MAX(1,
+ XFS_INODES_PER_CHUNK >>
+ mp->m_sb.sb_inopblog),
+ DBM_INODE, DBM_INODE, seqno, bno);
+ }
+ }
+ return;
+ }
+ if (be16_to_cpu(block->bb_numrecs) > mp->m_inobt_mxr[1] ||
+ (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_inobt_mnr[1])) {
+ dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in finobt block "
+ "%u/%u\n"),
+ be16_to_cpu(block->bb_numrecs), mp->m_inobt_mnr[1],
+ mp->m_inobt_mxr[1], seqno, bno);
+ serious_error++;
+ return;
+ }
+ pp = XFS_INOBT_PTR_ADDR(mp, block, 1, mp->m_inobt_mxr[1]);
+ for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++)
+ scan_sbtree(agf, be32_to_cpu(pp[i]), level, 0, scanfunc_fino, TYP_FINOBT);
+}
+
+static void
set_dbmap(
xfs_agnumber_t agno,
xfs_agblock_t agbno,
diff --git a/db/type.c b/db/type.c
index 5c60736..955986b 100644
--- a/db/type.c
+++ b/db/type.c
@@ -141,6 +141,8 @@ static const typ_t __typtab_spcrc[] = {
{ TYP_SYMLINK, "symlink", handle_struct, symlink_crc_hfld,
&xfs_symlink_buf_ops },
{ TYP_TEXT, "text", handle_text, NULL, NULL },
+ { TYP_FINOBT, "finobt", handle_struct, inobt_crc_hfld,
+ &xfs_inobt_buf_ops },
{ TYP_NONE, NULL }
};
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 06/11] xfs_repair: check v5 filesystem attr block header sanity
2015-08-26 0:32 ` [PATCH 06/11] xfs_repair: check v5 filesystem attr block header sanity Darrick J. Wong
@ 2015-08-26 0:45 ` Dave Chinner
2015-08-26 0:59 ` Darrick J. Wong
0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2015-08-26 0:45 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: xfs
On Tue, Aug 25, 2015 at 05:32:59PM -0700, Darrick J. Wong wrote:
> Check the v5 fields (uuid, blocknr, owner) of attribute blocks for
> obvious errors while scanning xattr blocks. If the ownership info
> is incorrect, kill the block.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Why hasn't the buffer verifier done this validation?
> @@ -1564,6 +1602,13 @@ process_longform_attr(
> if (bp->b_error == -EFSBADCRC)
> (*repair)++;
>
> + /* is this block sane? */
> + if (__check_attr_header(mp, bp, ino)) {
> + *repair = 0;
> + libxfs_putbuf(bp);
> + return 1;
> + }
As you can see the above hunk has a bad CRC check from the verifier,
and if the attr header is wrong then the verifier should be setting
bp->b_error == -EFSCORRUPTED.
So shouldn't this simply be:
+ if (bp->b_error == -EFSCORRUPTED) {
+ *repair = 0;
+ libxfs_putbuf(bp);
+ return 1;
+ }
+
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 07/11] xfs_repair: force not-so-bad bmbt blocks back through the verifier
2015-08-26 0:33 ` [PATCH 07/11] xfs_repair: force not-so-bad bmbt blocks back through the verifier Darrick J. Wong
@ 2015-08-26 0:54 ` Dave Chinner
2015-08-26 3:59 ` Darrick J. Wong
2015-08-26 5:24 ` [PATCH v2 " Darrick J. Wong
0 siblings, 2 replies; 25+ messages in thread
From: Dave Chinner @ 2015-08-26 0:54 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: xfs
On Tue, Aug 25, 2015 at 05:33:05PM -0700, Darrick J. Wong wrote:
> If during prefetch we encounter a bmbt block that fails the CRC check
> due to corruption in the unused part of the block, force the buffer
> back through the non-prefetch verifiers later so that the CRC is
> updated. Otherwise, the bad checksum goes unfixed and the kernel will
> still flag the bmbt block as invalid.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> repair/prefetch.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
>
> diff --git a/repair/prefetch.c b/repair/prefetch.c
> index 1de3ec0..77d29c8 100644
> --- a/repair/prefetch.c
> +++ b/repair/prefetch.c
> @@ -276,6 +276,14 @@ pf_scan_lbtree(
>
> XFS_BUF_SET_PRIORITY(bp, isadir ? B_DIR_BMAP : B_BMAP);
>
> + /*
> + * Make this bmbt buffer go back through the verifiers later so that
> + * we correct checksum errors stemming from bitflips in the unused
> + * parts of the bmbt block.
> + */
> + if (bp->b_error == -EFSBADCRC)
> + bp->b_flags |= LIBXFS_B_UNCHECKED;
This is because the next read of the buffer clears bp->b_error,
right?
So, while I think this is necessary, I also think the prefetch on
this btree should stop as we can't trust the contents of the buffer
to be correct. Hence I'd suggest that:
/*
* If the verfier flagged a problem with the buffer, we
* can't trust it's contents for the purposes of readahead.
* Stop prefetching the tree, and mark this buffer as
* unchecked so that the next read of the buffer by the
* repair code will retain the error status and hence be
* acted on appropriately.
*/
if (bp->b_error) {
bp->b_flags |= LIBXFS_B_UNCHECKED;
libxfs_putbuf(bp);
return 0;
}
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 06/11] xfs_repair: check v5 filesystem attr block header sanity
2015-08-26 0:45 ` Dave Chinner
@ 2015-08-26 0:59 ` Darrick J. Wong
2015-08-26 1:15 ` Dave Chinner
0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 0:59 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Aug 26, 2015 at 10:45:02AM +1000, Dave Chinner wrote:
> On Tue, Aug 25, 2015 at 05:32:59PM -0700, Darrick J. Wong wrote:
> > Check the v5 fields (uuid, blocknr, owner) of attribute blocks for
> > obvious errors while scanning xattr blocks. If the ownership info
> > is incorrect, kill the block.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>
> Why hasn't the buffer verifier done this validation?
Maybe I'm confused here, so here's what I think is going on:
AFAICT most of the verifiers do things like this:
if (crcs_enabled && cksum_verification fails) {
xfs_buf_ioerror(bp, -EFSBADCRC);
} else if (header_is_insane) {
xfs_buf_ioerror(bp, -EFSCORRUPTED);
}
The fuzzer corrupts the UUID without updating the CRC. The verifier first
checks the CRC and it doesn't match, so it sets b_error to -EFSBADCRC and
doesn't get to the header check. Then the verifier returns, so we end up back
in process_longform_attr. Where do we set -EFSCORRUPTED when the CRC also
doesn't match?
--D
>
> > @@ -1564,6 +1602,13 @@ process_longform_attr(
> > if (bp->b_error == -EFSBADCRC)
> > (*repair)++;
> >
> > + /* is this block sane? */
> > + if (__check_attr_header(mp, bp, ino)) {
> > + *repair = 0;
> > + libxfs_putbuf(bp);
> > + return 1;
> > + }
>
> As you can see the above hunk has a bad CRC check from the verifier,
> and if the attr header is wrong then the verifier should be setting
> bp->b_error == -EFSCORRUPTED.
>
> So shouldn't this simply be:
>
> + if (bp->b_error == -EFSCORRUPTED) {
> + *repair = 0;
> + libxfs_putbuf(bp);
> + return 1;
> + }
> +
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 04/11] libxfs: clear buffer state flags in libxfs_getbuf and variants
2015-08-26 0:32 ` [PATCH 04/11] libxfs: clear buffer state flags in libxfs_getbuf and variants Darrick J. Wong
@ 2015-08-26 1:02 ` Dave Chinner
2015-08-26 4:05 ` Darrick J. Wong
2015-08-26 5:23 ` [PATCH v2 " Darrick J. Wong
0 siblings, 2 replies; 25+ messages in thread
From: Dave Chinner @ 2015-08-26 1:02 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: xfs
On Tue, Aug 25, 2015 at 05:32:46PM -0700, Darrick J. Wong wrote:
> When we're running xfs_repair with prefetch enabled, it's possible
> that repair will decide to clear an inode without examining all
> metadata blocks owned by that inode. This leaves the unreferenced
> prefetched buffers marked UNCHECKED, which will cause a subsequent CRC
> error if the block is reallocated to a different structure and read
> more than once. Typically this happens when a large directory is
> corrupted and lost+found has to grow to accomodate all the
> disconnected inodes.
>
> In libxfs_getbuf*(), we're supposed to return an unused buffer which
> has a clean state. Unfortunately, things like UNCHECKED can hang
> around to cause incorrect verifier errors later, so change those
> functions to launder the state bits clean.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> libxfs/rdwr.c | 47 +++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 41 insertions(+), 6 deletions(-)
>
>
> diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> index 4f8212f..d28cea8 100644
> --- a/libxfs/rdwr.c
> +++ b/libxfs/rdwr.c
> @@ -631,15 +631,39 @@ libxfs_getbuf_flags(struct xfs_buftarg *btp, xfs_daddr_t blkno, int len,
> return __cache_lookup(&key, flags);
> }
>
> +/*
> + * Clean the buffer flags for libxfs_getbuf*(), which wants to return
> + * an unused buffer with clean state. This prevents CRC errors on a
> + * re-read of a corrupt block that was prefetched and freed. This
> + * can happen with a massively corrupt directory that is discarded,
> + * but whose blocks are then recycled into expanding lost+found.
> + *
> + * Note however that if the buffer's dirty (prefetch calls getbuf)
> + * we'll leave the state alone because we don't want to discard blocks
> + * that have been fixed.
> + */
> +static void
> +try_clean_buf(
Only thing I don't like about this patch is the name of this
function. It's really a "reset buffer state" function, so I think
that calling it something like reset_buf_state() would be more
appropriate.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 06/11] xfs_repair: check v5 filesystem attr block header sanity
2015-08-26 0:59 ` Darrick J. Wong
@ 2015-08-26 1:15 ` Dave Chinner
2015-08-26 4:50 ` Darrick J. Wong
0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2015-08-26 1:15 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: xfs
On Tue, Aug 25, 2015 at 05:59:11PM -0700, Darrick J. Wong wrote:
> On Wed, Aug 26, 2015 at 10:45:02AM +1000, Dave Chinner wrote:
> > On Tue, Aug 25, 2015 at 05:32:59PM -0700, Darrick J. Wong wrote:
> > > Check the v5 fields (uuid, blocknr, owner) of attribute blocks for
> > > obvious errors while scanning xattr blocks. If the ownership info
> > > is incorrect, kill the block.
> > >
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > Why hasn't the buffer verifier done this validation?
>
> Maybe I'm confused here, so here's what I think is going on:
>
> AFAICT most of the verifiers do things like this:
>
> if (crcs_enabled && cksum_verification fails) {
> xfs_buf_ioerror(bp, -EFSBADCRC);
> } else if (header_is_insane) {
> xfs_buf_ioerror(bp, -EFSCORRUPTED);
> }
>
> The fuzzer corrupts the UUID without updating the CRC. The verifier first
> checks the CRC and it doesn't match, so it sets b_error to -EFSBADCRC and
> doesn't get to the header check.
Ok, that explains it - I didn't consider that case. This would seem
like a general problem for repair when CRC errors are detected? i.e.
we set the repair flag without doing the remaining verifier validity
checks?
As it is, I don't really like duplicating the verifier checks in
repair. ISTR I recently suggested that we need to factor all the
common verifier checks (magic, owner, uuid, blockno) into a single
function that all verifiers called to remove all the code
duplication. If we do this, then repair can also call the function
to verify headers after a CRC failure to determine if repair is
possible....
This is a bit more work, so I'll probably take this specific patch
for 4.2.0, but I'd like to see this all factored out so we aren't
duplicating code unnecessarily.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 07/11] xfs_repair: force not-so-bad bmbt blocks back through the verifier
2015-08-26 0:54 ` Dave Chinner
@ 2015-08-26 3:59 ` Darrick J. Wong
2015-08-26 5:24 ` [PATCH v2 " Darrick J. Wong
1 sibling, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 3:59 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Aug 26, 2015 at 10:54:47AM +1000, Dave Chinner wrote:
> On Tue, Aug 25, 2015 at 05:33:05PM -0700, Darrick J. Wong wrote:
> > If during prefetch we encounter a bmbt block that fails the CRC check
> > due to corruption in the unused part of the block, force the buffer
> > back through the non-prefetch verifiers later so that the CRC is
> > updated. Otherwise, the bad checksum goes unfixed and the kernel will
> > still flag the bmbt block as invalid.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > repair/prefetch.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> >
> > diff --git a/repair/prefetch.c b/repair/prefetch.c
> > index 1de3ec0..77d29c8 100644
> > --- a/repair/prefetch.c
> > +++ b/repair/prefetch.c
> > @@ -276,6 +276,14 @@ pf_scan_lbtree(
> >
> > XFS_BUF_SET_PRIORITY(bp, isadir ? B_DIR_BMAP : B_BMAP);
> >
> > + /*
> > + * Make this bmbt buffer go back through the verifiers later so that
> > + * we correct checksum errors stemming from bitflips in the unused
> > + * parts of the bmbt block.
> > + */
> > + if (bp->b_error == -EFSBADCRC)
> > + bp->b_flags |= LIBXFS_B_UNCHECKED;
>
> This is because the next read of the buffer clears bp->b_error,
> right?
>
> So, while I think this is necessary, I also think the prefetch on
> this btree should stop as we can't trust the contents of the buffer
> to be correct. Hence I'd suggest that:
>
> /*
> * If the verfier flagged a problem with the buffer, we
> * can't trust it's contents for the purposes of readahead.
> * Stop prefetching the tree, and mark this buffer as
> * unchecked so that the next read of the buffer by the
> * repair code will retain the error status and hence be
> * acted on appropriately.
> */
> if (bp->b_error) {
> bp->b_flags |= LIBXFS_B_UNCHECKED;
> libxfs_putbuf(bp);
> return 0;
> }
Ok, I'll change it to stop prefetch on the bmapbt.
--D
>
> Cheers,
>
> Dave.
>
>
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 04/11] libxfs: clear buffer state flags in libxfs_getbuf and variants
2015-08-26 1:02 ` Dave Chinner
@ 2015-08-26 4:05 ` Darrick J. Wong
2015-08-26 5:23 ` [PATCH v2 " Darrick J. Wong
1 sibling, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 4:05 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Aug 26, 2015 at 11:02:32AM +1000, Dave Chinner wrote:
> On Tue, Aug 25, 2015 at 05:32:46PM -0700, Darrick J. Wong wrote:
> > When we're running xfs_repair with prefetch enabled, it's possible
> > that repair will decide to clear an inode without examining all
> > metadata blocks owned by that inode. This leaves the unreferenced
> > prefetched buffers marked UNCHECKED, which will cause a subsequent CRC
> > error if the block is reallocated to a different structure and read
> > more than once. Typically this happens when a large directory is
> > corrupted and lost+found has to grow to accomodate all the
> > disconnected inodes.
> >
> > In libxfs_getbuf*(), we're supposed to return an unused buffer which
> > has a clean state. Unfortunately, things like UNCHECKED can hang
> > around to cause incorrect verifier errors later, so change those
> > functions to launder the state bits clean.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > libxfs/rdwr.c | 47 +++++++++++++++++++++++++++++++++++++++++------
> > 1 file changed, 41 insertions(+), 6 deletions(-)
> >
> >
> > diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> > index 4f8212f..d28cea8 100644
> > --- a/libxfs/rdwr.c
> > +++ b/libxfs/rdwr.c
> > @@ -631,15 +631,39 @@ libxfs_getbuf_flags(struct xfs_buftarg *btp, xfs_daddr_t blkno, int len,
> > return __cache_lookup(&key, flags);
> > }
> >
> > +/*
> > + * Clean the buffer flags for libxfs_getbuf*(), which wants to return
> > + * an unused buffer with clean state. This prevents CRC errors on a
> > + * re-read of a corrupt block that was prefetched and freed. This
> > + * can happen with a massively corrupt directory that is discarded,
> > + * but whose blocks are then recycled into expanding lost+found.
> > + *
> > + * Note however that if the buffer's dirty (prefetch calls getbuf)
> > + * we'll leave the state alone because we don't want to discard blocks
> > + * that have been fixed.
> > + */
> > +static void
> > +try_clean_buf(
>
> Only thing I don't like about this patch is the name of this
> function. It's really a "reset buffer state" function, so I think
> that calling it something like reset_buf_state() would be more
> appropriate.
Done.
--D
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 06/11] xfs_repair: check v5 filesystem attr block header sanity
2015-08-26 1:15 ` Dave Chinner
@ 2015-08-26 4:50 ` Darrick J. Wong
2015-08-26 6:20 ` Dave Chinner
0 siblings, 1 reply; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 4:50 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Aug 26, 2015 at 11:15:20AM +1000, Dave Chinner wrote:
> On Tue, Aug 25, 2015 at 05:59:11PM -0700, Darrick J. Wong wrote:
> > On Wed, Aug 26, 2015 at 10:45:02AM +1000, Dave Chinner wrote:
> > > On Tue, Aug 25, 2015 at 05:32:59PM -0700, Darrick J. Wong wrote:
> > > > Check the v5 fields (uuid, blocknr, owner) of attribute blocks for
> > > > obvious errors while scanning xattr blocks. If the ownership info
> > > > is incorrect, kill the block.
> > > >
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > Why hasn't the buffer verifier done this validation?
> >
> > Maybe I'm confused here, so here's what I think is going on:
> >
> > AFAICT most of the verifiers do things like this:
> >
> > if (crcs_enabled && cksum_verification fails) {
> > xfs_buf_ioerror(bp, -EFSBADCRC);
> > } else if (header_is_insane) {
> > xfs_buf_ioerror(bp, -EFSCORRUPTED);
> > }
> >
> > The fuzzer corrupts the UUID without updating the CRC. The verifier first
> > checks the CRC and it doesn't match, so it sets b_error to -EFSBADCRC and
> > doesn't get to the header check.
>
> Ok, that explains it - I didn't consider that case. This would seem
> like a general problem for repair when CRC errors are detected? i.e.
> we set the repair flag without doing the remaining verifier validity
> checks?
Heh, this seems like a moderately large refactoring project... :)
AFAICT, repair is at least checking some of that stuff for bmap,
directory, and symlink blocks, but as you point out it's scattered all
over the place.
It looks like the verifiers can easily check the magic, uuid, and
block fields; but how would we get the owner info to the verifier?
What if we add a "u64 owner" to xfs_buf and change all functions that
grab a buffer to take a u64 owner argument? That'd be rather a lot of
things to change between the kernel and xfsprogs, but otoh we'd gain
owner checking and centralize verification of the v5 fields. I think
we'd have to change xfs_trans_{get,read}_buf*() in the kernel and
libxfs_{get,read}buf*() in xfsprogs.
Then we'd rework the verifiers as such:
if (header_is_insane)
xfs_buf_ioerror(bp, -EFSCORRUPTED);
else if (crcs and crc_fails)
xfs_buf_ioerror(bp, -EFSBADCRC);
Since the header being garbage seems like a much more severe error
than only the checksum being wrong. Then we'd always know when the
v5 fields don't match our expectations.
Next we'd change repair to do something like this:
bp = libxfs_readbuf(...);
if (!bp) {
/* couldn't get a buffer, abort */
return;
} else if (bp->b_error == -EFSBADCRC) {
/* just a bad crc; see if the rest of the block is ok */
repair++;
} else if (bp->b_error) {
/* io error or corrupt header; toss out the owner */
toss_owner();
libxfs_putbuf(bp);
return;
}
/* more checks... */
if (repair)
libxfs_writebuf(bp);
How's that sound?
--D
>
> As it is, I don't really like duplicating the verifier checks in
> repair. ISTR I recently suggested that we need to factor all the
> common verifier checks (magic, owner, uuid, blockno) into a single
> function that all verifiers called to remove all the code
> duplication. If we do this, then repair can also call the function
> to verify headers after a CRC failure to determine if repair is
> possible....
>
> This is a bit more work, so I'll probably take this specific patch
> for 4.2.0, but I'd like to see this all factored out so we aren't
> duplicating code unnecessarily.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 04/11] libxfs: clear buffer state flags in libxfs_getbuf and variants
2015-08-26 1:02 ` Dave Chinner
2015-08-26 4:05 ` Darrick J. Wong
@ 2015-08-26 5:23 ` Darrick J. Wong
1 sibling, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 5:23 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
When we're running xfs_repair with prefetch enabled, it's possible
that repair will decide to clear an inode without examining all
metadata blocks owned by that inode. This leaves the unreferenced
prefetched buffers marked UNCHECKED, which will cause a subsequent CRC
error if the block is reallocated to a different structure and read
more than once. Typically this happens when a large directory is
corrupted and lost+found has to grow to accomodate all the
disconnected inodes.
In libxfs_getbuf*(), we're supposed to return an unused buffer which
has a clean state. Unfortunately, things like UNCHECKED can hang
around to cause incorrect verifier errors later, so change those
functions to launder the state bits clean.
v2: Change the function name to reset_buf_state() to reflect what
the function is trying to accomplish.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
libxfs/rdwr.c | 47 +++++++++++++++++++++++++++++++++++++++++------
1 file changed, 41 insertions(+), 6 deletions(-)
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 4f8212f..bc77699 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -631,15 +631,39 @@ libxfs_getbuf_flags(struct xfs_buftarg *btp, xfs_daddr_t blkno, int len,
return __cache_lookup(&key, flags);
}
+/*
+ * Clean the buffer flags for libxfs_getbuf*(), which wants to return
+ * an unused buffer with clean state. This prevents CRC errors on a
+ * re-read of a corrupt block that was prefetched and freed. This
+ * can happen with a massively corrupt directory that is discarded,
+ * but whose blocks are then recycled into expanding lost+found.
+ *
+ * Note however that if the buffer's dirty (prefetch calls getbuf)
+ * we'll leave the state alone because we don't want to discard blocks
+ * that have been fixed.
+ */
+static void
+reset_buf_state(
+ struct xfs_buf *bp)
+{
+ if (bp && !(bp->b_flags & LIBXFS_B_DIRTY))
+ bp->b_flags &= ~(LIBXFS_B_UNCHECKED | LIBXFS_B_STALE |
+ LIBXFS_B_UPTODATE);
+}
+
struct xfs_buf *
libxfs_getbuf(struct xfs_buftarg *btp, xfs_daddr_t blkno, int len)
{
- return libxfs_getbuf_flags(btp, blkno, len, 0);
+ struct xfs_buf *bp;
+
+ bp = libxfs_getbuf_flags(btp, blkno, len, 0);
+ reset_buf_state(bp);
+ return bp;
}
-struct xfs_buf *
-libxfs_getbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map,
- int nmaps, int flags)
+static struct xfs_buf *
+__libxfs_getbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map,
+ int nmaps, int flags)
{
struct xfs_bufkey key = {0};
int i;
@@ -659,6 +683,17 @@ libxfs_getbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map,
return __cache_lookup(&key, flags);
}
+struct xfs_buf *
+libxfs_getbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map,
+ int nmaps, int flags)
+{
+ struct xfs_buf *bp;
+
+ bp = __libxfs_getbuf_map(btp, map, nmaps, flags);
+ reset_buf_state(bp);
+ return bp;
+}
+
void
libxfs_putbuf(xfs_buf_t *bp)
{
@@ -779,7 +814,7 @@ libxfs_readbuf(struct xfs_buftarg *btp, xfs_daddr_t blkno, int len, int flags,
xfs_buf_t *bp;
int error;
- bp = libxfs_getbuf(btp, blkno, len);
+ bp = libxfs_getbuf_flags(btp, blkno, len, 0);
if (!bp)
return NULL;
@@ -860,7 +895,7 @@ libxfs_readbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map, int nmaps,
return libxfs_readbuf(btp, map[0].bm_bn, map[0].bm_len,
flags, ops);
- bp = libxfs_getbuf_map(btp, map, nmaps, 0);
+ bp = __libxfs_getbuf_map(btp, map, nmaps, 0);
if (!bp)
return NULL;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 07/11] xfs_repair: force not-so-bad bmbt blocks back through the verifier
2015-08-26 0:54 ` Dave Chinner
2015-08-26 3:59 ` Darrick J. Wong
@ 2015-08-26 5:24 ` Darrick J. Wong
1 sibling, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-08-26 5:24 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
If during prefetch we encounter a bmbt block that fails the CRC check
due to corruption in the unused part of the block, force the buffer
back through the non-prefetch verifiers later so that the CRC is
updated. Otherwise, the bad checksum goes unfixed and the kernel will
still flag the bmbt block as invalid.
v2: Halt all readahead on the bmapbt if any of its blocks produce an
error.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
repair/prefetch.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/repair/prefetch.c b/repair/prefetch.c
index 1de3ec0..32ec55e 100644
--- a/repair/prefetch.c
+++ b/repair/prefetch.c
@@ -276,6 +276,18 @@ pf_scan_lbtree(
XFS_BUF_SET_PRIORITY(bp, isadir ? B_DIR_BMAP : B_BMAP);
+ /*
+ * If the verifier flagged a problem with the buffer, we can't trust
+ * its contents for the purposes of reading ahead. Stop prefetching
+ * the tree and mark the buffer unchecked so that the next read of the
+ * buffer will retain the error status and be acted upon appropriately.
+ */
+ if (bp->b_error) {
+ bp->b_flags |= LIBXFS_B_UNCHECKED;
+ libxfs_putbuf(bp);
+ return 0;
+ }
+
rc = (*func)(XFS_BUF_TO_BLOCK(bp), level - 1, isadir, args);
libxfs_putbuf(bp);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 06/11] xfs_repair: check v5 filesystem attr block header sanity
2015-08-26 4:50 ` Darrick J. Wong
@ 2015-08-26 6:20 ` Dave Chinner
0 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2015-08-26 6:20 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: xfs
On Tue, Aug 25, 2015 at 09:50:43PM -0700, Darrick J. Wong wrote:
> On Wed, Aug 26, 2015 at 11:15:20AM +1000, Dave Chinner wrote:
> > On Tue, Aug 25, 2015 at 05:59:11PM -0700, Darrick J. Wong wrote:
> > > On Wed, Aug 26, 2015 at 10:45:02AM +1000, Dave Chinner wrote:
> > > > On Tue, Aug 25, 2015 at 05:32:59PM -0700, Darrick J. Wong wrote:
> > > > > Check the v5 fields (uuid, blocknr, owner) of attribute blocks for
> > > > > obvious errors while scanning xattr blocks. If the ownership info
> > > > > is incorrect, kill the block.
> > > > >
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > >
> > > > Why hasn't the buffer verifier done this validation?
> > >
> > > Maybe I'm confused here, so here's what I think is going on:
> > >
> > > AFAICT most of the verifiers do things like this:
> > >
> > > if (crcs_enabled && cksum_verification fails) {
> > > xfs_buf_ioerror(bp, -EFSBADCRC);
> > > } else if (header_is_insane) {
> > > xfs_buf_ioerror(bp, -EFSCORRUPTED);
> > > }
> > >
> > > The fuzzer corrupts the UUID without updating the CRC. The verifier first
> > > checks the CRC and it doesn't match, so it sets b_error to -EFSBADCRC and
> > > doesn't get to the header check.
> >
> > Ok, that explains it - I didn't consider that case. This would seem
> > like a general problem for repair when CRC errors are detected? i.e.
> > we set the repair flag without doing the remaining verifier validity
> > checks?
>
> Heh, this seems like a moderately large refactoring project... :)
>
> AFAICT, repair is at least checking some of that stuff for bmap,
> directory, and symlink blocks, but as you point out it's scattered all
> over the place.
>
> It looks like the verifiers can easily check the magic, uuid, and
> block fields; but how would we get the owner info to the verifier?
Make it optional. i.e. if the owner passed in is 0, then don't check
it. That way we can pass in the owner if we have it available,
otherwise we don't check it.
> What if we add a "u64 owner" to xfs_buf and change all functions that
> grab a buffer to take a u64 owner argument? That'd be rather a lot of
> things to change between the kernel and xfsprogs, but otoh we'd gain
> owner checking and centralize verification of the v5 fields.
Maybe in the long term, but right now that seems like a lot of churn
for not much gain. I think solving the code duplication problem is
the immediate issue that we need to fix.
> I think
> we'd have to change xfs_trans_{get,read}_buf*() in the kernel and
> libxfs_{get,read}buf*() in xfsprogs.
>
> Then we'd rework the verifiers as such:
>
> if (header_is_insane)
> xfs_buf_ioerror(bp, -EFSCORRUPTED);
> else if (crcs and crc_fails)
> xfs_buf_ioerror(bp, -EFSBADCRC);
>
> Since the header being garbage seems like a much more severe error
> than only the checksum being wrong. Then we'd always know when the
> v5 fields don't match our expectations.
Except for the fact that the CRC failure tells us the data read from
disk is different to what we wrote, and that's a primary protection
against having to parse corrupt structures in the kernel.
The userspace code is a bit different - bad CRCs are just an
indication that there's a problem that needs fixing and a structure
that needs further validation.
Hence I'd much rather the header validation gets factored and then
run explicitly by the repair code if a bad CRC is detected. That way
the repair code can take action specific to the problem detected,
but we don't compromise the protection the CRCs provide the kernel
code...
> Next we'd change repair to do something like this:
>
> bp = libxfs_readbuf(...);
> if (!bp) {
> /* couldn't get a buffer, abort */
> return;
> } else if (bp->b_error == -EFSBADCRC) {
> /* just a bad crc; see if the rest of the block is ok */
> repair++;
> } else if (bp->b_error) {
> /* io error or corrupt header; toss out the owner */
> toss_owner();
> libxfs_putbuf(bp);
> return;
> }
>
Use a helper function specific to repair:
enum {
OK,
TOSS,
REPAIR,
ABORT,
};
int
repair_readbuf(... struct xfs_buf **bpp)
{
bp = libxfs_readbuf(...);
if (!bp) {
/* couldn't get a buffer, abort */
return ABORT;
}
if (!bp->error) {
*bpp = bp;
return OK;
}
if (bp->b_error == -EFSBADCRC) {
if (verify_header(....)) {
/* just a bad crc; see if the rest of the block is ok */
bp->b_error = 0;
return REPAIR;
}
}
/* io error or corrupt header; toss out the owner */
libxfs_putbuf(bp);
return TOSS;
}
Which tells repair exactly what to do ;)
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 08/11] xfs_io: support reflinking and deduping file ranges
2015-08-26 0:33 ` [PATCH 08/11] xfs_io: support reflinking and deduping file ranges Darrick J. Wong
@ 2015-09-22 2:17 ` Dave Chinner
2015-09-28 17:03 ` Darrick J. Wong
0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2015-09-22 2:17 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: xfs
On Tue, Aug 25, 2015 at 05:33:11PM -0700, Darrick J. Wong wrote:
> +static int
> +dedupe_f(
> + int argc,
> + char **argv)
> +{
> + off64_t soffset, doffset;
> + long long count, total;
> + char s1[64], s2[64], ts[64];
Magic!
> + char *infile;
> + int Cflag, qflag, wflag, Wflag;
Urk. Variables that differ only by case!
> + struct xfs_ioctl_file_extent_same_args *args = NULL;
> + struct xfs_ioctl_file_extent_same_info *info;
verbosity--;
> + size_t fsblocksize, fssectsize;
> + struct timeval t1, t2;
> + int c, fd = -1;
> +
> + Cflag = qflag = wflag = Wflag = 0;
> + init_cvtnum(&fsblocksize, &fssectsize);
> +
> + while ((c = getopt(argc, argv, "CqwW")) != EOF) {
> + switch (c) {
> + case 'C':
> + Cflag = 1;
> + break;
> + case 'q':
> + qflag = 1;
> + break;
> + case 'w':
> + wflag = 1;
> + break;
> + case 'W':
> + Wflag = 1;
> + break;
> + default:
> + return command_usage(&dedupe_cmd);
> + }
> + }
> + if (optind != argc - 4)
> + return command_usage(&dedupe_cmd);
> + infile = argv[optind];
> + optind++;
> + soffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
> + if (soffset < 0) {
> + printf(_("non-numeric src offset argument -- %s\n"), argv[optind]);
> + return 0;
> + }
> + optind++;
> + doffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
> + if (doffset < 0) {
> + printf(_("non-numeric dest offset argument -- %s\n"), argv[optind]);
> + return 0;
> + }
> + optind++;
> + count = cvtnum(fsblocksize, fssectsize, argv[optind]);
> + if (count < 1) {
> + printf(_("non-positive length argument -- %s\n"), argv[optind]);
> + return 0;
> + }
> +
> + c = IO_READONLY;
> + fd = openfile(infile, NULL, c, 0);
> + if (fd < 0)
> + return 0;
> +
> + gettimeofday(&t1, NULL);
> + args = calloc(1, sizeof(struct xfs_ioctl_file_extent_same_args) +
> + sizeof(struct xfs_ioctl_file_extent_same_info));
> + if (!args)
> + goto done;
> + info = (struct xfs_ioctl_file_extent_same_info *)(args + 1);
> + args->logical_offset = soffset;
> + args->length = count;
> + args->dest_count = 1;
> + info->fd = file->fd;
> + info->logical_offset = doffset;
> + do {
> + c = ioctl(fd, XFS_IOC_FILE_EXTENT_SAME, args);
> + if (c)
> + break;
> + args->logical_offset += info->bytes_deduped;
> + info->logical_offset += info->bytes_deduped;
> + args->length -= info->bytes_deduped;
> + } while (c == 0 && info->status == 0 && info->bytes_deduped > 0);
What's "c"? it's actually a return value, and it's already been
handled if it's non zero. and there's nothing preventing
args->length from going negative.
> + if (c)
> + perror(_("dedupe ioctl"));
> + if (info->status < 0)
> + printf("dedupe: %s\n", _(strerror(-info->status)));
> + if (info->status == XFS_SAME_DATA_DIFFERS)
"same data differs"? Really? :P
> + printf(_("Extents did not match.\n"));
And putting hte error messages outside the loop is going to be hard
to maintain. I'd much prefer to see this written as:
while (args->length > 0) {
result = ioctl(fd, XFS_IOC_FILE_EXTENT_SAME, args);
if (result) {
perror("XFS_IOC_FILE_EXTENT_SAME");
goto done;
}
if (info->status != 0) {
printf("dedupe: %s\n", _(strerror(-info->status)));
if (info->status == XFS_SAME_DATA_DIFFERS)
printf(_("Extents did not match.\n"));
goto done;
}
if (info->bytes_deduped == 0)
break;
args->logical_offset += info->bytes_deduped;
info->logical_offset += info->bytes_deduped;
args->length -= info->bytes_deduped;
}
Actually, I'd really lik eto see this bit factored out into a
separate function, so there's clear separation between arg parsing,
the operation and post-op cleanup.
> + total = info->bytes_deduped;
> + c = 1;
> + if (Wflag)
> + fsync(file->fd);
> + if (wflag)
> + fdatasync(file->fd);
So these flags are just for syncing. This does not need to be a part
of this function, because the user can simply do:
xfs_io -c "dedupe ...." -c "fsync" ...
if an fsync or fdatasync is required after dedupe. So kill those
options.
> + if (qflag)
> + goto done;
Urk.
> + gettimeofday(&t2, NULL);
> + t2 = tsub(t2, t1);
> +
> + /* Finally, report back -- -C gives a parsable format */
> + timestr(&t2, ts, sizeof(ts), Cflag ? VERBOSE_FIXED_TIME : 0);
> + if (!Cflag) {
> + cvtstr((double)total, s1, sizeof(s1));
> + cvtstr(tdiv((double)total, t2), s2, sizeof(s2));
> + printf(_("linked %lld/%lld bytes at offset %lld\n"),
> + total, count, (long long)doffset);
> + printf(_("%s, %d ops; %s (%s/sec and %.4f ops/sec)\n"),
> + s1, c, ts, s2, tdiv((double)c, t2));
> + } else {/* bytes,ops,time,bytes/sec,ops/sec */
> + printf("%lld,%d,%s,%.3f,%.3f\n",
> + total, c, ts,
> + tdiv((double)total, t2), tdiv((double)c, t2));
> + }
This must be common with other timing code. Factor it out?
> +static void
> +reflink_help(void)
> +{
> + printf(_(
> +"\n"
> +" Links a range of bytes (in block size increments) from a file into a range \n"
> +" of bytes in the open file. The two extent ranges need not contain identical\n"
> +" data. \n"
> +"\n"
> +" Example:\n"
> +" 'reflink some_file 0 4096 32768' - links 32768 bytes from some_file at \n"
> +" offset 0 to into the open file at \n"
> +" position 4096\n"
> +" 'reflink some_file' - links all bytes from some_file into the open file\n"
> +" at position 0\n"
> +"\n"
> +" Reflink a range of blocks from a given input file to the open file. Both\n"
> +" files share the same range of physical disk blocks; a write to the shared\n"
> +" range of either file should result in the write landing in a new block and\n"
> +" that range of the file being remapped (i.e. copy-on-write). Both files\n"
> +" must reside on the same filesystem.\n"
> +" -w -- call fdatasync(2) at the end (included in timing results)\n"
> +" -W -- call fsync(2) at the end (included in timing results)\n"
> +"\n"));
> +}
FWIW, could these just be a single string like:
"
Links a range of bytes (in block size increments) from a file into a range
of bytes in the open file. The two extent ranges need not contain identical
....
\n"));
So we don't have to mangle the entire layout if we modify the help
tet in future?
> +static int
> +reflink_f(
> + int argc,
> + char **argv)
> +{
Same comments as the dedupe_f() function about factoring and
variable names and reusing "c" for a bunch of unrelated values.
indeed, most of this is common with the dedupe_f function, so
perhaps it might be worth putting them in the same file and
factoring them appropriately to shared common elements?
> diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
> index 89689c6..0c922ad 100644
> --- a/libxfs/xfs_fs.h
> +++ b/libxfs/xfs_fs.h
> @@ -559,6 +559,42 @@ typedef struct xfs_swapext
> #define XFS_IOC_GOINGDOWN _IOR ('X', 125, __uint32_t)
> /* XFS_IOC_GETFSUUID ---------- deprecated 140 */
>
> +/* reflink ioctls; these MUST match the btrfs ioctl definitions */
> +struct xfs_ioctl_clone_range_args {
> + __s64 src_fd;
> + __u64 src_offset;
> + __u64 src_length;
> + __u64 dest_offset;
> +};
struct xfs_clone_args
> +
> +#define XFS_SAME_DATA_DIFFERS 1
#define XFS_EXTENT_DATA_SAME 0
#define XFS_EXTENT_DATA_DIFFERS 1
> +/* For extent-same ioctl */
> +struct xfs_ioctl_file_extent_same_info {
> + __s64 fd; /* in - destination file */
> + __u64 logical_offset; /* in - start of extent in destination */
> + __u64 bytes_deduped; /* out - total # of bytes we were able
> + * to dedupe from this file */
> + /* status of this dedupe operation:
> + * 0 if dedup succeeds
> + * < 0 for error
> + * == XFS_SAME_DATA_DIFFERS if data differs
> + */
> + __s32 status; /* out - see above description */
> + __u32 reserved;
> +};
struct xfs_extent_data_info
> +
> +struct xfs_ioctl_file_extent_same_args {
> + __u64 logical_offset; /* in - start of extent in source */
> + __u64 length; /* in - length of extent */
> + __u16 dest_count; /* in - total elements in info array */
> + __u16 reserved1;
> + __u32 reserved2;
> + struct xfs_ioctl_file_extent_same_info info[0];
> +};
struct xfs_extent_data
> +
> +#define XFS_IOC_CLONE _IOW (0x94, 9, int)
> +#define XFS_IOC_CLONE_RANGE _IOW (0x94, 13, struct xfs_ioctl_clone_range_args)
> +#define XFS_IOC_FILE_EXTENT_SAME _IOWR(0x94, 54, struct xfs_ioctl_file_extent_same_args)
FWIW, these structure and ioctl definitions really need to be in a
separate patch as they need to be shared with the kernel code.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 08/11] xfs_io: support reflinking and deduping file ranges
2015-09-22 2:17 ` Dave Chinner
@ 2015-09-28 17:03 ` Darrick J. Wong
0 siblings, 0 replies; 25+ messages in thread
From: Darrick J. Wong @ 2015-09-28 17:03 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Tue, Sep 22, 2015 at 12:17:55PM +1000, Dave Chinner wrote:
> On Tue, Aug 25, 2015 at 05:33:11PM -0700, Darrick J. Wong wrote:
> > +static int
> > +dedupe_f(
> > + int argc,
> > + char **argv)
> > +{
> > + off64_t soffset, doffset;
> > + long long count, total;
> > + char s1[64], s2[64], ts[64];
>
> Magic!
>
> > + char *infile;
> > + int Cflag, qflag, wflag, Wflag;
>
> Urk. Variables that differ only by case!
Yeah. I seriously hate those variable names, but they seemed to occur in
other parts of io/. I'll stop considering awkward naming to be precedent
and change these here to have sensible names.
(FWIW I was just following pwrite.c's lead...)
>
> > + struct xfs_ioctl_file_extent_same_args *args = NULL;
> > + struct xfs_ioctl_file_extent_same_info *info;
>
> verbosity--;
I was trying to keep the XFS names as close to the BTRFS names as possible
since they're the same feature, but since you later advocate for name changes
I'll gladly change 'em to be less annoying and more visually distinct. :)
>
> > + size_t fsblocksize, fssectsize;
> > + struct timeval t1, t2;
> > + int c, fd = -1;
> > +
> > + Cflag = qflag = wflag = Wflag = 0;
> > + init_cvtnum(&fsblocksize, &fssectsize);
> > +
> > + while ((c = getopt(argc, argv, "CqwW")) != EOF) {
> > + switch (c) {
> > + case 'C':
> > + Cflag = 1;
> > + break;
> > + case 'q':
> > + qflag = 1;
> > + break;
> > + case 'w':
> > + wflag = 1;
> > + break;
> > + case 'W':
> > + Wflag = 1;
> > + break;
> > + default:
> > + return command_usage(&dedupe_cmd);
> > + }
> > + }
> > + if (optind != argc - 4)
> > + return command_usage(&dedupe_cmd);
> > + infile = argv[optind];
> > + optind++;
> > + soffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
> > + if (soffset < 0) {
> > + printf(_("non-numeric src offset argument -- %s\n"), argv[optind]);
> > + return 0;
> > + }
> > + optind++;
> > + doffset = cvtnum(fsblocksize, fssectsize, argv[optind]);
> > + if (doffset < 0) {
> > + printf(_("non-numeric dest offset argument -- %s\n"), argv[optind]);
> > + return 0;
> > + }
> > + optind++;
> > + count = cvtnum(fsblocksize, fssectsize, argv[optind]);
> > + if (count < 1) {
> > + printf(_("non-positive length argument -- %s\n"), argv[optind]);
> > + return 0;
> > + }
> > +
> > + c = IO_READONLY;
> > + fd = openfile(infile, NULL, c, 0);
> > + if (fd < 0)
> > + return 0;
> > +
> > + gettimeofday(&t1, NULL);
> > + args = calloc(1, sizeof(struct xfs_ioctl_file_extent_same_args) +
> > + sizeof(struct xfs_ioctl_file_extent_same_info));
> > + if (!args)
> > + goto done;
> > + info = (struct xfs_ioctl_file_extent_same_info *)(args + 1);
> > + args->logical_offset = soffset;
> > + args->length = count;
> > + args->dest_count = 1;
> > + info->fd = file->fd;
> > + info->logical_offset = doffset;
> > + do {
> > + c = ioctl(fd, XFS_IOC_FILE_EXTENT_SAME, args);
> > + if (c)
> > + break;
> > + args->logical_offset += info->bytes_deduped;
> > + info->logical_offset += info->bytes_deduped;
> > + args->length -= info->bytes_deduped;
> > + } while (c == 0 && info->status == 0 && info->bytes_deduped > 0);
>
> What's "c"? it's actually a return value, and it's already been
> handled if it's non zero. and there's nothing preventing
> args->length from going negative.
args->length is u64, so I'll bail out of the loop if bytes_deduped > length;
that's probably a sign of badness going on.
>
> > + if (c)
> > + perror(_("dedupe ioctl"));
> > + if (info->status < 0)
> > + printf("dedupe: %s\n", _(strerror(-info->status)));
> > + if (info->status == XFS_SAME_DATA_DIFFERS)
>
> "same data differs"? Really? :P
<shrug> spoiled btrfs leftovers. :(
>
> > + printf(_("Extents did not match.\n"));
>
> And putting hte error messages outside the loop is going to be hard
> to maintain. I'd much prefer to see this written as:
>
> while (args->length > 0) {
> result = ioctl(fd, XFS_IOC_FILE_EXTENT_SAME, args);
> if (result) {
> perror("XFS_IOC_FILE_EXTENT_SAME");
> goto done;
> }
> if (info->status != 0) {
> printf("dedupe: %s\n", _(strerror(-info->status)));
> if (info->status == XFS_SAME_DATA_DIFFERS)
> printf(_("Extents did not match.\n"));
> goto done;
> }
> if (info->bytes_deduped == 0)
> break;
>
> args->logical_offset += info->bytes_deduped;
> info->logical_offset += info->bytes_deduped;
> args->length -= info->bytes_deduped;
> }
>
> Actually, I'd really lik eto see this bit factored out into a
> separate function, so there's clear separation between arg parsing,
> the operation and post-op cleanup.
>
> > + total = info->bytes_deduped;
> > + c = 1;
> > + if (Wflag)
> > + fsync(file->fd);
> > + if (wflag)
> > + fdatasync(file->fd);
>
> So these flags are just for syncing. This does not need to be a part
> of this function, because the user can simply do:
>
> xfs_io -c "dedupe ...." -c "fsync" ...
>
> if an fsync or fdatasync is required after dedupe. So kill those
> options.
Ok. I wonder why they're in pwrite.c...
>
>
> > + if (qflag)
> > + goto done;
>
> Urk.
>
> > + gettimeofday(&t2, NULL);
> > + t2 = tsub(t2, t1);
> > +
> > + /* Finally, report back -- -C gives a parsable format */
> > + timestr(&t2, ts, sizeof(ts), Cflag ? VERBOSE_FIXED_TIME : 0);
> > + if (!Cflag) {
> > + cvtstr((double)total, s1, sizeof(s1));
> > + cvtstr(tdiv((double)total, t2), s2, sizeof(s2));
> > + printf(_("linked %lld/%lld bytes at offset %lld\n"),
> > + total, count, (long long)doffset);
> > + printf(_("%s, %d ops; %s (%s/sec and %.4f ops/sec)\n"),
> > + s1, c, ts, s2, tdiv((double)c, t2));
> > + } else {/* bytes,ops,time,bytes/sec,ops/sec */
> > + printf("%lld,%d,%s,%.3f,%.3f\n",
> > + total, c, ts,
> > + tdiv((double)total, t2), tdiv((double)c, t2));
> > + }
>
> This must be common with other timing code. Factor it out?
Yep.
>
> > +static void
> > +reflink_help(void)
> > +{
> > + printf(_(
> > +"\n"
> > +" Links a range of bytes (in block size increments) from a file into a range \n"
> > +" of bytes in the open file. The two extent ranges need not contain identical\n"
> > +" data. \n"
> > +"\n"
> > +" Example:\n"
> > +" 'reflink some_file 0 4096 32768' - links 32768 bytes from some_file at \n"
> > +" offset 0 to into the open file at \n"
> > +" position 4096\n"
> > +" 'reflink some_file' - links all bytes from some_file into the open file\n"
> > +" at position 0\n"
> > +"\n"
> > +" Reflink a range of blocks from a given input file to the open file. Both\n"
> > +" files share the same range of physical disk blocks; a write to the shared\n"
> > +" range of either file should result in the write landing in a new block and\n"
> > +" that range of the file being remapped (i.e. copy-on-write). Both files\n"
> > +" must reside on the same filesystem.\n"
> > +" -w -- call fdatasync(2) at the end (included in timing results)\n"
> > +" -W -- call fsync(2) at the end (included in timing results)\n"
> > +"\n"));
> > +}
>
> FWIW, could these just be a single string like:
>
> "
> Links a range of bytes (in block size increments) from a file into a range
> of bytes in the open file. The two extent ranges need not contain identical
> ....
> \n"));
>
> So we don't have to mangle the entire layout if we modify the help
> tet in future?
Ok.
>
> > +static int
> > +reflink_f(
> > + int argc,
> > + char **argv)
> > +{
>
> Same comments as the dedupe_f() function about factoring and
> variable names and reusing "c" for a bunch of unrelated values.
>
> indeed, most of this is common with the dedupe_f function, so
> perhaps it might be worth putting them in the same file and
> factoring them appropriately to shared common elements?
Sounds good.
>
> > diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
> > index 89689c6..0c922ad 100644
> > --- a/libxfs/xfs_fs.h
> > +++ b/libxfs/xfs_fs.h
> > @@ -559,6 +559,42 @@ typedef struct xfs_swapext
> > #define XFS_IOC_GOINGDOWN _IOR ('X', 125, __uint32_t)
> > /* XFS_IOC_GETFSUUID ---------- deprecated 140 */
> >
> > +/* reflink ioctls; these MUST match the btrfs ioctl definitions */
> > +struct xfs_ioctl_clone_range_args {
> > + __s64 src_fd;
> > + __u64 src_offset;
> > + __u64 src_length;
> > + __u64 dest_offset;
> > +};
>
> struct xfs_clone_args
>
> > +
> > +#define XFS_SAME_DATA_DIFFERS 1
>
> #define XFS_EXTENT_DATA_SAME 0
> #define XFS_EXTENT_DATA_DIFFERS 1
>
> > +/* For extent-same ioctl */
> > +struct xfs_ioctl_file_extent_same_info {
> > + __s64 fd; /* in - destination file */
> > + __u64 logical_offset; /* in - start of extent in destination */
> > + __u64 bytes_deduped; /* out - total # of bytes we were able
> > + * to dedupe from this file */
> > + /* status of this dedupe operation:
> > + * 0 if dedup succeeds
> > + * < 0 for error
> > + * == XFS_SAME_DATA_DIFFERS if data differs
> > + */
> > + __s32 status; /* out - see above description */
> > + __u32 reserved;
> > +};
>
> struct xfs_extent_data_info
>
> > +
> > +struct xfs_ioctl_file_extent_same_args {
> > + __u64 logical_offset; /* in - start of extent in source */
> > + __u64 length; /* in - length of extent */
> > + __u16 dest_count; /* in - total elements in info array */
> > + __u16 reserved1;
> > + __u32 reserved2;
> > + struct xfs_ioctl_file_extent_same_info info[0];
> > +};
>
> struct xfs_extent_data
Yay, less ugly names!
>
> > +
> > +#define XFS_IOC_CLONE _IOW (0x94, 9, int)
> > +#define XFS_IOC_CLONE_RANGE _IOW (0x94, 13, struct xfs_ioctl_clone_range_args)
> > +#define XFS_IOC_FILE_EXTENT_SAME _IOWR(0x94, 54, struct xfs_ioctl_file_extent_same_args)
>
> FWIW, these structure and ioctl definitions really need to be in a
> separate patch as they need to be shared with the kernel code.
Ok. libxfs/ changes in separate patches, got it.
--D
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2015-09-28 17:04 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-26 0:32 [PATCH v3 00/11] xfsprogs fuzzing fixes Darrick J. Wong
2015-08-26 0:32 ` [PATCH 01/11] xfs_repair: set args.geo in dir2_kill_block Darrick J. Wong
2015-08-26 0:32 ` [PATCH 02/11] libxfs: verifier should set buffer error when da block has a bad magic number Darrick J. Wong
2015-08-26 0:32 ` [PATCH 03/11] libxfs: fix XFS_WANT_CORRUPTED_* macros to return negative error codes Darrick J. Wong
2015-08-26 0:32 ` [PATCH 04/11] libxfs: clear buffer state flags in libxfs_getbuf and variants Darrick J. Wong
2015-08-26 1:02 ` Dave Chinner
2015-08-26 4:05 ` Darrick J. Wong
2015-08-26 5:23 ` [PATCH v2 " Darrick J. Wong
2015-08-26 0:32 ` [PATCH 05/11] xfs_repair: ignore "repaired" flag after we decide to clear xattr block Darrick J. Wong
2015-08-26 0:32 ` [PATCH 06/11] xfs_repair: check v5 filesystem attr block header sanity Darrick J. Wong
2015-08-26 0:45 ` Dave Chinner
2015-08-26 0:59 ` Darrick J. Wong
2015-08-26 1:15 ` Dave Chinner
2015-08-26 4:50 ` Darrick J. Wong
2015-08-26 6:20 ` Dave Chinner
2015-08-26 0:33 ` [PATCH 07/11] xfs_repair: force not-so-bad bmbt blocks back through the verifier Darrick J. Wong
2015-08-26 0:54 ` Dave Chinner
2015-08-26 3:59 ` Darrick J. Wong
2015-08-26 5:24 ` [PATCH v2 " Darrick J. Wong
2015-08-26 0:33 ` [PATCH 08/11] xfs_io: support reflinking and deduping file ranges Darrick J. Wong
2015-09-22 2:17 ` Dave Chinner
2015-09-28 17:03 ` Darrick J. Wong
2015-08-26 0:33 ` [PATCH 09/11] xfs_db: enable blocktrash for checksummed filesystems Darrick J. Wong
2015-08-26 0:33 ` [PATCH 10/11] xfs_db: trash the block at the top of the cursor stack Darrick J. Wong
2015-08-26 0:33 ` [PATCH 11/11] xfs_db: enable blockget for v5 filesystems Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox