* [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support
@ 2015-10-07 4:54 Darrick J. Wong
2015-10-07 4:54 ` [PATCH 01/58] libxfs: make xfs_alloc_fix_freelist non-static Darrick J. Wong
` (57 more replies)
0 siblings, 58 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:54 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Hi all,
This is the third revision of an RFC for adding to XFS kernel support
for tracking reverse-mappings of physical blocks to file and metadata;
and support for mapping multiple file logical blocks to the same
physical block, more commonly known as reflinking. Given the
significant amount of re-engineering required to make the initial rmap
implementation compatible with reflink, I decided to publish both
features as an integrated patchset off of upstream. This means that
rmap and reflink are now compatible with each other.
Dave Chinner's initial rmap implementation featured a simple b+tree
containing (_physical_block_, blockcount, owner) records and enough
code to stuff the rmap btree (rmapbt) whenever a block was allocated
or freed. However, a generic reflink implementation requires the
ability to map a block to any logical block offset in any file.
Therefore it is necessary to expand the rmapbt record definition to be
(_physical block_, _owner_, _offset_, blockcount) to maintain uniquely
identifiable records. The upper two bits of the offset field are used
to flag attr fork records and bmbt block records, respectively. The
highest bit of the blockcount is used to indicate an unwritten extent.
It is intended that in the future the rmapbt will some day be used to
reconstruct a corrupt block map btree (bmbt).
The reflink implementation features a simple b+tree containing
(_physical block_, blockcount, refcount) records to track the
reference counts of extents of physical blocks. There's also support
code to provide the desired copy-on-write behavior and the userland
interfaces to reflink, query the status of, and a new fallocate mode
to un-reflink parts of files.
For single-owner blocks (i.e. metadata) the rmapbt records are still
managed at alloc/free time. To enable reflink and rmap at the same
time, however, it becomes necessary to manage rmapbt records for file
extents at map/unmap time. In the current implementation, file extent
records exactly mirror bmbt contents. It should be easy to merge
file extent rmaps on non-reflink filesystems, but that is not yet
written. In theory merging can happen for file extent rmaps on
reflink filesystems too, but that could involve a lot of searching
through the tree since records are not indexed on the last physical
block of the extent.
The ioctl interface to XFS reflink looks surprisingly like the btrfs
ioctl interface -- you can reflink a file, reflink subranges
of a file, or dedupe subranges of files. To un-reflink a file, I'm
proposing a new fallocate flag which will (try to) fork all shared
blocks within a certain file range. xfs_fsr is a better candidate
for de-reflinking a file since it also defragments the file; the
extent swap ioctl has also been upgraded (crappily) to support
updating the rmapbt as needed.
The patch set is based on the current (4.3-rc4) upstream kernel.
There are plenty of bugs in this code; in particular the copy-on-write
code is still terrible and prone to all sorts of amusing crashes.
There are too many patches to discuss individually, but they are
grouped by subject area:
0. Cleanups
1. rmapbt support
2. Re-engineering rmapbt to support reflink
3. refcntbt support
4. Implement the data block sharing pieces of reflink
Issues:
* The toy CoW implementation exists as a single-threaded workqueue(!)
In talking with Dave Chinner, I get the sense that he sees CoW as a a
natural extension of a reworked XFS write path that doesn't use buffer
heads. That work hasn't landed, so I've only put enough effort into
fixing the CoW so that it can (barely) pass the associated xfstests.
In the future, a CoW block being written would simply become a
delalloc extent and the process of allocating the delalloc extent
would merely have to know to unmap whatever's there first.
* The extent swapping ioctl now allocates a bigger fixed-size
transaction. That's most likely a stupid thing to do, so getting a
better grip on how the journalling code works and auditing all the new
transaction users will have to happen. Right now it mostly gets
lucky.
* Don't ENOSPC. This should get fixed up once we start using delalloc.
* We'll want to connect to copy_file_range when it appears in a
kernel release some time.
If you're going to start using this mess, you probably ought to just
pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
This is an extraordinary way to eat your data. Enjoy!
Comments and questions are, as always, welcome.
--D
[1] https://github.com/djwong/linux-xfs-dev/commits/master
[2] https://github.com/djwong/xfsprogs/commits/for-next
[3] https://github.com/djwong/xfstests/commits/master
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH 01/58] libxfs: make xfs_alloc_fix_freelist non-static
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
@ 2015-10-07 4:54 ` Darrick J. Wong
2015-10-07 4:54 ` [PATCH 02/58] xfs: fix log ticket type printing Darrick J. Wong
` (56 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:54 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Since xfs_repair wants to use xfs_alloc_fix_freelist, remove the
static designation. xfsprogs already has this; this simply brings
the kernel up to date.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_alloc.c | 2 +-
fs/xfs/libxfs/xfs_alloc.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index ffad7f2..cc07884 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1924,7 +1924,7 @@ xfs_alloc_space_available(
* Decide whether to use this allocation group for this allocation.
* If so, fix up the btree freelist's size.
*/
-STATIC int /* error */
+int /* error */
xfs_alloc_fix_freelist(
struct xfs_alloc_arg *args, /* allocation argument structure */
int flags) /* XFS_ALLOC_FLAG_... */
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index ca1c816..071b28b 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -233,5 +233,6 @@ xfs_alloc_get_rec(
int xfs_read_agf(struct xfs_mount *mp, struct xfs_trans *tp,
xfs_agnumber_t agno, int flags, struct xfs_buf **bpp);
+int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, int flags);
#endif /* __XFS_ALLOC_H__ */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 02/58] xfs: fix log ticket type printing
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
2015-10-07 4:54 ` [PATCH 01/58] libxfs: make xfs_alloc_fix_freelist non-static Darrick J. Wong
@ 2015-10-07 4:54 ` Darrick J. Wong
2015-10-07 4:55 ` [PATCH 03/58] xfs: introduce rmap btree definitions Darrick J. Wong
` (55 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:54 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Update the log ticket reservation type printing code to reflect
all the types of log tickets, to avoid incorrect debug output and
avoid running off the end of the array.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_log.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index aaadee0..75734af 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2045,12 +2045,14 @@ xlog_print_tic_res(
"QM_DQCLUSTER",
"QM_QINOCREATE",
"QM_QUOTAOFF_END",
- "SB_UNIT",
"FSYNC_TS",
"GROWFSRT_ALLOC",
"GROWFSRT_ZERO",
"GROWFSRT_FREE",
- "SWAPEXT"
+ "SWAPEXT",
+ "CHECKPOINT",
+ "ICREATE",
+ "CREATE_TMPFILE"
};
xfs_warn(mp, "xlog_write: reservation summary:");
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 03/58] xfs: introduce rmap btree definitions
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
2015-10-07 4:54 ` [PATCH 01/58] libxfs: make xfs_alloc_fix_freelist non-static Darrick J. Wong
2015-10-07 4:54 ` [PATCH 02/58] xfs: fix log ticket type printing Darrick J. Wong
@ 2015-10-07 4:55 ` Darrick J. Wong
2015-10-07 4:55 ` [PATCH 04/58] xfs: add rmap btree stats infrastructure Darrick J. Wong
` (54 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:55 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
Add new per-ag rmap btree definitions to the per-ag structures. The
rmap btree will sit inthe empty slots on disk after the free space
btrees, and hence form a part of the array of space management
btrees. This requires the definition of the btree to be contiguous
with the free space btrees.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_alloc.c | 6 ++++++
fs/xfs/libxfs/xfs_btree.c | 4 ++--
fs/xfs/libxfs/xfs_btree.h | 3 +++
fs/xfs/libxfs/xfs_format.h | 22 +++++++++++++++++-----
fs/xfs/libxfs/xfs_types.h | 4 ++--
5 files changed, 30 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index cc07884..e5bede9 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2275,6 +2275,10 @@ xfs_agf_verify(
be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]) > XFS_BTREE_MAXLEVELS)
return false;
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb) &&
+ be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) > XFS_BTREE_MAXLEVELS)
+ return false;
+
/*
* during growfs operations, the perag is not fully initialised,
* so we can't use it for any useful checking. growfs ensures we can't
@@ -2405,6 +2409,8 @@ xfs_alloc_read_agf(
be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
pag->pagf_levels[XFS_BTNUM_CNTi] =
be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
+ pag->pagf_levels[XFS_BTNUM_RMAPi] =
+ be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
spin_lock_init(&pag->pagb_lock);
pag->pagb_count = 0;
pag->pagb_tree = RB_ROOT;
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index f7d7ee7..b642170 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -42,9 +42,9 @@ kmem_zone_t *xfs_btree_cur_zone;
* Btree magic numbers.
*/
static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
- { XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
+ { XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, 0, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
XFS_FIBT_MAGIC },
- { XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC,
+ { XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC, XFS_RMAP_CRC_MAGIC,
XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC }
};
#define xfs_btree_magic(cur) \
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 8f18bab..ace1995 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -63,6 +63,7 @@ union xfs_btree_rec {
#define XFS_BTNUM_BMAP ((xfs_btnum_t)XFS_BTNUM_BMAPi)
#define XFS_BTNUM_INO ((xfs_btnum_t)XFS_BTNUM_INOi)
#define XFS_BTNUM_FINO ((xfs_btnum_t)XFS_BTNUM_FINOi)
+#define XFS_BTNUM_RMAP ((xfs_btnum_t)XFS_BTNUM_RMAPi)
/*
* For logging record fields.
@@ -94,6 +95,7 @@ do { \
case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(bmbt, stat); break; \
case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(ibt, stat); break; \
case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(fibt, stat); break; \
+ case XFS_BTNUM_RMAP: break; \
case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
} \
} while (0)
@@ -108,6 +110,7 @@ do { \
case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_ADD(bmbt, stat, val); break; \
case XFS_BTNUM_INO: __XFS_BTREE_STATS_ADD(ibt, stat, val); break; \
case XFS_BTNUM_FINO: __XFS_BTREE_STATS_ADD(fibt, stat, val); break; \
+ case XFS_BTNUM_RMAP: break; \
case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
} \
} while (0)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 9590a06..3350013 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -447,6 +447,7 @@ xfs_sb_has_compat_feature(
}
#define XFS_SB_FEAT_RO_COMPAT_FINOBT (1 << 0) /* free inode btree */
+#define XFS_SB_FEAT_RO_COMPAT_RMAPBT (1 << 1) /* reverse map btree */
#define XFS_SB_FEAT_RO_COMPAT_ALL \
(XFS_SB_FEAT_RO_COMPAT_FINOBT)
#define XFS_SB_FEAT_RO_COMPAT_UNKNOWN ~XFS_SB_FEAT_RO_COMPAT_ALL
@@ -518,6 +519,12 @@ static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_SPINODES);
}
+static inline bool xfs_sb_version_hasrmapbt(struct xfs_sb *sbp)
+{
+ return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
+ (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_RMAPBT);
+}
+
/*
* XFS_SB_FEAT_INCOMPAT_META_UUID indicates that the metadata UUID
* is stored separately from the user-visible UUID; this allows the
@@ -590,10 +597,10 @@ xfs_is_quota_inode(struct xfs_sb *sbp, xfs_ino_t ino)
#define XFS_AGI_GOOD_VERSION(v) ((v) == XFS_AGI_VERSION)
/*
- * Btree number 0 is bno, 1 is cnt. This value gives the size of the
+ * Btree number 0 is bno, 1 is cnt, 2 is rmap. This value gives the size of the
* arrays below.
*/
-#define XFS_BTNUM_AGF ((int)XFS_BTNUM_CNTi + 1)
+#define XFS_BTNUM_AGF ((int)XFS_BTNUM_RMAPi + 1)
/*
* The second word of agf_levels in the first a.g. overlaps the EFS
@@ -610,12 +617,10 @@ typedef struct xfs_agf {
__be32 agf_seqno; /* sequence # starting from 0 */
__be32 agf_length; /* size in blocks of a.g. */
/*
- * Freespace information
+ * Freespace and rmap information
*/
__be32 agf_roots[XFS_BTNUM_AGF]; /* root blocks */
- __be32 agf_spare0; /* spare field */
__be32 agf_levels[XFS_BTNUM_AGF]; /* btree levels */
- __be32 agf_spare1; /* spare field */
__be32 agf_flfirst; /* first freelist block's index */
__be32 agf_fllast; /* last freelist block's index */
@@ -1293,6 +1298,13 @@ typedef __be32 xfs_inobt_ptr_t;
#define XFS_FIBT_BLOCK(mp) ((xfs_agblock_t)(XFS_IBT_BLOCK(mp) + 1))
/*
+ * Reverse mapping btree format definitions
+ *
+ * There is a btree for the reverse map per allocation group
+ */
+#define XFS_RMAP_CRC_MAGIC 0x524d4233 /* 'RMB3' */
+
+/*
* The first data block of an AG depends on whether the filesystem was formatted
* with the finobt feature. If so, account for the finobt reserved root btree
* block.
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index b79dc66..3d50364 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -108,8 +108,8 @@ typedef enum {
} xfs_lookup_t;
typedef enum {
- XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi,
- XFS_BTNUM_FINOi, XFS_BTNUM_MAX
+ XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi,
+ XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_MAX
} xfs_btnum_t;
struct xfs_name {
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 04/58] xfs: add rmap btree stats infrastructure
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (2 preceding siblings ...)
2015-10-07 4:55 ` [PATCH 03/58] xfs: introduce rmap btree definitions Darrick J. Wong
@ 2015-10-07 4:55 ` Darrick J. Wong
2015-10-07 4:55 ` [PATCH 05/58] xfs: rmap btree add more reserved blocks Darrick J. Wong
` (53 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:55 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, Dave Chinner, xfs
>From : Dave Chinner <dchinner@redhat.com>
The rmap btree will require the same stats as all the other generic
btrees, so add al the code for that now.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_btree.h | 4 ++--
fs/xfs/xfs_stats.c | 1 +
fs/xfs/xfs_stats.h | 18 +++++++++++++++++-
3 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index ace1995..494ee0b 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -95,7 +95,7 @@ do { \
case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(bmbt, stat); break; \
case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(ibt, stat); break; \
case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(fibt, stat); break; \
- case XFS_BTNUM_RMAP: break; \
+ case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(rmap, stat); break; \
case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
} \
} while (0)
@@ -110,7 +110,7 @@ do { \
case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_ADD(bmbt, stat, val); break; \
case XFS_BTNUM_INO: __XFS_BTREE_STATS_ADD(ibt, stat, val); break; \
case XFS_BTNUM_FINO: __XFS_BTREE_STATS_ADD(fibt, stat, val); break; \
- case XFS_BTNUM_RMAP: break; \
+ case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_ADD(rmap, stat, val); break; \
case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
} \
} while (0)
diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c
index f224038..67bbfa2 100644
--- a/fs/xfs/xfs_stats.c
+++ b/fs/xfs/xfs_stats.c
@@ -60,6 +60,7 @@ static int xfs_stat_proc_show(struct seq_file *m, void *v)
{ "bmbt2", XFSSTAT_END_BMBT_V2 },
{ "ibt2", XFSSTAT_END_IBT_V2 },
{ "fibt2", XFSSTAT_END_FIBT_V2 },
+ { "rmapbt", XFSSTAT_END_RMAP_V2 },
/* we print both series of quota information together */
{ "qm", XFSSTAT_END_QM },
};
diff --git a/fs/xfs/xfs_stats.h b/fs/xfs/xfs_stats.h
index c8f238b..8414db2 100644
--- a/fs/xfs/xfs_stats.h
+++ b/fs/xfs/xfs_stats.h
@@ -199,7 +199,23 @@ struct xfsstats {
__uint32_t xs_fibt_2_alloc;
__uint32_t xs_fibt_2_free;
__uint32_t xs_fibt_2_moves;
-#define XFSSTAT_END_XQMSTAT (XFSSTAT_END_FIBT_V2+6)
+#define XFSSTAT_END_RMAP_V2 (XFSSTAT_END_FIBT_V2+15)
+ __uint32_t xs_rmap_2_lookup;
+ __uint32_t xs_rmap_2_compare;
+ __uint32_t xs_rmap_2_insrec;
+ __uint32_t xs_rmap_2_delrec;
+ __uint32_t xs_rmap_2_newroot;
+ __uint32_t xs_rmap_2_killroot;
+ __uint32_t xs_rmap_2_increment;
+ __uint32_t xs_rmap_2_decrement;
+ __uint32_t xs_rmap_2_lshift;
+ __uint32_t xs_rmap_2_rshift;
+ __uint32_t xs_rmap_2_split;
+ __uint32_t xs_rmap_2_join;
+ __uint32_t xs_rmap_2_alloc;
+ __uint32_t xs_rmap_2_free;
+ __uint32_t xs_rmap_2_moves;
+#define XFSSTAT_END_XQMSTAT (XFSSTAT_END_RMAP_V2+6)
__uint32_t xs_qm_dqreclaims;
__uint32_t xs_qm_dqreclaim_misses;
__uint32_t xs_qm_dquot_dups;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 05/58] xfs: rmap btree add more reserved blocks
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (3 preceding siblings ...)
2015-10-07 4:55 ` [PATCH 04/58] xfs: add rmap btree stats infrastructure Darrick J. Wong
@ 2015-10-07 4:55 ` Darrick J. Wong
2015-10-07 4:55 ` [PATCH 06/58] xfs: add owner field to extent allocation and freeing Darrick J. Wong
` (52 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:55 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
XFS reserves a small amount of space in each AG for the minimum
number of free blocks needed for operation. Adding the rmap btree
increases the number of reserved blocks, but it also increases the
complexity of the calculation as the free inode btree is optional
(like the rmbt).
Rather than calculate the prealloc blocks every time we need to
check it, add a function to calculate it at mount time and store it
in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
just to use the xfs-mount variable directly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_alloc.c | 11 +++++++++++
fs/xfs/libxfs/xfs_alloc.h | 2 ++
fs/xfs/libxfs/xfs_format.h | 9 +--------
fs/xfs/xfs_fsops.c | 6 +++---
fs/xfs/xfs_mount.c | 2 ++
fs/xfs/xfs_mount.h | 1 +
6 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index e5bede9..8bb2a32 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -49,6 +49,17 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
+xfs_extlen_t
+xfs_prealloc_blocks(
+ struct xfs_mount *mp)
+{
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return XFS_RMAP_BLOCK(mp) + 1;
+ if (xfs_sb_version_hasfinobt(&mp->m_sb))
+ return XFS_FIBT_BLOCK(mp) + 1;
+ return XFS_IBT_BLOCK(mp) + 1;
+}
+
/*
* Lookup the record equal to [bno, len] in the btree given by cur.
*/
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 071b28b..1993644 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -235,4 +235,6 @@ int xfs_read_agf(struct xfs_mount *mp, struct xfs_trans *tp,
xfs_agnumber_t agno, int flags, struct xfs_buf **bpp);
int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, int flags);
+xfs_extlen_t xfs_prealloc_blocks(struct xfs_mount *mp);
+
#endif /* __XFS_ALLOC_H__ */
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 3350013..a926134 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1304,18 +1304,11 @@ typedef __be32 xfs_inobt_ptr_t;
*/
#define XFS_RMAP_CRC_MAGIC 0x524d4233 /* 'RMB3' */
-/*
- * The first data block of an AG depends on whether the filesystem was formatted
- * with the finobt feature. If so, account for the finobt reserved root btree
- * block.
- */
-#define XFS_PREALLOC_BLOCKS(mp) \
+#define XFS_RMAP_BLOCK(mp) \
(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
XFS_FIBT_BLOCK(mp) + 1 : \
XFS_IBT_BLOCK(mp) + 1)
-
-
/*
* BMAP Btree format definitions
*
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index ee3aaa0a..32e24ec 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -246,7 +246,7 @@ xfs_growfs_data_private(
agf->agf_flfirst = 0;
agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
agf->agf_flcount = 0;
- tmpsize = agsize - XFS_PREALLOC_BLOCKS(mp);
+ tmpsize = agsize - mp->m_ag_prealloc_blocks;
agf->agf_freeblks = cpu_to_be32(tmpsize);
agf->agf_longest = cpu_to_be32(tmpsize);
if (xfs_sb_version_hascrc(&mp->m_sb))
@@ -343,7 +343,7 @@ xfs_growfs_data_private(
agno, 0);
arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
- arec->ar_startblock = cpu_to_be32(XFS_PREALLOC_BLOCKS(mp));
+ arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
arec->ar_blockcount = cpu_to_be32(
agsize - be32_to_cpu(arec->ar_startblock));
@@ -372,7 +372,7 @@ xfs_growfs_data_private(
agno, 0);
arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
- arec->ar_startblock = cpu_to_be32(XFS_PREALLOC_BLOCKS(mp));
+ arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
arec->ar_blockcount = cpu_to_be32(
agsize - be32_to_cpu(arec->ar_startblock));
nfree += be32_to_cpu(arec->ar_blockcount);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index bf92e0c..1b2f72c 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -239,6 +239,8 @@ xfs_initialize_perag(
if (maxagi)
*maxagi = index;
+
+ mp->m_ag_prealloc_blocks = xfs_prealloc_blocks(mp);
return 0;
out_unwind:
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 7999e91..d9c9834 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -93,6 +93,7 @@ typedef struct xfs_mount {
uint m_ag_maxlevels; /* XFS_AG_MAXLEVELS */
uint m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
uint m_in_maxlevels; /* max inobt btree levels. */
+ xfs_extlen_t m_ag_prealloc_blocks; /* reserved ag blocks */
struct radix_tree_root m_perag_tree; /* per-ag accounting info */
spinlock_t m_perag_lock; /* lock for m_perag_tree */
struct mutex m_growlock; /* growfs mutex */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 06/58] xfs: add owner field to extent allocation and freeing
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (4 preceding siblings ...)
2015-10-07 4:55 ` [PATCH 05/58] xfs: rmap btree add more reserved blocks Darrick J. Wong
@ 2015-10-07 4:55 ` Darrick J. Wong
2015-10-07 4:55 ` [PATCH 07/58] xfs: add extended " Darrick J. Wong
` (51 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:55 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
For the rmap btree to work, we have to fed the extent owner
information to the the allocation and freeing functions. This
information is what will end up in the rmap btree that tracks
allocated extents. While we technically don't need the owner
information when freeing extents, passing it allows us to validate
that the extent we are removing from the rmap btree actually
belonged to the owner we expected it to belong to.
We also define a special set of owner values for internal metadata
that would otherwise have no owner. This allows us to tell the
difference between metadata owned by different per-ag btrees, as
well as static fs metadata (e.g. AG headers) and internal journal
blocks.
There are also a couple of special cases we need to take care of -
during EFI recovery, we don't actually know who the original owner
was, so we need to pass a wildcard to indicate that we aren't
checking the owner for validity. We also need special handling in
growfs, as we "free" the space in the last AG when extending it, but
because it's new space it has no actual owner...
While touching the xfs_bmap_add_free() function, re-order the
parameters to put the struct xfs_mount first.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_alloc.c | 11 ++++++++---
fs/xfs/libxfs/xfs_alloc.h | 4 +++-
fs/xfs/libxfs/xfs_bmap.c | 17 ++++++++++++-----
fs/xfs/libxfs/xfs_bmap.h | 5 +++--
fs/xfs/libxfs/xfs_bmap_btree.c | 3 ++-
fs/xfs/libxfs/xfs_format.h | 16 ++++++++++++++++
fs/xfs/libxfs/xfs_ialloc.c | 10 +++++-----
fs/xfs/libxfs/xfs_ialloc_btree.c | 3 ++-
fs/xfs/xfs_bmap_util.c | 3 ++-
fs/xfs/xfs_fsops.c | 13 +++++++++----
fs/xfs/xfs_log_recover.c | 3 ++-
fs/xfs/xfs_trans.h | 2 +-
fs/xfs/xfs_trans_extfree.c | 5 +++--
13 files changed, 68 insertions(+), 27 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 8bb2a32..af570ce 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1592,6 +1592,7 @@ xfs_free_ag_extent(
xfs_agnumber_t agno, /* allocation group number */
xfs_agblock_t bno, /* starting block number */
xfs_extlen_t len, /* length of extent */
+ uint64_t owner, /* extent owner */
int isfl) /* set if is freelist blocks - no sb acctg */
{
xfs_btree_cur_t *bno_cur; /* cursor for by-block btree */
@@ -2018,7 +2019,8 @@ xfs_alloc_fix_freelist(
error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
if (error)
goto out_agbp_relse;
- error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1);
+ error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
+ XFS_RMAP_OWN_AG, 1);
if (error)
goto out_agbp_relse;
bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
@@ -2028,6 +2030,7 @@ xfs_alloc_fix_freelist(
memset(&targs, 0, sizeof(targs));
targs.tp = tp;
targs.mp = mp;
+ targs.owner = XFS_RMAP_OWN_AG;
targs.agbp = agbp;
targs.agno = args->agno;
targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
@@ -2668,7 +2671,8 @@ int /* error */
xfs_free_extent(
xfs_trans_t *tp, /* transaction pointer */
xfs_fsblock_t bno, /* starting block number of extent */
- xfs_extlen_t len) /* length of extent */
+ xfs_extlen_t len, /* length of extent */
+ uint64_t owner) /* extent owner */
{
xfs_alloc_arg_t args;
int error;
@@ -2704,7 +2708,8 @@ xfs_free_extent(
goto error0;
}
- error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
+ error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno,
+ len, owner, 0);
if (!error)
xfs_extent_busy_insert(tp, args.agno, args.agbno, len, 0);
error0:
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 1993644..cff44e0 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -122,6 +122,7 @@ typedef struct xfs_alloc_arg {
char isfl; /* set if is freelist blocks - !acctg */
char userdata; /* set if this is user data */
xfs_fsblock_t firstblock; /* io first block allocated */
+ uint64_t owner; /* owner of blocks being allocated */
} xfs_alloc_arg_t;
/*
@@ -208,7 +209,8 @@ int /* error */
xfs_free_extent(
struct xfs_trans *tp, /* transaction pointer */
xfs_fsblock_t bno, /* starting block number of extent */
- xfs_extlen_t len); /* length of extent */
+ xfs_extlen_t len, /* length of extent */
+ uint64_t owner); /* extent owner */
int /* error */
xfs_alloc_lookup_le(
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 8e2010d..d4c2c4a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -567,10 +567,11 @@ xfs_bmap_validate_ret(
*/
void
xfs_bmap_add_free(
+ struct xfs_mount *mp, /* mount point structure */
+ struct xfs_bmap_free *flist, /* list of extents */
xfs_fsblock_t bno, /* fs block number of extent */
xfs_filblks_t len, /* length of extent */
- xfs_bmap_free_t *flist, /* list of extents */
- xfs_mount_t *mp) /* mount point structure */
+ uint64_t owner) /* extent owner */
{
xfs_bmap_free_item_t *cur; /* current (next) element */
xfs_bmap_free_item_t *new; /* new element */
@@ -591,9 +592,12 @@ xfs_bmap_add_free(
ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
#endif
ASSERT(xfs_bmap_free_item_zone != NULL);
+ ASSERT(owner);
+
new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
new->xbfi_startblock = bno;
new->xbfi_blockcount = (xfs_extlen_t)len;
+ new->xbfi_owner = owner;
for (prev = NULL, cur = flist->xbf_first;
cur != NULL;
prev = cur, cur = cur->xbfi_next) {
@@ -696,7 +700,7 @@ xfs_bmap_btree_to_extents(
cblock = XFS_BUF_TO_BLOCK(cbp);
if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
return error;
- xfs_bmap_add_free(cbno, 1, cur->bc_private.b.flist, mp);
+ xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1, ip->i_ino);
ip->i_d.di_nblocks--;
xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
xfs_trans_binval(tp, cbp);
@@ -777,6 +781,7 @@ xfs_bmap_extents_to_btree(
memset(&args, 0, sizeof(args));
args.tp = tp;
args.mp = mp;
+ args.owner = ip->i_ino;
args.firstblock = *firstblock;
if (*firstblock == NULLFSBLOCK) {
args.type = XFS_ALLOCTYPE_START_BNO;
@@ -923,6 +928,7 @@ xfs_bmap_local_to_extents(
memset(&args, 0, sizeof(args));
args.tp = tp;
args.mp = ip->i_mount;
+ args.owner = ip->i_ino;
args.firstblock = *firstblock;
/*
* Allocate a block. We know we need only one, since the
@@ -3703,6 +3709,7 @@ xfs_bmap_btalloc(
memset(&args, 0, sizeof(args));
args.tp = ap->tp;
args.mp = mp;
+ args.owner = ap->ip->i_ino;
args.fsbno = ap->blkno;
/* Trim the allocation back to the maximum an AG can fit. */
@@ -4977,8 +4984,8 @@ xfs_bmap_del_extent(
* If we need to, add to list of extents to delete.
*/
if (do_fx)
- xfs_bmap_add_free(del->br_startblock, del->br_blockcount, flist,
- mp);
+ xfs_bmap_add_free(mp, flist, del->br_startblock,
+ del->br_blockcount, ip->i_ino);
/*
* Adjust inode # blocks in the file.
*/
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 6aaa0c1..674819f 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -66,6 +66,7 @@ typedef struct xfs_bmap_free_item
{
xfs_fsblock_t xbfi_startblock;/* starting fs block number */
xfs_extlen_t xbfi_blockcount;/* number of blocks in extent */
+ uint64_t xbfi_owner; /* extent owner */
struct xfs_bmap_free_item *xbfi_next; /* link to next entry */
} xfs_bmap_free_item_t;
@@ -182,8 +183,8 @@ void xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
int xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
void xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
-void xfs_bmap_add_free(xfs_fsblock_t bno, xfs_filblks_t len,
- struct xfs_bmap_free *flist, struct xfs_mount *mp);
+void xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_bmap_free *flist,
+ xfs_fsblock_t bno, xfs_filblks_t len, uint64_t owner);
void xfs_bmap_cancel(struct xfs_bmap_free *flist);
int xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
int *committed);
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 6b0cf65..fcbbbef 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -446,6 +446,7 @@ xfs_bmbt_alloc_block(
args.mp = cur->bc_mp;
args.fsbno = cur->bc_private.b.firstblock;
args.firstblock = args.fsbno;
+ args.owner = cur->bc_private.b.ip->i_ino;
if (args.fsbno == NULLFSBLOCK) {
args.fsbno = be64_to_cpu(start->l);
@@ -526,7 +527,7 @@ xfs_bmbt_free_block(
struct xfs_trans *tp = cur->bc_tp;
xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
- xfs_bmap_add_free(fsbno, 1, cur->bc_private.b.flist, mp);
+ xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1, ip->i_ino);
ip->i_d.di_nblocks--;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index a926134..b1e11ba 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1304,6 +1304,22 @@ typedef __be32 xfs_inobt_ptr_t;
*/
#define XFS_RMAP_CRC_MAGIC 0x524d4233 /* 'RMB3' */
+/*
+ * Special owner types.
+ *
+ * Seeing as we only support up to 8EB, we have the upper bit of the owner field
+ * to tell us we have a special owner value. We use these for static metadata
+ * allocated at mkfs/growfs time, as well as for freespace management metadata.
+ */
+#define XFS_RMAP_OWN_NULL (-1ULL) /* No owner, for growfs */
+#define XFS_RMAP_OWN_UNKNOWN (-2ULL) /* Unknown owner, for EFI recovery */
+#define XFS_RMAP_OWN_FS (-3ULL) /* static fs metadata */
+#define XFS_RMAP_OWN_LOG (-4ULL) /* static fs metadata */
+#define XFS_RMAP_OWN_AG (-5ULL) /* AG freespace btree blocks */
+#define XFS_RMAP_OWN_INOBT (-6ULL) /* Inode btree blocks */
+#define XFS_RMAP_OWN_INODES (-7ULL) /* Inode chunk */
+#define XFS_RMAP_OWN_MIN (-8ULL) /* guard */
+
#define XFS_RMAP_BLOCK(mp) \
(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
XFS_FIBT_BLOCK(mp) + 1 : \
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 54deb2d..ed91eb5 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -613,6 +613,7 @@ xfs_ialloc_ag_alloc(
args.tp = tp;
args.mp = tp->t_mountp;
args.fsbno = NULLFSBLOCK;
+ args.owner = XFS_RMAP_OWN_INODES;
#ifdef DEBUG
/* randomly do sparse inode allocations */
@@ -1827,9 +1828,8 @@ xfs_difree_inode_chunk(
if (!xfs_inobt_issparse(rec->ir_holemask)) {
/* not sparse, calculate extent info directly */
- xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno,
- XFS_AGINO_TO_AGBNO(mp, rec->ir_startino)),
- mp->m_ialloc_blks, flist, mp);
+ xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, sagbno),
+ mp->m_ialloc_blks, XFS_RMAP_OWN_INODES);
return;
}
@@ -1872,8 +1872,8 @@ xfs_difree_inode_chunk(
ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
- xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno, agbno), contigblk,
- flist, mp);
+ xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, agbno),
+ contigblk, XFS_RMAP_OWN_INODES);
/* reset range to current bit and carry on... */
startidx = endidx = nextbit;
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index f39b285..445245f 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -96,6 +96,7 @@ xfs_inobt_alloc_block(
memset(&args, 0, sizeof(args));
args.tp = cur->bc_tp;
args.mp = cur->bc_mp;
+ args.owner = XFS_RMAP_OWN_INOBT;
args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.a.agno, sbno);
args.minlen = 1;
args.maxlen = 1;
@@ -129,7 +130,7 @@ xfs_inobt_free_block(
int error;
fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp));
- error = xfs_free_extent(cur->bc_tp, fsbno, 1);
+ error = xfs_free_extent(cur->bc_tp, fsbno, 1, XFS_RMAP_OWN_INOBT);
if (error)
return error;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 3bf4ad0..f1081de 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -122,7 +122,8 @@ xfs_bmap_finish(
next = free->xbfi_next;
error = xfs_trans_free_extent(*tp, efd, free->xbfi_startblock,
- free->xbfi_blockcount);
+ free->xbfi_blockcount,
+ free->xbfi_owner);
if (error)
return error;
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 32e24ec..5711700 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -466,14 +466,19 @@ xfs_growfs_data_private(
be32_to_cpu(agi->agi_length));
xfs_alloc_log_agf(tp, bp, XFS_AGF_LENGTH);
+
/*
* Free the new space.
+ *
+ * XFS_RMAP_OWN_NULL is used here to tell the rmap btree that
+ * this doesn't actually exist in the rmap btree.
*/
- error = xfs_free_extent(tp, XFS_AGB_TO_FSB(mp, agno,
- be32_to_cpu(agf->agf_length) - new), new);
- if (error) {
+ error = xfs_free_extent(tp,
+ XFS_AGB_TO_FSB(mp, agno,
+ be32_to_cpu(agf->agf_length) - new),
+ new, XFS_RMAP_OWN_NULL);
+ if (error)
goto error0;
- }
}
/*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 512a094..242ed5d 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3821,7 +3821,8 @@ xlog_recover_process_efi(
for (i = 0; i < efip->efi_format.efi_nextents; i++) {
extp = &(efip->efi_format.efi_extents[i]);
error = xfs_trans_free_extent(tp, efdp, extp->ext_start,
- extp->ext_len);
+ extp->ext_len,
+ XFS_RMAP_OWN_UNKNOWN);
if (error)
goto abort_error;
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 4643070..ee277fa 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -222,7 +222,7 @@ struct xfs_efd_log_item *xfs_trans_get_efd(xfs_trans_t *,
uint);
int xfs_trans_free_extent(struct xfs_trans *,
struct xfs_efd_log_item *, xfs_fsblock_t,
- xfs_extlen_t);
+ xfs_extlen_t, uint64_t);
int xfs_trans_commit(struct xfs_trans *);
int __xfs_trans_roll(struct xfs_trans **, struct xfs_inode *, int *);
int xfs_trans_roll(struct xfs_trans **, struct xfs_inode *);
diff --git a/fs/xfs/xfs_trans_extfree.c b/fs/xfs/xfs_trans_extfree.c
index a96ae54..1b7d6d5 100644
--- a/fs/xfs/xfs_trans_extfree.c
+++ b/fs/xfs/xfs_trans_extfree.c
@@ -118,13 +118,14 @@ xfs_trans_free_extent(
struct xfs_trans *tp,
struct xfs_efd_log_item *efdp,
xfs_fsblock_t start_block,
- xfs_extlen_t ext_len)
+ xfs_extlen_t ext_len,
+ uint64_t owner)
{
uint next_extent;
struct xfs_extent *extp;
int error;
- error = xfs_free_extent(tp, start_block, ext_len);
+ error = xfs_free_extent(tp, start_block, ext_len, owner);
/*
* Mark the transaction dirty, even on error. This ensures the
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 07/58] xfs: add extended owner field to extent allocation and freeing
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (5 preceding siblings ...)
2015-10-07 4:55 ` [PATCH 06/58] xfs: add owner field to extent allocation and freeing Darrick J. Wong
@ 2015-10-07 4:55 ` Darrick J. Wong
2015-10-07 4:55 ` [PATCH 08/58] xfs: introduce rmap extent operation stubs Darrick J. Wong
` (50 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:55 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Extend the owner field to include both the owner type and some sort
of index within the owner.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_alloc.c | 11 +++++----
fs/xfs/libxfs/xfs_alloc.h | 4 ++-
fs/xfs/libxfs/xfs_bmap.c | 20 +++++++++-------
fs/xfs/libxfs/xfs_bmap.h | 5 ++--
fs/xfs/libxfs/xfs_bmap_btree.c | 7 ++++-
fs/xfs/libxfs/xfs_format.h | 49 ++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_ialloc.c | 8 ++++--
fs/xfs/libxfs/xfs_ialloc_btree.c | 6 +++--
fs/xfs/xfs_bmap_util.c | 2 +-
fs/xfs/xfs_fsops.c | 5 +++-
fs/xfs/xfs_log_recover.c | 4 ++-
fs/xfs/xfs_trans.h | 2 +-
fs/xfs/xfs_trans_extfree.c | 4 ++-
13 files changed, 97 insertions(+), 30 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index af570ce..44db7b1 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1592,7 +1592,7 @@ xfs_free_ag_extent(
xfs_agnumber_t agno, /* allocation group number */
xfs_agblock_t bno, /* starting block number */
xfs_extlen_t len, /* length of extent */
- uint64_t owner, /* extent owner */
+ struct xfs_owner_info *oinfo, /* extent owner */
int isfl) /* set if is freelist blocks - no sb acctg */
{
xfs_btree_cur_t *bno_cur; /* cursor for by-block btree */
@@ -2013,6 +2013,7 @@ xfs_alloc_fix_freelist(
* back on the free list? Maybe we should only do this when space is
* getting low or the AGFL is more than half full?
*/
+ XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
while (pag->pagf_flcount > need) {
struct xfs_buf *bp;
@@ -2020,7 +2021,7 @@ xfs_alloc_fix_freelist(
if (error)
goto out_agbp_relse;
error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
- XFS_RMAP_OWN_AG, 1);
+ &targs.oinfo, 1);
if (error)
goto out_agbp_relse;
bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
@@ -2030,7 +2031,7 @@ xfs_alloc_fix_freelist(
memset(&targs, 0, sizeof(targs));
targs.tp = tp;
targs.mp = mp;
- targs.owner = XFS_RMAP_OWN_AG;
+ XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
targs.agbp = agbp;
targs.agno = args->agno;
targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
@@ -2672,7 +2673,7 @@ xfs_free_extent(
xfs_trans_t *tp, /* transaction pointer */
xfs_fsblock_t bno, /* starting block number of extent */
xfs_extlen_t len, /* length of extent */
- uint64_t owner) /* extent owner */
+ struct xfs_owner_info *oinfo) /* extent owner */
{
xfs_alloc_arg_t args;
int error;
@@ -2709,7 +2710,7 @@ xfs_free_extent(
}
error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno,
- len, owner, 0);
+ len, oinfo, 0);
if (!error)
xfs_extent_busy_insert(tp, args.agno, args.agbno, len, 0);
error0:
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index cff44e0..95b161f 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -122,7 +122,7 @@ typedef struct xfs_alloc_arg {
char isfl; /* set if is freelist blocks - !acctg */
char userdata; /* set if this is user data */
xfs_fsblock_t firstblock; /* io first block allocated */
- uint64_t owner; /* owner of blocks being allocated */
+ struct xfs_owner_info oinfo; /* owner of blocks being allocated */
} xfs_alloc_arg_t;
/*
@@ -210,7 +210,7 @@ xfs_free_extent(
struct xfs_trans *tp, /* transaction pointer */
xfs_fsblock_t bno, /* starting block number of extent */
xfs_extlen_t len, /* length of extent */
- uint64_t owner); /* extent owner */
+ struct xfs_owner_info *oinfo); /* extent owner */
int /* error */
xfs_alloc_lookup_le(
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index d4c2c4a..fef3767 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -571,7 +571,7 @@ xfs_bmap_add_free(
struct xfs_bmap_free *flist, /* list of extents */
xfs_fsblock_t bno, /* fs block number of extent */
xfs_filblks_t len, /* length of extent */
- uint64_t owner) /* extent owner */
+ struct xfs_owner_info *oinfo) /* extent owner */
{
xfs_bmap_free_item_t *cur; /* current (next) element */
xfs_bmap_free_item_t *new; /* new element */
@@ -592,12 +592,14 @@ xfs_bmap_add_free(
ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
#endif
ASSERT(xfs_bmap_free_item_zone != NULL);
- ASSERT(owner);
new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
new->xbfi_startblock = bno;
new->xbfi_blockcount = (xfs_extlen_t)len;
- new->xbfi_owner = owner;
+ if (oinfo)
+ memcpy(&new->xbfi_oinfo, oinfo, sizeof(struct xfs_owner_info));
+ else
+ memset(&new->xbfi_oinfo, 0, sizeof(struct xfs_owner_info));
for (prev = NULL, cur = flist->xbf_first;
cur != NULL;
prev = cur, cur = cur->xbfi_next) {
@@ -677,6 +679,7 @@ xfs_bmap_btree_to_extents(
xfs_mount_t *mp; /* mount point structure */
__be64 *pp; /* ptr to block address */
struct xfs_btree_block *rblock;/* root btree block */
+ struct xfs_owner_info oinfo;
mp = ip->i_mount;
ifp = XFS_IFORK_PTR(ip, whichfork);
@@ -700,7 +703,8 @@ xfs_bmap_btree_to_extents(
cblock = XFS_BUF_TO_BLOCK(cbp);
if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
return error;
- xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1, ip->i_ino);
+ XFS_RMAP_INO_BMBT_OWNER(&oinfo, ip->i_ino, whichfork);
+ xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1, &oinfo);
ip->i_d.di_nblocks--;
xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
xfs_trans_binval(tp, cbp);
@@ -781,7 +785,7 @@ xfs_bmap_extents_to_btree(
memset(&args, 0, sizeof(args));
args.tp = tp;
args.mp = mp;
- args.owner = ip->i_ino;
+ XFS_RMAP_INO_BMBT_OWNER(&args.oinfo, ip->i_ino, whichfork);
args.firstblock = *firstblock;
if (*firstblock == NULLFSBLOCK) {
args.type = XFS_ALLOCTYPE_START_BNO;
@@ -928,7 +932,7 @@ xfs_bmap_local_to_extents(
memset(&args, 0, sizeof(args));
args.tp = tp;
args.mp = ip->i_mount;
- args.owner = ip->i_ino;
+ XFS_RMAP_INO_OWNER(&args.oinfo, ip->i_ino, whichfork, 0);
args.firstblock = *firstblock;
/*
* Allocate a block. We know we need only one, since the
@@ -3709,7 +3713,6 @@ xfs_bmap_btalloc(
memset(&args, 0, sizeof(args));
args.tp = ap->tp;
args.mp = mp;
- args.owner = ap->ip->i_ino;
args.fsbno = ap->blkno;
/* Trim the allocation back to the maximum an AG can fit. */
@@ -4799,6 +4802,7 @@ xfs_bmap_del_extent(
nblks = 0;
do_fx = 0;
}
+
/*
* Set flag value to use in switch statement.
* Left-contig is 2, right-contig is 1.
@@ -4985,7 +4989,7 @@ xfs_bmap_del_extent(
*/
if (do_fx)
xfs_bmap_add_free(mp, flist, del->br_startblock,
- del->br_blockcount, ip->i_ino);
+ del->br_blockcount, NULL);
/*
* Adjust inode # blocks in the file.
*/
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 674819f..89fa3dd 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -66,7 +66,7 @@ typedef struct xfs_bmap_free_item
{
xfs_fsblock_t xbfi_startblock;/* starting fs block number */
xfs_extlen_t xbfi_blockcount;/* number of blocks in extent */
- uint64_t xbfi_owner; /* extent owner */
+ struct xfs_owner_info xbfi_oinfo; /* extent owner */
struct xfs_bmap_free_item *xbfi_next; /* link to next entry */
} xfs_bmap_free_item_t;
@@ -184,7 +184,8 @@ void xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
int xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
void xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
void xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_bmap_free *flist,
- xfs_fsblock_t bno, xfs_filblks_t len, uint64_t owner);
+ xfs_fsblock_t bno, xfs_filblks_t len,
+ struct xfs_owner_info *oinfo);
void xfs_bmap_cancel(struct xfs_bmap_free *flist);
int xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
int *committed);
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index fcbbbef..1035128 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -446,7 +446,8 @@ xfs_bmbt_alloc_block(
args.mp = cur->bc_mp;
args.fsbno = cur->bc_private.b.firstblock;
args.firstblock = args.fsbno;
- args.owner = cur->bc_private.b.ip->i_ino;
+ XFS_RMAP_INO_BMBT_OWNER(&args.oinfo, cur->bc_private.b.ip->i_ino,
+ cur->bc_private.b.whichfork);
if (args.fsbno == NULLFSBLOCK) {
args.fsbno = be64_to_cpu(start->l);
@@ -526,8 +527,10 @@ xfs_bmbt_free_block(
struct xfs_inode *ip = cur->bc_private.b.ip;
struct xfs_trans *tp = cur->bc_tp;
xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
+ struct xfs_owner_info oinfo;
- xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1, ip->i_ino);
+ XFS_RMAP_INO_BMBT_OWNER(&oinfo, ip->i_ino, cur->bc_private.b.whichfork);
+ xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1, &oinfo);
ip->i_d.di_nblocks--;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index b1e11ba..0c89c87 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1305,6 +1305,55 @@ typedef __be32 xfs_inobt_ptr_t;
#define XFS_RMAP_CRC_MAGIC 0x524d4233 /* 'RMB3' */
/*
+ * Ownership info for an extent. This is used to create reverse-mapping
+ * entries.
+ */
+#define XFS_RMAP_INO_ATTR_FORK (1)
+#define XFS_RMAP_BMBT_BLOCK (2)
+struct xfs_owner_info {
+ uint64_t oi_owner;
+ xfs_fileoff_t oi_offset;
+ unsigned int oi_flags;
+};
+
+static inline void
+XFS_RMAP_AG_OWNER(
+ struct xfs_owner_info *oi,
+ uint64_t owner)
+{
+ oi->oi_owner = owner;
+ oi->oi_offset = 0;
+ oi->oi_flags = 0;
+}
+
+static inline void
+XFS_RMAP_INO_BMBT_OWNER(
+ struct xfs_owner_info *oi,
+ xfs_ino_t ino,
+ int whichfork)
+{
+ oi->oi_owner = ino;
+ oi->oi_offset = 0;
+ oi->oi_flags = XFS_RMAP_BMBT_BLOCK;
+ if (whichfork == XFS_ATTR_FORK)
+ oi->oi_flags |= XFS_RMAP_INO_ATTR_FORK;
+}
+
+static inline void
+XFS_RMAP_INO_OWNER(
+ struct xfs_owner_info *oi,
+ xfs_ino_t ino,
+ int whichfork,
+ xfs_fileoff_t offset)
+{
+ oi->oi_owner = ino;
+ oi->oi_offset = offset;
+ oi->oi_flags = 0;
+ if (whichfork == XFS_ATTR_FORK)
+ oi->oi_flags |= XFS_RMAP_INO_ATTR_FORK;
+}
+
+/*
* Special owner types.
*
* Seeing as we only support up to 8EB, we have the upper bit of the owner field
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index ed91eb5..8f28256 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -613,7 +613,7 @@ xfs_ialloc_ag_alloc(
args.tp = tp;
args.mp = tp->t_mountp;
args.fsbno = NULLFSBLOCK;
- args.owner = XFS_RMAP_OWN_INODES;
+ XFS_RMAP_AG_OWNER(&args.oinfo, XFS_RMAP_OWN_INODES);
#ifdef DEBUG
/* randomly do sparse inode allocations */
@@ -1824,12 +1824,14 @@ xfs_difree_inode_chunk(
int nextbit;
xfs_agblock_t agbno;
int contigblk;
+ struct xfs_owner_info oinfo;
DECLARE_BITMAP(holemask, XFS_INOBT_HOLEMASK_BITS);
+ XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_INODES);
if (!xfs_inobt_issparse(rec->ir_holemask)) {
/* not sparse, calculate extent info directly */
xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, sagbno),
- mp->m_ialloc_blks, XFS_RMAP_OWN_INODES);
+ mp->m_ialloc_blks, &oinfo);
return;
}
@@ -1873,7 +1875,7 @@ xfs_difree_inode_chunk(
ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, agbno),
- contigblk, XFS_RMAP_OWN_INODES);
+ contigblk, &oinfo);
/* reset range to current bit and carry on... */
startidx = endidx = nextbit;
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 445245f..bd8a1da 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -96,7 +96,7 @@ xfs_inobt_alloc_block(
memset(&args, 0, sizeof(args));
args.tp = cur->bc_tp;
args.mp = cur->bc_mp;
- args.owner = XFS_RMAP_OWN_INOBT;
+ XFS_RMAP_AG_OWNER(&args.oinfo, XFS_RMAP_OWN_INOBT);
args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.a.agno, sbno);
args.minlen = 1;
args.maxlen = 1;
@@ -128,9 +128,11 @@ xfs_inobt_free_block(
{
xfs_fsblock_t fsbno;
int error;
+ struct xfs_owner_info oinfo;
+ XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_INOBT);
fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp));
- error = xfs_free_extent(cur->bc_tp, fsbno, 1, XFS_RMAP_OWN_INOBT);
+ error = xfs_free_extent(cur->bc_tp, fsbno, 1, &oinfo);
if (error)
return error;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index f1081de..8b2e505 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -123,7 +123,7 @@ xfs_bmap_finish(
error = xfs_trans_free_extent(*tp, efd, free->xbfi_startblock,
free->xbfi_blockcount,
- free->xbfi_owner);
+ &free->xbfi_oinfo);
if (error)
return error;
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 5711700..b3d1665 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -439,6 +439,8 @@ xfs_growfs_data_private(
* There are new blocks in the old last a.g.
*/
if (new) {
+ struct xfs_owner_info oinfo;
+
/*
* Change the agi length.
*/
@@ -473,10 +475,11 @@ xfs_growfs_data_private(
* XFS_RMAP_OWN_NULL is used here to tell the rmap btree that
* this doesn't actually exist in the rmap btree.
*/
+ XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_NULL);
error = xfs_free_extent(tp,
XFS_AGB_TO_FSB(mp, agno,
be32_to_cpu(agf->agf_length) - new),
- new, XFS_RMAP_OWN_NULL);
+ new, &oinfo);
if (error)
goto error0;
}
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 242ed5d..129b9a1 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3786,6 +3786,7 @@ xlog_recover_process_efi(
int error = 0;
xfs_extent_t *extp;
xfs_fsblock_t startblock_fsb;
+ struct xfs_owner_info oinfo;
ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags));
@@ -3818,11 +3819,12 @@ xlog_recover_process_efi(
goto abort_error;
efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
+ XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_UNKNOWN);
for (i = 0; i < efip->efi_format.efi_nextents; i++) {
extp = &(efip->efi_format.efi_extents[i]);
error = xfs_trans_free_extent(tp, efdp, extp->ext_start,
extp->ext_len,
- XFS_RMAP_OWN_UNKNOWN);
+ &oinfo);
if (error)
goto abort_error;
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index ee277fa..50fe77e 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -222,7 +222,7 @@ struct xfs_efd_log_item *xfs_trans_get_efd(xfs_trans_t *,
uint);
int xfs_trans_free_extent(struct xfs_trans *,
struct xfs_efd_log_item *, xfs_fsblock_t,
- xfs_extlen_t, uint64_t);
+ xfs_extlen_t, struct xfs_owner_info *);
int xfs_trans_commit(struct xfs_trans *);
int __xfs_trans_roll(struct xfs_trans **, struct xfs_inode *, int *);
int xfs_trans_roll(struct xfs_trans **, struct xfs_inode *);
diff --git a/fs/xfs/xfs_trans_extfree.c b/fs/xfs/xfs_trans_extfree.c
index 1b7d6d5..d1b8833 100644
--- a/fs/xfs/xfs_trans_extfree.c
+++ b/fs/xfs/xfs_trans_extfree.c
@@ -119,13 +119,13 @@ xfs_trans_free_extent(
struct xfs_efd_log_item *efdp,
xfs_fsblock_t start_block,
xfs_extlen_t ext_len,
- uint64_t owner)
+ struct xfs_owner_info *oinfo)
{
uint next_extent;
struct xfs_extent *extp;
int error;
- error = xfs_free_extent(tp, start_block, ext_len, owner);
+ error = xfs_free_extent(tp, start_block, ext_len, oinfo);
/*
* Mark the transaction dirty, even on error. This ensures the
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 08/58] xfs: introduce rmap extent operation stubs
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (6 preceding siblings ...)
2015-10-07 4:55 ` [PATCH 07/58] xfs: add extended " Darrick J. Wong
@ 2015-10-07 4:55 ` Darrick J. Wong
2015-10-07 4:55 ` [PATCH 09/58] xfs: extend rmap extent operation stubs to take full owner info Darrick J. Wong
` (49 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:55 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
Add the stubs into the extent allocation and freeing paths that the
rmap btree implementation will hook into. While doing this, add the
trace points that will be used to track rmap btree extent
manipulations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1
fs/xfs/libxfs/xfs_alloc.c | 11 +++++
fs/xfs/libxfs/xfs_rmap.c | 89 ++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_rmap_btree.h | 30 +++++++++++++
fs/xfs/xfs_trace.h | 37 +++++++++++++++++
5 files changed, 168 insertions(+)
create mode 100644 fs/xfs/libxfs/xfs_rmap.c
create mode 100644 fs/xfs/libxfs/xfs_rmap_btree.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index a096841..5e65a07 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -51,6 +51,7 @@ xfs-y += $(addprefix libxfs/, \
xfs_inode_fork.o \
xfs_inode_buf.o \
xfs_log_rlimit.o \
+ xfs_rmap.o \
xfs_sb.o \
xfs_symlink_remote.o \
xfs_trans_resv.o \
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 44db7b1..2722b44 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -26,6 +26,7 @@
#include "xfs_mount.h"
#include "xfs_inode.h"
#include "xfs_btree.h"
+#include "xfs_rmap_btree.h"
#include "xfs_alloc_btree.h"
#include "xfs_alloc.h"
#include "xfs_extent_busy.h"
@@ -644,6 +645,12 @@ xfs_alloc_ag_vextent(
ASSERT(!args->wasfromfl || !args->isfl);
ASSERT(args->agbno % args->alignment == 0);
+ /* insert new block into the reverse map btree */
+ error = xfs_rmap_alloc(args->tp, args->agbp, args->agno,
+ args->agbno, args->len, args->owner);
+ if (error)
+ return error;
+
if (!args->wasfromfl) {
error = xfs_alloc_update_counters(args->tp, args->pag,
args->agbp,
@@ -1610,6 +1617,10 @@ xfs_free_ag_extent(
xfs_extlen_t nlen; /* new length of freespace */
xfs_perag_t *pag; /* per allocation group data */
+ error = xfs_rmap_free(tp, agbp, agno, bno, len, owner);
+ if (error)
+ goto error0;
+
mp = tp->t_mountp;
/*
* Allocate and initialize a cursor for the by-block btree.
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
new file mode 100644
index 0000000..3958cf8
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -0,0 +1,89 @@
+
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_btree.h"
+#include "xfs_trans.h"
+#include "xfs_alloc.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_trans_space.h"
+#include "xfs_trace.h"
+#include "xfs_error.h"
+#include "xfs_extent_busy.h"
+
+int
+xfs_rmap_free(
+ struct xfs_trans *tp,
+ struct xfs_buf *agbp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t bno,
+ xfs_extlen_t len,
+ uint64_t owner)
+{
+ struct xfs_mount *mp = tp->t_mountp;
+ int error = 0;
+
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return 0;
+
+ trace_xfs_rmap_free_extent(mp, agno, bno, len, owner);
+ if (1)
+ goto out_error;
+ trace_xfs_rmap_free_extent_done(mp, agno, bno, len, owner);
+ return 0;
+
+out_error:
+ trace_xfs_rmap_free_extent_error(mp, agno, bno, len, owner);
+ return error;
+}
+
+int
+xfs_rmap_alloc(
+ struct xfs_trans *tp,
+ struct xfs_buf *agbp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t bno,
+ xfs_extlen_t len,
+ uint64_t owner)
+{
+ struct xfs_mount *mp = tp->t_mountp;
+ int error = 0;
+
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return 0;
+
+ trace_xfs_rmap_alloc_extent(mp, agno, bno, len, owner);
+ if (1)
+ goto out_error;
+ trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, owner);
+ return 0;
+
+out_error:
+ trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, owner);
+ return error;
+}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
new file mode 100644
index 0000000..f1caa40
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_RMAP_BTREE_H__
+#define __XFS_RMAP_BTREE_H__
+
+struct xfs_buf;
+
+int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
+ xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
+ uint64_t owner);
+int xfs_rmap_free(struct xfs_trans *tp, struct xfs_buf *agbp,
+ xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
+ uint64_t owner);
+
+#endif /* __XFS_RMAP_BTREE_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 5ed36b1..7360593 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1674,6 +1674,43 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
+DECLARE_EVENT_CLASS(xfs_rmap_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner),
+ TP_ARGS(mp, agno, agbno, len, owner),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, agbno)
+ __field(xfs_extlen_t, len)
+ __field(uint64_t, owner)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->agbno = agbno;
+ __entry->len = len;
+ __entry->owner = owner;
+ ),
+ TP_printk("dev %d:%d agno %u agbno %u len %u, owner 0x%llx",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->agbno,
+ __entry->len,
+ __entry->owner)
+);
+#define DEFINE_RMAP_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner), \
+ TP_ARGS(mp, agno, agbno, len, owner))
+DEFINE_RMAP_EVENT(xfs_rmap_free_extent);
+DEFINE_RMAP_EVENT(xfs_rmap_free_extent_done);
+DEFINE_RMAP_EVENT(xfs_rmap_free_extent_error);
+DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent);
+DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent_done);
+DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent_error);
+
DECLARE_EVENT_CLASS(xfs_da_class,
TP_PROTO(struct xfs_da_args *args),
TP_ARGS(args),
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 09/58] xfs: extend rmap extent operation stubs to take full owner info
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (7 preceding siblings ...)
2015-10-07 4:55 ` [PATCH 08/58] xfs: introduce rmap extent operation stubs Darrick J. Wong
@ 2015-10-07 4:55 ` Darrick J. Wong
2015-10-07 4:55 ` [PATCH 10/58] xfs: define the on-disk rmap btree format Darrick J. Wong
` (48 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:55 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Extend the stubs to take full owner info.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_alloc.c | 23 ++++++++++++++---------
fs/xfs/libxfs/xfs_rmap.c | 16 ++++++++--------
fs/xfs/libxfs/xfs_rmap_btree.h | 4 ++--
fs/xfs/xfs_trace.h | 22 +++++++++++++++-------
4 files changed, 39 insertions(+), 26 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 2722b44..3c55fa7 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -645,11 +645,13 @@ xfs_alloc_ag_vextent(
ASSERT(!args->wasfromfl || !args->isfl);
ASSERT(args->agbno % args->alignment == 0);
- /* insert new block into the reverse map btree */
- error = xfs_rmap_alloc(args->tp, args->agbp, args->agno,
- args->agbno, args->len, args->owner);
- if (error)
- return error;
+ /* if not file data, insert new block into the reverse map btree */
+ if (args->oinfo.oi_owner) {
+ error = xfs_rmap_alloc(args->tp, args->agbp, args->agno,
+ args->agbno, args->len, &args->oinfo);
+ if (error)
+ return error;
+ }
if (!args->wasfromfl) {
error = xfs_alloc_update_counters(args->tp, args->pag,
@@ -1617,16 +1619,19 @@ xfs_free_ag_extent(
xfs_extlen_t nlen; /* new length of freespace */
xfs_perag_t *pag; /* per allocation group data */
- error = xfs_rmap_free(tp, agbp, agno, bno, len, owner);
- if (error)
- goto error0;
+ bno_cur = cnt_cur = NULL;
+
+ if (oinfo->oi_owner) {
+ error = xfs_rmap_free(tp, agbp, agno, bno, len, oinfo);
+ if (error)
+ goto error0;
+ }
mp = tp->t_mountp;
/*
* Allocate and initialize a cursor for the by-block btree.
*/
bno_cur = xfs_allocbt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_BNO);
- cnt_cur = NULL;
/*
* Look for a neighboring block on the left (lower block numbers)
* that is contiguous with this space.
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 3958cf8..3e17294 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -43,7 +43,7 @@ xfs_rmap_free(
xfs_agnumber_t agno,
xfs_agblock_t bno,
xfs_extlen_t len,
- uint64_t owner)
+ struct xfs_owner_info *oinfo)
{
struct xfs_mount *mp = tp->t_mountp;
int error = 0;
@@ -51,14 +51,14 @@ xfs_rmap_free(
if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
return 0;
- trace_xfs_rmap_free_extent(mp, agno, bno, len, owner);
+ trace_xfs_rmap_free_extent(mp, agno, bno, len, oinfo);
if (1)
goto out_error;
- trace_xfs_rmap_free_extent_done(mp, agno, bno, len, owner);
+ trace_xfs_rmap_free_extent_done(mp, agno, bno, len, oinfo);
return 0;
out_error:
- trace_xfs_rmap_free_extent_error(mp, agno, bno, len, owner);
+ trace_xfs_rmap_free_extent_error(mp, agno, bno, len, oinfo);
return error;
}
@@ -69,7 +69,7 @@ xfs_rmap_alloc(
xfs_agnumber_t agno,
xfs_agblock_t bno,
xfs_extlen_t len,
- uint64_t owner)
+ struct xfs_owner_info *oinfo)
{
struct xfs_mount *mp = tp->t_mountp;
int error = 0;
@@ -77,13 +77,13 @@ xfs_rmap_alloc(
if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
return 0;
- trace_xfs_rmap_alloc_extent(mp, agno, bno, len, owner);
+ trace_xfs_rmap_alloc_extent(mp, agno, bno, len, oinfo);
if (1)
goto out_error;
- trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, owner);
+ trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, oinfo);
return 0;
out_error:
- trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, owner);
+ trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, oinfo);
return error;
}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index f1caa40..a3b8f90 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -22,9 +22,9 @@ struct xfs_buf;
int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
- uint64_t owner);
+ struct xfs_owner_info *oinfo);
int xfs_rmap_free(struct xfs_trans *tp, struct xfs_buf *agbp,
xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
- uint64_t owner);
+ struct xfs_owner_info *oinfo);
#endif /* __XFS_RMAP_BTREE_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 7360593..4e7a889 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1676,34 +1676,42 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
DECLARE_EVENT_CLASS(xfs_rmap_class,
TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
- xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner),
- TP_ARGS(mp, agno, agbno, len, owner),
+ xfs_agblock_t agbno, xfs_extlen_t len,
+ struct xfs_owner_info *oinfo),
+ TP_ARGS(mp, agno, agbno, len, oinfo),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_agnumber_t, agno)
__field(xfs_agblock_t, agbno)
__field(xfs_extlen_t, len)
__field(uint64_t, owner)
+ __field(uint64_t, offset)
+ __field(unsigned long, flags)
),
TP_fast_assign(
__entry->dev = mp->m_super->s_dev;
__entry->agno = agno;
__entry->agbno = agbno;
__entry->len = len;
- __entry->owner = owner;
+ __entry->owner = oinfo->oi_owner;
+ __entry->offset = oinfo->oi_offset;
+ __entry->flags = oinfo->oi_flags;
),
- TP_printk("dev %d:%d agno %u agbno %u len %u, owner 0x%llx",
+ TP_printk("dev %d:%d agno %u agbno %u len %u, owner 0x%llx, offset %llu, flags 0x%lx",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->agno,
__entry->agbno,
__entry->len,
- __entry->owner)
+ __entry->owner,
+ __entry->offset,
+ __entry->flags)
);
#define DEFINE_RMAP_EVENT(name) \
DEFINE_EVENT(xfs_rmap_class, name, \
TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
- xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner), \
- TP_ARGS(mp, agno, agbno, len, owner))
+ xfs_agblock_t agbno, xfs_extlen_t len, \
+ struct xfs_owner_info *oinfo), \
+ TP_ARGS(mp, agno, agbno, len, oinfo))
DEFINE_RMAP_EVENT(xfs_rmap_free_extent);
DEFINE_RMAP_EVENT(xfs_rmap_free_extent_done);
DEFINE_RMAP_EVENT(xfs_rmap_free_extent_error);
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 10/58] xfs: define the on-disk rmap btree format
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (8 preceding siblings ...)
2015-10-07 4:55 ` [PATCH 09/58] xfs: extend rmap extent operation stubs to take full owner info Darrick J. Wong
@ 2015-10-07 4:55 ` Darrick J. Wong
2015-10-07 4:55 ` [PATCH 11/58] xfs: enhance " Darrick J. Wong
` (47 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:55 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
Now we have all the surrounding call infrastructure in place, we can
start fillin gout the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/Makefile | 1
fs/xfs/libxfs/xfs_btree.c | 3 +
fs/xfs/libxfs/xfs_btree.h | 18 ++--
fs/xfs/libxfs/xfs_format.h | 27 ++++++
fs/xfs/libxfs/xfs_rmap_btree.c | 187 ++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_rmap_btree.h | 31 +++++++
fs/xfs/libxfs/xfs_sb.c | 6 +
fs/xfs/libxfs/xfs_shared.h | 2
fs/xfs/xfs_mount.h | 2
9 files changed, 269 insertions(+), 8 deletions(-)
create mode 100644 fs/xfs/libxfs/xfs_rmap_btree.c
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 5e65a07..dc40462 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -52,6 +52,7 @@ xfs-y += $(addprefix libxfs/, \
xfs_inode_buf.o \
xfs_log_rlimit.o \
xfs_rmap.o \
+ xfs_rmap_btree.o \
xfs_sb.o \
xfs_symlink_remote.o \
xfs_trans_resv.o \
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index b642170..13971c6 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1116,6 +1116,9 @@ xfs_btree_set_refs(
case XFS_BTNUM_BMAP:
xfs_buf_set_ref(bp, XFS_BMAP_BTREE_REF);
break;
+ case XFS_BTNUM_RMAP:
+ xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF);
+ break;
default:
ASSERT(0);
}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 494ee0b..67e2bbd 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -38,17 +38,19 @@ union xfs_btree_ptr {
};
union xfs_btree_key {
- xfs_bmbt_key_t bmbt;
- xfs_bmdr_key_t bmbr; /* bmbt root block */
- xfs_alloc_key_t alloc;
- xfs_inobt_key_t inobt;
+ struct xfs_bmbt_key bmbt;
+ xfs_bmdr_key_t bmbr; /* bmbt root block */
+ xfs_alloc_key_t alloc;
+ struct xfs_inobt_key inobt;
+ struct xfs_rmap_key rmap;
};
union xfs_btree_rec {
- xfs_bmbt_rec_t bmbt;
- xfs_bmdr_rec_t bmbr; /* bmbt root block */
- xfs_alloc_rec_t alloc;
- xfs_inobt_rec_t inobt;
+ struct xfs_bmbt_rec bmbt;
+ xfs_bmdr_rec_t bmbr; /* bmbt root block */
+ struct xfs_alloc_rec alloc;
+ struct xfs_inobt_rec inobt;
+ struct xfs_rmap_rec rmap;
};
/*
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 0c89c87..ff24083 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1369,6 +1369,33 @@ XFS_RMAP_INO_OWNER(
#define XFS_RMAP_OWN_INODES (-7ULL) /* Inode chunk */
#define XFS_RMAP_OWN_MIN (-8ULL) /* guard */
+/*
+ * Data record structure
+ */
+struct xfs_rmap_rec {
+ __be32 rm_startblock; /* extent start block */
+ __be32 rm_blockcount; /* extent length */
+ __be64 rm_owner; /* extent owner */
+};
+
+struct xfs_rmap_irec {
+ xfs_agblock_t rm_startblock; /* extent start block */
+ xfs_extlen_t rm_blockcount; /* extent length */
+ __uint64_t rm_owner; /* extent owner */
+};
+
+/*
+ * Key structure
+ *
+ * We don't use the length for lookups
+ */
+struct xfs_rmap_key {
+ __be32 rm_startblock; /* extent start block */
+};
+
+/* btree pointer type */
+typedef __be32 xfs_rmap_ptr_t;
+
#define XFS_RMAP_BLOCK(mp) \
(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
XFS_FIBT_BLOCK(mp) + 1 : \
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
new file mode 100644
index 0000000..9a02699
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_trans.h"
+#include "xfs_alloc.h"
+#include "xfs_btree.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_error.h"
+#include "xfs_extent_busy.h"
+
+static struct xfs_btree_cur *
+xfs_rmapbt_dup_cursor(
+ struct xfs_btree_cur *cur)
+{
+ return xfs_rmapbt_init_cursor(cur->bc_mp, cur->bc_tp,
+ cur->bc_private.a.agbp, cur->bc_private.a.agno);
+}
+
+static bool
+xfs_rmapbt_verify(
+ struct xfs_buf *bp)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+ struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp);
+ struct xfs_perag *pag = bp->b_pag;
+ unsigned int level;
+
+ /*
+ * magic number and level verification
+ *
+ * During growfs operations, we can't verify the exact level or owner as
+ * the perag is not fully initialised and hence not attached to the
+ * buffer. In this case, check against the maximum tree depth.
+ *
+ * Similarly, during log recovery we will have a perag structure
+ * attached, but the agf information will not yet have been initialised
+ * from the on disk AGF. Again, we can only check against maximum limits
+ * in this case.
+ */
+ if (block->bb_magic!= cpu_to_be32(XFS_RMAP_CRC_MAGIC))
+ return false;
+
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return false;
+ if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid))
+ return false;
+ if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
+ return false;
+ if (pag && be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+ return false;
+
+ level = be16_to_cpu(block->bb_level);
+ if (pag && pag->pagf_init) {
+ if (level >= pag->pagf_levels[XFS_BTNUM_RMAPi])
+ return false;
+ } else if (level >= mp->m_ag_maxlevels)
+ return false;
+
+ /* numrecs verification */
+ if (be16_to_cpu(block->bb_numrecs) > mp->m_rmap_mxr[level != 0])
+ return false;
+
+ /* sibling pointer verification */
+ if (!block->bb_u.s.bb_leftsib ||
+ (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
+ block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
+ return false;
+ if (!block->bb_u.s.bb_rightsib ||
+ (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
+ block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
+ return false;
+
+ return true;
+}
+
+static void
+xfs_rmapbt_read_verify(
+ struct xfs_buf *bp)
+{
+ if (!xfs_btree_sblock_verify_crc(bp))
+ xfs_buf_ioerror(bp, -EFSBADCRC);
+ else if (!xfs_rmapbt_verify(bp))
+ xfs_buf_ioerror(bp, -EFSCORRUPTED);
+
+ if (bp->b_error) {
+ trace_xfs_btree_corrupt(bp, _RET_IP_);
+ xfs_verifier_error(bp);
+ }
+}
+
+static void
+xfs_rmapbt_write_verify(
+ struct xfs_buf *bp)
+{
+ if (!xfs_rmapbt_verify(bp)) {
+ trace_xfs_btree_corrupt(bp, _RET_IP_);
+ xfs_buf_ioerror(bp, -EFSCORRUPTED);
+ xfs_verifier_error(bp);
+ return;
+ }
+ xfs_btree_sblock_calc_crc(bp);
+
+}
+
+const struct xfs_buf_ops xfs_rmapbt_buf_ops = {
+ .verify_read = xfs_rmapbt_read_verify,
+ .verify_write = xfs_rmapbt_write_verify,
+};
+
+static const struct xfs_btree_ops xfs_rmapbt_ops = {
+ .rec_len = sizeof(struct xfs_rmap_rec),
+ .key_len = sizeof(struct xfs_rmap_key),
+
+ .dup_cursor = xfs_rmapbt_dup_cursor,
+ .buf_ops = &xfs_rmapbt_buf_ops,
+};
+
+/*
+ * Allocate a new allocation btree cursor.
+ */
+struct xfs_btree_cur *
+xfs_rmapbt_init_cursor(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_buf *agbp,
+ xfs_agnumber_t agno)
+{
+ struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp);
+ struct xfs_btree_cur *cur;
+
+ cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_SLEEP);
+ cur->bc_tp = tp;
+ cur->bc_mp = mp;
+ cur->bc_btnum = XFS_BTNUM_RMAP;
+ cur->bc_flags = XFS_BTREE_CRC_BLOCKS;
+ cur->bc_blocklog = mp->m_sb.sb_blocklog;
+ cur->bc_ops = &xfs_rmapbt_ops;
+ cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
+
+ cur->bc_private.a.agbp = agbp;
+ cur->bc_private.a.agno = agno;
+
+ return cur;
+}
+
+/*
+ * Calculate number of records in an rmap btree block.
+ */
+int
+xfs_rmapbt_maxrecs(
+ struct xfs_mount *mp,
+ int blocklen,
+ int leaf)
+{
+ blocklen -= XFS_RMAP_BLOCK_LEN;
+
+ if (leaf)
+ return blocklen / sizeof(struct xfs_rmap_rec);
+ return blocklen /
+ (sizeof(struct xfs_rmap_key) + sizeof(xfs_rmap_ptr_t));
+}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index a3b8f90..2e02362 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -19,6 +19,37 @@
#define __XFS_RMAP_BTREE_H__
struct xfs_buf;
+struct xfs_btree_cur;
+struct xfs_mount;
+
+/* rmaps only exist on crc enabled filesystems */
+#define XFS_RMAP_BLOCK_LEN XFS_BTREE_SBLOCK_CRC_LEN
+
+/*
+ * Record, key, and pointer address macros for btree blocks.
+ *
+ * (note that some of these may appear unused, but they are used in userspace)
+ */
+#define XFS_RMAP_REC_ADDR(block, index) \
+ ((struct xfs_rmap_rec *) \
+ ((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+ (((index) - 1) * sizeof(struct xfs_rmap_rec))))
+
+#define XFS_RMAP_KEY_ADDR(block, index) \
+ ((struct xfs_rmap_key *) \
+ ((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+ ((index) - 1) * sizeof(struct xfs_rmap_key)))
+
+#define XFS_RMAP_PTR_ADDR(block, index, maxrecs) \
+ ((xfs_rmap_ptr_t *) \
+ ((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+ (maxrecs) * sizeof(struct xfs_rmap_key) + \
+ ((index) - 1) * sizeof(xfs_rmap_ptr_t)))
+
+struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,
+ struct xfs_trans *tp, struct xfs_buf *bp,
+ xfs_agnumber_t agno);
+int xfs_rmapbt_maxrecs(struct xfs_mount *mp, int blocklen, int leaf);
int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 4742514..42b5696 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -35,6 +35,7 @@
#include "xfs_bmap_btree.h"
#include "xfs_alloc_btree.h"
#include "xfs_ialloc_btree.h"
+#include "xfs_rmap_btree.h"
/*
* Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -717,6 +718,11 @@ xfs_sb_mount_common(
mp->m_bmap_dmnr[0] = mp->m_bmap_dmxr[0] / 2;
mp->m_bmap_dmnr[1] = mp->m_bmap_dmxr[1] / 2;
+ mp->m_rmap_mxr[0] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, 1);
+ mp->m_rmap_mxr[1] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, 0);
+ mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2;
+ mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2;
+
mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
sbp->sb_inopblock);
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 5be5297..88efbb4 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -38,6 +38,7 @@ extern const struct xfs_buf_ops xfs_agi_buf_ops;
extern const struct xfs_buf_ops xfs_agf_buf_ops;
extern const struct xfs_buf_ops xfs_agfl_buf_ops;
extern const struct xfs_buf_ops xfs_allocbt_buf_ops;
+extern const struct xfs_buf_ops xfs_rmapbt_buf_ops;
extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
@@ -210,6 +211,7 @@ int xfs_log_calc_minimum_size(struct xfs_mount *);
#define XFS_INO_BTREE_REF 3
#define XFS_ALLOC_BTREE_REF 2
#define XFS_BMAP_BTREE_REF 2
+#define XFS_RMAP_BTREE_REF 2
#define XFS_DIR_BTREE_REF 2
#define XFS_INO_REF 2
#define XFS_ATTR_BTREE_REF 1
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index d9c9834..8030627 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -90,6 +90,8 @@ typedef struct xfs_mount {
uint m_bmap_dmnr[2]; /* min bmap btree records */
uint m_inobt_mxr[2]; /* max inobt btree records */
uint m_inobt_mnr[2]; /* min inobt btree records */
+ uint m_rmap_mxr[2]; /* max rmap btree records */
+ uint m_rmap_mnr[2]; /* min rmap btree records */
uint m_ag_maxlevels; /* XFS_AG_MAXLEVELS */
uint m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
uint m_in_maxlevels; /* max inobt btree levels. */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 11/58] xfs: enhance the on-disk rmap btree format
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (9 preceding siblings ...)
2015-10-07 4:55 ` [PATCH 10/58] xfs: define the on-disk rmap btree format Darrick J. Wong
@ 2015-10-07 4:55 ` Darrick J. Wong
2015-10-07 4:56 ` [PATCH 12/58] xfs: add rmap btree growfs support Darrick J. Wong
` (46 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:55 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Expand the rmap btree to record owner and offset info.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_format.h | 68 ++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_rmap_btree.c | 2 +
2 files changed, 69 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index ff24083..f0cf383 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1369,6 +1369,8 @@ XFS_RMAP_INO_OWNER(
#define XFS_RMAP_OWN_INODES (-7ULL) /* Inode chunk */
#define XFS_RMAP_OWN_MIN (-8ULL) /* guard */
+#define XFS_RMAP_NON_INODE_OWNER(owner) (!!((owner) & (1ULL << 63)))
+
/*
* Data record structure
*/
@@ -1376,12 +1378,44 @@ struct xfs_rmap_rec {
__be32 rm_startblock; /* extent start block */
__be32 rm_blockcount; /* extent length */
__be64 rm_owner; /* extent owner */
+ __be64 rm_offset; /* offset within the owner */
};
+/*
+ * rmap btree record
+ * rm_blockcount:31 is the unwritten extent flag (same as l0:63 in bmbt)
+ * rm_blockcount:0-30 are the extent length
+ * rm_offset:63 is the attribute fork flag
+ * rm_offset:62 is the bmbt block flag
+ * rm_offset:0-61 is the block offset within the inode
+ */
+#define XFS_RMAP_OFF_ATTR ((__uint64_t)1ULL << 63)
+#define XFS_RMAP_OFF_BMBT ((__uint64_t)1ULL << 62)
+#define XFS_RMAP_LEN_UNWRITTEN ((xfs_extlen_t)1U << 31)
+
+#define XFS_RMAP_OFF_MASK ~(XFS_RMAP_OFF_ATTR | XFS_RMAP_OFF_BMBT)
+#define XFS_RMAP_LEN_MASK ~XFS_RMAP_LEN_UNWRITTEN
+
+#define XFS_RMAP_OFF(off) ((off) & XFS_RMAP_OFF_MASK)
+#define XFS_RMAP_LEN(len) ((len) & XFS_RMAP_LEN_MASK)
+
+#define XFS_RMAP_IS_BMBT(off) (!!((off) & XFS_RMAP_OFF_BMBT))
+#define XFS_RMAP_IS_ATTR_FORK(off) (!!((off) & XFS_RMAP_OFF_ATTR))
+#define XFS_RMAP_IS_UNWRITTEN(len) (!!((len) & XFS_RMAP_LEN_UNWRITTEN))
+
+#define RMAPBT_STARTBLOCK_BITLEN 32
+#define RMAPBT_EXNTFLAG_BITLEN 1
+#define RMAPBT_BLOCKCOUNT_BITLEN 31
+#define RMAPBT_OWNER_BITLEN 64
+#define RMAPBT_ATTRFLAG_BITLEN 1
+#define RMAPBT_BMBTFLAG_BITLEN 1
+#define RMAPBT_OFFSET_BITLEN 62
+
struct xfs_rmap_irec {
xfs_agblock_t rm_startblock; /* extent start block */
xfs_extlen_t rm_blockcount; /* extent length */
__uint64_t rm_owner; /* extent owner */
+ __uint64_t rm_offset; /* offset within the owner */
};
/*
@@ -1391,6 +1425,8 @@ struct xfs_rmap_irec {
*/
struct xfs_rmap_key {
__be32 rm_startblock; /* extent start block */
+ __be64 rm_owner; /* extent owner */
+ __be64 rm_offset; /* offset within the owner */
};
/* btree pointer type */
@@ -1401,6 +1437,38 @@ typedef __be32 xfs_rmap_ptr_t;
XFS_FIBT_BLOCK(mp) + 1 : \
XFS_IBT_BLOCK(mp) + 1)
+static inline void
+xfs_owner_info_unpack(
+ struct xfs_owner_info *oinfo,
+ uint64_t *owner,
+ uint64_t *offset)
+{
+ __uint64_t r;
+
+ *owner = oinfo->oi_owner;
+ r = oinfo->oi_offset;
+ if (oinfo->oi_flags & XFS_RMAP_INO_ATTR_FORK)
+ r |= XFS_RMAP_OFF_ATTR;
+ if (oinfo->oi_flags & XFS_RMAP_BMBT_BLOCK)
+ r |= XFS_RMAP_OFF_BMBT;
+ *offset = r;
+}
+
+static inline void
+xfs_owner_info_pack(
+ struct xfs_owner_info *oinfo,
+ uint64_t owner,
+ uint64_t offset)
+{
+ oinfo->oi_owner = owner;
+ oinfo->oi_offset = XFS_RMAP_OFF(offset);
+ oinfo->oi_flags = 0;
+ if (XFS_RMAP_IS_ATTR_FORK(offset))
+ oinfo->oi_flags |= XFS_RMAP_INO_ATTR_FORK;
+ if (XFS_RMAP_IS_BMBT(offset))
+ oinfo->oi_flags |= XFS_RMAP_BMBT_BLOCK;
+}
+
/*
* BMAP Btree format definitions
*
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 9a02699..5671771 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -63,7 +63,7 @@ xfs_rmapbt_verify(
* from the on disk AGF. Again, we can only check against maximum limits
* in this case.
*/
- if (block->bb_magic!= cpu_to_be32(XFS_RMAP_CRC_MAGIC))
+ if (block->bb_magic != cpu_to_be32(XFS_RMAP_CRC_MAGIC))
return false;
if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 12/58] xfs: add rmap btree growfs support
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (10 preceding siblings ...)
2015-10-07 4:55 ` [PATCH 11/58] xfs: enhance " Darrick J. Wong
@ 2015-10-07 4:56 ` Darrick J. Wong
2015-10-07 4:56 ` [PATCH 13/58] xfs: enhance " Darrick J. Wong
` (45 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:56 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
Now we can read and write rmap btree blocks, we can add support to
the growfs code to initialise new rmap btree blocks.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_fsops.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index b3d1665..38c78da 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -32,6 +32,7 @@
#include "xfs_btree.h"
#include "xfs_alloc_btree.h"
#include "xfs_alloc.h"
+#include "xfs_rmap_btree.h"
#include "xfs_ialloc.h"
#include "xfs_fsops.h"
#include "xfs_itable.h"
@@ -243,6 +244,12 @@ xfs_growfs_data_private(
agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(XFS_CNT_BLOCK(mp));
agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+ agf->agf_roots[XFS_BTNUM_RMAPi] =
+ cpu_to_be32(XFS_RMAP_BLOCK(mp));
+ agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
+ }
+
agf->agf_flfirst = 0;
agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
agf->agf_flcount = 0;
@@ -382,6 +389,67 @@ xfs_growfs_data_private(
if (error)
goto error0;
+ /* RMAP btree root block */
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+ struct xfs_rmap_rec *rrec;
+ struct xfs_btree_block *block;
+ bp = xfs_growfs_get_hdr_buf(mp,
+ XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
+ BTOBB(mp->m_sb.sb_blocksize), 0,
+ &xfs_rmapbt_buf_ops);
+ if (!bp) {
+ error = -ENOMEM;
+ goto error0;
+ }
+
+ xfs_btree_init_block(mp, bp, XFS_RMAP_CRC_MAGIC, 0, 2,
+ agno, XFS_BTREE_CRC_BLOCKS);
+ block = XFS_BUF_TO_BLOCK(bp);
+
+
+ /*
+ * mark the AG header regions as static metadata The BNO
+ * btree block is the first block after the headers, so
+ * it's location defines the size of region the static
+ * metadata consumes.
+ *
+ * Note: unlike mkfs, we never have to account for log
+ * space when growing the data regions
+ */
+ rrec = XFS_RMAP_REC_ADDR(block, 1);
+ rrec->rm_startblock = 0;
+ rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
+ rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
+ be16_add_cpu(&block->bb_numrecs, 1);
+
+ /* account freespace btree root blocks */
+ rrec = XFS_RMAP_REC_ADDR(block, 2);
+ rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
+ rrec->rm_blockcount = cpu_to_be32(2);
+ rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+ be16_add_cpu(&block->bb_numrecs, 1);
+
+ /* account inode btree root blocks */
+ rrec = XFS_RMAP_REC_ADDR(block, 3);
+ rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
+ rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
+ XFS_IBT_BLOCK(mp));
+ rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
+ be16_add_cpu(&block->bb_numrecs, 1);
+
+ /* account for rmap btree root */
+ rrec = XFS_RMAP_REC_ADDR(block, 4);
+ rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
+ rrec->rm_blockcount = cpu_to_be32(1);
+ rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+ be16_add_cpu(&block->bb_numrecs, 1);
+
+ error = xfs_bwrite(bp);
+ xfs_buf_relse(bp);
+ if (error)
+ goto error0;
+ }
+
/*
* INO btree root block
*/
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 13/58] xfs: enhance rmap btree growfs support
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (11 preceding siblings ...)
2015-10-07 4:56 ` [PATCH 12/58] xfs: add rmap btree growfs support Darrick J. Wong
@ 2015-10-07 4:56 ` Darrick J. Wong
2015-10-07 4:56 ` [PATCH 14/58] xfs: rmap btree transaction reservations Darrick J. Wong
` (44 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:56 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Fill out the rmap offset fields.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_fsops.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 38c78da..119be0a 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -393,6 +393,7 @@ xfs_growfs_data_private(
if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
struct xfs_rmap_rec *rrec;
struct xfs_btree_block *block;
+
bp = xfs_growfs_get_hdr_buf(mp,
XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
BTOBB(mp->m_sb.sb_blocksize), 0,
@@ -402,7 +403,7 @@ xfs_growfs_data_private(
goto error0;
}
- xfs_btree_init_block(mp, bp, XFS_RMAP_CRC_MAGIC, 0, 2,
+ xfs_btree_init_block(mp, bp, XFS_RMAP_CRC_MAGIC, 0, 0,
agno, XFS_BTREE_CRC_BLOCKS);
block = XFS_BUF_TO_BLOCK(bp);
@@ -420,6 +421,7 @@ xfs_growfs_data_private(
rrec->rm_startblock = 0;
rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
+ rrec->rm_offset = 0;
be16_add_cpu(&block->bb_numrecs, 1);
/* account freespace btree root blocks */
@@ -427,6 +429,7 @@ xfs_growfs_data_private(
rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
rrec->rm_blockcount = cpu_to_be32(2);
rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+ rrec->rm_offset = 0;
be16_add_cpu(&block->bb_numrecs, 1);
/* account inode btree root blocks */
@@ -435,13 +438,15 @@ xfs_growfs_data_private(
rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
XFS_IBT_BLOCK(mp));
rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
+ rrec->rm_offset = 0;
be16_add_cpu(&block->bb_numrecs, 1);
- /* account for rmap btree root */
+ /* account for rmap btree root */
rrec = XFS_RMAP_REC_ADDR(block, 4);
rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
rrec->rm_blockcount = cpu_to_be32(1);
rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+ rrec->rm_offset = 0;
be16_add_cpu(&block->bb_numrecs, 1);
error = xfs_bwrite(bp);
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 14/58] xfs: rmap btree transaction reservations
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (12 preceding siblings ...)
2015-10-07 4:56 ` [PATCH 13/58] xfs: enhance " Darrick J. Wong
@ 2015-10-07 4:56 ` Darrick J. Wong
2015-10-07 4:56 ` [PATCH 15/58] xfs: rmap btree requires more reserved free space Darrick J. Wong
` (43 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:56 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
The rmap btrees will use the AGFL as the block allocation source, so
we need to ensure that the transaction reservations reflect the fact
this tree is modified by allocation and freeing. Hence we need to
extend all the extent allocation/free reservations used in
transactions to handle this.
Note that this also gets rid of the unused XFS_ALLOCFREE_LOG_RES
macro, as we now do buffer reservations based on the number of
buffers logged via xfs_calc_buf_res(). Hence we only need the buffer
count calculation now.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_trans_resv.c | 56 ++++++++++++++++++++++++++++------------
fs/xfs/libxfs/xfs_trans_resv.h | 10 -------
2 files changed, 39 insertions(+), 27 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 68cb1e7..d495f82 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -64,6 +64,28 @@ xfs_calc_buf_res(
}
/*
+ * Per-extent log reservation for the allocation btree changes
+ * involved in freeing or allocating an extent. When rmap is not enabled,
+ * there are only two trees that will be modified (free space trees), and when
+ * rmap is enabled there will be three (freespace + rmap trees). The number of
+ * blocks reserved is based on the formula:
+ *
+ * num trees * ((2 blocks/level * max depth) - 1)
+ */
+static uint
+xfs_allocfree_log_count(
+ struct xfs_mount *mp,
+ uint num_ops)
+{
+ uint num_trees = 2;
+
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+ num_trees++;
+
+ return num_ops * num_trees * (2 * mp->m_ag_maxlevels - 1);
+}
+
+/*
* Logging inodes is really tricksy. They are logged in memory format,
* which means that what we write into the log doesn't directly translate into
* the amount of space they use on disk.
@@ -126,7 +148,7 @@ xfs_calc_inode_res(
*/
STATIC uint
xfs_calc_finobt_res(
- struct xfs_mount *mp,
+ struct xfs_mount *mp,
int alloc,
int modify)
{
@@ -137,7 +159,7 @@ xfs_calc_finobt_res(
res = xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1));
if (alloc)
- res += xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ res += xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
XFS_FSB_TO_B(mp, 1));
if (modify)
res += (uint)XFS_FSB_TO_B(mp, 1);
@@ -188,10 +210,10 @@ xfs_calc_write_reservation(
xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
XFS_FSB_TO_B(mp, 1)) +
xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
XFS_FSB_TO_B(mp, 1))),
(xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
XFS_FSB_TO_B(mp, 1))));
}
@@ -217,10 +239,10 @@ xfs_calc_itruncate_reservation(
xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1,
XFS_FSB_TO_B(mp, 1))),
(xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
XFS_FSB_TO_B(mp, 1)) +
xfs_calc_buf_res(5, 0) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
XFS_FSB_TO_B(mp, 1)) +
xfs_calc_buf_res(2 + mp->m_ialloc_blks +
mp->m_in_maxlevels, 0)));
@@ -247,7 +269,7 @@ xfs_calc_rename_reservation(
xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
XFS_FSB_TO_B(mp, 1))),
(xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 3),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 3),
XFS_FSB_TO_B(mp, 1))));
}
@@ -286,7 +308,7 @@ xfs_calc_link_reservation(
xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
XFS_FSB_TO_B(mp, 1))),
(xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
XFS_FSB_TO_B(mp, 1))));
}
@@ -324,7 +346,7 @@ xfs_calc_remove_reservation(
xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
XFS_FSB_TO_B(mp, 1))),
(xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
XFS_FSB_TO_B(mp, 1))));
}
@@ -371,7 +393,7 @@ xfs_calc_create_resv_alloc(
mp->m_sb.sb_sectsize +
xfs_calc_buf_res(mp->m_ialloc_blks, XFS_FSB_TO_B(mp, 1)) +
xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
XFS_FSB_TO_B(mp, 1));
}
@@ -399,7 +421,7 @@ xfs_calc_icreate_resv_alloc(
return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
mp->m_sb.sb_sectsize +
xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
XFS_FSB_TO_B(mp, 1)) +
xfs_calc_finobt_res(mp, 0, 0);
}
@@ -483,7 +505,7 @@ xfs_calc_ifree_reservation(
xfs_calc_buf_res(1, 0) +
xfs_calc_buf_res(2 + mp->m_ialloc_blks +
mp->m_in_maxlevels, 0) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
XFS_FSB_TO_B(mp, 1)) +
xfs_calc_finobt_res(mp, 0, 1);
}
@@ -513,7 +535,7 @@ xfs_calc_growdata_reservation(
struct xfs_mount *mp)
{
return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
XFS_FSB_TO_B(mp, 1));
}
@@ -535,7 +557,7 @@ xfs_calc_growrtalloc_reservation(
xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
XFS_FSB_TO_B(mp, 1)) +
xfs_calc_inode_res(mp, 1) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
XFS_FSB_TO_B(mp, 1));
}
@@ -611,7 +633,7 @@ xfs_calc_addafork_reservation(
xfs_calc_buf_res(1, mp->m_dir_geo->blksize) +
xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1,
XFS_FSB_TO_B(mp, 1)) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
XFS_FSB_TO_B(mp, 1));
}
@@ -634,7 +656,7 @@ xfs_calc_attrinval_reservation(
xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
XFS_FSB_TO_B(mp, 1))),
(xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
XFS_FSB_TO_B(mp, 1))));
}
@@ -701,7 +723,7 @@ xfs_calc_attrrm_reservation(
XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
(xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
- xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+ xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
XFS_FSB_TO_B(mp, 1))));
}
diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h
index 7978150..0eb46ed 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.h
+++ b/fs/xfs/libxfs/xfs_trans_resv.h
@@ -68,16 +68,6 @@ struct xfs_trans_resv {
#define M_RES(mp) (&(mp)->m_resv)
/*
- * Per-extent log reservation for the allocation btree changes
- * involved in freeing or allocating an extent.
- * 2 trees * (2 blocks/level * max depth - 1) * block size
- */
-#define XFS_ALLOCFREE_LOG_RES(mp,nx) \
- ((nx) * (2 * XFS_FSB_TO_B((mp), 2 * (mp)->m_ag_maxlevels - 1)))
-#define XFS_ALLOCFREE_LOG_COUNT(mp,nx) \
- ((nx) * (2 * (2 * (mp)->m_ag_maxlevels - 1)))
-
-/*
* Per-directory log reservation for any directory change.
* dir blocks: (1 btree block per level + data block + free block) * dblock size
* bmap btree: (levels + 2) * max depth * block size
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 15/58] xfs: rmap btree requires more reserved free space
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (13 preceding siblings ...)
2015-10-07 4:56 ` [PATCH 14/58] xfs: rmap btree transaction reservations Darrick J. Wong
@ 2015-10-07 4:56 ` Darrick J. Wong
2015-10-07 4:56 ` [PATCH 16/58] libxfs: fix min freelist length calculation Darrick J. Wong
` (42 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:56 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functiosn to handle this.
Also, because the macros are now executing conditional code and are called quite
frequently, convert them to functions that initialise varaibles in the struct
xfs_mount, use the new variables everywhere and document the calculations
better.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_alloc.c | 69 +++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_alloc.h | 41 +++------------------------
fs/xfs/libxfs/xfs_bmap.c | 2 +
fs/xfs/libxfs/xfs_sb.c | 2 +
fs/xfs/xfs_discard.c | 2 +
fs/xfs/xfs_fsops.c | 4 +--
fs/xfs/xfs_mount.c | 2 +
fs/xfs/xfs_mount.h | 2 +
fs/xfs/xfs_super.c | 2 +
9 files changed, 84 insertions(+), 42 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 3c55fa7..6fd9d3e 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -62,6 +62,72 @@ xfs_prealloc_blocks(
}
/*
+ * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
+ * AGF buffer (PV 947395), we place constraints on the relationship among actual
+ * allocations for data blocks, freelist blocks, and potential file data bmap
+ * btree blocks. However, these restrictions may result in no actual space
+ * allocated for a delayed extent, for example, a data block in a certain AG is
+ * allocated but there is no additional block for the additional bmap btree
+ * block due to a split of the bmap btree of the file. The result of this may
+ * lead to an infinite loop when the file gets flushed to disk and all delayed
+ * extents need to be actually allocated. To get around this, we explicitly set
+ * aside a few blocks which will not be reserved in delayed allocation.
+ *
+ * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
+ * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
+ * so we set the number of set-aside blocks to 4 + 4*agcount when not using rmap
+ * btrees.
+ *
+ * When rmap btrees are active, we have to consider that using the last block in
+ * the AG can cause a full height rmap btree split and we need enough blocks on
+ * the AGFL to be able to handle this. That means we have, in addition to the
+ * above consideration, another (2 * mp->m_ag_levels) - 1 blocks required to be
+ * available to the free list.
+ */
+unsigned int
+xfs_alloc_set_aside(
+ struct xfs_mount *mp)
+{
+ unsigned int blocks;
+
+ blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return blocks;
+ return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_ag_maxlevels) - 1);
+}
+
+/*
+ * When deciding how much space to allocate out of an AG, we limit the
+ * allocation maximum size to the size the AG. However, we cannot use all the
+ * blocks in the AG - some are permanently used by metadata. These
+ * blocks are generally:
+ * - the AG superblock, AGF, AGI and AGFL
+ * - the AGF (bno and cnt) and AGI btree root blocks, and optionally
+ * the AGI free inode and rmap btree root blocks.
+ * - blocks on the AGFL according to xfs_alloc_set_aside() limits
+ *
+ * The AG headers are sector sized, so the amount of space they take up is
+ * dependent on filesystem geometry. The others are all single blocks.
+ */
+unsigned int
+xfs_alloc_ag_max_usable(struct xfs_mount *mp)
+{
+ unsigned int blocks;
+
+ blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
+ blocks += XFS_ALLOC_AGFL_RESERVE;
+ blocks += 3; /* AGF, AGI btree root blocks */
+ if (xfs_sb_version_hasfinobt(&mp->m_sb))
+ blocks++; /* finobt root block */
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+ /* rmap root block + full tree split on full AG */
+ blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
+ }
+
+ return mp->m_sb.sb_agblocks - blocks;
+}
+
+/*
* Lookup the record equal to [bno, len] in the btree given by cur.
*/
STATIC int /* error */
@@ -1911,6 +1977,9 @@ xfs_alloc_min_freelist(
/* space needed by-size freespace btree */
min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
mp->m_ag_maxlevels);
+ /* space needed reverse mapping used space btree */
+ min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
+ mp->m_ag_maxlevels);
return min_free;
}
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 95b161f..67e564e 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -56,42 +56,6 @@ typedef unsigned int xfs_alloctype_t;
#define XFS_ALLOC_FLAG_FREEING 0x00000002 /* indicate caller is freeing extents*/
/*
- * In order to avoid ENOSPC-related deadlock caused by
- * out-of-order locking of AGF buffer (PV 947395), we place
- * constraints on the relationship among actual allocations for
- * data blocks, freelist blocks, and potential file data bmap
- * btree blocks. However, these restrictions may result in no
- * actual space allocated for a delayed extent, for example, a data
- * block in a certain AG is allocated but there is no additional
- * block for the additional bmap btree block due to a split of the
- * bmap btree of the file. The result of this may lead to an
- * infinite loop in xfssyncd when the file gets flushed to disk and
- * all delayed extents need to be actually allocated. To get around
- * this, we explicitly set aside a few blocks which will not be
- * reserved in delayed allocation. Considering the minimum number of
- * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
- * btree requires 1 fsb, so we set the number of set-aside blocks
- * to 4 + 4*agcount.
- */
-#define XFS_ALLOC_SET_ASIDE(mp) (4 + ((mp)->m_sb.sb_agcount * 4))
-
-/*
- * When deciding how much space to allocate out of an AG, we limit the
- * allocation maximum size to the size the AG. However, we cannot use all the
- * blocks in the AG - some are permanently used by metadata. These
- * blocks are generally:
- * - the AG superblock, AGF, AGI and AGFL
- * - the AGF (bno and cnt) and AGI btree root blocks
- * - 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
- *
- * The AG headers are sector sized, so the amount of space they take up is
- * dependent on filesystem geometry. The others are all single blocks.
- */
-#define XFS_ALLOC_AG_MAX_USABLE(mp) \
- ((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
-
-
-/*
* Argument structure for xfs_alloc routines.
* This is turned into a structure to avoid having 20 arguments passed
* down several levels of the stack.
@@ -131,6 +95,11 @@ typedef struct xfs_alloc_arg {
#define XFS_ALLOC_USERDATA 1 /* allocation is for user data*/
#define XFS_ALLOC_INITIAL_USER_DATA 2 /* special case start of file */
+/* freespace limit calculations */
+#define XFS_ALLOC_AGFL_RESERVE 4
+unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
+unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
+
xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
struct xfs_perag *pag, xfs_extlen_t need);
unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index fef3767..e740ef5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3716,7 +3716,7 @@ xfs_bmap_btalloc(
args.fsbno = ap->blkno;
/* Trim the allocation back to the maximum an AG can fit. */
- args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
+ args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
args.firstblock = *ap->firstblock;
blen = 0;
if (nullfb) {
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 42b5696..0103a75 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -732,6 +732,8 @@ xfs_sb_mount_common(
mp->m_ialloc_min_blks = sbp->sb_spino_align;
else
mp->m_ialloc_min_blks = mp->m_ialloc_blks;
+ mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
+ mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
}
/*
diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index e85a951..ec7bb8b 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -179,7 +179,7 @@ xfs_ioc_trim(
* matter as trimming blocks is an advisory interface.
*/
if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) ||
- range.minlen > XFS_FSB_TO_B(mp, XFS_ALLOC_AG_MAX_USABLE(mp)) ||
+ range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) ||
range.len < mp->m_sb.sb_blocksize)
return -EINVAL;
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 119be0a..031dd92 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -723,7 +723,7 @@ xfs_fs_counts(
cnt->allocino = percpu_counter_read_positive(&mp->m_icount);
cnt->freeino = percpu_counter_read_positive(&mp->m_ifree);
cnt->freedata = percpu_counter_read_positive(&mp->m_fdblocks) -
- XFS_ALLOC_SET_ASIDE(mp);
+ mp->m_alloc_set_aside;
spin_lock(&mp->m_sb_lock);
cnt->freertx = mp->m_sb.sb_frextents;
@@ -796,7 +796,7 @@ retry:
__int64_t free;
free = percpu_counter_sum(&mp->m_fdblocks) -
- XFS_ALLOC_SET_ASIDE(mp);
+ mp->m_alloc_set_aside;
if (!free)
goto out; /* ENOSPC and fdblks_delta = 0 */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 1b2f72c..88da908 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1196,7 +1196,7 @@ xfs_mod_fdblocks(
batch = XFS_FDBLOCKS_BATCH;
__percpu_counter_add(&mp->m_fdblocks, delta, batch);
- if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
+ if (__percpu_counter_compare(&mp->m_fdblocks, mp->m_alloc_set_aside,
XFS_FDBLOCKS_BATCH) >= 0) {
/* we had space! */
return 0;
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 8030627..cdced0b 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -96,6 +96,8 @@ typedef struct xfs_mount {
uint m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
uint m_in_maxlevels; /* max inobt btree levels. */
xfs_extlen_t m_ag_prealloc_blocks; /* reserved ag blocks */
+ uint m_alloc_set_aside; /* space we can't use */
+ uint m_ag_max_usable; /* max space per AG */
struct radix_tree_root m_perag_tree; /* per-ag accounting info */
spinlock_t m_perag_lock; /* lock for m_perag_tree */
struct mutex m_growlock; /* growfs mutex */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 904f637..6dc5993 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1072,7 +1072,7 @@ xfs_fs_statfs(
statp->f_blocks = sbp->sb_dblocks - lsize;
spin_unlock(&mp->m_sb_lock);
- statp->f_bfree = fdblocks - XFS_ALLOC_SET_ASIDE(mp);
+ statp->f_bfree = fdblocks - mp->m_alloc_set_aside;
statp->f_bavail = statp->f_bfree;
fakeinos = statp->f_bfree << sbp->sb_inopblog;
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 16/58] libxfs: fix min freelist length calculation
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (14 preceding siblings ...)
2015-10-07 4:56 ` [PATCH 15/58] xfs: rmap btree requires more reserved free space Darrick J. Wong
@ 2015-10-07 4:56 ` Darrick J. Wong
2015-10-07 4:56 ` [PATCH 17/58] xfs: add rmap btree operations Darrick J. Wong
` (41 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:56 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
If rmapbt is disabled, it is incorrect to require 1 extra AGFL block
for the rmapbt (due to the + 1); the entire clause needs to be gated
on the feature flag.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_alloc.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6fd9d3e..d7b9d43 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1978,8 +1978,10 @@ xfs_alloc_min_freelist(
min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
mp->m_ag_maxlevels);
/* space needed reverse mapping used space btree */
- min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
- mp->m_ag_maxlevels);
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+ min_free += min_t(unsigned int,
+ pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
+ mp->m_ag_maxlevels);
return min_free;
}
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 17/58] xfs: add rmap btree operations
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (15 preceding siblings ...)
2015-10-07 4:56 ` [PATCH 16/58] libxfs: fix min freelist length calculation Darrick J. Wong
@ 2015-10-07 4:56 ` Darrick J. Wong
2015-10-07 4:57 ` [PATCH 18/58] xfs: enhance " Darrick J. Wong
` (40 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:56 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, Dave Chinner, xfs
>From : Dave Chinner <dchinner@redhat.com>
Implement the generic btree operations needed to manipulate rmap
btree blocks. This is very similar to the per-ag freespace btree
implementation, and uses the AGFL for allocation and freeing of
blocks.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_btree.h | 1
fs/xfs/libxfs/xfs_rmap.c | 58 +++++++++++
fs/xfs/libxfs/xfs_rmap_btree.c | 204 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 263 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 67e2bbd..48ab2b1 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -204,6 +204,7 @@ typedef struct xfs_btree_cur
xfs_alloc_rec_incore_t a;
xfs_bmbt_irec_t b;
xfs_inobt_rec_incore_t i;
+ struct xfs_rmap_irec r;
} bc_rec; /* current insert/search record value */
struct xfs_buf *bc_bufs[XFS_BTREE_MAXLEVELS]; /* buf ptr per level */
int bc_ptrs[XFS_BTREE_MAXLEVELS]; /* key/record # */
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 3e17294..64b2525 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -36,6 +36,64 @@
#include "xfs_error.h"
#include "xfs_extent_busy.h"
+/*
+ * Lookup the first record less than or equal to [bno, len]
+ * in the btree given by cur.
+ */
+STATIC int
+xfs_rmap_lookup_le(
+ struct xfs_btree_cur *cur,
+ xfs_agblock_t bno,
+ xfs_extlen_t len,
+ uint64_t owner,
+ int *stat)
+{
+ cur->bc_rec.r.rm_startblock = bno;
+ cur->bc_rec.r.rm_blockcount = len;
+ cur->bc_rec.r.rm_owner = owner;
+ return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+}
+
+/*
+ * Update the record referred to by cur to the value given
+ * by [bno, len, ref].
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_rmap_update(
+ struct xfs_btree_cur *cur,
+ struct xfs_rmap_irec *irec)
+{
+ union xfs_btree_rec rec;
+
+ rec.rmap.rm_startblock = cpu_to_be32(irec->rm_startblock);
+ rec.rmap.rm_blockcount = cpu_to_be32(irec->rm_blockcount);
+ rec.rmap.rm_owner = cpu_to_be64(irec->rm_owner);
+ return xfs_btree_update(cur, &rec);
+}
+
+/*
+ * Get the data from the pointed-to record.
+ */
+STATIC int
+xfs_rmap_get_rec(
+ struct xfs_btree_cur *cur,
+ struct xfs_rmap_irec *irec,
+ int *stat)
+{
+ union xfs_btree_rec *rec;
+ int error;
+
+ error = xfs_btree_get_rec(cur, &rec, stat);
+ if (error || !*stat)
+ return error;
+
+ irec->rm_startblock = be32_to_cpu(rec->rmap.rm_startblock);
+ irec->rm_blockcount = be32_to_cpu(rec->rmap.rm_blockcount);
+ irec->rm_owner = be64_to_cpu(rec->rmap.rm_owner);
+ return 0;
+}
+
int
xfs_rmap_free(
struct xfs_trans *tp,
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 5671771..58bdac3 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -34,6 +34,26 @@
#include "xfs_error.h"
#include "xfs_extent_busy.h"
+/*
+ * Reverse map btree.
+ *
+ * This is a per-ag tree used to track the owner of a given extent. Owner
+ * records are inserted when an extent is allocated, and removed when an extent
+ * is freed. There can only be one owner of an extent, usually an inode or some
+ * other metadata structure like a AG btree.
+ *
+ * The rmap btree is part of the free space management, so blocks for the tree
+ * are sourced from the agfl. Hence we need transaction reservation support for
+ * this tree so that the freelist is always large enough. This also impacts on
+ * the minimum space we need to leave free in the AG.
+ *
+ * The tree is ordered by block number - there's no need to order/search by
+ * extent size for online updating/management of the tree, and the reverse
+ * lookups are going to be "who owns this block" and so are by-block ordering is
+ * perfect for this.
+ *
+ */
+
static struct xfs_btree_cur *
xfs_rmapbt_dup_cursor(
struct xfs_btree_cur *cur)
@@ -42,6 +62,153 @@ xfs_rmapbt_dup_cursor(
cur->bc_private.a.agbp, cur->bc_private.a.agno);
}
+STATIC void
+xfs_rmapbt_set_root(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_ptr *ptr,
+ int inc)
+{
+ struct xfs_buf *agbp = cur->bc_private.a.agbp;
+ struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp);
+ xfs_agnumber_t seqno = be32_to_cpu(agf->agf_seqno);
+ int btnum = cur->bc_btnum;
+ struct xfs_perag *pag = xfs_perag_get(cur->bc_mp, seqno);
+
+ ASSERT(ptr->s != 0);
+
+ agf->agf_roots[btnum] = ptr->s;
+ be32_add_cpu(&agf->agf_levels[btnum], inc);
+ pag->pagf_levels[btnum] += inc;
+ xfs_perag_put(pag);
+
+ xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
+}
+
+STATIC int
+xfs_rmapbt_alloc_block(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_ptr *start,
+ union xfs_btree_ptr *new,
+ int *stat)
+{
+ int error;
+ xfs_agblock_t bno;
+
+ XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
+
+ /* Allocate the new block from the freelist. If we can't, give up. */
+ error = xfs_alloc_get_freelist(cur->bc_tp, cur->bc_private.a.agbp,
+ &bno, 1);
+ if (error) {
+ XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR);
+ return error;
+ }
+
+ if (bno == NULLAGBLOCK) {
+ XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+ *stat = 0;
+ return 0;
+ }
+
+ xfs_extent_busy_reuse(cur->bc_mp, cur->bc_private.a.agno, bno, 1, false);
+
+ xfs_trans_agbtree_delta(cur->bc_tp, 1);
+ new->s = cpu_to_be32(bno);
+
+ XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+ *stat = 1;
+ return 0;
+}
+
+STATIC int
+xfs_rmapbt_free_block(
+ struct xfs_btree_cur *cur,
+ struct xfs_buf *bp)
+{
+ struct xfs_buf *agbp = cur->bc_private.a.agbp;
+ struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp);
+ xfs_agblock_t bno;
+ int error;
+
+ bno = xfs_daddr_to_agbno(cur->bc_mp, XFS_BUF_ADDR(bp));
+ error = xfs_alloc_put_freelist(cur->bc_tp, agbp, NULL, bno, 1);
+ if (error)
+ return error;
+
+ xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1,
+ XFS_EXTENT_BUSY_SKIP_DISCARD);
+ xfs_trans_agbtree_delta(cur->bc_tp, -1);
+
+ xfs_trans_binval(cur->bc_tp, bp);
+ return 0;
+}
+
+STATIC int
+xfs_rmapbt_get_minrecs(
+ struct xfs_btree_cur *cur,
+ int level)
+{
+ return cur->bc_mp->m_rmap_mnr[level != 0];
+}
+
+STATIC int
+xfs_rmapbt_get_maxrecs(
+ struct xfs_btree_cur *cur,
+ int level)
+{
+ return cur->bc_mp->m_rmap_mxr[level != 0];
+}
+
+STATIC void
+xfs_rmapbt_init_key_from_rec(
+ union xfs_btree_key *key,
+ union xfs_btree_rec *rec)
+{
+ key->rmap.rm_startblock = rec->rmap.rm_startblock;
+}
+
+STATIC void
+xfs_rmapbt_init_rec_from_key(
+ union xfs_btree_key *key,
+ union xfs_btree_rec *rec)
+{
+ rec->rmap.rm_startblock = key->rmap.rm_startblock;
+}
+
+STATIC void
+xfs_rmapbt_init_rec_from_cur(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_rec *rec)
+{
+ rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock);
+ rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount);
+ rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner);
+}
+
+STATIC void
+xfs_rmapbt_init_ptr_from_cur(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_ptr *ptr)
+{
+ struct xfs_agf *agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp);
+
+ ASSERT(cur->bc_private.a.agno == be32_to_cpu(agf->agf_seqno));
+ ASSERT(agf->agf_roots[cur->bc_btnum] != 0);
+
+ ptr->s = agf->agf_roots[cur->bc_btnum];
+}
+
+STATIC __int64_t
+xfs_rmapbt_key_diff(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_key *key)
+{
+ struct xfs_rmap_irec *rec = &cur->bc_rec.r;
+ struct xfs_rmap_key *kp = &key->rmap;
+
+ return (__int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock;
+}
+
static bool
xfs_rmapbt_verify(
struct xfs_buf *bp)
@@ -133,12 +300,49 @@ const struct xfs_buf_ops xfs_rmapbt_buf_ops = {
.verify_write = xfs_rmapbt_write_verify,
};
+#if defined(DEBUG) || defined(XFS_WARN)
+STATIC int
+xfs_rmapbt_keys_inorder(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_key *k1,
+ union xfs_btree_key *k2)
+{
+ return be32_to_cpu(k1->rmap.rm_startblock) <
+ be32_to_cpu(k2->rmap.rm_startblock);
+}
+
+STATIC int
+xfs_rmapbt_recs_inorder(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_rec *r1,
+ union xfs_btree_rec *r2)
+{
+ return be32_to_cpu(r1->rmap.rm_startblock) +
+ be32_to_cpu(r1->rmap.rm_blockcount) <=
+ be32_to_cpu(r2->rmap.rm_startblock);
+}
+#endif /* DEBUG */
+
static const struct xfs_btree_ops xfs_rmapbt_ops = {
.rec_len = sizeof(struct xfs_rmap_rec),
.key_len = sizeof(struct xfs_rmap_key),
.dup_cursor = xfs_rmapbt_dup_cursor,
+ .set_root = xfs_rmapbt_set_root,
+ .alloc_block = xfs_rmapbt_alloc_block,
+ .free_block = xfs_rmapbt_free_block,
+ .get_minrecs = xfs_rmapbt_get_minrecs,
+ .get_maxrecs = xfs_rmapbt_get_maxrecs,
+ .init_key_from_rec = xfs_rmapbt_init_key_from_rec,
+ .init_rec_from_key = xfs_rmapbt_init_rec_from_key,
+ .init_rec_from_cur = xfs_rmapbt_init_rec_from_cur,
+ .init_ptr_from_cur = xfs_rmapbt_init_ptr_from_cur,
+ .key_diff = xfs_rmapbt_key_diff,
.buf_ops = &xfs_rmapbt_buf_ops,
+#if defined(DEBUG) || defined(XFS_WARN)
+ .keys_inorder = xfs_rmapbt_keys_inorder,
+ .recs_inorder = xfs_rmapbt_recs_inorder,
+#endif
};
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 18/58] xfs: enhance rmap btree operations
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (16 preceding siblings ...)
2015-10-07 4:56 ` [PATCH 17/58] xfs: add rmap btree operations Darrick J. Wong
@ 2015-10-07 4:57 ` Darrick J. Wong
2015-10-07 4:57 ` [PATCH 19/58] xfs: add an extent to the rmap btree Darrick J. Wong
` (39 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:57 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Adapt the rmap btree to store owner offsets within each rmap record,
and to handle the primary key being extended to [agblk, owner, offset].
The expansion of the primary key is crucial to allowing multiple owners
per extent. Unfortunately, doing so adds the requirement that all rmap
records for file extents (metadata always has one owner) correspond to
some bmbt entry somewhere.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_rmap.c | 32 +++++++++++++++++---
fs/xfs/libxfs/xfs_rmap_btree.c | 65 ++++++++++++++++++++++++++++++----------
fs/xfs/libxfs/xfs_rmap_btree.h | 7 ++++
3 files changed, 84 insertions(+), 20 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 64b2525..f6fe742 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -37,26 +37,48 @@
#include "xfs_extent_busy.h"
/*
- * Lookup the first record less than or equal to [bno, len]
+ * Lookup the first record less than or equal to [bno, len, owner, offset]
* in the btree given by cur.
*/
-STATIC int
+int
xfs_rmap_lookup_le(
struct xfs_btree_cur *cur,
xfs_agblock_t bno,
xfs_extlen_t len,
uint64_t owner,
+ uint64_t offset,
int *stat)
{
cur->bc_rec.r.rm_startblock = bno;
cur->bc_rec.r.rm_blockcount = len;
cur->bc_rec.r.rm_owner = owner;
+ cur->bc_rec.r.rm_offset = offset;
return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
}
/*
+ * Lookup the record exactly matching [bno, len, owner, offset]
+ * in the btree given by cur.
+ */
+int
+xfs_rmap_lookup_eq(
+ struct xfs_btree_cur *cur,
+ xfs_agblock_t bno,
+ xfs_extlen_t len,
+ uint64_t owner,
+ uint64_t offset,
+ int *stat)
+{
+ cur->bc_rec.r.rm_startblock = bno;
+ cur->bc_rec.r.rm_blockcount = len;
+ cur->bc_rec.r.rm_owner = owner;
+ cur->bc_rec.r.rm_offset = offset;
+ return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
+}
+
+/*
* Update the record referred to by cur to the value given
- * by [bno, len, ref].
+ * by [bno, len, owner, offset].
* This either works (return 0) or gets an EFSCORRUPTED error.
*/
STATIC int
@@ -69,13 +91,14 @@ xfs_rmap_update(
rec.rmap.rm_startblock = cpu_to_be32(irec->rm_startblock);
rec.rmap.rm_blockcount = cpu_to_be32(irec->rm_blockcount);
rec.rmap.rm_owner = cpu_to_be64(irec->rm_owner);
+ rec.rmap.rm_offset = cpu_to_be64(irec->rm_offset);
return xfs_btree_update(cur, &rec);
}
/*
* Get the data from the pointed-to record.
*/
-STATIC int
+int
xfs_rmap_get_rec(
struct xfs_btree_cur *cur,
struct xfs_rmap_irec *irec,
@@ -91,6 +114,7 @@ xfs_rmap_get_rec(
irec->rm_startblock = be32_to_cpu(rec->rmap.rm_startblock);
irec->rm_blockcount = be32_to_cpu(rec->rmap.rm_blockcount);
irec->rm_owner = be64_to_cpu(rec->rmap.rm_owner);
+ irec->rm_offset = be64_to_cpu(rec->rmap.rm_offset);
return 0;
}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 58bdac3..5fe717b 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -37,21 +37,26 @@
/*
* Reverse map btree.
*
- * This is a per-ag tree used to track the owner of a given extent. Owner
- * records are inserted when an extent is allocated, and removed when an extent
- * is freed. There can only be one owner of an extent, usually an inode or some
- * other metadata structure like a AG btree.
+ * This is a per-ag tree used to track the owner(s) of a given extent. With
+ * reflink it is possible for there to be multiple owners, which is a departure
+ * from classic XFS. Owner records for data extents are inserted when the
+ * extent is mapped and removed when an extent is unmapped. Owner records for
+ * all other block types (i.e. metadata) are inserted when an extent is
+ * allocated and removed when an extent is freed. There can only be one owner
+ * of a metadata extent, usually an inode or some other metadata structure like
+ * an AG btree.
*
* The rmap btree is part of the free space management, so blocks for the tree
* are sourced from the agfl. Hence we need transaction reservation support for
* this tree so that the freelist is always large enough. This also impacts on
* the minimum space we need to leave free in the AG.
*
- * The tree is ordered by block number - there's no need to order/search by
- * extent size for online updating/management of the tree, and the reverse
- * lookups are going to be "who owns this block" and so are by-block ordering is
- * perfect for this.
- *
+ * The tree is ordered by [ag block, owner, offset]. This is a large key size,
+ * but it is the only way to enforce unique keys when a block can be owned by
+ * multiple files at any offset. There's no need to order/search by extent
+ * size for online updating/management of the tree. It is intended that most
+ * reverse lookups will be to find the owner(s) of a particular block, or to
+ * try to recover tree and file data from corrupt primary metadata.
*/
static struct xfs_btree_cur *
@@ -165,6 +170,8 @@ xfs_rmapbt_init_key_from_rec(
union xfs_btree_rec *rec)
{
key->rmap.rm_startblock = rec->rmap.rm_startblock;
+ key->rmap.rm_owner = rec->rmap.rm_owner;
+ key->rmap.rm_offset = rec->rmap.rm_offset;
}
STATIC void
@@ -173,6 +180,8 @@ xfs_rmapbt_init_rec_from_key(
union xfs_btree_rec *rec)
{
rec->rmap.rm_startblock = key->rmap.rm_startblock;
+ rec->rmap.rm_owner = key->rmap.rm_owner;
+ rec->rmap.rm_offset = key->rmap.rm_offset;
}
STATIC void
@@ -183,6 +192,7 @@ xfs_rmapbt_init_rec_from_cur(
rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock);
rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount);
rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner);
+ rec->rmap.rm_offset = cpu_to_be64(cur->bc_rec.r.rm_offset);
}
STATIC void
@@ -205,8 +215,16 @@ xfs_rmapbt_key_diff(
{
struct xfs_rmap_irec *rec = &cur->bc_rec.r;
struct xfs_rmap_key *kp = &key->rmap;
-
- return (__int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock;
+ __int64_t d;
+
+ d = (__int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock;
+ if (d)
+ return d;
+ d = (__int64_t)be64_to_cpu(kp->rm_owner) - rec->rm_owner;
+ if (d)
+ return d;
+ d = (__int64_t)be64_to_cpu(kp->rm_offset) - rec->rm_offset;
+ return d;
}
static bool
@@ -307,8 +325,16 @@ xfs_rmapbt_keys_inorder(
union xfs_btree_key *k1,
union xfs_btree_key *k2)
{
- return be32_to_cpu(k1->rmap.rm_startblock) <
- be32_to_cpu(k2->rmap.rm_startblock);
+ if (be32_to_cpu(k1->rmap.rm_startblock) <
+ be32_to_cpu(k2->rmap.rm_startblock))
+ return 1;
+ if (be64_to_cpu(k1->rmap.rm_owner) <
+ be64_to_cpu(k2->rmap.rm_owner))
+ return 1;
+ if (be64_to_cpu(k1->rmap.rm_offset) <=
+ be64_to_cpu(k2->rmap.rm_offset))
+ return 1;
+ return 0;
}
STATIC int
@@ -317,9 +343,16 @@ xfs_rmapbt_recs_inorder(
union xfs_btree_rec *r1,
union xfs_btree_rec *r2)
{
- return be32_to_cpu(r1->rmap.rm_startblock) +
- be32_to_cpu(r1->rmap.rm_blockcount) <=
- be32_to_cpu(r2->rmap.rm_startblock);
+ if (be32_to_cpu(r1->rmap.rm_startblock) <
+ be32_to_cpu(r2->rmap.rm_startblock))
+ return 1;
+ if (be64_to_cpu(r1->rmap.rm_offset) <
+ be64_to_cpu(r2->rmap.rm_offset))
+ return 1;
+ if (be64_to_cpu(r1->rmap.rm_owner) <=
+ be64_to_cpu(r2->rmap.rm_owner))
+ return 1;
+ return 0;
}
#endif /* DEBUG */
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index 2e02362..a5c97f8 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -51,6 +51,13 @@ struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,
xfs_agnumber_t agno);
int xfs_rmapbt_maxrecs(struct xfs_mount *mp, int blocklen, int leaf);
+int xfs_rmap_lookup_le(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+ xfs_extlen_t len, uint64_t owner, uint64_t offset, int *stat);
+int xfs_rmap_lookup_eq(struct xfs_btree_cur *cur, xfs_agblock_t bno,
+ xfs_extlen_t len, uint64_t owner, uint64_t offset, int *stat);
+int xfs_rmap_get_rec(struct xfs_btree_cur *cur, struct xfs_rmap_irec *irec,
+ int *stat);
+
int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
struct xfs_owner_info *oinfo);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 19/58] xfs: add an extent to the rmap btree
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (17 preceding siblings ...)
2015-10-07 4:57 ` [PATCH 18/58] xfs: enhance " Darrick J. Wong
@ 2015-10-07 4:57 ` Darrick J. Wong
2015-10-07 4:57 ` [PATCH 20/58] xfs: add tracepoints for the rmap-mirrors-bmbt functions Darrick J. Wong
` (38 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:57 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
Now all the btree, free space and transaction infrastructure is in
place, we can finally add the code to insert reverse mappings to the
rmap btree. Freeing will be done in a spearate patch, so just the
addition operation can be focussed on here.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_rmap.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 138 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index f6fe742..d1b6c82 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -144,6 +144,18 @@ out_error:
return error;
}
+/*
+ * When we allocate a new block, the first thing we do is add a reference to the
+ * extent in the rmap btree. This takes the form of a [agbno, length, owner]
+ * record. Newly inserted extents should never overlap with an existing extent
+ * in the rmap btree. Hence the insertion is a relatively trivial exercise,
+ * involving checking for adjacent records and merging if the new extent is
+ * contiguous and has the same owner.
+ *
+ * Note that we have no MAXEXTLEN limits here when merging as the length in the
+ * record has the full 32 bits available and hence a single record can track the
+ * entire space in the AG.
+ */
int
xfs_rmap_alloc(
struct xfs_trans *tp,
@@ -154,18 +166,143 @@ xfs_rmap_alloc(
struct xfs_owner_info *oinfo)
{
struct xfs_mount *mp = tp->t_mountp;
+ struct xfs_btree_cur *cur;
+ struct xfs_rmap_irec ltrec;
+ struct xfs_rmap_irec gtrec;
+ int have_gt;
int error = 0;
+ int i;
if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
return 0;
trace_xfs_rmap_alloc_extent(mp, agno, bno, len, oinfo);
- if (1)
+ cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+
+ /*
+ * For the initial lookup, look for and exact match or the left-adjacent
+ * record for our insertion point. This will also give us the record for
+ * start block contiguity tests.
+ */
+ error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+ error = xfs_rmap_get_rec(cur, <rec, &i);
+ if (error)
goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+ //printk("rmalloc ag %d bno 0x%x/0x%x/0x%llx, ltrec 0x%x/0x%x/0x%llx\n",
+ // agno, bno, len, owner, ltrec.rm_startblock,
+ // ltrec.rm_blockcount, ltrec.rm_owner);
+
+ XFS_WANT_CORRUPTED_GOTO(mp,
+ ltrec.rm_startblock + ltrec.rm_blockcount <= bno, out_error);
+
+ /*
+ * Increment the cursor to see if we have a right-adjacent record to our
+ * insertion point. This will give us the record for end block
+ * contiguity tests.
+ */
+ error = xfs_btree_increment(cur, 0, &have_gt);
+ if (error)
+ goto out_error;
+ if (have_gt) {
+ error = xfs_rmap_get_rec(cur, >rec, &i);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+ //printk("rmalloc ag %d bno 0x%x/0x%x/0x%llx, gtrec 0x%x/0x%x/0x%llx\n",
+ // agno, bno, len, owner, gtrec.rm_startblock,
+ // gtrec.rm_blockcount, gtrec.rm_owner);
+ XFS_WANT_CORRUPTED_GOTO(mp, bno + len <= gtrec.rm_startblock,
+ out_error);
+ } else {
+ gtrec.rm_owner = XFS_RMAP_OWN_NULL;
+ }
+
+ /*
+ * Note: cursor currently points one record to the right of ltrec, even
+ * if there is no record in the tree to the right.
+ */
+ if (ltrec.rm_owner == owner &&
+ ltrec.rm_startblock + ltrec.rm_blockcount == bno) {
+ /*
+ * left edge contiguous, merge into left record.
+ *
+ * ltbno ltlen
+ * orig: |ooooooooo|
+ * adding: |aaaaaaaaa|
+ * result: |rrrrrrrrrrrrrrrrrrr|
+ * bno len
+ */
+ //printk("add left\n");
+ ltrec.rm_blockcount += len;
+ if (gtrec.rm_owner == owner &&
+ bno + len == gtrec.rm_startblock) {
+ //printk("add middle\n");
+ /*
+ * right edge also contiguous, delete right record
+ * and merge into left record.
+ *
+ * ltbno ltlen gtbno gtlen
+ * orig: |ooooooooo| |ooooooooo|
+ * adding: |aaaaaaaaa|
+ * result: |rrrrrrrrrrrrrrrrrrrrrrrrrrrrr|
+ */
+ ltrec.rm_blockcount += gtrec.rm_blockcount;
+ error = xfs_btree_delete(cur, &i);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+ }
+
+ /* point the cursor back to the left record and update */
+ error = xfs_btree_decrement(cur, 0, &have_gt);
+ if (error)
+ goto out_error;
+ error = xfs_rmap_update(cur, <rec);
+ if (error)
+ goto out_error;
+ } else if (gtrec.rm_owner == owner &&
+ bno + len == gtrec.rm_startblock) {
+ /*
+ * right edge contiguous, merge into right record.
+ *
+ * gtbno gtlen
+ * Orig: |ooooooooo|
+ * adding: |aaaaaaaaa|
+ * Result: |rrrrrrrrrrrrrrrrrrr|
+ * bno len
+ */
+ //printk("add right\n");
+ gtrec.rm_startblock = bno;
+ gtrec.rm_blockcount += len;
+ error = xfs_rmap_update(cur, >rec);
+ if (error)
+ goto out_error;
+ } else {
+ //printk("add no match\n");
+ /*
+ * no contiguous edge with identical owner, insert
+ * new record at current cursor position.
+ */
+ cur->bc_rec.r.rm_startblock = bno;
+ cur->bc_rec.r.rm_blockcount = len;
+ cur->bc_rec.r.rm_owner = owner;
+ error = xfs_btree_insert(cur, &i);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+ }
+
trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, oinfo);
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
return 0;
out_error:
trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, oinfo);
+ xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
return error;
}
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 20/58] xfs: add tracepoints for the rmap-mirrors-bmbt functions
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (18 preceding siblings ...)
2015-10-07 4:57 ` [PATCH 19/58] xfs: add an extent to the rmap btree Darrick J. Wong
@ 2015-10-07 4:57 ` Darrick J. Wong
2015-10-07 4:57 ` [PATCH 21/58] xfs: teach rmap_alloc how to deal with our larger rmap btree Darrick J. Wong
` (37 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:57 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_trace.h | 251 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 251 insertions(+)
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 4e7a889..f8ec848 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1719,6 +1719,257 @@ DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent);
DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent_done);
DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent_error);
+/* rmap-mirrors-bmbt traces */
+DECLARE_EVENT_CLASS(xfs_rmap_bmbt3_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *left,
+ struct xfs_bmbt_irec *prev,
+ struct xfs_bmbt_irec *right),
+ TP_ARGS(mp, agno, ino, whichfork, left, prev, right),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_ino_t, ino)
+ __field(int, whichfork)
+ __field(xfs_fileoff_t, l_loff)
+ __field(xfs_fsblock_t, l_poff)
+ __field(xfs_filblks_t, l_len)
+ __field(xfs_fileoff_t, p_loff)
+ __field(xfs_fsblock_t, p_poff)
+ __field(xfs_filblks_t, p_len)
+ __field(xfs_fileoff_t, r_loff)
+ __field(xfs_fsblock_t, r_poff)
+ __field(xfs_filblks_t, r_len)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->ino = ino;
+ __entry->whichfork = whichfork;
+ __entry->l_loff = left->br_startoff;
+ __entry->l_poff = left->br_startblock;
+ __entry->l_len = left->br_blockcount;
+ __entry->p_loff = prev->br_startoff;
+ __entry->p_poff = prev->br_startblock;
+ __entry->p_len = prev->br_blockcount;
+ __entry->r_loff = right->br_startoff;
+ __entry->r_poff = right->br_startblock;
+ __entry->r_len = right->br_blockcount;
+ ),
+ TP_printk("dev %d:%d agno %u ino 0x%llx %s (%llu:%lld:%lld):(%llu:%lld:%lld):(%llu:%lld:%lld)",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->ino,
+ __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
+ __entry->l_poff,
+ __entry->l_len,
+ __entry->l_loff,
+ __entry->p_poff,
+ __entry->p_len,
+ __entry->p_loff,
+ __entry->r_poff,
+ __entry->r_len,
+ __entry->r_loff)
+);
+#define DEFINE_RMAP_BMBT3_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_bmbt3_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ xfs_ino_t ino, \
+ int whichfork, \
+ struct xfs_bmbt_irec *left, \
+ struct xfs_bmbt_irec *prev, \
+ struct xfs_bmbt_irec *right), \
+ TP_ARGS(mp, agno, ino, whichfork, left, prev, right))
+
+DECLARE_EVENT_CLASS(xfs_rmap_bmbt2_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *left,
+ struct xfs_bmbt_irec *prev),
+ TP_ARGS(mp, agno, ino, whichfork, left, prev),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_ino_t, ino)
+ __field(int, whichfork)
+ __field(xfs_fileoff_t, l_loff)
+ __field(xfs_fsblock_t, l_poff)
+ __field(xfs_filblks_t, l_len)
+ __field(xfs_fileoff_t, p_loff)
+ __field(xfs_fsblock_t, p_poff)
+ __field(xfs_filblks_t, p_len)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->ino = ino;
+ __entry->whichfork = whichfork;
+ __entry->l_loff = left->br_startoff;
+ __entry->l_poff = left->br_startblock;
+ __entry->l_len = left->br_blockcount;
+ __entry->p_loff = prev->br_startoff;
+ __entry->p_poff = prev->br_startblock;
+ __entry->p_len = prev->br_blockcount;
+ ),
+ TP_printk("dev %d:%d agno %u ino 0x%llx %s (%llu:%lld:%lld):(%llu:%lld:%lld)",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->ino,
+ __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
+ __entry->l_poff,
+ __entry->l_len,
+ __entry->l_loff,
+ __entry->p_poff,
+ __entry->p_len,
+ __entry->p_loff)
+);
+#define DEFINE_RMAP_BMBT2_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_bmbt2_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ xfs_ino_t ino, \
+ int whichfork, \
+ struct xfs_bmbt_irec *left, \
+ struct xfs_bmbt_irec *prev), \
+ TP_ARGS(mp, agno, ino, whichfork, left, prev))
+
+DECLARE_EVENT_CLASS(xfs_rmap_bmbt1_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *left),
+ TP_ARGS(mp, agno, ino, whichfork, left),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_ino_t, ino)
+ __field(int, whichfork)
+ __field(xfs_fileoff_t, l_loff)
+ __field(xfs_fsblock_t, l_poff)
+ __field(xfs_filblks_t, l_len)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->ino = ino;
+ __entry->whichfork = whichfork;
+ __entry->l_loff = left->br_startoff;
+ __entry->l_poff = left->br_startblock;
+ __entry->l_len = left->br_blockcount;
+ ),
+ TP_printk("dev %d:%d agno %u ino 0x%llx %s (%llu:%lld:%lld)",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->ino,
+ __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
+ __entry->l_poff,
+ __entry->l_len,
+ __entry->l_loff)
+);
+#define DEFINE_RMAP_BMBT1_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_bmbt1_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ xfs_ino_t ino, \
+ int whichfork, \
+ struct xfs_bmbt_irec *left), \
+ TP_ARGS(mp, agno, ino, whichfork, left))
+
+DECLARE_EVENT_CLASS(xfs_rmap_adjust_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *left,
+ long adj),
+ TP_ARGS(mp, agno, ino, whichfork, left, adj),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_ino_t, ino)
+ __field(int, whichfork)
+ __field(xfs_fileoff_t, l_loff)
+ __field(xfs_fsblock_t, l_poff)
+ __field(xfs_filblks_t, l_len)
+ __field(long, adj)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->ino = ino;
+ __entry->whichfork = whichfork;
+ __entry->l_loff = left->br_startoff;
+ __entry->l_poff = left->br_startblock;
+ __entry->l_len = left->br_blockcount;
+ __entry->adj = adj;
+ ),
+ TP_printk("dev %d:%d agno %u ino 0x%llx %s (%llu:%lld:%lld) adj %ld",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->ino,
+ __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
+ __entry->l_poff,
+ __entry->l_len,
+ __entry->l_loff,
+ __entry->adj)
+);
+#define DEFINE_RMAP_ADJUST_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_adjust_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ xfs_ino_t ino, \
+ int whichfork, \
+ struct xfs_bmbt_irec *left, \
+ long adj), \
+ TP_ARGS(mp, agno, ino, whichfork, left, adj))
+
+DECLARE_EVENT_CLASS(xfs_rmapbt_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_agblock_t agbno, xfs_extlen_t len,
+ uint64_t owner, uint64_t offset),
+ TP_ARGS(mp, agno, agbno, len, owner, offset),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, agbno)
+ __field(xfs_extlen_t, len)
+ __field(uint64_t, owner)
+ __field(uint64_t, offset)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->agbno = agbno;
+ __entry->len = len;
+ __entry->owner = owner;
+ __entry->offset = offset;
+ ),
+ TP_printk("dev %d:%d agno %u agbno %u len %u, owner 0x%llx, offset %llu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->agbno,
+ __entry->len,
+ __entry->owner,
+ __entry->offset)
+);
+#define DEFINE_RMAPBT_EVENT(name) \
+DEFINE_EVENT(xfs_rmapbt_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ xfs_agblock_t agbno, xfs_extlen_t len, \
+ uint64_t owner, uint64_t offset), \
+ TP_ARGS(mp, agno, agbno, len, owner, offset))
+
+DEFINE_RMAP_BMBT3_EVENT(xfs_rmap_combine);
+DEFINE_RMAP_BMBT2_EVENT(xfs_rmap_lcombine);
+DEFINE_RMAP_BMBT2_EVENT(xfs_rmap_rcombine);
+DEFINE_RMAP_BMBT1_EVENT(xfs_rmap_insert);
+DEFINE_RMAP_BMBT1_EVENT(xfs_rmap_delete);
+DEFINE_RMAP_ADJUST_EVENT(xfs_rmap_move);
+DEFINE_RMAP_ADJUST_EVENT(xfs_rmap_slide);
+DEFINE_RMAP_ADJUST_EVENT(xfs_rmap_resize);
+DEFINE_RMAPBT_EVENT(xfs_rmapbt_update);
+DEFINE_RMAPBT_EVENT(xfs_rmapbt_insert);
+DEFINE_RMAPBT_EVENT(xfs_rmapbt_delete);
+
DECLARE_EVENT_CLASS(xfs_da_class,
TP_PROTO(struct xfs_da_args *args),
TP_ARGS(args),
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 21/58] xfs: teach rmap_alloc how to deal with our larger rmap btree
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (19 preceding siblings ...)
2015-10-07 4:57 ` [PATCH 20/58] xfs: add tracepoints for the rmap-mirrors-bmbt functions Darrick J. Wong
@ 2015-10-07 4:57 ` Darrick J. Wong
2015-10-07 4:57 ` [PATCH 22/58] xfs: remove an extent from the " Darrick J. Wong
` (36 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:57 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Since reflink will require the rmapbt to mirror bmbt entries exactly, the
xfs_rmap_alloc function is only necessary to handle rmaps for bmbt blocks
and AG btrees. Therefore, enhancing the function (and its extent merging
code) to handle the larger rmap records can be done simply.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_rmap.c | 50 +++++++++++++++++++++++++++++++---------
fs/xfs/libxfs/xfs_rmap_btree.h | 1 +
2 files changed, 40 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index d1b6c82..bc25f9c 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -145,16 +145,34 @@ out_error:
}
/*
- * When we allocate a new block, the first thing we do is add a reference to the
- * extent in the rmap btree. This takes the form of a [agbno, length, owner]
- * record. Newly inserted extents should never overlap with an existing extent
- * in the rmap btree. Hence the insertion is a relatively trivial exercise,
- * involving checking for adjacent records and merging if the new extent is
- * contiguous and has the same owner.
- *
- * Note that we have no MAXEXTLEN limits here when merging as the length in the
- * record has the full 32 bits available and hence a single record can track the
- * entire space in the AG.
+ * A mergeable rmap should have the same owner, cannot be unwritten, and
+ * must be a bmbt rmap if we're asking about a bmbt rmap.
+ */
+static bool
+is_mergeable_rmap(
+ struct xfs_rmap_irec *irec,
+ uint64_t owner,
+ uint64_t offset)
+{
+ if (irec->rm_owner == XFS_RMAP_OWN_NULL)
+ return false;
+ if (irec->rm_owner != owner)
+ return false;
+ if (XFS_RMAP_IS_UNWRITTEN(irec->rm_blockcount))
+ return false;
+ if (XFS_RMAP_IS_ATTR_FORK(offset) ^
+ XFS_RMAP_IS_ATTR_FORK(irec->rm_offset))
+ return false;
+ if (XFS_RMAP_IS_BMBT(offset) ^ XFS_RMAP_IS_BMBT(irec->rm_offset))
+ return false;
+ return true;
+}
+
+/*
+ * When we allocate a new block, the first thing we do is add a reference to
+ * the extent in the rmap btree. This takes the form of a [agbno, length,
+ * owner, offset] record. Flags are encoded in the high bits of the offset
+ * field.
*/
int
xfs_rmap_alloc(
@@ -172,6 +190,8 @@ xfs_rmap_alloc(
int have_gt;
int error = 0;
int i;
+ uint64_t owner;
+ uint64_t offset;
if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
return 0;
@@ -179,12 +199,14 @@ xfs_rmap_alloc(
trace_xfs_rmap_alloc_extent(mp, agno, bno, len, oinfo);
cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+ xfs_owner_info_unpack(oinfo, &owner, &offset);
+ ASSERT(XFS_RMAP_NON_INODE_OWNER(owner) || XFS_RMAP_IS_BMBT(offset));
/*
* For the initial lookup, look for and exact match or the left-adjacent
* record for our insertion point. This will also give us the record for
* start block contiguity tests.
*/
- error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
+ error = xfs_rmap_lookup_le(cur, bno, len, owner, offset, &i);
if (error)
goto out_error;
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
@@ -196,8 +218,11 @@ xfs_rmap_alloc(
//printk("rmalloc ag %d bno 0x%x/0x%x/0x%llx, ltrec 0x%x/0x%x/0x%llx\n",
// agno, bno, len, owner, ltrec.rm_startblock,
// ltrec.rm_blockcount, ltrec.rm_owner);
+ if (!is_mergeable_rmap(<rec, owner, offset))
+ ltrec.rm_owner = XFS_RMAP_OWN_NULL;
XFS_WANT_CORRUPTED_GOTO(mp,
+ ltrec.rm_owner == XFS_RMAP_OWN_NULL ||
ltrec.rm_startblock + ltrec.rm_blockcount <= bno, out_error);
/*
@@ -221,6 +246,8 @@ xfs_rmap_alloc(
} else {
gtrec.rm_owner = XFS_RMAP_OWN_NULL;
}
+ if (!is_mergeable_rmap(>rec, owner, offset))
+ gtrec.rm_owner = XFS_RMAP_OWN_NULL;
/*
* Note: cursor currently points one record to the right of ltrec, even
@@ -291,6 +318,7 @@ xfs_rmap_alloc(
cur->bc_rec.r.rm_startblock = bno;
cur->bc_rec.r.rm_blockcount = len;
cur->bc_rec.r.rm_owner = owner;
+ cur->bc_rec.r.rm_offset = offset;
error = xfs_btree_insert(cur, &i);
if (error)
goto out_error;
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index a5c97f8..0dfc151 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -58,6 +58,7 @@ int xfs_rmap_lookup_eq(struct xfs_btree_cur *cur, xfs_agblock_t bno,
int xfs_rmap_get_rec(struct xfs_btree_cur *cur, struct xfs_rmap_irec *irec,
int *stat);
+/* functions for updating the rmapbt for bmbt blocks and AG btree blocks */
int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
struct xfs_owner_info *oinfo);
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 22/58] xfs: remove an extent from the rmap btree
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (20 preceding siblings ...)
2015-10-07 4:57 ` [PATCH 21/58] xfs: teach rmap_alloc how to deal with our larger rmap btree Darrick J. Wong
@ 2015-10-07 4:57 ` Darrick J. Wong
2015-10-07 4:57 ` [PATCH 23/58] xfs: enhanced " Darrick J. Wong
` (35 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:57 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
Now that we have records in the rmap btree, we need to remove them
when extents are freed. This needs to find the relevant record in
the btree and remove/trim/split it accordingly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_rmap.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 153 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index bc25f9c..92e417d 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -118,6 +118,31 @@ xfs_rmap_get_rec(
return 0;
}
+/*
+ * Find the extent in the rmap btree and remove it.
+ *
+ * The record we find should always span a range greater than or equal to the
+ * the extent being freed. This makes the code simple as, in theory, we do not
+ * have to handle ranges that are split across multiple records as extents that
+ * result in bmap btree extent merges should also result in rmap btree extent
+ * merges. The owner field ensures we don't merge extents from different
+ * structures into the same record, hence this property should always hold true
+ * if we ensure that the rmap btree supports at least the same size maximum
+ * extent as the bmap btree (bmbt MAXEXTLEN is 2^21 blocks at present, rmap
+ * btree record can hold 2^32 blocks in a single extent).
+ *
+ * Special Case #1: when growing the filesystem, we "free" an extent when
+ * growing the last AG. This extent is new space and so it is not tracked as
+ * used space in the btree. The growfs code will pass in an owner of
+ * XFS_RMAP_OWN_NULL to indicate that it expected that there is no owner of this
+ * extent. We verify that - the extent lookup result in a record that does not
+ * overlap.
+ *
+ * Special Case #2: EFIs do not record the owner of the extent, so when
+ * recovering EFIs from the log we pass in XFS_RMAP_OWN_UNKNOWN to tell the rmap
+ * btree to ignore the owner (i.e. wildcard match) so we don't trigger
+ * corruption checks during log recovery.
+ */
int
xfs_rmap_free(
struct xfs_trans *tp,
@@ -128,19 +153,146 @@ xfs_rmap_free(
struct xfs_owner_info *oinfo)
{
struct xfs_mount *mp = tp->t_mountp;
+ struct xfs_btree_cur *cur;
+ struct xfs_rmap_irec ltrec;
int error = 0;
+ int i;
if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
return 0;
trace_xfs_rmap_free_extent(mp, agno, bno, len, oinfo);
- if (1)
+ cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+
+ /*
+ * We should always have a left record because there's a static record
+ * for the AG headers at rm_startblock == 0 created by mkfs/growfs that
+ * will not ever be removed from the tree.
+ */
+ error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+ error = xfs_rmap_get_rec(cur, <rec, &i);
+ if (error)
goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+ /*
+ * For growfs, the incoming extent must be beyond the left record we
+ * just found as it is new space and won't be used by anyone. This is
+ * just a corruption check as we don't actually do anything with this
+ * extent.
+ */
+ if (owner == XFS_RMAP_OWN_NULL) {
+ XFS_WANT_CORRUPTED_GOTO(mp, bno > ltrec.rm_startblock +
+ ltrec.rm_blockcount, out_error);
+ goto out_done;
+ }
+
+/*
+ if (owner != ltrec.rm_owner ||
+ bno > ltrec.rm_startblock + ltrec.rm_blockcount)
+ */
+ //printk("rmfree ag %d bno 0x%x/0x%x/0x%llx, ltrec 0x%x/0x%x/0x%llx\n",
+ // agno, bno, len, owner, ltrec.rm_startblock,
+ // ltrec.rm_blockcount, ltrec.rm_owner);
+
+ /* make sure the extent we found covers the entire freeing range. */
+ XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno, out_error);
+ XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_blockcount >= len, out_error);
+ XFS_WANT_CORRUPTED_GOTO(mp,
+ bno <= ltrec.rm_startblock + ltrec.rm_blockcount, out_error);
+
+ /* make sure the owner matches what we expect to find in the tree */
+ XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner ||
+ (owner < XFS_RMAP_OWN_NULL &&
+ owner >= XFS_RMAP_OWN_MIN), out_error);
+
+ if (ltrec.rm_startblock == bno && ltrec.rm_blockcount == len) {
+ //printk("remove exact\n");
+ /* exact match, simply remove the record from rmap tree */
+ error = xfs_btree_delete(cur, &i);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+ } else if (ltrec.rm_startblock == bno) {
+ //printk("remove left\n");
+ /*
+ * overlap left hand side of extent: move the start, trim the
+ * length and update the current record.
+ *
+ * ltbno ltlen
+ * Orig: |oooooooooooooooooooo|
+ * Freeing: |fffffffff|
+ * Result: |rrrrrrrrrr|
+ * bno len
+ */
+ ltrec.rm_startblock += len;
+ ltrec.rm_blockcount -= len;
+ error = xfs_rmap_update(cur, <rec);
+ if (error)
+ goto out_error;
+ } else if (ltrec.rm_startblock + ltrec.rm_blockcount == bno + len) {
+ //printk("remove right\n");
+ /*
+ * overlap right hand side of extent: trim the length and update
+ * the current record.
+ *
+ * ltbno ltlen
+ * Orig: |oooooooooooooooooooo|
+ * Freeing: |fffffffff|
+ * Result: |rrrrrrrrrr|
+ * bno len
+ */
+ ltrec.rm_blockcount -= len;
+ error = xfs_rmap_update(cur, <rec);
+ if (error)
+ goto out_error;
+ } else {
+
+ /*
+ * overlap middle of extent: trim the length of the existing
+ * record to the length of the new left-extent size, increment
+ * the insertion position so we can insert a new record
+ * containing the remaining right-extent space.
+ *
+ * ltbno ltlen
+ * Orig: |oooooooooooooooooooo|
+ * Freeing: |fffffffff|
+ * Result: |rrrrr| |rrrr|
+ * bno len
+ */
+ xfs_extlen_t orig_len = ltrec.rm_blockcount;
+ //printk("remove middle\n");
+
+ ltrec.rm_blockcount = bno - ltrec.rm_startblock;;
+ error = xfs_rmap_update(cur, <rec);
+ if (error)
+ goto out_error;
+
+ error = xfs_btree_increment(cur, 0, &i);
+ if (error)
+ goto out_error;
+
+ cur->bc_rec.r.rm_startblock = bno + len;
+ cur->bc_rec.r.rm_blockcount = orig_len - len -
+ ltrec.rm_blockcount;
+ cur->bc_rec.r.rm_owner = ltrec.rm_owner;
+ error = xfs_btree_insert(cur, &i);
+ if (error)
+ goto out_error;
+ }
+
+out_done:
trace_xfs_rmap_free_extent_done(mp, agno, bno, len, oinfo);
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
return 0;
out_error:
trace_xfs_rmap_free_extent_error(mp, agno, bno, len, oinfo);
+ xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
return error;
}
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 23/58] xfs: enhanced remove an extent from the rmap btree
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (21 preceding siblings ...)
2015-10-07 4:57 ` [PATCH 22/58] xfs: remove an extent from the " Darrick J. Wong
@ 2015-10-07 4:57 ` Darrick J. Wong
2015-10-07 4:57 ` [PATCH 24/58] xfs: add rmap btree insert and delete helpers Darrick J. Wong
` (34 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:57 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Simplify the rmap extent removal since we never merge extents or anything
fancy like that.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_rmap.c | 49 ++++++++++++++++++++++++++++++----------------
1 file changed, 32 insertions(+), 17 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 92e417d..073fb96 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -121,15 +121,8 @@ xfs_rmap_get_rec(
/*
* Find the extent in the rmap btree and remove it.
*
- * The record we find should always span a range greater than or equal to the
- * the extent being freed. This makes the code simple as, in theory, we do not
- * have to handle ranges that are split across multiple records as extents that
- * result in bmap btree extent merges should also result in rmap btree extent
- * merges. The owner field ensures we don't merge extents from different
- * structures into the same record, hence this property should always hold true
- * if we ensure that the rmap btree supports at least the same size maximum
- * extent as the bmap btree (bmbt MAXEXTLEN is 2^21 blocks at present, rmap
- * btree record can hold 2^32 blocks in a single extent).
+ * The record we find should always be an exact match for the extent that we're
+ * looking for, since we insert them into the btree without modification.
*
* Special Case #1: when growing the filesystem, we "free" an extent when
* growing the last AG. This extent is new space and so it is not tracked as
@@ -155,8 +148,11 @@ xfs_rmap_free(
struct xfs_mount *mp = tp->t_mountp;
struct xfs_btree_cur *cur;
struct xfs_rmap_irec ltrec;
+ uint64_t ltoff;
int error = 0;
int i;
+ uint64_t owner;
+ uint64_t offset;
if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
return 0;
@@ -164,12 +160,15 @@ xfs_rmap_free(
trace_xfs_rmap_free_extent(mp, agno, bno, len, oinfo);
cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+ xfs_owner_info_unpack(oinfo, &owner, &offset);
+ ASSERT(XFS_RMAP_NON_INODE_OWNER(owner) || XFS_RMAP_IS_BMBT(offset));
+ ltoff = ltrec.rm_offset & ~XFS_RMAP_OFF_BMBT;
/*
* We should always have a left record because there's a static record
* for the AG headers at rm_startblock == 0 created by mkfs/growfs that
* will not ever be removed from the tree.
*/
- error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
+ error = xfs_rmap_lookup_le(cur, bno, len, owner, offset, &i);
if (error)
goto out_error;
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
@@ -200,15 +199,30 @@ xfs_rmap_free(
// ltrec.rm_blockcount, ltrec.rm_owner);
/* make sure the extent we found covers the entire freeing range. */
- XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno, out_error);
- XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_blockcount >= len, out_error);
- XFS_WANT_CORRUPTED_GOTO(mp,
- bno <= ltrec.rm_startblock + ltrec.rm_blockcount, out_error);
+ XFS_WANT_CORRUPTED_GOTO(mp, !XFS_RMAP_IS_UNWRITTEN(ltrec.rm_blockcount),
+ out_error);
+ XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno &&
+ ltrec.rm_startblock + XFS_RMAP_LEN(ltrec.rm_blockcount) >=
+ bno + len, out_error);
/* make sure the owner matches what we expect to find in the tree */
XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner ||
- (owner < XFS_RMAP_OWN_NULL &&
- owner >= XFS_RMAP_OWN_MIN), out_error);
+ XFS_RMAP_NON_INODE_OWNER(owner), out_error);
+
+ /* check the offset, if necessary */
+ if (!XFS_RMAP_NON_INODE_OWNER(owner)) {
+ if (XFS_RMAP_IS_BMBT(offset)) {
+ XFS_WANT_CORRUPTED_GOTO(mp,
+ XFS_RMAP_IS_BMBT(ltrec.rm_offset),
+ out_error);
+ } else {
+ XFS_WANT_CORRUPTED_GOTO(mp,
+ ltrec.rm_offset <= offset, out_error);
+ XFS_WANT_CORRUPTED_GOTO(mp,
+ offset <= ltoff + ltrec.rm_blockcount,
+ out_error);
+ }
+ }
if (ltrec.rm_startblock == bno && ltrec.rm_blockcount == len) {
//printk("remove exact\n");
@@ -267,7 +281,7 @@ xfs_rmap_free(
xfs_extlen_t orig_len = ltrec.rm_blockcount;
//printk("remove middle\n");
- ltrec.rm_blockcount = bno - ltrec.rm_startblock;;
+ ltrec.rm_blockcount = bno - ltrec.rm_startblock;
error = xfs_rmap_update(cur, <rec);
if (error)
goto out_error;
@@ -280,6 +294,7 @@ xfs_rmap_free(
cur->bc_rec.r.rm_blockcount = orig_len - len -
ltrec.rm_blockcount;
cur->bc_rec.r.rm_owner = ltrec.rm_owner;
+ cur->bc_rec.r.rm_offset = offset;
error = xfs_btree_insert(cur, &i);
if (error)
goto out_error;
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 24/58] xfs: add rmap btree insert and delete helpers
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (22 preceding siblings ...)
2015-10-07 4:57 ` [PATCH 23/58] xfs: enhanced " Darrick J. Wong
@ 2015-10-07 4:57 ` Darrick J. Wong
2015-10-07 4:57 ` [PATCH 25/58] xfs: bmap btree changes should update rmap btree Darrick J. Wong
` (33 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:57 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Add a couple of helper functions to encapsulate rmap btree insert and
delete operations. Add tracepoints to the update function.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_rmap.c | 62 ++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_rmap_btree.h | 2 +
2 files changed, 64 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 073fb96..045f9a7 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -88,6 +88,10 @@ xfs_rmap_update(
{
union xfs_btree_rec rec;
+ trace_xfs_rmapbt_update(cur->bc_mp, cur->bc_private.a.agno,
+ irec->rm_startblock, irec->rm_blockcount,
+ irec->rm_owner, irec->rm_offset);
+
rec.rmap.rm_startblock = cpu_to_be32(irec->rm_startblock);
rec.rmap.rm_blockcount = cpu_to_be32(irec->rm_blockcount);
rec.rmap.rm_owner = cpu_to_be64(irec->rm_owner);
@@ -95,6 +99,64 @@ xfs_rmap_update(
return xfs_btree_update(cur, &rec);
}
+int
+xfs_rmapbt_insert(
+ struct xfs_btree_cur *rcur,
+ xfs_agblock_t agbno,
+ xfs_extlen_t len,
+ uint64_t owner,
+ uint64_t offset)
+{
+ int i;
+ int error;
+
+ trace_xfs_rmapbt_insert(rcur->bc_mp, rcur->bc_private.a.agno, agbno,
+ len, owner, offset);
+
+ error = xfs_rmap_lookup_eq(rcur, agbno, len, owner, offset, &i);
+ if (error)
+ goto done;
+ XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 0, done);
+
+ rcur->bc_rec.r.rm_startblock = agbno;
+ rcur->bc_rec.r.rm_blockcount = len;
+ rcur->bc_rec.r.rm_owner = owner;
+ rcur->bc_rec.r.rm_offset = offset;
+ error = xfs_btree_insert(rcur, &i);
+ if (error)
+ goto done;
+ XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+done:
+ return error;
+}
+
+STATIC int
+xfs_rmapbt_delete(
+ struct xfs_btree_cur *rcur,
+ xfs_agblock_t agbno,
+ xfs_extlen_t len,
+ uint64_t owner,
+ uint64_t offset)
+{
+ int i;
+ int error;
+
+ trace_xfs_rmapbt_delete(rcur->bc_mp, rcur->bc_private.a.agno, agbno,
+ len, owner, offset);
+
+ error = xfs_rmap_lookup_eq(rcur, agbno, len, owner, offset, &i);
+ if (error)
+ goto done;
+ XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+
+ error = xfs_btree_delete(rcur, &i);
+ if (error)
+ goto done;
+ XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+done:
+ return error;
+}
+
/*
* Get the data from the pointed-to record.
*/
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index 0dfc151..d7c9722 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -55,6 +55,8 @@ int xfs_rmap_lookup_le(struct xfs_btree_cur *cur, xfs_agblock_t bno,
xfs_extlen_t len, uint64_t owner, uint64_t offset, int *stat);
int xfs_rmap_lookup_eq(struct xfs_btree_cur *cur, xfs_agblock_t bno,
xfs_extlen_t len, uint64_t owner, uint64_t offset, int *stat);
+int xfs_rmapbt_insert(struct xfs_btree_cur *rcur, xfs_agblock_t agbno,
+ xfs_extlen_t len, uint64_t owner, uint64_t offset);
int xfs_rmap_get_rec(struct xfs_btree_cur *cur, struct xfs_rmap_irec *irec,
int *stat);
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 25/58] xfs: bmap btree changes should update rmap btree
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (23 preceding siblings ...)
2015-10-07 4:57 ` [PATCH 24/58] xfs: add rmap btree insert and delete helpers Darrick J. Wong
@ 2015-10-07 4:57 ` Darrick J. Wong
2015-10-21 21:39 ` Darrick J. Wong
2015-10-07 4:57 ` [PATCH 26/58] xfs: add rmap btree geometry feature flag Darrick J. Wong
` (32 subsequent siblings)
57 siblings, 1 reply; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:57 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Any update to a file's bmap should make the corresponding change to
the rmapbt. On a reflink filesystem, this is absolutely required
because a given (file data) physical block can have multiple owners
and the only sane way to find an rmap given a bmap is if there is a
1:1 correspondence.
(At some point we can optimize this for non-reflink filesystems
because regular merge still works there.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 262 ++++++++++++++++++++++++++++++++++-
fs/xfs/libxfs/xfs_bmap.h | 1
fs/xfs/libxfs/xfs_rmap.c | 296 ++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_rmap_btree.h | 20 +++
4 files changed, 568 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index e740ef5..81f0ae0 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -45,6 +45,7 @@
#include "xfs_symlink.h"
#include "xfs_attr_leaf.h"
#include "xfs_filestream.h"
+#include "xfs_rmap_btree.h"
kmem_zone_t *xfs_bmap_free_item_zone;
@@ -1861,6 +1862,10 @@ xfs_bmap_add_extent_delay_real(
if (error)
goto done;
}
+ error = xfs_rmap_combine(bma->rcur, bma->ip->i_ino,
+ XFS_DATA_FORK, &LEFT, &RIGHT, &PREV);
+ if (error)
+ goto done;
break;
case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG:
@@ -1893,6 +1898,10 @@ xfs_bmap_add_extent_delay_real(
if (error)
goto done;
}
+ error = xfs_rmap_lcombine(bma->rcur, bma->ip->i_ino,
+ XFS_DATA_FORK, &LEFT, &PREV);
+ if (error)
+ goto done;
break;
case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
@@ -1924,6 +1933,10 @@ xfs_bmap_add_extent_delay_real(
if (error)
goto done;
}
+ error = xfs_rmap_rcombine(bma->rcur, bma->ip->i_ino,
+ XFS_DATA_FORK, &RIGHT, &PREV, new);
+ if (error)
+ goto done;
break;
case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING:
@@ -1953,6 +1966,10 @@ xfs_bmap_add_extent_delay_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
}
+ error = xfs_rmap_insert(bma->rcur, bma->ip->i_ino,
+ XFS_DATA_FORK, new);
+ if (error)
+ goto done;
break;
case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG:
@@ -1988,6 +2005,10 @@ xfs_bmap_add_extent_delay_real(
if (error)
goto done;
}
+ error = xfs_rmap_lcombine(bma->rcur, bma->ip->i_ino,
+ XFS_DATA_FORK, &LEFT, new);
+ if (error)
+ goto done;
da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp),
startblockval(PREV.br_startblock));
xfs_bmbt_set_startblock(ep, nullstartblock(da_new));
@@ -2023,6 +2044,10 @@ xfs_bmap_add_extent_delay_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
}
+ error = xfs_rmap_insert(bma->rcur, bma->ip->i_ino,
+ XFS_DATA_FORK, new);
+ if (error)
+ goto done;
if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
@@ -2071,6 +2096,8 @@ xfs_bmap_add_extent_delay_real(
if (error)
goto done;
}
+ error = xfs_rmap_rcombine(bma->rcur, bma->ip->i_ino,
+ XFS_DATA_FORK, &RIGHT, new, new);
da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp),
startblockval(PREV.br_startblock));
@@ -2107,6 +2134,10 @@ xfs_bmap_add_extent_delay_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
}
+ error = xfs_rmap_insert(bma->rcur, bma->ip->i_ino,
+ XFS_DATA_FORK, new);
+ if (error)
+ goto done;
if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
@@ -2176,6 +2207,10 @@ xfs_bmap_add_extent_delay_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
}
+ error = xfs_rmap_insert(bma->rcur, bma->ip->i_ino,
+ XFS_DATA_FORK, new);
+ if (error)
+ goto done;
if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
@@ -2271,7 +2306,8 @@ xfs_bmap_add_extent_unwritten_real(
xfs_bmbt_irec_t *new, /* new data to add to file extents */
xfs_fsblock_t *first, /* pointer to firstblock variable */
xfs_bmap_free_t *flist, /* list of extents to be freed */
- int *logflagsp) /* inode logging flags */
+ int *logflagsp, /* inode logging flags */
+ struct xfs_btree_cur *rcur)/* rmap btree pointer */
{
xfs_btree_cur_t *cur; /* btree cursor */
xfs_bmbt_rec_host_t *ep; /* extent entry for idx */
@@ -2417,6 +2453,10 @@ xfs_bmap_add_extent_unwritten_real(
RIGHT.br_blockcount, LEFT.br_state)))
goto done;
}
+ error = xfs_rmap_combine(rcur, ip->i_ino,
+ XFS_DATA_FORK, &LEFT, &RIGHT, &PREV);
+ if (error)
+ goto done;
break;
case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG:
@@ -2454,6 +2494,10 @@ xfs_bmap_add_extent_unwritten_real(
LEFT.br_state)))
goto done;
}
+ error = xfs_rmap_lcombine(rcur, ip->i_ino,
+ XFS_DATA_FORK, &LEFT, &PREV);
+ if (error)
+ goto done;
break;
case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
@@ -2489,6 +2533,10 @@ xfs_bmap_add_extent_unwritten_real(
newext)))
goto done;
}
+ error = xfs_rmap_rcombine(rcur, ip->i_ino,
+ XFS_DATA_FORK, &RIGHT, &PREV, new);
+ if (error)
+ goto done;
break;
case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING:
@@ -2562,6 +2610,14 @@ xfs_bmap_add_extent_unwritten_real(
if (error)
goto done;
}
+ error = xfs_rmap_move(rcur, ip->i_ino,
+ XFS_DATA_FORK, &PREV, new->br_blockcount);
+ if (error)
+ goto done;
+ error = xfs_rmap_resize(rcur, ip->i_ino,
+ XFS_DATA_FORK, &LEFT, -new->br_blockcount);
+ if (error)
+ goto done;
break;
case BMAP_LEFT_FILLING:
@@ -2600,6 +2656,14 @@ xfs_bmap_add_extent_unwritten_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
}
+ error = xfs_rmap_move(rcur, ip->i_ino,
+ XFS_DATA_FORK, &PREV, new->br_blockcount);
+ if (error)
+ goto done;
+ error = xfs_rmap_insert(rcur, ip->i_ino,
+ XFS_DATA_FORK, new);
+ if (error)
+ goto done;
break;
case BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
@@ -2642,6 +2706,14 @@ xfs_bmap_add_extent_unwritten_real(
newext)))
goto done;
}
+ error = xfs_rmap_resize(rcur, ip->i_ino,
+ XFS_DATA_FORK, &PREV, -new->br_blockcount);
+ if (error)
+ goto done;
+ error = xfs_rmap_move(rcur, ip->i_ino,
+ XFS_DATA_FORK, &RIGHT, -new->br_blockcount);
+ if (error)
+ goto done;
break;
case BMAP_RIGHT_FILLING:
@@ -2682,6 +2754,14 @@ xfs_bmap_add_extent_unwritten_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
}
+ error = xfs_rmap_resize(rcur, ip->i_ino,
+ XFS_DATA_FORK, &PREV, -new->br_blockcount);
+ if (error)
+ goto done;
+ error = xfs_rmap_insert(rcur, ip->i_ino,
+ XFS_DATA_FORK, new);
+ if (error)
+ goto done;
break;
case 0:
@@ -2743,6 +2823,17 @@ xfs_bmap_add_extent_unwritten_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
}
+ error = xfs_rmap_resize(rcur, ip->i_ino, XFS_DATA_FORK, &PREV,
+ new->br_startoff - PREV.br_startoff -
+ PREV.br_blockcount);
+ if (error)
+ goto done;
+ error = xfs_rmap_insert(rcur, ip->i_ino, XFS_DATA_FORK, new);
+ if (error)
+ goto done;
+ error = xfs_rmap_insert(rcur, ip->i_ino, XFS_DATA_FORK, &r[1]);
+ if (error)
+ goto done;
break;
case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG:
@@ -2946,6 +3037,7 @@ xfs_bmap_add_extent_hole_real(
int rval=0; /* return value (logging flags) */
int state; /* state bits, accessed thru macros */
struct xfs_mount *mp;
+ struct xfs_bmbt_irec prev; /* fake previous extent entry */
mp = bma->tp ? bma->tp->t_mountp : NULL;
ifp = XFS_IFORK_PTR(bma->ip, whichfork);
@@ -3053,6 +3145,12 @@ xfs_bmap_add_extent_hole_real(
if (error)
goto done;
}
+ prev = *new;
+ prev.br_startblock = nullstartblock(0);
+ error = xfs_rmap_combine(bma->rcur, bma->ip->i_ino,
+ whichfork, &left, &right, &prev);
+ if (error)
+ goto done;
break;
case BMAP_LEFT_CONTIG:
@@ -3085,6 +3183,10 @@ xfs_bmap_add_extent_hole_real(
if (error)
goto done;
}
+ error = xfs_rmap_resize(bma->rcur, bma->ip->i_ino,
+ whichfork, &left, new->br_blockcount);
+ if (error)
+ goto done;
break;
case BMAP_RIGHT_CONTIG:
@@ -3119,6 +3221,10 @@ xfs_bmap_add_extent_hole_real(
if (error)
goto done;
}
+ error = xfs_rmap_move(bma->rcur, bma->ip->i_ino,
+ whichfork, &right, -new->br_blockcount);
+ if (error)
+ goto done;
break;
case 0:
@@ -3147,6 +3253,10 @@ xfs_bmap_add_extent_hole_real(
goto done;
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
}
+ error = xfs_rmap_insert(bma->rcur, bma->ip->i_ino,
+ whichfork, new);
+ if (error)
+ goto done;
break;
}
@@ -4276,6 +4386,59 @@ xfs_bmapi_delay(
return 0;
}
+static int
+alloc_rcur(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_btree_cur **pcur,
+ xfs_fsblock_t fsblock)
+{
+ struct xfs_btree_cur *cur = *pcur;
+ struct xfs_buf *agbp;
+ int error;
+ xfs_agnumber_t agno;
+
+ agno = XFS_FSB_TO_AGNO(mp, fsblock);
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return 0;
+ if (cur && cur->bc_private.a.agno == agno)
+ return 0;
+ if (isnullstartblock(fsblock))
+ return 0;
+
+ error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
+ if (error)
+ return error;
+
+ cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+ if (!cur) {
+ xfs_trans_brelse(tp, agbp);
+ return -ENOMEM;
+ }
+
+ *pcur = cur;
+ return 0;
+}
+
+static void
+free_rcur(
+ struct xfs_btree_cur **pcur,
+ int bt_error)
+{
+ struct xfs_btree_cur *cur = *pcur;
+ struct xfs_buf *agbp;
+ struct xfs_trans *tp;
+
+ if (cur == NULL)
+ return;
+
+ agbp = cur->bc_private.a.agbp;
+ tp = cur->bc_tp;
+ xfs_btree_del_cursor(cur, bt_error);
+ xfs_trans_brelse(tp, agbp);
+
+ *pcur = NULL;
+}
static int
xfs_bmapi_allocate(
@@ -4368,6 +4531,10 @@ xfs_bmapi_allocate(
xfs_sb_version_hasextflgbit(&mp->m_sb))
bma->got.br_state = XFS_EXT_UNWRITTEN;
+ error = alloc_rcur(mp, bma->tp, &bma->rcur, bma->got.br_startblock);
+ if (error)
+ return error;
+
if (bma->wasdel)
error = xfs_bmap_add_extent_delay_real(bma);
else
@@ -4429,9 +4596,14 @@ xfs_bmapi_convert_unwritten(
mval->br_state = (mval->br_state == XFS_EXT_UNWRITTEN)
? XFS_EXT_NORM : XFS_EXT_UNWRITTEN;
+ error = alloc_rcur(bma->ip->i_mount, bma->tp, &bma->rcur,
+ mval->br_startblock);
+ if (error)
+ return error;
+
error = xfs_bmap_add_extent_unwritten_real(bma->tp, bma->ip, &bma->idx,
&bma->cur, mval, bma->firstblock, bma->flist,
- &tmp_logflags);
+ &tmp_logflags, bma->rcur);
/*
* Log the inode core unconditionally in the unwritten extent conversion
* path because the conversion might not have done so (e.g., if the
@@ -4633,6 +4805,7 @@ xfs_bmapi_write(
}
*nmap = n;
+ free_rcur(&bma.rcur, XFS_BTREE_NOERROR);
/*
* Transform from btree to extents, give it cur.
*/
@@ -4652,6 +4825,7 @@ xfs_bmapi_write(
XFS_IFORK_MAXEXT(ip, whichfork));
error = 0;
error0:
+ free_rcur(&bma.rcur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
/*
* Log everything. Do this after conversion, there's no point in
* logging the extent records if we've converted to btree format.
@@ -4704,7 +4878,8 @@ xfs_bmap_del_extent(
xfs_btree_cur_t *cur, /* if null, not a btree */
xfs_bmbt_irec_t *del, /* data to remove from extents */
int *logflagsp, /* inode logging flags */
- int whichfork) /* data or attr fork */
+ int whichfork, /* data or attr fork */
+ struct xfs_btree_cur *rcur) /* rmap btree */
{
xfs_filblks_t da_new; /* new delay-alloc indirect blocks */
xfs_filblks_t da_old; /* old delay-alloc indirect blocks */
@@ -4822,6 +4997,9 @@ xfs_bmap_del_extent(
XFS_IFORK_NEXT_SET(ip, whichfork,
XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
flags |= XFS_ILOG_CORE;
+ error = xfs_rmap_delete(rcur, ip->i_ino, whichfork, &got);
+ if (error)
+ goto done;
if (!cur) {
flags |= xfs_ilog_fext(whichfork);
break;
@@ -4849,6 +5027,10 @@ xfs_bmap_del_extent(
}
xfs_bmbt_set_startblock(ep, del_endblock);
trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
+ error = xfs_rmap_move(rcur, ip->i_ino, whichfork,
+ &got, del->br_blockcount);
+ if (error)
+ goto done;
if (!cur) {
flags |= xfs_ilog_fext(whichfork);
break;
@@ -4875,6 +5057,10 @@ xfs_bmap_del_extent(
break;
}
trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
+ error = xfs_rmap_resize(rcur, ip->i_ino, whichfork,
+ &got, -del->br_blockcount);
+ if (error)
+ goto done;
if (!cur) {
flags |= xfs_ilog_fext(whichfork);
break;
@@ -4900,6 +5086,15 @@ xfs_bmap_del_extent(
if (!delay) {
new.br_startblock = del_endblock;
flags |= XFS_ILOG_CORE;
+ error = xfs_rmap_resize(rcur, ip->i_ino,
+ whichfork, &got,
+ temp - got.br_blockcount);
+ if (error)
+ goto done;
+ error = xfs_rmap_insert(rcur, ip->i_ino,
+ whichfork, &new);
+ if (error)
+ goto done;
if (cur) {
if ((error = xfs_bmbt_update(cur,
got.br_startoff,
@@ -5052,6 +5247,7 @@ xfs_bunmapi(
int wasdel; /* was a delayed alloc extent */
int whichfork; /* data or attribute fork */
xfs_fsblock_t sum;
+ struct xfs_btree_cur *rcur = NULL; /* rmap btree */
trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_);
@@ -5136,6 +5332,11 @@ xfs_bunmapi(
got.br_startoff + got.br_blockcount - 1);
if (bno < start)
break;
+
+ error = alloc_rcur(mp, tp, &rcur, got.br_startblock);
+ if (error)
+ goto error0;
+
/*
* Then deal with the (possibly delayed) allocated space
* we found.
@@ -5195,7 +5396,7 @@ xfs_bunmapi(
del.br_state = XFS_EXT_UNWRITTEN;
error = xfs_bmap_add_extent_unwritten_real(tp, ip,
&lastx, &cur, &del, firstblock, flist,
- &logflags);
+ &logflags, rcur);
if (error)
goto error0;
goto nodelete;
@@ -5253,7 +5454,8 @@ xfs_bunmapi(
lastx--;
error = xfs_bmap_add_extent_unwritten_real(tp,
ip, &lastx, &cur, &prev,
- firstblock, flist, &logflags);
+ firstblock, flist, &logflags,
+ rcur);
if (error)
goto error0;
goto nodelete;
@@ -5262,7 +5464,8 @@ xfs_bunmapi(
del.br_state = XFS_EXT_UNWRITTEN;
error = xfs_bmap_add_extent_unwritten_real(tp,
ip, &lastx, &cur, &del,
- firstblock, flist, &logflags);
+ firstblock, flist, &logflags,
+ rcur);
if (error)
goto error0;
goto nodelete;
@@ -5315,7 +5518,7 @@ xfs_bunmapi(
goto error0;
}
error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del,
- &tmp_logflags, whichfork);
+ &tmp_logflags, whichfork, rcur);
logflags |= tmp_logflags;
if (error)
goto error0;
@@ -5339,6 +5542,7 @@ nodelete:
}
*done = bno == (xfs_fileoff_t)-1 || bno < start || lastx < 0;
+ free_rcur(&rcur, XFS_BTREE_NOERROR);
/*
* Convert to a btree if necessary.
*/
@@ -5366,6 +5570,7 @@ nodelete:
*/
error = 0;
error0:
+ free_rcur(&rcur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
/*
* Log everything. Do this after conversion, there's no point in
* logging the extent records if we've converted to btree format.
@@ -5438,7 +5643,8 @@ xfs_bmse_merge(
struct xfs_bmbt_rec_host *gotp, /* extent to shift */
struct xfs_bmbt_rec_host *leftp, /* preceding extent */
struct xfs_btree_cur *cur,
- int *logflags) /* output */
+ int *logflags, /* output */
+ struct xfs_btree_cur *rcur) /* rmap btree */
{
struct xfs_bmbt_irec got;
struct xfs_bmbt_irec left;
@@ -5469,6 +5675,13 @@ xfs_bmse_merge(
XFS_IFORK_NEXT_SET(ip, whichfork,
XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
*logflags |= XFS_ILOG_CORE;
+ error = xfs_rmap_resize(rcur, ip->i_ino, whichfork, &left,
+ blockcount - left.br_blockcount);
+ if (error)
+ return error;
+ error = xfs_rmap_delete(rcur, ip->i_ino, whichfork, &got);
+ if (error)
+ return error;
if (!cur) {
*logflags |= XFS_ILOG_DEXT;
return 0;
@@ -5511,7 +5724,8 @@ xfs_bmse_shift_one(
struct xfs_bmbt_rec_host *gotp,
struct xfs_btree_cur *cur,
int *logflags,
- enum shift_direction direction)
+ enum shift_direction direction,
+ struct xfs_btree_cur *rcur)
{
struct xfs_ifork *ifp;
struct xfs_mount *mp;
@@ -5561,7 +5775,7 @@ xfs_bmse_shift_one(
offset_shift_fsb)) {
return xfs_bmse_merge(ip, whichfork, offset_shift_fsb,
*current_ext, gotp, adj_irecp,
- cur, logflags);
+ cur, logflags, rcur);
}
} else {
startoff = got.br_startoff + offset_shift_fsb;
@@ -5598,6 +5812,10 @@ update_current_ext:
(*current_ext)--;
xfs_bmbt_set_startoff(gotp, startoff);
*logflags |= XFS_ILOG_CORE;
+ error = xfs_rmap_slide(rcur, ip->i_ino, whichfork,
+ &got, startoff - got.br_startoff);
+ if (error)
+ return error;
if (!cur) {
*logflags |= XFS_ILOG_DEXT;
return 0;
@@ -5649,6 +5867,7 @@ xfs_bmap_shift_extents(
int error = 0;
int whichfork = XFS_DATA_FORK;
int logflags = 0;
+ struct xfs_btree_cur *rcur = NULL;
if (unlikely(XFS_TEST_ERROR(
(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
@@ -5737,9 +5956,14 @@ xfs_bmap_shift_extents(
}
while (nexts++ < num_exts) {
+ xfs_bmbt_get_all(gotp, &got);
+ error = alloc_rcur(mp, tp, &rcur, got.br_startblock);
+ if (error)
+ return error;
+
error = xfs_bmse_shift_one(ip, whichfork, offset_shift_fsb,
¤t_ext, gotp, cur, &logflags,
- direction);
+ direction, rcur);
if (error)
goto del_cursor;
/*
@@ -5765,6 +5989,7 @@ xfs_bmap_shift_extents(
}
del_cursor:
+ free_rcur(&rcur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
if (cur)
xfs_btree_del_cursor(cur,
error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
@@ -5801,6 +6026,7 @@ xfs_bmap_split_extent_at(
int error = 0;
int logflags = 0;
int i = 0;
+ struct xfs_btree_cur *rcur = NULL;
if (unlikely(XFS_TEST_ERROR(
(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
@@ -5895,6 +6121,18 @@ xfs_bmap_split_extent_at(
XFS_WANT_CORRUPTED_GOTO(mp, i == 1, del_cursor);
}
+ /* update rmapbt */
+ error = alloc_rcur(mp, tp, &rcur, new.br_startblock);
+ if (error)
+ goto del_cursor;
+ error = xfs_rmap_resize(rcur, ip->i_ino, whichfork, &got, -gotblkcnt);
+ if (error)
+ goto del_cursor;
+ error = xfs_rmap_insert(rcur, ip->i_ino, whichfork, &new);
+ if (error)
+ goto del_cursor;
+ free_rcur(&rcur, XFS_BTREE_NOERROR);
+
/*
* Convert to a btree if necessary.
*/
@@ -5908,6 +6146,8 @@ xfs_bmap_split_extent_at(
}
del_cursor:
+ free_rcur(&rcur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+
if (cur) {
cur->bc_private.b.allocated = 0;
xfs_btree_del_cursor(cur,
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 89fa3dd..59f26cf 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -56,6 +56,7 @@ struct xfs_bmalloca {
bool aeof; /* allocated space at eof */
bool conv; /* overwriting unwritten extents */
int flags;
+ struct xfs_btree_cur *rcur; /* rmap btree cursor */
};
/*
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 045f9a7..d821b1a 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -563,3 +563,299 @@ out_error:
xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
return error;
}
+
+/* Encode logical offset for a rmapbt record */
+STATIC uint64_t
+b2r_off(
+ int whichfork,
+ xfs_fileoff_t off)
+{
+ uint64_t x;
+
+ x = off;
+ if (whichfork == XFS_ATTR_FORK)
+ x |= XFS_RMAP_OFF_ATTR;
+ return x;
+}
+
+/* Encode blockcount for a rmapbt record */
+STATIC xfs_extlen_t
+b2r_len(
+ struct xfs_bmbt_irec *irec)
+{
+ xfs_extlen_t x;
+
+ x = irec->br_blockcount;
+ if (irec->br_state == XFS_EXT_UNWRITTEN)
+ x |= XFS_RMAP_LEN_UNWRITTEN;
+ return x;
+}
+
+/* Combine two adjacent rmap extents */
+int
+xfs_rmap_combine(
+ struct xfs_btree_cur *rcur,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *LEFT,
+ struct xfs_bmbt_irec *RIGHT,
+ struct xfs_bmbt_irec *PREV)
+{
+ int error;
+
+ if (!rcur)
+ return 0;
+
+ trace_xfs_rmap_combine(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+ whichfork, LEFT, PREV, RIGHT);
+
+ /* Delete right rmap */
+ error = xfs_rmapbt_delete(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp, RIGHT->br_startblock),
+ b2r_len(RIGHT), ino,
+ b2r_off(whichfork, RIGHT->br_startoff));
+ if (error)
+ goto done;
+
+ /* Delete prev rmap */
+ if (!isnullstartblock(PREV->br_startblock)) {
+ error = xfs_rmapbt_delete(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp,
+ PREV->br_startblock),
+ b2r_len(PREV), ino,
+ b2r_off(whichfork, PREV->br_startoff));
+ if (error)
+ goto done;
+ }
+
+ /* Enlarge left rmap */
+ return xfs_rmap_resize(rcur, ino, whichfork, LEFT,
+ PREV->br_blockcount + RIGHT->br_blockcount);
+done:
+ return error;
+}
+
+/* Extend a left rmap extent */
+int
+xfs_rmap_lcombine(
+ struct xfs_btree_cur *rcur,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *LEFT,
+ struct xfs_bmbt_irec *PREV)
+{
+ int error;
+
+ if (!rcur)
+ return 0;
+
+ trace_xfs_rmap_lcombine(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+ whichfork, LEFT, PREV);
+
+ /* Delete prev rmap */
+ if (!isnullstartblock(PREV->br_startblock)) {
+ error = xfs_rmapbt_delete(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp,
+ PREV->br_startblock),
+ b2r_len(PREV), ino,
+ b2r_off(whichfork, PREV->br_startoff));
+ if (error)
+ goto done;
+ }
+
+ /* Enlarge left rmap */
+ return xfs_rmap_resize(rcur, ino, whichfork, LEFT, PREV->br_blockcount);
+done:
+ return error;
+}
+
+/* Extend a right rmap extent */
+int
+xfs_rmap_rcombine(
+ struct xfs_btree_cur *rcur,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *RIGHT,
+ struct xfs_bmbt_irec *PREV,
+ struct xfs_bmbt_irec *new)
+{
+ int error;
+
+ if (!rcur)
+ return 0;
+ ASSERT(PREV->br_startoff == new->br_startoff);
+
+ trace_xfs_rmap_rcombine(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+ whichfork, RIGHT, PREV);
+
+ /* Delete prev rmap */
+ if (!isnullstartblock(PREV->br_startblock)) {
+ error = xfs_rmapbt_delete(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp,
+ PREV->br_startblock),
+ b2r_len(PREV), ino,
+ b2r_off(whichfork, PREV->br_startoff));
+ if (error)
+ goto done;
+ }
+
+ /* Enlarge right rmap */
+ return xfs_rmap_resize(rcur, ino, whichfork, RIGHT,
+ -PREV->br_blockcount);
+done:
+ return error;
+}
+
+/* Insert a rmap extent */
+int
+xfs_rmap_insert(
+ struct xfs_btree_cur *rcur,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *new)
+{
+ if (!rcur)
+ return 0;
+
+ trace_xfs_rmap_insert(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+ whichfork, new);
+
+ return xfs_rmapbt_insert(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp, new->br_startblock),
+ b2r_len(new), ino,
+ b2r_off(whichfork, new->br_startoff));
+}
+
+/* Delete a rmap extent */
+int
+xfs_rmap_delete(
+ struct xfs_btree_cur *rcur,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *new)
+{
+ if (!rcur)
+ return 0;
+
+ trace_xfs_rmap_delete(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+ whichfork, new);
+
+ return xfs_rmapbt_delete(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp, new->br_startblock),
+ b2r_len(new), ino,
+ b2r_off(whichfork, new->br_startoff));
+}
+
+/* Change the start of an rmap */
+int
+xfs_rmap_move(
+ struct xfs_btree_cur *rcur,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *PREV,
+ long start_adj)
+{
+ int error;
+ struct xfs_bmbt_irec irec;
+
+ if (!rcur)
+ return 0;
+
+ trace_xfs_rmap_move(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+ whichfork, PREV, start_adj);
+
+ /* Delete prev rmap */
+ error = xfs_rmapbt_delete(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
+ b2r_len(PREV), ino,
+ b2r_off(whichfork, PREV->br_startoff));
+ if (error)
+ goto done;
+
+ /* Re-add rmap with new start */
+ irec = *PREV;
+ irec.br_startblock += start_adj;
+ irec.br_startoff += start_adj;
+ irec.br_blockcount -= start_adj;
+ return xfs_rmapbt_insert(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp, irec.br_startblock),
+ b2r_len(&irec), ino,
+ b2r_off(whichfork, irec.br_startoff));
+done:
+ return error;
+}
+
+/* Change the logical offset of an rmap */
+int
+xfs_rmap_slide(
+ struct xfs_btree_cur *rcur,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *PREV,
+ long start_adj)
+{
+ int error;
+
+ if (!rcur)
+ return 0;
+
+ trace_xfs_rmap_slide(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+ whichfork, PREV, start_adj);
+
+ /* Delete prev rmap */
+ error = xfs_rmapbt_delete(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
+ b2r_len(PREV), ino,
+ b2r_off(whichfork, PREV->br_startoff));
+ if (error)
+ goto done;
+
+ /* Re-add rmap with new logical offset */
+ return xfs_rmapbt_insert(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
+ b2r_len(PREV), ino,
+ b2r_off(whichfork, PREV->br_startoff + start_adj));
+done:
+ return error;
+}
+
+/* Change the size of an rmap */
+int
+xfs_rmap_resize(
+ struct xfs_btree_cur *rcur,
+ xfs_ino_t ino,
+ int whichfork,
+ struct xfs_bmbt_irec *PREV,
+ long size_adj)
+{
+ int i;
+ int error;
+ struct xfs_bmbt_irec irec;
+ struct xfs_rmap_irec rrec;
+
+ if (!rcur)
+ return 0;
+
+ trace_xfs_rmap_resize(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+ whichfork, PREV, size_adj);
+
+ error = xfs_rmap_lookup_eq(rcur,
+ XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
+ b2r_len(PREV), ino,
+ b2r_off(whichfork, PREV->br_startoff), &i);
+ if (error)
+ goto done;
+ XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+ error = xfs_rmap_get_rec(rcur, &rrec, &i);
+ if (error)
+ goto done;
+ XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+ irec = *PREV;
+ irec.br_blockcount += size_adj;
+ rrec.rm_blockcount = b2r_len(&irec);
+ error = xfs_rmap_update(rcur, &rrec);
+ if (error)
+ goto done;
+done:
+ return error;
+}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index d7c9722..0131d9a 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -68,4 +68,24 @@ int xfs_rmap_free(struct xfs_trans *tp, struct xfs_buf *agbp,
xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
struct xfs_owner_info *oinfo);
+/* functions for updating the rmapbt based on bmbt map/unmap operations */
+int xfs_rmap_combine(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
+ struct xfs_bmbt_irec *LEFT, struct xfs_bmbt_irec *RIGHT,
+ struct xfs_bmbt_irec *PREV);
+int xfs_rmap_lcombine(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
+ struct xfs_bmbt_irec *LEFT, struct xfs_bmbt_irec *PREV);
+int xfs_rmap_rcombine(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
+ struct xfs_bmbt_irec *RIGHT, struct xfs_bmbt_irec *PREV,
+ struct xfs_bmbt_irec *new);
+int xfs_rmap_insert(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
+ struct xfs_bmbt_irec *new);
+int xfs_rmap_delete(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
+ struct xfs_bmbt_irec *new);
+int xfs_rmap_move(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
+ struct xfs_bmbt_irec *PREV, long start_adj);
+int xfs_rmap_slide(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
+ struct xfs_bmbt_irec *PREV, long start_adj);
+int xfs_rmap_resize(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
+ struct xfs_bmbt_irec *PREV, long size_adj);
+
#endif /* __XFS_RMAP_BTREE_H__ */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 26/58] xfs: add rmap btree geometry feature flag
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (24 preceding siblings ...)
2015-10-07 4:57 ` [PATCH 25/58] xfs: bmap btree changes should update rmap btree Darrick J. Wong
@ 2015-10-07 4:57 ` Darrick J. Wong
2015-10-07 4:58 ` [PATCH 27/58] xfs: add rmap btree block detection to log recovery Darrick J. Wong
` (31 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:57 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
So xfs_info and other userspace utilities know the filesystem is
using this feature.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_fs.h | 1 +
fs/xfs/xfs_fsops.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 89689c6..9fbdb86 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -240,6 +240,7 @@ typedef struct xfs_fsop_resblks {
#define XFS_FSOP_GEOM_FLAGS_FTYPE 0x10000 /* inode directory types */
#define XFS_FSOP_GEOM_FLAGS_FINOBT 0x20000 /* free inode btree */
#define XFS_FSOP_GEOM_FLAGS_SPINODES 0x40000 /* sparse inode chunks */
+#define XFS_FSOP_GEOM_FLAGS_RMAPBT 0x80000 /* Reverse mapping btree */
/*
* Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 031dd92..64bb02b 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -104,7 +104,9 @@ xfs_fs_geometry(
(xfs_sb_version_hasfinobt(&mp->m_sb) ?
XFS_FSOP_GEOM_FLAGS_FINOBT : 0) |
(xfs_sb_version_hassparseinodes(&mp->m_sb) ?
- XFS_FSOP_GEOM_FLAGS_SPINODES : 0);
+ XFS_FSOP_GEOM_FLAGS_SPINODES : 0) |
+ (xfs_sb_version_hasrmapbt(&mp->m_sb) ?
+ XFS_FSOP_GEOM_FLAGS_RMAPBT : 0);
geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
mp->m_sb.sb_logsectsize : BBSIZE;
geo->rtsectsize = mp->m_sb.sb_blocksize;
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 27/58] xfs: add rmap btree block detection to log recovery
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (25 preceding siblings ...)
2015-10-07 4:57 ` [PATCH 26/58] xfs: add rmap btree geometry feature flag Darrick J. Wong
@ 2015-10-07 4:58 ` Darrick J. Wong
2015-10-07 4:58 ` [PATCH 28/58] xfs: enable the rmap btree functionality Darrick J. Wong
` (30 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:58 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
So such blocks can be correctly identified and have their operations
structutes attached to validate recovery has not resulted in a
correct block.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_log_recover.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 129b9a1..0f5fb8f 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1847,6 +1847,7 @@ xlog_recover_get_buf_lsn(
case XFS_ABTC_CRC_MAGIC:
case XFS_ABTB_MAGIC:
case XFS_ABTC_MAGIC:
+ case XFS_RMAP_CRC_MAGIC:
case XFS_IBT_CRC_MAGIC:
case XFS_IBT_MAGIC: {
struct xfs_btree_block *btb = blk;
@@ -2015,6 +2016,9 @@ xlog_recover_validate_buf_type(
case XFS_BMAP_MAGIC:
bp->b_ops = &xfs_bmbt_buf_ops;
break;
+ case XFS_RMAP_CRC_MAGIC:
+ bp->b_ops = &xfs_rmapbt_buf_ops;
+ break;
default:
xfs_warn(mp, "Bad btree block magic!");
ASSERT(0);
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 28/58] xfs: enable the rmap btree functionality
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (26 preceding siblings ...)
2015-10-07 4:58 ` [PATCH 27/58] xfs: add rmap btree block detection to log recovery Darrick J. Wong
@ 2015-10-07 4:58 ` Darrick J. Wong
2015-10-07 4:58 ` [PATCH 29/58] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Darrick J. Wong
` (29 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:58 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
Add the feature flag to the supported matrix so that the kernel can
mount and use rmap btree enabled filesystems
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_format.h | 3 ++-
fs/xfs/libxfs/xfs_sb.c | 5 +++++
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index f0cf383..c4a73e9 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -449,7 +449,8 @@ xfs_sb_has_compat_feature(
#define XFS_SB_FEAT_RO_COMPAT_FINOBT (1 << 0) /* free inode btree */
#define XFS_SB_FEAT_RO_COMPAT_RMAPBT (1 << 1) /* reverse map btree */
#define XFS_SB_FEAT_RO_COMPAT_ALL \
- (XFS_SB_FEAT_RO_COMPAT_FINOBT)
+ (XFS_SB_FEAT_RO_COMPAT_FINOBT | \
+ XFS_SB_FEAT_RO_COMPAT_RMAPBT)
#define XFS_SB_FEAT_RO_COMPAT_UNKNOWN ~XFS_SB_FEAT_RO_COMPAT_ALL
static inline bool
xfs_sb_has_ro_compat_feature(
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 0103a75..8b437f5 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -213,6 +213,11 @@ xfs_mount_validate_sb(
return -EINVAL;
}
+ if (xfs_sb_version_hasrmapbt(sbp)) {
+ xfs_alert(mp,
+"EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
+ }
+
/*
* More sanity checking. Most of these were stolen directly from
* xfs_repair.
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 29/58] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (27 preceding siblings ...)
2015-10-07 4:58 ` [PATCH 28/58] xfs: enable the rmap btree functionality Darrick J. Wong
@ 2015-10-07 4:58 ` Darrick J. Wong
2015-10-07 4:58 ` [PATCH 30/58] xfs: implement " Darrick J. Wong
` (28 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:58 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs, Dave Chinner
>From : Dave Chinner <dchinner@redhat.com>
Swapping extents between two inodes requires the owner to be updated
in the rmap tree for all the extents that are swapped. This code
does not yet exist, so switch off the XFS_IOC_SWAPEXT ioctl until
support has been implemented. This will nee dto be done before the
rmap btree code can have the experimental tag removed.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_bmap_util.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 8b2e505..f41a6f7 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1685,6 +1685,19 @@ xfs_swap_extents(
__uint64_t tmp;
int lock_flags;
+ /*
+ * We can't swap extents on rmap btree enabled filesystems yet
+ * as there is no mechanism to update the owner of extents in
+ * the rmap tree yet. Hence, for the moment, just reject attempts
+ * to swap extents with EINVAL after emitting a warning once to remind
+ * us this needs fixing.
+ */
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+ WARN_ONCE(1,
+ "XFS: XFS_IOC_SWAPEXT not supported on RMAP enabled filesystems\n");
+ return -EINVAL;
+ }
+
tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
if (!tempifp) {
error = -ENOMEM;
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 30/58] xfs: implement XFS_IOC_SWAPEXT when rmap btree is enabled
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (28 preceding siblings ...)
2015-10-07 4:58 ` [PATCH 29/58] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Darrick J. Wong
@ 2015-10-07 4:58 ` Darrick J. Wong
2015-10-07 4:58 ` [PATCH 31/58] libxfs: refactor short btree block verification Darrick J. Wong
` (27 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:58 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Implement extent swapping when reverse-mapping is enabled.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_btree.c | 17 ++++++
fs/xfs/libxfs/xfs_rmap.c | 112 ++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_rmap_btree.h | 8 +++
fs/xfs/xfs_bmap_util.c | 38 +++++++++-----
4 files changed, 161 insertions(+), 14 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 13971c6..2d7dc8c 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -32,6 +32,7 @@
#include "xfs_trace.h"
#include "xfs_cksum.h"
#include "xfs_alloc.h"
+#include "xfs_rmap_btree.h"
/*
* Cursor allocation zone.
@@ -3988,6 +3989,8 @@ xfs_btree_block_change_owner(
struct xfs_btree_block *block;
struct xfs_buf *bp;
union xfs_btree_ptr rptr;
+ struct xfs_owner_info old_oinfo, new_oinfo;
+ int error;
/* do right sibling readahead */
xfs_btree_readahead(cur, level, XFS_BTCUR_RIGHTRA);
@@ -3999,6 +4002,20 @@ xfs_btree_block_change_owner(
else
block->bb_u.s.bb_owner = cpu_to_be32(new_owner);
+ /* change rmap owners (bmbt blocks only) */
+ if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+ XFS_RMAP_INO_BMBT_OWNER(&old_oinfo,
+ cur->bc_private.b.ip->i_ino,
+ cur->bc_private.b.whichfork);
+ XFS_RMAP_INO_BMBT_OWNER(&new_oinfo,
+ new_owner,
+ cur->bc_private.b.whichfork);
+ error = xfs_rmap_change_bmbt_owner(cur, bp, &old_oinfo,
+ &new_oinfo);
+ if (error)
+ return error;
+ }
+
/*
* If the block is a root block hosted in an inode, we might not have a
* buffer pointer here and we shouldn't attempt to log the change as the
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index d821b1a..b2781f5 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -35,6 +35,8 @@
#include "xfs_trace.h"
#include "xfs_error.h"
#include "xfs_extent_busy.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
/*
* Lookup the first record less than or equal to [bno, len, owner, offset]
@@ -859,3 +861,113 @@ xfs_rmap_resize(
done:
return error;
}
+
+/**
+ * Change ownership of a file's BMBT block reverse-mappings.
+ */
+int
+xfs_rmap_change_bmbt_owner(
+ struct xfs_btree_cur *bcur,
+ struct xfs_buf *bp,
+ struct xfs_owner_info *old_owner,
+ struct xfs_owner_info *new_owner)
+{
+ struct xfs_buf *agfbp;
+ xfs_fsblock_t fsbno;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ int error;
+
+ if (!xfs_sb_version_hasrmapbt(&bcur->bc_mp->m_sb) || !bp)
+ return 0;
+
+ fsbno = XFS_DADDR_TO_FSB(bcur->bc_mp, XFS_BUF_ADDR(bp));
+ agno = XFS_FSB_TO_AGNO(bcur->bc_mp, fsbno);
+ agbno = XFS_FSB_TO_AGBNO(bcur->bc_mp, fsbno);
+
+ error = xfs_read_agf(bcur->bc_mp, bcur->bc_tp, agno, 0, &agfbp);
+
+ error = xfs_rmap_free(bcur->bc_tp, agfbp, agno, agbno, 1, old_owner);
+ if (error)
+ goto err;
+
+ error = xfs_rmap_alloc(bcur->bc_tp, agfbp, agno, agbno, 1, new_owner);
+ if (error)
+ goto err;
+
+err:
+ xfs_trans_brelse(bcur->bc_tp, agfbp);
+ return error;
+}
+
+/**
+ * Change the ownership on a file's extent's reverse-mappings.
+ */
+int
+xfs_rmap_change_extent_owner(
+ struct xfs_mount *mp,
+ struct xfs_inode *ip,
+ xfs_ino_t ino,
+ xfs_fileoff_t isize,
+ struct xfs_trans *tp,
+ int whichfork,
+ xfs_ino_t new_owner)
+{
+ struct xfs_bmbt_irec imap;
+ struct xfs_btree_cur *cur = NULL;
+ struct xfs_buf *agfbp = NULL;
+ int nimaps;
+ xfs_fileoff_t offset;
+ xfs_filblks_t len;
+ xfs_agnumber_t agno;
+ int flags = 0;
+ int error;
+
+ if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return 0;
+
+ if (whichfork == XFS_ATTR_FORK)
+ flags |= XFS_BMAPI_ATTRFORK;
+
+ offset = 0;
+ len = XFS_B_TO_FSB(mp, isize);
+ nimaps = 1;
+ error = xfs_bmapi_read(ip, offset, len, &imap, &nimaps, flags);
+ while (error == 0 && nimaps > 0) {
+ if (imap.br_startblock == HOLESTARTBLOCK ||
+ imap.br_startblock == DELAYSTARTBLOCK)
+ goto advloop;
+
+ agno = XFS_FSB_TO_AGNO(mp, imap.br_startblock);
+
+ error = xfs_read_agf(mp, tp, agno, 0, &agfbp);
+ if (error)
+ break;
+
+ cur = xfs_rmapbt_init_cursor(mp, tp, agfbp, agno);
+
+ error = xfs_rmap_delete(cur, ino, whichfork, &imap);
+ if (error)
+ break;
+ error = xfs_rmap_insert(cur, new_owner, whichfork, &imap);
+ if (error)
+ break;
+
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ cur = NULL;
+ xfs_trans_brelse(tp, agfbp);
+ agfbp = NULL;
+advloop:
+ offset += imap.br_blockcount;
+ len -= imap.br_blockcount;
+ nimaps = 1;
+ error = xfs_bmapi_read(ip, offset, len, &imap, &nimaps, flags);
+ }
+
+ if (cur)
+ xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR :
+ XFS_BTREE_NOERROR);
+ if (agfbp)
+ xfs_trans_brelse(tp, agfbp);
+ return error;
+}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index 0131d9a..5d248b5 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -88,4 +88,12 @@ int xfs_rmap_slide(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
int xfs_rmap_resize(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
struct xfs_bmbt_irec *PREV, long size_adj);
+/* functions for changing rmap ownership */
+int xfs_rmap_change_extent_owner(struct xfs_mount *mp, struct xfs_inode *ip,
+ xfs_ino_t ino, xfs_fileoff_t isize, struct xfs_trans *tp,
+ int whichfork, xfs_ino_t new_owner);
+int xfs_rmap_change_bmbt_owner(struct xfs_btree_cur *bcur, struct xfs_buf *bp,
+ struct xfs_owner_info *old_owner,
+ struct xfs_owner_info *new_owner);
+
#endif /* __XFS_RMAP_BTREE_H__ */
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index f41a6f7..245a34a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -40,6 +40,7 @@
#include "xfs_trace.h"
#include "xfs_icache.h"
#include "xfs_log.h"
+#include "xfs_rmap_btree.h"
/* Kernel only BMAP related definitions and functions */
@@ -1668,6 +1669,17 @@ xfs_swap_extent_flush(
return 0;
}
+static int
+change_extent_owner(
+ struct xfs_inode *ip,
+ struct xfs_trans *tp,
+ int whichfork,
+ struct xfs_inode *tip)
+{
+ return xfs_rmap_change_extent_owner(ip->i_mount, ip, ip->i_ino,
+ ip->i_d.di_size, tp, whichfork, tip->i_ino);
+}
+
int
xfs_swap_extents(
xfs_inode_t *ip, /* target inode */
@@ -1684,19 +1696,7 @@ xfs_swap_extents(
int taforkblks = 0;
__uint64_t tmp;
int lock_flags;
-
- /*
- * We can't swap extents on rmap btree enabled filesystems yet
- * as there is no mechanism to update the owner of extents in
- * the rmap tree yet. Hence, for the moment, just reject attempts
- * to swap extents with EINVAL after emitting a warning once to remind
- * us this needs fixing.
- */
- if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
- WARN_ONCE(1,
- "XFS: XFS_IOC_SWAPEXT not supported on RMAP enabled filesystems\n");
- return -EINVAL;
- }
+ struct xfs_trans_res tres = {.tr_logres = 262144, .tr_logcount = 1, .tr_logflags = 0};
tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
if (!tempifp) {
@@ -1734,7 +1734,9 @@ xfs_swap_extents(
goto out_unlock;
tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT);
- error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
+ /* XXX How do we create a potentially huge transaction here? */
+ /* error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0); */
+ error = xfs_trans_reserve(tp, &tres, 0, 0);
if (error) {
xfs_trans_cancel(tp);
goto out_unlock;
@@ -1834,6 +1836,14 @@ xfs_swap_extents(
goto out_trans_cancel;
}
+ /* Change owners in the extent rmaps */
+ error = change_extent_owner(ip, tp, XFS_DATA_FORK, tip);
+ if (error)
+ goto out_trans_cancel;
+ error = change_extent_owner(tip, tp, XFS_DATA_FORK, ip);
+ if (error)
+ goto out_trans_cancel;
+
/*
* Swap the data forks of the inodes
*/
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 31/58] libxfs: refactor short btree block verification
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (29 preceding siblings ...)
2015-10-07 4:58 ` [PATCH 30/58] xfs: implement " Darrick J. Wong
@ 2015-10-07 4:58 ` Darrick J. Wong
2015-10-07 4:58 ` [PATCH 32/58] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
` (26 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:58 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Create xfs_btree_sblock_verify() to verify short-format btree blocks
(i.e. the per-AG btrees with 32-bit block pointers) instead of
open-coding them.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_alloc_btree.c | 34 ++--------------------
fs/xfs/libxfs/xfs_btree.c | 58 ++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_btree.h | 3 ++
fs/xfs/libxfs/xfs_ialloc_btree.c | 26 ++---------------
fs/xfs/libxfs/xfs_rmap_btree.c | 23 ++-------------
5 files changed, 70 insertions(+), 74 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index 90de071..1352322 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -293,14 +293,7 @@ xfs_allocbt_verify(
level = be16_to_cpu(block->bb_level);
switch (block->bb_magic) {
case cpu_to_be32(XFS_ABTB_CRC_MAGIC):
- if (!xfs_sb_version_hascrc(&mp->m_sb))
- return false;
- if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
- return false;
- if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
- return false;
- if (pag &&
- be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+ if (!xfs_btree_sblock_v5hdr_verify(bp))
return false;
/* fall through */
case cpu_to_be32(XFS_ABTB_MAGIC):
@@ -311,14 +304,7 @@ xfs_allocbt_verify(
return false;
break;
case cpu_to_be32(XFS_ABTC_CRC_MAGIC):
- if (!xfs_sb_version_hascrc(&mp->m_sb))
- return false;
- if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
- return false;
- if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
- return false;
- if (pag &&
- be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+ if (!xfs_btree_sblock_v5hdr_verify(bp))
return false;
/* fall through */
case cpu_to_be32(XFS_ABTC_MAGIC):
@@ -332,21 +318,7 @@ xfs_allocbt_verify(
return false;
}
- /* numrecs verification */
- if (be16_to_cpu(block->bb_numrecs) > mp->m_alloc_mxr[level != 0])
- return false;
-
- /* sibling pointer verification */
- if (!block->bb_u.s.bb_leftsib ||
- (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
- block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
- return false;
- if (!block->bb_u.s.bb_rightsib ||
- (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
- block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
- return false;
-
- return true;
+ return xfs_btree_sblock_verify(bp, mp->m_alloc_mxr[level != 0]);
}
static void
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 2d7dc8c..a24c092 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4087,3 +4087,61 @@ xfs_btree_change_owner(
return 0;
}
+
+/**
+ * xfs_btree_sblock_v5hdr_verify() -- verify the v5 fields of a short-format
+ * btree block
+ *
+ * @bp: buffer containing the btree block
+ * @max_recs: pointer to the m_*_mxr max records field in the xfs mount
+ * @pag_max_level: pointer to the per-ag max level field
+ */
+bool
+xfs_btree_sblock_v5hdr_verify(
+ struct xfs_buf *bp)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+ struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp);
+ struct xfs_perag *pag = bp->b_pag;
+
+ if (!xfs_sb_version_hascrc(&mp->m_sb))
+ return false;
+ if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
+ return false;
+ if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
+ return false;
+ if (pag && be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+ return false;
+ return true;
+}
+
+/**
+ * xfs_btree_sblock_verify() -- verify a short-format btree block
+ *
+ * @bp: buffer containing the btree block
+ * @max_recs: maximum records allowed in this btree node
+ */
+bool
+xfs_btree_sblock_verify(
+ struct xfs_buf *bp,
+ unsigned int max_recs)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+ struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp);
+
+ /* numrecs verification */
+ if (be16_to_cpu(block->bb_numrecs) > max_recs)
+ return false;
+
+ /* sibling pointer verification */
+ if (!block->bb_u.s.bb_leftsib ||
+ (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
+ block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
+ return false;
+ if (!block->bb_u.s.bb_rightsib ||
+ (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
+ block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
+ return false;
+
+ return true;
+}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 48ab2b1..dd29d15 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -471,4 +471,7 @@ static inline int xfs_btree_get_level(struct xfs_btree_block *block)
#define XFS_BTREE_TRACE_ARGR(c, r)
#define XFS_BTREE_TRACE_CURSOR(c, t)
+bool xfs_btree_sblock_v5hdr_verify(struct xfs_buf *bp);
+bool xfs_btree_sblock_verify(struct xfs_buf *bp, unsigned int max_recs);
+
#endif /* __XFS_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index bd8a1da..97228d6 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -224,7 +224,6 @@ xfs_inobt_verify(
{
struct xfs_mount *mp = bp->b_target->bt_mount;
struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp);
- struct xfs_perag *pag = bp->b_pag;
unsigned int level;
/*
@@ -240,14 +239,7 @@ xfs_inobt_verify(
switch (block->bb_magic) {
case cpu_to_be32(XFS_IBT_CRC_MAGIC):
case cpu_to_be32(XFS_FIBT_CRC_MAGIC):
- if (!xfs_sb_version_hascrc(&mp->m_sb))
- return false;
- if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
- return false;
- if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
- return false;
- if (pag &&
- be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+ if (!xfs_btree_sblock_v5hdr_verify(bp))
return false;
/* fall through */
case cpu_to_be32(XFS_IBT_MAGIC):
@@ -257,24 +249,12 @@ xfs_inobt_verify(
return 0;
}
- /* numrecs and level verification */
+ /* level verification */
level = be16_to_cpu(block->bb_level);
if (level >= mp->m_in_maxlevels)
return false;
- if (be16_to_cpu(block->bb_numrecs) > mp->m_inobt_mxr[level != 0])
- return false;
-
- /* sibling pointer verification */
- if (!block->bb_u.s.bb_leftsib ||
- (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
- block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
- return false;
- if (!block->bb_u.s.bb_rightsib ||
- (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
- block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
- return false;
- return true;
+ return xfs_btree_sblock_verify(bp, mp->m_inobt_mxr[level != 0]);
}
static void
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 5fe717b..6546d80 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -253,13 +253,10 @@ xfs_rmapbt_verify(
if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
return false;
- if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid))
- return false;
- if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
- return false;
- if (pag && be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+ if (!xfs_btree_sblock_v5hdr_verify(bp))
return false;
+ /* level verification */
level = be16_to_cpu(block->bb_level);
if (pag && pag->pagf_init) {
if (level >= pag->pagf_levels[XFS_BTNUM_RMAPi])
@@ -267,21 +264,7 @@ xfs_rmapbt_verify(
} else if (level >= mp->m_ag_maxlevels)
return false;
- /* numrecs verification */
- if (be16_to_cpu(block->bb_numrecs) > mp->m_rmap_mxr[level != 0])
- return false;
-
- /* sibling pointer verification */
- if (!block->bb_u.s.bb_leftsib ||
- (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
- block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
- return false;
- if (!block->bb_u.s.bb_rightsib ||
- (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
- block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
- return false;
-
- return true;
+ return xfs_btree_sblock_verify(bp, mp->m_rmap_mxr[level != 0]);
}
static void
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 32/58] xfs: don't update rmapbt when fixing agfl
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (30 preceding siblings ...)
2015-10-07 4:58 ` [PATCH 31/58] libxfs: refactor short btree block verification Darrick J. Wong
@ 2015-10-07 4:58 ` Darrick J. Wong
2015-10-07 4:58 ` [PATCH 33/58] xfs: introduce refcount btree definitions Darrick J. Wong
` (25 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:58 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Allow a caller of xfs_alloc_fix_freelist to disable rmapbt updates
when fixing the AG freelist. xfs_repair needs this during phase 5
to be able to adjust the freelist while it's reconstructing the rmap
btree; the missing entries will be added back at the very end of
phase 5 once the AGFL contents settle down.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_alloc.c | 40 ++++++++++++++++++++++++++--------------
fs/xfs/libxfs/xfs_alloc.h | 5 ++++-
2 files changed, 30 insertions(+), 15 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index d7b9d43..be44873 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2099,26 +2099,38 @@ xfs_alloc_fix_freelist(
* anything other than extra overhead when we need to put more blocks
* back on the free list? Maybe we should only do this when space is
* getting low or the AGFL is more than half full?
+ *
+ * The NOSHRINK flag prevents the AGFL from being shrunk if it's too
+ * big; the NORMAP flag prevents AGFL expand/shrink operations from
+ * updating the rmapbt. Both flags are used in xfs_repair while we're
+ * rebuilding the rmapbt, and neither are used by the kernel. They're
+ * both required to ensure that rmaps are correctly recorded for the
+ * regenerated AGFL, bnobt, and cntbt. See repair/phase5.c and
+ * repair/rmap.c in xfsprogs for details.
*/
- XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
- while (pag->pagf_flcount > need) {
- struct xfs_buf *bp;
+ memset(&targs, 0, sizeof(targs));
+ if (!(flags & XFS_ALLOC_FLAG_NOSHRINK)) {
+ if (!(flags & XFS_ALLOC_FLAG_NORMAP))
+ XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
+ while (pag->pagf_flcount > need) {
+ struct xfs_buf *bp;
- error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
- if (error)
- goto out_agbp_relse;
- error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
- &targs.oinfo, 1);
- if (error)
- goto out_agbp_relse;
- bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
- xfs_trans_binval(tp, bp);
+ error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
+ if (error)
+ goto out_agbp_relse;
+ error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
+ &targs.oinfo, 1);
+ if (error)
+ goto out_agbp_relse;
+ bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
+ xfs_trans_binval(tp, bp);
+ }
}
- memset(&targs, 0, sizeof(targs));
targs.tp = tp;
targs.mp = mp;
- XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
+ if (!(flags & XFS_ALLOC_FLAG_NORMAP))
+ XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
targs.agbp = agbp;
targs.agno = args->agno;
targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 67e564e..754b5dd 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -54,6 +54,9 @@ typedef unsigned int xfs_alloctype_t;
*/
#define XFS_ALLOC_FLAG_TRYLOCK 0x00000001 /* use trylock for buffer locking */
#define XFS_ALLOC_FLAG_FREEING 0x00000002 /* indicate caller is freeing extents*/
+#define XFS_ALLOC_FLAG_NORMAP 0x00000004 /* don't modify the rmapbt */
+#define XFS_ALLOC_FLAG_NOSHRINK 0x00000008 /* don't shrink the freelist */
+
/*
* Argument structure for xfs_alloc routines.
@@ -86,7 +89,7 @@ typedef struct xfs_alloc_arg {
char isfl; /* set if is freelist blocks - !acctg */
char userdata; /* set if this is user data */
xfs_fsblock_t firstblock; /* io first block allocated */
- struct xfs_owner_info oinfo; /* owner of blocks being allocated */
+ struct xfs_owner_info oinfo; /* owner of blocks being allocated */
} xfs_alloc_arg_t;
/*
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 33/58] xfs: introduce refcount btree definitions
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (31 preceding siblings ...)
2015-10-07 4:58 ` [PATCH 32/58] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
@ 2015-10-07 4:58 ` Darrick J. Wong
2015-10-07 4:58 ` [PATCH 34/58] xfs: add refcount btree stats infrastructure Darrick J. Wong
` (24 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:58 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Add new per-AG refcount btree definitions to the per-AG structures.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_alloc.c | 5 +++++
fs/xfs/libxfs/xfs_btree.c | 5 +++--
fs/xfs/libxfs/xfs_btree.h | 4 ++++
fs/xfs/libxfs/xfs_format.h | 34 ++++++++++++++++++++++++++++++++--
fs/xfs/libxfs/xfs_types.h | 2 +-
fs/xfs/xfs_inode.h | 5 +++++
fs/xfs/xfs_mount.h | 3 +++
7 files changed, 53 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index be44873..6e7f0a6 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2406,6 +2406,10 @@ xfs_agf_verify(
be32_to_cpu(agf->agf_btreeblks) > be32_to_cpu(agf->agf_length))
return false;
+ if (xfs_sb_version_hasreflink(&mp->m_sb) &&
+ be32_to_cpu(agf->agf_refcount_level) > XFS_BTREE_MAXLEVELS)
+ return false;
+
return true;;
}
@@ -2525,6 +2529,7 @@ xfs_alloc_read_agf(
be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
pag->pagf_levels[XFS_BTNUM_RMAPi] =
be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
+ pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
spin_lock_init(&pag->pagb_lock);
pag->pagb_count = 0;
pag->pagb_tree = RB_ROOT;
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index a24c092..b6ebfc6 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -44,9 +44,10 @@ kmem_zone_t *xfs_btree_cur_zone;
*/
static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, 0, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
- XFS_FIBT_MAGIC },
+ XFS_FIBT_MAGIC, 0 },
{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC, XFS_RMAP_CRC_MAGIC,
- XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC }
+ XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC,
+ XFS_REFC_CRC_MAGIC }
};
#define xfs_btree_magic(cur) \
xfs_magics[!!((cur)->bc_flags & XFS_BTREE_CRC_BLOCKS)][cur->bc_btnum]
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index dd29d15..4b49e35 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -66,6 +66,7 @@ union xfs_btree_rec {
#define XFS_BTNUM_INO ((xfs_btnum_t)XFS_BTNUM_INOi)
#define XFS_BTNUM_FINO ((xfs_btnum_t)XFS_BTNUM_FINOi)
#define XFS_BTNUM_RMAP ((xfs_btnum_t)XFS_BTNUM_RMAPi)
+#define XFS_BTNUM_REFC ((xfs_btnum_t)XFS_BTNUM_REFCi)
/*
* For logging record fields.
@@ -98,6 +99,7 @@ do { \
case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(ibt, stat); break; \
case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(fibt, stat); break; \
case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(rmap, stat); break; \
+ case XFS_BTNUM_REFC: break; \
case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
} \
} while (0)
@@ -113,6 +115,7 @@ do { \
case XFS_BTNUM_INO: __XFS_BTREE_STATS_ADD(ibt, stat, val); break; \
case XFS_BTNUM_FINO: __XFS_BTREE_STATS_ADD(fibt, stat, val); break; \
case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_ADD(rmap, stat, val); break; \
+ case XFS_BTNUM_REFC: break; \
case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
} \
} while (0)
@@ -217,6 +220,7 @@ typedef struct xfs_btree_cur
union {
struct { /* needed for BNO, CNT, INO */
struct xfs_buf *agbp; /* agf/agi buffer pointer */
+ struct xfs_bmap_free *flist; /* list to free after */
xfs_agnumber_t agno; /* ag number */
} a;
struct { /* needed for BMAP */
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index c4a73e9..600b327 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -448,6 +448,7 @@ xfs_sb_has_compat_feature(
#define XFS_SB_FEAT_RO_COMPAT_FINOBT (1 << 0) /* free inode btree */
#define XFS_SB_FEAT_RO_COMPAT_RMAPBT (1 << 1) /* reverse map btree */
+#define XFS_SB_FEAT_RO_COMPAT_REFLINK (1 << 2) /* reflinked files */
#define XFS_SB_FEAT_RO_COMPAT_ALL \
(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
XFS_SB_FEAT_RO_COMPAT_RMAPBT)
@@ -526,6 +527,12 @@ static inline bool xfs_sb_version_hasrmapbt(struct xfs_sb *sbp)
(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_RMAPBT);
}
+static inline bool xfs_sb_version_hasreflink(struct xfs_sb *sbp)
+{
+ return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
+ (sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_REFLINK);
+}
+
/*
* XFS_SB_FEAT_INCOMPAT_META_UUID indicates that the metadata UUID
* is stored separately from the user-visible UUID; this allows the
@@ -632,12 +639,15 @@ typedef struct xfs_agf {
__be32 agf_btreeblks; /* # of blocks held in AGF btrees */
uuid_t agf_uuid; /* uuid of filesystem */
+ __be32 agf_refcount_root; /* refcount tree root block */
+ __be32 agf_refcount_level; /* refcount btree levels */
+
/*
* reserve some contiguous space for future logged fields before we add
* the unlogged fields. This makes the range logging via flags and
* structure offsets much simpler.
*/
- __be64 agf_spare64[16];
+ __be64 agf_spare64[15];
/* unlogged fields, written during buffer writeback. */
__be64 agf_lsn; /* last write sequence */
@@ -1024,6 +1034,18 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
XFS_DIFLAG_EXTSZINHERIT | XFS_DIFLAG_NODEFRAG | XFS_DIFLAG_FILESTREAM)
/*
+ * Values for di_flags2
+ * There should be a one-to-one correspondence between these flags and the
+ * XFS_XFLAG_s.
+ */
+#define XFS_DIFLAG2_REFLINK_BIT 0 /* file's blocks may be reflinked */
+#define XFS_DIFLAG2_REFLINK (1 << XFS_DIFLAG2_REFLINK_BIT)
+
+#define XFS_DIFLAG2_ANY \
+ (XFS_DIFLAG2_REFLINK)
+
+
+/*
* Inode number format:
* low inopblog bits - offset in block
* next agblklog bits - block number in ag
@@ -1368,7 +1390,8 @@ XFS_RMAP_INO_OWNER(
#define XFS_RMAP_OWN_AG (-5ULL) /* AG freespace btree blocks */
#define XFS_RMAP_OWN_INOBT (-6ULL) /* Inode btree blocks */
#define XFS_RMAP_OWN_INODES (-7ULL) /* Inode chunk */
-#define XFS_RMAP_OWN_MIN (-8ULL) /* guard */
+#define XFS_RMAP_OWN_REFC (-8ULL) /* refcount tree */
+#define XFS_RMAP_OWN_MIN (-9ULL) /* guard */
#define XFS_RMAP_NON_INODE_OWNER(owner) (!!((owner) & (1ULL << 63)))
@@ -1471,6 +1494,13 @@ xfs_owner_info_pack(
}
/*
+ * Reference Count Btree format definitions
+ *
+ */
+#define XFS_REFC_CRC_MAGIC 0x52334643 /* 'R3FC' */
+
+
+/*
* BMAP Btree format definitions
*
* This includes both the root block definition that sits inside an inode fork
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 3d50364..be7b6de 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -109,7 +109,7 @@ typedef enum {
typedef enum {
XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi,
- XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_MAX
+ XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_MAX
} xfs_btnum_t;
struct xfs_name {
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index ca9e119..6436a96 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -202,6 +202,11 @@ xfs_get_initial_prid(struct xfs_inode *dp)
return XFS_PROJID_DEFAULT;
}
+static inline bool xfs_is_reflink_inode(struct xfs_inode *ip)
+{
+ return ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK;
+}
+
/*
* In-core inode flags.
*/
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index cdced0b..4b286cc 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -315,6 +315,9 @@ typedef struct xfs_perag {
/* for rcu-safe freeing */
struct rcu_head rcu_head;
int pagb_count; /* pagb slots in use */
+
+ /* reference count */
+ __uint8_t pagf_refcount_level;
} xfs_perag_t;
extern int xfs_log_sbcount(xfs_mount_t *);
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 34/58] xfs: add refcount btree stats infrastructure
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (32 preceding siblings ...)
2015-10-07 4:58 ` [PATCH 33/58] xfs: introduce refcount btree definitions Darrick J. Wong
@ 2015-10-07 4:58 ` Darrick J. Wong
2015-10-07 4:58 ` [PATCH 35/58] xfs: refcount btree add more reserved blocks Darrick J. Wong
` (23 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:58 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
The refcount btree presents the same stats as the other btrees, so
add all the code for that now.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_btree.h | 4 ++--
fs/xfs/xfs_stats.c | 1 +
fs/xfs/xfs_stats.h | 18 +++++++++++++++++-
3 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 4b49e35..8c59521 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -99,7 +99,7 @@ do { \
case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(ibt, stat); break; \
case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(fibt, stat); break; \
case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(rmap, stat); break; \
- case XFS_BTNUM_REFC: break; \
+ case XFS_BTNUM_REFC: __XFS_BTREE_STATS_INC(refcbt, stat); break; \
case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
} \
} while (0)
@@ -115,7 +115,7 @@ do { \
case XFS_BTNUM_INO: __XFS_BTREE_STATS_ADD(ibt, stat, val); break; \
case XFS_BTNUM_FINO: __XFS_BTREE_STATS_ADD(fibt, stat, val); break; \
case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_ADD(rmap, stat, val); break; \
- case XFS_BTNUM_REFC: break; \
+ case XFS_BTNUM_REFC: __XFS_BTREE_STATS_ADD(refcbt, stat, val); break; \
case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
} \
} while (0)
diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c
index 67bbfa2..64a60ef 100644
--- a/fs/xfs/xfs_stats.c
+++ b/fs/xfs/xfs_stats.c
@@ -61,6 +61,7 @@ static int xfs_stat_proc_show(struct seq_file *m, void *v)
{ "ibt2", XFSSTAT_END_IBT_V2 },
{ "fibt2", XFSSTAT_END_FIBT_V2 },
{ "rmapbt", XFSSTAT_END_RMAP_V2 },
+ { "refcntbt", XFSSTAT_END_REFCOUNT },
/* we print both series of quota information together */
{ "qm", XFSSTAT_END_QM },
};
diff --git a/fs/xfs/xfs_stats.h b/fs/xfs/xfs_stats.h
index 8414db2..f6e4de6 100644
--- a/fs/xfs/xfs_stats.h
+++ b/fs/xfs/xfs_stats.h
@@ -215,7 +215,23 @@ struct xfsstats {
__uint32_t xs_rmap_2_alloc;
__uint32_t xs_rmap_2_free;
__uint32_t xs_rmap_2_moves;
-#define XFSSTAT_END_XQMSTAT (XFSSTAT_END_RMAP_V2+6)
+#define XFSSTAT_END_REFCOUNT (XFSSTAT_END_RMAP_V2 + 15)
+ __uint32_t xs_refcbt_2_lookup;
+ __uint32_t xs_refcbt_2_compare;
+ __uint32_t xs_refcbt_2_insrec;
+ __uint32_t xs_refcbt_2_delrec;
+ __uint32_t xs_refcbt_2_newroot;
+ __uint32_t xs_refcbt_2_killroot;
+ __uint32_t xs_refcbt_2_increment;
+ __uint32_t xs_refcbt_2_decrement;
+ __uint32_t xs_refcbt_2_lshift;
+ __uint32_t xs_refcbt_2_rshift;
+ __uint32_t xs_refcbt_2_split;
+ __uint32_t xs_refcbt_2_join;
+ __uint32_t xs_refcbt_2_alloc;
+ __uint32_t xs_refcbt_2_free;
+ __uint32_t xs_refcbt_2_moves;
+#define XFSSTAT_END_XQMSTAT (XFSSTAT_END_REFCOUNT + 6)
__uint32_t xs_qm_dqreclaims;
__uint32_t xs_qm_dqreclaim_misses;
__uint32_t xs_qm_dquot_dups;
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 35/58] xfs: refcount btree add more reserved blocks
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (33 preceding siblings ...)
2015-10-07 4:58 ` [PATCH 34/58] xfs: add refcount btree stats infrastructure Darrick J. Wong
@ 2015-10-07 4:58 ` Darrick J. Wong
2015-10-07 4:59 ` [PATCH 36/58] xfs: define the on-disk refcount btree format Darrick J. Wong
` (22 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:58 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Since XFS reserves a small amount of space in each AG as the minimum
free space needed for an operation, save some more space in case we
touch the refcount btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_alloc.c | 13 +++++++++++++
fs/xfs/libxfs/xfs_format.h | 2 ++
2 files changed, 15 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6e7f0a6..bd8aa8c 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -50,10 +50,23 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
+unsigned int
+xfs_refc_block(
+ struct xfs_mount *mp)
+{
+ if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+ return XFS_RMAP_BLOCK(mp) + 1;
+ if (xfs_sb_version_hasfinobt(&mp->m_sb))
+ return XFS_FIBT_BLOCK(mp) + 1;
+ return XFS_IBT_BLOCK(mp) + 1;
+}
+
xfs_extlen_t
xfs_prealloc_blocks(
struct xfs_mount *mp)
{
+ if (xfs_sb_version_hasreflink(&mp->m_sb))
+ return xfs_refc_block(mp) + 1;
if (xfs_sb_version_hasrmapbt(&mp->m_sb))
return XFS_RMAP_BLOCK(mp) + 1;
if (xfs_sb_version_hasfinobt(&mp->m_sb))
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 600b327..5757440 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1499,6 +1499,8 @@ xfs_owner_info_pack(
*/
#define XFS_REFC_CRC_MAGIC 0x52334643 /* 'R3FC' */
+unsigned int xfs_refc_block(struct xfs_mount *mp);
+
/*
* BMAP Btree format definitions
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 36/58] xfs: define the on-disk refcount btree format
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (34 preceding siblings ...)
2015-10-07 4:58 ` [PATCH 35/58] xfs: refcount btree add more reserved blocks Darrick J. Wong
@ 2015-10-07 4:59 ` Darrick J. Wong
2015-10-07 4:59 ` [PATCH 37/58] xfs: define tracepoints for refcount/reflink activities Darrick J. Wong
` (21 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:59 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Start constructing the refcount btree implementation by establishing
the on-disk format and everything needed to read, write, and
manipulate the refcount btree blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/Makefile | 1
fs/xfs/libxfs/xfs_btree.c | 3 +
fs/xfs/libxfs/xfs_btree.h | 3 +
fs/xfs/libxfs/xfs_format.h | 32 +++++++
fs/xfs/libxfs/xfs_refcount_btree.c | 173 ++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_refcount_btree.h | 65 ++++++++++++++
fs/xfs/libxfs/xfs_sb.c | 9 ++
fs/xfs/libxfs/xfs_shared.h | 2
fs/xfs/xfs_mount.h | 2
9 files changed, 290 insertions(+)
create mode 100644 fs/xfs/libxfs/xfs_refcount_btree.c
create mode 100644 fs/xfs/libxfs/xfs_refcount_btree.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index dc40462..c394ec2 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -53,6 +53,7 @@ xfs-y += $(addprefix libxfs/, \
xfs_log_rlimit.o \
xfs_rmap.o \
xfs_rmap_btree.o \
+ xfs_refcount_btree.o \
xfs_sb.o \
xfs_symlink_remote.o \
xfs_trans_resv.o \
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index b6ebfc6..c90feee 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1121,6 +1121,9 @@ xfs_btree_set_refs(
case XFS_BTNUM_RMAP:
xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF);
break;
+ case XFS_BTNUM_REFC:
+ xfs_buf_set_ref(bp, XFS_REFC_BTREE_REF);
+ break;
default:
ASSERT(0);
}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 8c59521..94848a1 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -43,6 +43,7 @@ union xfs_btree_key {
xfs_alloc_key_t alloc;
struct xfs_inobt_key inobt;
struct xfs_rmap_key rmap;
+ struct xfs_refcount_key refc;
};
union xfs_btree_rec {
@@ -51,6 +52,7 @@ union xfs_btree_rec {
struct xfs_alloc_rec alloc;
struct xfs_inobt_rec inobt;
struct xfs_rmap_rec rmap;
+ struct xfs_refcount_rec refc;
};
/*
@@ -208,6 +210,7 @@ typedef struct xfs_btree_cur
xfs_bmbt_irec_t b;
xfs_inobt_rec_incore_t i;
struct xfs_rmap_irec r;
+ struct xfs_refcount_irec rc;
} bc_rec; /* current insert/search record value */
struct xfs_buf *bc_bufs[XFS_BTREE_MAXLEVELS]; /* buf ptr per level */
int bc_ptrs[XFS_BTREE_MAXLEVELS]; /* key/record # */
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 5757440..98e52ab 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1501,6 +1501,38 @@ xfs_owner_info_pack(
unsigned int xfs_refc_block(struct xfs_mount *mp);
+/*
+ * Data record/key structure
+ *
+ * Each record associates a range of physical blocks (starting at
+ * rc_startblock and ending rc_blockcount blocks later) with a
+ * reference count (rc_refcount). A record is only stored in the
+ * btree if the refcount is > 2. An entry in the free block btree
+ * means that the refcount is 0, and no entries anywhere means that
+ * the refcount is 1, as was true in XFS before reflinking.
+ */
+struct xfs_refcount_rec {
+ __be32 rc_startblock; /* starting block number */
+ __be32 rc_blockcount; /* count of blocks */
+ __be32 rc_refcount; /* number of inodes linked here */
+};
+
+struct xfs_refcount_key {
+ __be32 rc_startblock; /* starting block number */
+};
+
+struct xfs_refcount_irec {
+ xfs_agblock_t rc_startblock; /* starting block number */
+ xfs_extlen_t rc_blockcount; /* count of free blocks */
+ xfs_nlink_t rc_refcount; /* number of inodes linked here */
+};
+
+#define MAXREFCOUNT ((xfs_nlink_t)~0U)
+#define MAXREFCEXTLEN ((xfs_extlen_t)~0U)
+
+/* btree pointer type */
+typedef __be32 xfs_refcount_ptr_t;
+
/*
* BMAP Btree format definitions
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
new file mode 100644
index 0000000..7067a05
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -0,0 +1,173 @@
+/*
+ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_trans.h"
+#include "xfs_bit.h"
+
+static struct xfs_btree_cur *
+xfs_refcountbt_dup_cursor(
+ struct xfs_btree_cur *cur)
+{
+ return xfs_refcountbt_init_cursor(cur->bc_mp, cur->bc_tp,
+ cur->bc_private.a.agbp, cur->bc_private.a.agno,
+ cur->bc_private.a.flist);
+}
+
+STATIC bool
+xfs_refcountbt_verify(
+ struct xfs_buf *bp)
+{
+ struct xfs_mount *mp = bp->b_target->bt_mount;
+ struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp);
+ struct xfs_perag *pag = bp->b_pag;
+ unsigned int level;
+
+ if (block->bb_magic != cpu_to_be32(XFS_REFC_CRC_MAGIC))
+ return false;
+
+ if (!xfs_sb_version_hasreflink(&mp->m_sb))
+ return false;
+ if (!xfs_btree_sblock_v5hdr_verify(bp))
+ return false;
+
+ level = be16_to_cpu(block->bb_level);
+ if (pag && pag->pagf_init) {
+ if (level >= pag->pagf_refcount_level)
+ return false;
+ } else if (level >= mp->m_ag_maxlevels)
+ return false;
+
+ return xfs_btree_sblock_verify(bp, mp->m_refc_mxr[level != 0]);
+}
+
+STATIC void
+xfs_refcountbt_read_verify(
+ struct xfs_buf *bp)
+{
+ if (!xfs_btree_sblock_verify_crc(bp))
+ xfs_buf_ioerror(bp, -EFSBADCRC);
+ else if (!xfs_refcountbt_verify(bp))
+ xfs_buf_ioerror(bp, -EFSCORRUPTED);
+
+ if (bp->b_error) {
+ trace_xfs_btree_corrupt(bp, _RET_IP_);
+ xfs_verifier_error(bp);
+ }
+}
+
+STATIC void
+xfs_refcountbt_write_verify(
+ struct xfs_buf *bp)
+{
+ if (!xfs_refcountbt_verify(bp)) {
+ trace_xfs_btree_corrupt(bp, _RET_IP_);
+ xfs_buf_ioerror(bp, -EFSCORRUPTED);
+ xfs_verifier_error(bp);
+ return;
+ }
+ xfs_btree_sblock_calc_crc(bp);
+
+}
+
+const struct xfs_buf_ops xfs_refcountbt_buf_ops = {
+ .verify_read = xfs_refcountbt_read_verify,
+ .verify_write = xfs_refcountbt_write_verify,
+};
+
+static const struct xfs_btree_ops xfs_refcountbt_ops = {
+ .rec_len = sizeof(struct xfs_refcount_rec),
+ .key_len = sizeof(struct xfs_refcount_key),
+
+ .dup_cursor = xfs_refcountbt_dup_cursor,
+ .buf_ops = &xfs_refcountbt_buf_ops,
+};
+
+/**
+ * xfs_refcountbt_init_cursor() -- Allocate a new refcount btree cursor.
+ *
+ * @mp: XFS mount object
+ * @tp: XFS transaction
+ * @agbp: Buffer containing the AGF
+ * @agno: AG number
+ */
+struct xfs_btree_cur *
+xfs_refcountbt_init_cursor(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_buf *agbp,
+ xfs_agnumber_t agno,
+ struct xfs_bmap_free *flist)
+{
+ struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp);
+ struct xfs_btree_cur *cur;
+
+ ASSERT(agno != NULLAGNUMBER);
+ ASSERT(agno < mp->m_sb.sb_agcount);
+ cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_SLEEP);
+
+ cur->bc_tp = tp;
+ cur->bc_mp = mp;
+ cur->bc_btnum = XFS_BTNUM_REFC;
+ cur->bc_blocklog = mp->m_sb.sb_blocklog;
+ cur->bc_ops = &xfs_refcountbt_ops;
+
+ cur->bc_nlevels = be32_to_cpu(agf->agf_refcount_level);
+
+ cur->bc_private.a.agbp = agbp;
+ cur->bc_private.a.agno = agno;
+ cur->bc_private.a.flist = flist;
+ cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
+
+ return cur;
+}
+
+/**
+ * xfs_refcountbt_maxrecs() -- Calculate number of records in a refcount
+ * btree block.
+ * @mp: XFS mount object
+ * @blocklen: Length of block, in bytes.
+ * @leaf: true if this is a leaf btree block, false otherwise
+ */
+int
+xfs_refcountbt_maxrecs(
+ struct xfs_mount *mp,
+ int blocklen,
+ bool leaf)
+{
+ blocklen -= XFS_REFCOUNT_BLOCK_LEN;
+
+ if (leaf)
+ return blocklen / sizeof(struct xfs_refcount_rec);
+ return blocklen / (sizeof(struct xfs_refcount_key) +
+ sizeof(xfs_refcount_ptr_t));
+}
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h
new file mode 100644
index 0000000..d51dc1a
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_refcount_btree.h
@@ -0,0 +1,65 @@
+/*
+ * Copyright (c) 2000,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_REFCOUNT_BTREE_H__
+#define __XFS_REFCOUNT_BTREE_H__
+
+/*
+ * Reference Count Btree on-disk structures
+ */
+
+struct xfs_buf;
+struct xfs_btree_cur;
+struct xfs_mount;
+
+/*
+ * Btree block header size
+ */
+#define XFS_REFCOUNT_BLOCK_LEN XFS_BTREE_SBLOCK_CRC_LEN
+
+/*
+ * Record, key, and pointer address macros for btree blocks.
+ *
+ * (note that some of these may appear unused, but they are used in userspace)
+ */
+#define XFS_REFCOUNT_REC_ADDR(block, index) \
+ ((struct xfs_refcount_rec *) \
+ ((char *)(block) + \
+ XFS_REFCOUNT_BLOCK_LEN + \
+ (((index) - 1) * sizeof(struct xfs_refcount_rec))))
+
+#define XFS_REFCOUNT_KEY_ADDR(block, index) \
+ ((struct xfs_refcount_key *) \
+ ((char *)(block) + \
+ XFS_REFCOUNT_BLOCK_LEN + \
+ ((index) - 1) * sizeof(struct xfs_refcount_key)))
+
+#define XFS_REFCOUNT_PTR_ADDR(block, index, maxrecs) \
+ ((xfs_refcount_ptr_t *) \
+ ((char *)(block) + \
+ XFS_REFCOUNT_BLOCK_LEN + \
+ (maxrecs) * sizeof(struct xfs_refcount_key) + \
+ ((index) - 1) * sizeof(xfs_refcount_ptr_t)))
+
+extern struct xfs_btree_cur *xfs_refcountbt_init_cursor(struct xfs_mount *mp,
+ struct xfs_trans *tp, struct xfs_buf *agbp, xfs_agnumber_t agno,
+ struct xfs_bmap_free *flist);
+extern int xfs_refcountbt_maxrecs(struct xfs_mount *mp, int blocklen,
+ bool leaf);
+
+#endif /* __XFS_REFCOUNT_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 8b437f5..5bc638f 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -36,6 +36,8 @@
#include "xfs_alloc_btree.h"
#include "xfs_ialloc_btree.h"
#include "xfs_rmap_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_refcount_btree.h"
/*
* Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -728,6 +730,13 @@ xfs_sb_mount_common(
mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2;
mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2;
+ mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize,
+ true);
+ mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize,
+ false);
+ mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2;
+ mp->m_refc_mnr[1] = mp->m_refc_mxr[1] / 2;
+
mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
sbp->sb_inopblock);
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 88efbb4..77d1220 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -39,6 +39,7 @@ extern const struct xfs_buf_ops xfs_agf_buf_ops;
extern const struct xfs_buf_ops xfs_agfl_buf_ops;
extern const struct xfs_buf_ops xfs_allocbt_buf_ops;
extern const struct xfs_buf_ops xfs_rmapbt_buf_ops;
+extern const struct xfs_buf_ops xfs_refcountbt_buf_ops;
extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
@@ -216,6 +217,7 @@ int xfs_log_calc_minimum_size(struct xfs_mount *);
#define XFS_INO_REF 2
#define XFS_ATTR_BTREE_REF 1
#define XFS_DQUOT_REF 1
+#define XFS_REFC_BTREE_REF 1
/*
* Flags for xfs_trans_ichgtime().
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 4b286cc..aba42d7 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -92,6 +92,8 @@ typedef struct xfs_mount {
uint m_inobt_mnr[2]; /* min inobt btree records */
uint m_rmap_mxr[2]; /* max rmap btree records */
uint m_rmap_mnr[2]; /* min rmap btree records */
+ uint m_refc_mxr[2]; /* max refc btree records */
+ uint m_refc_mnr[2]; /* min refc btree records */
uint m_ag_maxlevels; /* XFS_AG_MAXLEVELS */
uint m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
uint m_in_maxlevels; /* max inobt btree levels. */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 37/58] xfs: define tracepoints for refcount/reflink activities
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (35 preceding siblings ...)
2015-10-07 4:59 ` [PATCH 36/58] xfs: define the on-disk refcount btree format Darrick J. Wong
@ 2015-10-07 4:59 ` Darrick J. Wong
2015-10-07 4:59 ` [PATCH 38/58] xfs: add refcount btree support to growfs Darrick J. Wong
` (20 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:59 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Define all the tracepoints we need to inspect the refcount and reflink
runtime operation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_trace.h | 672 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 672 insertions(+)
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index f8ec848..2f6015c 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -2453,6 +2453,678 @@ DEFINE_DISCARD_EVENT(xfs_discard_toosmall);
DEFINE_DISCARD_EVENT(xfs_discard_exclude);
DEFINE_DISCARD_EVENT(xfs_discard_busy);
+/* reflink/refcount tracepoint classes */
+
+/* reuse the discard trace class for agbno/aglen-based traces */
+#define DEFINE_AG_EXTENT_EVENT(name) DEFINE_DISCARD_EVENT(name)
+
+/* ag btree lookup tracepoint class */
+#define XFS_AG_BTREE_CMP_FORMAT_STR \
+ { XFS_LOOKUP_EQ, "eq" }, \
+ { XFS_LOOKUP_LE, "le" }, \
+ { XFS_LOOKUP_GE, "ge" }
+DECLARE_EVENT_CLASS(xfs_ag_btree_lookup_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_agblock_t agbno, xfs_lookup_t dir),
+ TP_ARGS(mp, agno, agbno, dir),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, agbno)
+ __field(xfs_lookup_t, dir)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->agbno = agbno;
+ __entry->dir = dir;
+ ),
+ TP_printk("dev %d:%d agno %u agbno %u cmp %s(%d)\n",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->agbno,
+ __print_symbolic(__entry->dir, XFS_AG_BTREE_CMP_FORMAT_STR),
+ __entry->dir)
+)
+
+#define DEFINE_AG_BTREE_LOOKUP_EVENT(name) \
+DEFINE_EVENT(xfs_ag_btree_lookup_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ xfs_agblock_t agbno, xfs_lookup_t dir), \
+ TP_ARGS(mp, agno, agbno, dir))
+
+/* two-file io tracepoint class */
+DECLARE_EVENT_CLASS(xfs_double_io_class,
+ TP_PROTO(struct xfs_inode *src, xfs_off_t soffset, xfs_off_t len,
+ struct xfs_inode *dest, xfs_off_t doffset),
+ TP_ARGS(src, soffset, len, dest, doffset),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, src_ino)
+ __field(loff_t, src_isize)
+ __field(loff_t, src_disize)
+ __field(loff_t, src_offset)
+ __field(size_t, len)
+ __field(xfs_ino_t, dest_ino)
+ __field(loff_t, dest_isize)
+ __field(loff_t, dest_disize)
+ __field(loff_t, dest_offset)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(src)->i_sb->s_dev;
+ __entry->src_ino = src->i_ino;
+ __entry->src_isize = VFS_I(src)->i_size;
+ __entry->src_disize = src->i_d.di_size;
+ __entry->src_offset = soffset;
+ __entry->len = len;
+ __entry->dest_ino = dest->i_ino;
+ __entry->dest_isize = VFS_I(dest)->i_size;
+ __entry->dest_disize = dest->i_d.di_size;
+ __entry->dest_offset = doffset;
+ ),
+ TP_printk("dev %d:%d count %zd "
+ "ino 0x%llx isize 0x%llx disize 0x%llx offset 0x%llx -> "
+ "ino 0x%llx isize 0x%llx disize 0x%llx offset 0x%llx",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->len,
+ __entry->src_ino,
+ __entry->src_isize,
+ __entry->src_disize,
+ __entry->src_offset,
+ __entry->dest_ino,
+ __entry->dest_isize,
+ __entry->dest_disize,
+ __entry->dest_offset)
+)
+
+#define DEFINE_DOUBLE_IO_EVENT(name) \
+DEFINE_EVENT(xfs_double_io_class, name, \
+ TP_PROTO(struct xfs_inode *src, xfs_off_t soffset, xfs_off_t len, \
+ struct xfs_inode *dest, xfs_off_t doffset), \
+ TP_ARGS(src, soffset, len, dest, doffset))
+
+/* two-file vfs io tracepoint class */
+DECLARE_EVENT_CLASS(xfs_double_vfs_io_class,
+ TP_PROTO(struct inode *src, u64 soffset, u64 len,
+ struct inode *dest, u64 doffset),
+ TP_ARGS(src, soffset, len, dest, doffset),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(unsigned long, src_ino)
+ __field(loff_t, src_isize)
+ __field(loff_t, src_offset)
+ __field(size_t, len)
+ __field(unsigned long, dest_ino)
+ __field(loff_t, dest_isize)
+ __field(loff_t, dest_offset)
+ ),
+ TP_fast_assign(
+ __entry->dev = src->i_sb->s_dev;
+ __entry->src_ino = src->i_ino;
+ __entry->src_isize = i_size_read(src);
+ __entry->src_offset = soffset;
+ __entry->len = len;
+ __entry->dest_ino = dest->i_ino;
+ __entry->dest_isize = i_size_read(dest);
+ __entry->dest_offset = doffset;
+ ),
+ TP_printk("dev %d:%d count %zd "
+ "ino 0x%lx isize 0x%llx offset 0x%llx -> "
+ "ino 0x%lx isize 0x%llx offset 0x%llx",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->len,
+ __entry->src_ino,
+ __entry->src_isize,
+ __entry->src_offset,
+ __entry->dest_ino,
+ __entry->dest_isize,
+ __entry->dest_offset)
+)
+
+#define DEFINE_DOUBLE_VFS_IO_EVENT(name) \
+DEFINE_EVENT(xfs_double_vfs_io_class, name, \
+ TP_PROTO(struct inode *src, u64 soffset, u64 len, \
+ struct inode *dest, u64 doffset), \
+ TP_ARGS(src, soffset, len, dest, doffset))
+
+/* CoW write tracepoint */
+DECLARE_EVENT_CLASS(xfs_copy_on_write_class,
+ TP_PROTO(struct xfs_inode *ip, xfs_fileoff_t lblk, xfs_fsblock_t pblk,
+ xfs_extlen_t len, xfs_fsblock_t new_pblk),
+ TP_ARGS(ip, lblk, pblk, len, new_pblk),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(xfs_fileoff_t, lblk)
+ __field(xfs_fsblock_t, pblk)
+ __field(xfs_extlen_t, len)
+ __field(xfs_fsblock_t, new_pblk)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->ino = ip->i_ino;
+ __entry->lblk = lblk;
+ __entry->pblk = pblk;
+ __entry->len = len;
+ __entry->new_pblk = new_pblk;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx lblk 0x%llx pblk 0x%llx "
+ "len 0x%x new_pblk %llu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->lblk,
+ __entry->pblk,
+ __entry->len,
+ __entry->new_pblk)
+)
+
+#define DEFINE_COW_EVENT(name) \
+DEFINE_EVENT(xfs_copy_on_write_class, name, \
+ TP_PROTO(struct xfs_inode *ip, xfs_fileoff_t lblk, xfs_fsblock_t pblk, \
+ xfs_extlen_t len, xfs_fsblock_t new_pblk), \
+ TP_ARGS(ip, lblk, pblk, len, new_pblk))
+
+/* single-rlext tracepoint class */
+DECLARE_EVENT_CLASS(xfs_refcount_extent_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ struct xfs_refcount_irec *irec),
+ TP_ARGS(mp, agno, irec),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, startblock)
+ __field(xfs_extlen_t, blockcount)
+ __field(xfs_nlink_t, refcount)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->startblock = irec->rc_startblock;
+ __entry->blockcount = irec->rc_blockcount;
+ __entry->refcount = irec->rc_refcount;
+ ),
+ TP_printk("dev %d:%d agno %u agbno %u len %u refcount %u\n",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->startblock,
+ __entry->blockcount,
+ __entry->refcount)
+)
+
+#define DEFINE_REFCOUNT_EXTENT_EVENT(name) \
+DEFINE_EVENT(xfs_refcount_extent_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ struct xfs_refcount_irec *irec), \
+ TP_ARGS(mp, agno, irec))
+
+/* single-rlext and an agbno tracepoint class */
+DECLARE_EVENT_CLASS(xfs_refcount_extent_at_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ struct xfs_refcount_irec *irec, xfs_agblock_t agbno),
+ TP_ARGS(mp, agno, irec, agbno),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, startblock)
+ __field(xfs_extlen_t, blockcount)
+ __field(xfs_nlink_t, refcount)
+ __field(xfs_agblock_t, agbno)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->startblock = irec->rc_startblock;
+ __entry->blockcount = irec->rc_blockcount;
+ __entry->refcount = irec->rc_refcount;
+ __entry->agbno = agbno;
+ ),
+ TP_printk("dev %d:%d agno %u agbno %u len %u refcount %u @ agbno %u\n",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->startblock,
+ __entry->blockcount,
+ __entry->refcount,
+ __entry->agbno)
+)
+
+#define DEFINE_REFCOUNT_EXTENT_AT_EVENT(name) \
+DEFINE_EVENT(xfs_refcount_extent_at_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ struct xfs_refcount_irec *irec, xfs_agblock_t agbno), \
+ TP_ARGS(mp, agno, irec, agbno))
+
+/* double-rlext tracepoint class */
+DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2),
+ TP_ARGS(mp, agno, i1, i2),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, i1_startblock)
+ __field(xfs_extlen_t, i1_blockcount)
+ __field(xfs_nlink_t, i1_refcount)
+ __field(xfs_agblock_t, i2_startblock)
+ __field(xfs_extlen_t, i2_blockcount)
+ __field(xfs_nlink_t, i2_refcount)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->i1_startblock = i1->rc_startblock;
+ __entry->i1_blockcount = i1->rc_blockcount;
+ __entry->i1_refcount = i1->rc_refcount;
+ __entry->i2_startblock = i2->rc_startblock;
+ __entry->i2_blockcount = i2->rc_blockcount;
+ __entry->i2_refcount = i2->rc_refcount;
+ ),
+ TP_printk("dev %d:%d agno %u agbno %u len %u refcount %u -- "
+ "agbno %u len %u refcount %u\n",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->i1_startblock,
+ __entry->i1_blockcount,
+ __entry->i1_refcount,
+ __entry->i2_startblock,
+ __entry->i2_blockcount,
+ __entry->i2_refcount)
+)
+
+#define DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(name) \
+DEFINE_EVENT(xfs_refcount_double_extent_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2), \
+ TP_ARGS(mp, agno, i1, i2))
+
+/* double-rlext and an agbno tracepoint class */
+DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2,
+ xfs_agblock_t agbno),
+ TP_ARGS(mp, agno, i1, i2, agbno),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, i1_startblock)
+ __field(xfs_extlen_t, i1_blockcount)
+ __field(xfs_nlink_t, i1_refcount)
+ __field(xfs_agblock_t, i2_startblock)
+ __field(xfs_extlen_t, i2_blockcount)
+ __field(xfs_nlink_t, i2_refcount)
+ __field(xfs_agblock_t, agbno)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->i1_startblock = i1->rc_startblock;
+ __entry->i1_blockcount = i1->rc_blockcount;
+ __entry->i1_refcount = i1->rc_refcount;
+ __entry->i2_startblock = i2->rc_startblock;
+ __entry->i2_blockcount = i2->rc_blockcount;
+ __entry->i2_refcount = i2->rc_refcount;
+ __entry->agbno = agbno;
+ ),
+ TP_printk("dev %d:%d agno %u agbno %u len %u refcount %u -- "
+ "agbno %u len %u refcount %u @ agbno %u\n",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->i1_startblock,
+ __entry->i1_blockcount,
+ __entry->i1_refcount,
+ __entry->i2_startblock,
+ __entry->i2_blockcount,
+ __entry->i2_refcount,
+ __entry->agbno)
+)
+
+#define DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(name) \
+DEFINE_EVENT(xfs_refcount_double_extent_at_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2, \
+ xfs_agblock_t agbno), \
+ TP_ARGS(mp, agno, i1, i2, agbno))
+
+/* triple-rlext tracepoint class */
+DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+ struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2,
+ struct xfs_refcount_irec *i3),
+ TP_ARGS(mp, agno, i1, i2, i3),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(xfs_agblock_t, i1_startblock)
+ __field(xfs_extlen_t, i1_blockcount)
+ __field(xfs_nlink_t, i1_refcount)
+ __field(xfs_agblock_t, i2_startblock)
+ __field(xfs_extlen_t, i2_blockcount)
+ __field(xfs_nlink_t, i2_refcount)
+ __field(xfs_agblock_t, i3_startblock)
+ __field(xfs_extlen_t, i3_blockcount)
+ __field(xfs_nlink_t, i3_refcount)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->i1_startblock = i1->rc_startblock;
+ __entry->i1_blockcount = i1->rc_blockcount;
+ __entry->i1_refcount = i1->rc_refcount;
+ __entry->i2_startblock = i2->rc_startblock;
+ __entry->i2_blockcount = i2->rc_blockcount;
+ __entry->i2_refcount = i2->rc_refcount;
+ __entry->i3_startblock = i3->rc_startblock;
+ __entry->i3_blockcount = i3->rc_blockcount;
+ __entry->i3_refcount = i3->rc_refcount;
+ ),
+ TP_printk("dev %d:%d agno %u agbno %u len %u refcount %u -- "
+ "agbno %u len %u refcount %u -- "
+ "agbno %u len %u refcount %u\n",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->i1_startblock,
+ __entry->i1_blockcount,
+ __entry->i1_refcount,
+ __entry->i2_startblock,
+ __entry->i2_blockcount,
+ __entry->i2_refcount,
+ __entry->i3_startblock,
+ __entry->i3_blockcount,
+ __entry->i3_refcount)
+);
+
+#define DEFINE_REFCOUNT_TRIPLE_EXTENT_EVENT(name) \
+DEFINE_EVENT(xfs_refcount_triple_extent_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2, \
+ struct xfs_refcount_irec *i3), \
+ TP_ARGS(mp, agno, i1, i2, i3))
+
+/* simple AG-based error/%ip tracepoint class */
+DECLARE_EVENT_CLASS(xfs_ag_error_class,
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int error,
+ unsigned long caller_ip),
+ TP_ARGS(mp, agno, error, caller_ip),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_agnumber_t, agno)
+ __field(int, error)
+ __field(unsigned long, caller_ip)
+ ),
+ TP_fast_assign(
+ __entry->dev = mp->m_super->s_dev;
+ __entry->agno = agno;
+ __entry->error = error;
+ __entry->caller_ip = caller_ip;
+ ),
+ TP_printk("dev %d:%d agno %u error %d caller %ps",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->agno,
+ __entry->error,
+ (char *)__entry->caller_ip)
+);
+
+#define DEFINE_AG_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_ag_error_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int error, \
+ unsigned long caller_ip), \
+ TP_ARGS(mp, agno, error, caller_ip))
+
+/* simple inode-based error/%ip tracepoint class */
+DECLARE_EVENT_CLASS(xfs_inode_error_class,
+ TP_PROTO(struct xfs_inode *ip, int error, unsigned long caller_ip),
+ TP_ARGS(ip, error, caller_ip),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(int, error)
+ __field(unsigned long, caller_ip)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->ino = ip->i_ino;
+ __entry->error = error;
+ __entry->caller_ip = caller_ip;
+ ),
+ TP_printk("dev %d:%d ino %llx error %d caller %ps",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->error,
+ (char *)__entry->caller_ip)
+);
+
+#define DEFINE_INODE_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_inode_error_class, name, \
+ TP_PROTO(struct xfs_inode *ip, int error, \
+ unsigned long caller_ip), \
+ TP_ARGS(ip, error, caller_ip))
+
+/* refcount/reflink tracepoint definitions */
+
+/* reflink allocator */
+TRACE_EVENT(xfs_reflink_relink_blocks,
+ TP_PROTO(struct xfs_inode *ip, xfs_fsblock_t fsbno,
+ xfs_extlen_t len),
+ TP_ARGS(ip, fsbno, len),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(xfs_fsblock_t, fsbno)
+ __field(xfs_extlen_t, len)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->ino = ip->i_ino;
+ __entry->fsbno = fsbno;
+ __entry->len = len;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx fsbno 0x%llx len %x",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->fsbno,
+ __entry->len)
+);
+
+/* refcount btree tracepoints */
+DEFINE_AG_BTREE_LOOKUP_EVENT(xfs_refcountbt_lookup);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcountbt_get);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcountbt_update);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcountbt_insert);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcountbt_delete);
+
+/* refcount adjustment tracepoints */
+DEFINE_AG_EXTENT_EVENT(xfs_refcount_increase);
+DEFINE_AG_EXTENT_EVENT(xfs_refcount_decrease);
+DEFINE_REFCOUNT_TRIPLE_EXTENT_EVENT(xfs_refcount_merge_center_extents);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_modify_extent);
+DEFINE_REFCOUNT_EXTENT_AT_EVENT(xfs_refcount_split_left_extent);
+DEFINE_REFCOUNT_EXTENT_AT_EVENT(xfs_refcount_split_right_extent);
+DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(xfs_refcount_merge_left_extent);
+DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(xfs_refcount_merge_right_extent);
+DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(xfs_refcount_find_left_extent);
+DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(xfs_refcount_find_right_extent);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_adjust_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_merge_center_extents_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_modify_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_split_left_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_split_right_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_merge_left_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_merge_right_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_find_left_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_find_right_extent_error);
+DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(xfs_refcount_rec_order_error);
+
+/* reflink tracepoints */
+DEFINE_INODE_EVENT(xfs_reflink_set_inode_flag);
+DEFINE_INODE_EVENT(xfs_reflink_unset_inode_flag);
+DEFINE_ITRUNC_EVENT(xfs_reflink_update_inode_size);
+DEFINE_IOMAP_EVENT(xfs_reflink_read_iomap);
+TRACE_EVENT(xfs_reflink_main_loop,
+ TP_PROTO(struct xfs_inode *src, xfs_fileoff_t soffset,
+ xfs_filblks_t len, struct xfs_inode *dest,
+ xfs_fileoff_t doffset),
+ TP_ARGS(src, soffset, len, dest, doffset),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, src_ino)
+ __field(xfs_fileoff_t, src_lblk)
+ __field(xfs_filblks_t, len)
+ __field(xfs_ino_t, dest_ino)
+ __field(xfs_fileoff_t, dest_lblk)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(src)->i_sb->s_dev;
+ __entry->src_ino = src->i_ino;
+ __entry->src_lblk = soffset;
+ __entry->len = len;
+ __entry->dest_ino = dest->i_ino;
+ __entry->dest_lblk = doffset;
+ ),
+ TP_printk("dev %d:%d len 0x%llx "
+ "ino 0x%llx offset 0x%llx blocks -> "
+ "ino 0x%llx offset 0x%llx blocks",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->len,
+ __entry->src_ino,
+ __entry->src_lblk,
+ __entry->dest_ino,
+ __entry->dest_lblk)
+);
+TRACE_EVENT(xfs_reflink_punch_range,
+ TP_PROTO(struct xfs_inode *ip, xfs_fileoff_t lblk,
+ xfs_extlen_t len),
+ TP_ARGS(ip, lblk, len),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(xfs_fileoff_t, lblk)
+ __field(xfs_extlen_t, len)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->ino = ip->i_ino;
+ __entry->lblk = lblk;
+ __entry->len = len;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx lblk 0x%llx len 0x%x",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->lblk,
+ __entry->len)
+);
+TRACE_EVENT(xfs_reflink_remap_range,
+ TP_PROTO(struct xfs_inode *ip, xfs_fileoff_t lblk,
+ xfs_extlen_t len, xfs_fsblock_t new_pblk),
+ TP_ARGS(ip, lblk, len, new_pblk),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(xfs_fileoff_t, lblk)
+ __field(xfs_extlen_t, len)
+ __field(xfs_fsblock_t, new_pblk)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->ino = ip->i_ino;
+ __entry->lblk = lblk;
+ __entry->len = len;
+ __entry->new_pblk = new_pblk;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx lblk 0x%llx len 0x%x new_pblk %llu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->lblk,
+ __entry->len,
+ __entry->new_pblk)
+);
+DEFINE_DOUBLE_IO_EVENT(xfs_reflink_range);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_range_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_set_inode_flag_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_update_inode_size_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_reflink_main_loop_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_read_iomap_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_punch_range_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_remap_range_error);
+
+/* dedupe tracepoints */
+DEFINE_DOUBLE_IO_EVENT(xfs_reflink_compare_extents);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_compare_extents_error);
+
+/* ioctl tracepoints */
+DEFINE_DOUBLE_VFS_IO_EVENT(xfs_ioctl_reflink);
+DEFINE_DOUBLE_VFS_IO_EVENT(xfs_ioctl_clone_range);
+DEFINE_DOUBLE_VFS_IO_EVENT(xfs_ioctl_file_extent_same);
+TRACE_EVENT(xfs_ioctl_clone,
+ TP_PROTO(struct inode *src, struct inode *dest),
+ TP_ARGS(src, dest),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(unsigned long, src_ino)
+ __field(loff_t, src_isize)
+ __field(unsigned long, dest_ino)
+ __field(loff_t, dest_isize)
+ ),
+ TP_fast_assign(
+ __entry->dev = src->i_sb->s_dev;
+ __entry->src_ino = src->i_ino;
+ __entry->src_isize = i_size_read(src);
+ __entry->dest_ino = dest->i_ino;
+ __entry->dest_isize = i_size_read(dest);
+ ),
+ TP_printk("dev %d:%d "
+ "ino 0x%lx isize 0x%llx -> "
+ "ino 0x%lx isize 0x%llx\n",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->src_ino,
+ __entry->src_isize,
+ __entry->dest_ino,
+ __entry->dest_isize)
+);
+
+/* unshare tracepoints */
+DEFINE_INODE_EVENT(xfs_reflink_unshare);
+DEFINE_PAGE_EVENT(xfs_reflink_unshare_page);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_unshare_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_dirty_page_error);
+
+/* copy on write events */
+TRACE_EVENT(xfs_reflink_bounce_direct_write,
+ TP_PROTO(struct xfs_inode *ip, struct xfs_bmbt_irec *irec),
+ TP_ARGS(ip, irec),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(xfs_ino_t, ino)
+ __field(xfs_fileoff_t, lblk)
+ __field(xfs_extlen_t, len)
+ __field(xfs_fsblock_t, pblk)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->ino = ip->i_ino;
+ __entry->lblk = irec->br_startoff;
+ __entry->len = irec->br_blockcount;
+ __entry->pblk = irec->br_startblock;
+ ),
+ TP_printk("dev %d:%d ino 0x%llx lblk 0x%llx len 0x%x pblk %llu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->ino,
+ __entry->lblk,
+ __entry->len,
+ __entry->pblk)
+);
+DEFINE_COW_EVENT(xfs_reflink_reserve_fork_block);
+DEFINE_COW_EVENT(xfs_reflink_write_fork_block);
+DEFINE_COW_EVENT(xfs_reflink_remap_after_io);
+DEFINE_COW_EVENT(xfs_reflink_free_forked);
+DEFINE_COW_EVENT(xfs_reflink_fork_buf);
+DEFINE_COW_EVENT(xfs_reflink_finish_fork_buf);
+DEFINE_RW_EVENT(xfs_reflink_force_getblocks);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_reserve_fork_block_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_remap_after_io_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_free_forked_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_fork_buf_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_finish_fork_buf_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_write_fork_block_error);
+
#endif /* _TRACE_XFS_H */
#undef TRACE_INCLUDE_PATH
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 38/58] xfs: add refcount btree support to growfs
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (36 preceding siblings ...)
2015-10-07 4:59 ` [PATCH 37/58] xfs: define tracepoints for refcount/reflink activities Darrick J. Wong
@ 2015-10-07 4:59 ` Darrick J. Wong
2015-10-07 4:59 ` [PATCH 39/58] xfs: add refcount btree operations Darrick J. Wong
` (19 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:59 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Modify the growfs code to initialize new refcount btree blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_fsops.c | 38 ++++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 64bb02b..7c37c22 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -260,6 +260,11 @@ xfs_growfs_data_private(
agf->agf_longest = cpu_to_be32(tmpsize);
if (xfs_sb_version_hascrc(&mp->m_sb))
uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
+ if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+ agf->agf_refcount_root = cpu_to_be32(
+ xfs_refc_block(mp));
+ agf->agf_refcount_level = cpu_to_be32(1);
+ }
error = xfs_bwrite(bp);
xfs_buf_relse(bp);
@@ -451,6 +456,17 @@ xfs_growfs_data_private(
rrec->rm_offset = 0;
be16_add_cpu(&block->bb_numrecs, 1);
+ /* account for refc btree root */
+ if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+ rrec = XFS_RMAP_REC_ADDR(block, 5);
+ rrec->rm_startblock = cpu_to_be32(
+ xfs_refc_block(mp));
+ rrec->rm_blockcount = cpu_to_be32(1);
+ rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
+ rrec->rm_offset = 0;
+ be16_add_cpu(&block->bb_numrecs, 1);
+ }
+
error = xfs_bwrite(bp);
xfs_buf_relse(bp);
if (error)
@@ -508,6 +524,28 @@ xfs_growfs_data_private(
goto error0;
}
+ /*
+ * refcount btree root block
+ */
+ if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+ bp = xfs_growfs_get_hdr_buf(mp,
+ XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
+ BTOBB(mp->m_sb.sb_blocksize), 0,
+ &xfs_refcountbt_buf_ops);
+ if (!bp) {
+ error = -ENOMEM;
+ goto error0;
+ }
+
+ xfs_btree_init_block(mp, bp, XFS_REFC_CRC_MAGIC,
+ 0, 0, agno,
+ XFS_BTREE_CRC_BLOCKS);
+
+ error = xfs_bwrite(bp);
+ xfs_buf_relse(bp);
+ if (error)
+ goto error0;
+ }
}
xfs_trans_agblocks_delta(tp, nfree);
/*
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 39/58] xfs: add refcount btree operations
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (37 preceding siblings ...)
2015-10-07 4:59 ` [PATCH 38/58] xfs: add refcount btree support to growfs Darrick J. Wong
@ 2015-10-07 4:59 ` Darrick J. Wong
2015-10-07 4:59 ` [PATCH 40/58] libxfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
` (18 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:59 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Implement the generic btree operations required to manipulate refcount
btree blocks. The implementation is similar to the bmapbt, though it
will only allocate and free blocks from the AG.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/Makefile | 1
fs/xfs/libxfs/xfs_refcount.c | 169 ++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_refcount.h | 29 +++++
fs/xfs/libxfs/xfs_refcount_btree.c | 203 ++++++++++++++++++++++++++++++++++++
4 files changed, 402 insertions(+)
create mode 100644 fs/xfs/libxfs/xfs_refcount.c
create mode 100644 fs/xfs/libxfs/xfs_refcount.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index c394ec2..3565db6 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -53,6 +53,7 @@ xfs-y += $(addprefix libxfs/, \
xfs_log_rlimit.o \
xfs_rmap.o \
xfs_rmap_btree.o \
+ xfs_refcount.o \
xfs_refcount_btree.o \
xfs_sb.o \
xfs_symlink_remote.o \
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
new file mode 100644
index 0000000..b3f2c25
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -0,0 +1,169 @@
+/*
+ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_trans.h"
+#include "xfs_bit.h"
+#include "xfs_refcount.h"
+
+/**
+ * xfs_refcountbt_lookup_le() -- Look up the first record less than or equal to
+ * [bno, len] in the btree given by cur.
+ * @cur: refcount btree cursor
+ * @bno: AG block number to look up
+ * @stat: set to 1 if successful, 0 otherwise
+ */
+int
+xfs_refcountbt_lookup_le(
+ struct xfs_btree_cur *cur,
+ xfs_agblock_t bno,
+ int *stat)
+{
+ trace_xfs_refcountbt_lookup(cur->bc_mp, cur->bc_private.a.agno, bno,
+ XFS_LOOKUP_LE);
+ cur->bc_rec.rc.rc_startblock = bno;
+ cur->bc_rec.rc.rc_blockcount = 0;
+ return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+}
+
+/**
+ * xfs_refcountbt_lookup_ge() -- Look up the first record greater than or equal
+ * to [bno, len] in the btree given by cur.
+ * @cur: refcount btree cursor
+ * @bno: AG block number to look up
+ * @stat: set to 1 if successful, 0 otherwise
+ */
+int /* error */
+xfs_refcountbt_lookup_ge(
+ struct xfs_btree_cur *cur, /* btree cursor */
+ xfs_agblock_t bno, /* starting block of extent */
+ int *stat) /* success/failure */
+{
+ trace_xfs_refcountbt_lookup(cur->bc_mp, cur->bc_private.a.agno, bno,
+ XFS_LOOKUP_GE);
+ cur->bc_rec.rc.rc_startblock = bno;
+ cur->bc_rec.rc.rc_blockcount = 0;
+ return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
+}
+
+/**
+ * xfs_refcountbt_get_rec() -- Get the data from the pointed-to record.
+ *
+ * @cur: refcount btree cursor
+ * @irec: set to the record currently pointed to by the btree cursor
+ * @stat: set to 1 if successful, 0 otherwise
+ */
+int
+xfs_refcountbt_get_rec(
+ struct xfs_btree_cur *cur,
+ struct xfs_refcount_irec *irec,
+ int *stat)
+{
+ union xfs_btree_rec *rec;
+ int error;
+
+ error = xfs_btree_get_rec(cur, &rec, stat);
+ if (!error && *stat == 1) {
+ irec->rc_startblock = be32_to_cpu(rec->refc.rc_startblock);
+ irec->rc_blockcount = be32_to_cpu(rec->refc.rc_blockcount);
+ irec->rc_refcount = be32_to_cpu(rec->refc.rc_refcount);
+ trace_xfs_refcountbt_get(cur->bc_mp, cur->bc_private.a.agno,
+ irec);
+ }
+ return error;
+}
+
+/*
+ * Update the record referred to by cur to the value given
+ * by [bno, len, refcount].
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_refcountbt_update(
+ struct xfs_btree_cur *cur,
+ struct xfs_refcount_irec *irec)
+{
+ union xfs_btree_rec rec;
+
+ trace_xfs_refcountbt_update(cur->bc_mp, cur->bc_private.a.agno, irec);
+ rec.refc.rc_startblock = cpu_to_be32(irec->rc_startblock);
+ rec.refc.rc_blockcount = cpu_to_be32(irec->rc_blockcount);
+ rec.refc.rc_refcount = cpu_to_be32(irec->rc_refcount);
+ return xfs_btree_update(cur, &rec);
+}
+
+/*
+ * Insert the record referred to by cur to the value given
+ * by [bno, len, refcount].
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_refcountbt_insert(
+ struct xfs_btree_cur *cur,
+ struct xfs_refcount_irec *irec,
+ int *i)
+{
+ trace_xfs_refcountbt_insert(cur->bc_mp, cur->bc_private.a.agno, irec);
+ cur->bc_rec.rc.rc_startblock = irec->rc_startblock;
+ cur->bc_rec.rc.rc_blockcount = irec->rc_blockcount;
+ cur->bc_rec.rc.rc_refcount = irec->rc_refcount;
+ return xfs_btree_insert(cur, i);
+}
+
+/*
+ * Remove the record referred to by cur, then set the pointer to the spot
+ * where the record could be re-inserted, in case we want to increment or
+ * decrement the cursor.
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_refcountbt_delete(
+ struct xfs_btree_cur *cur,
+ int *i)
+{
+ struct xfs_refcount_irec irec;
+ int found_rec;
+ int error;
+
+ error = xfs_refcountbt_get_rec(cur, &irec, &found_rec);
+ if (error)
+ return error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+ trace_xfs_refcountbt_delete(cur->bc_mp, cur->bc_private.a.agno, &irec);
+ error = xfs_btree_delete(cur, i);
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, *i == 1, out_error);
+ if (error)
+ return error;
+ error = xfs_refcountbt_lookup_ge(cur, irec.rc_startblock, &found_rec);
+out_error:
+ return error;
+}
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
new file mode 100644
index 0000000..033a9b1
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2000,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_REFCOUNT_H__
+#define __XFS_REFCOUNT_H__
+
+extern int xfs_refcountbt_lookup_le(struct xfs_btree_cur *cur,
+ xfs_agblock_t bno, int *stat);
+extern int xfs_refcountbt_lookup_ge(struct xfs_btree_cur *cur,
+ xfs_agblock_t bno, int *stat);
+extern int xfs_refcountbt_get_rec(struct xfs_btree_cur *cur,
+ struct xfs_refcount_irec *irec, int *stat);
+
+#endif /* __XFS_REFCOUNT_H__ */
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index 7067a05..d7574cd 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -43,6 +43,158 @@ xfs_refcountbt_dup_cursor(
cur->bc_private.a.flist);
}
+STATIC void
+xfs_refcountbt_set_root(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_ptr *ptr,
+ int inc)
+{
+ struct xfs_buf *agbp = cur->bc_private.a.agbp;
+ struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp);
+ xfs_agnumber_t seqno = be32_to_cpu(agf->agf_seqno);
+ struct xfs_perag *pag = xfs_perag_get(cur->bc_mp, seqno);
+
+ ASSERT(ptr->s != 0);
+
+ agf->agf_refcount_root = ptr->s;
+ be32_add_cpu(&agf->agf_refcount_level, inc);
+ pag->pagf_refcount_level += inc;
+ xfs_perag_put(pag);
+
+ xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
+}
+
+STATIC int
+xfs_refcountbt_alloc_block(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_ptr *start,
+ union xfs_btree_ptr *new,
+ int *stat)
+{
+ struct xfs_alloc_arg args; /* block allocation args */
+ int error; /* error return value */
+
+ memset(&args, 0, sizeof(args));
+ args.tp = cur->bc_tp;
+ args.mp = cur->bc_mp;
+ args.type = XFS_ALLOCTYPE_NEAR_BNO;
+ args.fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+ xfs_refc_block(args.mp));
+ args.firstblock = args.fsbno;
+ XFS_RMAP_AG_OWNER(&args.oinfo, XFS_RMAP_OWN_REFC);
+ args.minlen = args.maxlen = args.prod = 1;
+
+ error = xfs_alloc_vextent(&args);
+ if (error)
+ goto out_error;
+ if (args.fsbno == NULLFSBLOCK) {
+ XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+ *stat = 0;
+ return 0;
+ }
+ ASSERT(args.agno == cur->bc_private.a.agno);
+ ASSERT(args.len == 1);
+
+ new->s = cpu_to_be32(args.agbno);
+
+ XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+ *stat = 1;
+ return 0;
+
+out_error:
+ XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR);
+ return error;
+}
+
+STATIC int
+xfs_refcountbt_free_block(
+ struct xfs_btree_cur *cur,
+ struct xfs_buf *bp)
+{
+ struct xfs_mount *mp = cur->bc_mp;
+ struct xfs_trans *tp = cur->bc_tp;
+ xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
+ struct xfs_owner_info oinfo;
+
+ XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_REFC);
+ xfs_bmap_add_free(mp, cur->bc_private.a.flist, fsbno, 1,
+ &oinfo);
+ xfs_trans_binval(tp, bp);
+ return 0;
+}
+
+STATIC int
+xfs_refcountbt_get_minrecs(
+ struct xfs_btree_cur *cur,
+ int level)
+{
+ return cur->bc_mp->m_refc_mnr[level != 0];
+}
+
+STATIC int
+xfs_refcountbt_get_maxrecs(
+ struct xfs_btree_cur *cur,
+ int level)
+{
+ return cur->bc_mp->m_refc_mxr[level != 0];
+}
+
+STATIC void
+xfs_refcountbt_init_key_from_rec(
+ union xfs_btree_key *key,
+ union xfs_btree_rec *rec)
+{
+ ASSERT(rec->refc.rc_startblock != 0);
+
+ key->refc.rc_startblock = rec->refc.rc_startblock;
+}
+
+STATIC void
+xfs_refcountbt_init_rec_from_key(
+ union xfs_btree_key *key,
+ union xfs_btree_rec *rec)
+{
+ ASSERT(key->refc.rc_startblock != 0);
+
+ rec->refc.rc_startblock = key->refc.rc_startblock;
+}
+
+STATIC void
+xfs_refcountbt_init_rec_from_cur(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_rec *rec)
+{
+ ASSERT(cur->bc_rec.rc.rc_startblock != 0);
+
+ rec->refc.rc_startblock = cpu_to_be32(cur->bc_rec.rc.rc_startblock);
+ rec->refc.rc_blockcount = cpu_to_be32(cur->bc_rec.rc.rc_blockcount);
+ rec->refc.rc_refcount = cpu_to_be32(cur->bc_rec.rc.rc_refcount);
+}
+
+STATIC void
+xfs_refcountbt_init_ptr_from_cur(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_ptr *ptr)
+{
+ struct xfs_agf *agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp);
+
+ ASSERT(cur->bc_private.a.agno == be32_to_cpu(agf->agf_seqno));
+ ASSERT(agf->agf_refcount_root != 0);
+
+ ptr->s = agf->agf_refcount_root;
+}
+
+STATIC __int64_t
+xfs_refcountbt_key_diff(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_key *key)
+{
+ struct xfs_refcount_irec *rec = &cur->bc_rec.rc;
+ struct xfs_refcount_key *kp = &key->refc;
+
+ return (__int64_t)be32_to_cpu(kp->rc_startblock) - rec->rc_startblock;
+}
+
STATIC bool
xfs_refcountbt_verify(
struct xfs_buf *bp)
@@ -104,12 +256,63 @@ const struct xfs_buf_ops xfs_refcountbt_buf_ops = {
.verify_write = xfs_refcountbt_write_verify,
};
+#if defined(DEBUG) || defined(XFS_WARN)
+STATIC int
+xfs_refcountbt_keys_inorder(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_key *k1,
+ union xfs_btree_key *k2)
+{
+ return be32_to_cpu(k1->refc.rc_startblock) <
+ be32_to_cpu(k2->refc.rc_startblock);
+}
+
+STATIC int
+xfs_refcountbt_recs_inorder(
+ struct xfs_btree_cur *cur,
+ union xfs_btree_rec *r1,
+ union xfs_btree_rec *r2)
+{
+ struct xfs_refcount_irec a, b;
+
+ int ret = be32_to_cpu(r1->refc.rc_startblock) +
+ be32_to_cpu(r1->refc.rc_blockcount) <=
+ be32_to_cpu(r2->refc.rc_startblock);
+ if (!ret) {
+ a.rc_startblock = be32_to_cpu(r1->refc.rc_startblock);
+ a.rc_blockcount = be32_to_cpu(r1->refc.rc_blockcount);
+ a.rc_refcount = be32_to_cpu(r1->refc.rc_refcount);
+ b.rc_startblock = be32_to_cpu(r2->refc.rc_startblock);
+ b.rc_blockcount = be32_to_cpu(r2->refc.rc_blockcount);
+ b.rc_refcount = be32_to_cpu(r2->refc.rc_refcount);
+ trace_xfs_refcount_rec_order_error(cur->bc_mp,
+ cur->bc_private.a.agno, &a, &b);
+ }
+
+ return ret;
+}
+#endif /* DEBUG */
+
static const struct xfs_btree_ops xfs_refcountbt_ops = {
.rec_len = sizeof(struct xfs_refcount_rec),
.key_len = sizeof(struct xfs_refcount_key),
.dup_cursor = xfs_refcountbt_dup_cursor,
+ .set_root = xfs_refcountbt_set_root,
+ .alloc_block = xfs_refcountbt_alloc_block,
+ .free_block = xfs_refcountbt_free_block,
+ .get_minrecs = xfs_refcountbt_get_minrecs,
+ .get_maxrecs = xfs_refcountbt_get_maxrecs,
+ .init_key_from_rec = xfs_refcountbt_init_key_from_rec,
+ .init_rec_from_key = xfs_refcountbt_init_rec_from_key,
+ .init_rec_from_cur = xfs_refcountbt_init_rec_from_cur,
+ .init_ptr_from_cur = xfs_refcountbt_init_ptr_from_cur,
+ .key_diff = xfs_refcountbt_key_diff,
.buf_ops = &xfs_refcountbt_buf_ops,
+#if defined(DEBUG) || defined(XFS_WARN)
+ .keys_inorder = xfs_refcountbt_keys_inorder,
+ .recs_inorder = xfs_refcountbt_recs_inorder,
+#endif
};
/**
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 40/58] libxfs: adjust refcount of an extent of blocks in refcount btree
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (38 preceding siblings ...)
2015-10-07 4:59 ` [PATCH 39/58] xfs: add refcount btree operations Darrick J. Wong
@ 2015-10-07 4:59 ` Darrick J. Wong
2015-10-27 19:05 ` Darrick J. Wong
2015-10-07 4:59 ` [PATCH 41/58] libxfs: adjust refcount when unmapping file blocks Darrick J. Wong
` (17 subsequent siblings)
57 siblings, 1 reply; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:59 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Provide functions to adjust the reference counts for an extent of
physical blocks stored in the refcount btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_refcount.c | 771 ++++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_refcount.h | 8
2 files changed, 779 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index b3f2c25..02892bc 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -167,3 +167,774 @@ xfs_refcountbt_delete(
out_error:
return error;
}
+
+/*
+ * Adjusting the Reference Count
+ *
+ * As stated elsewhere, the reference count btree (refcbt) stores
+ * >1 reference counts for extents of physical blocks. In this
+ * operation, we're either raising or lowering the reference count of
+ * some subrange stored in the tree:
+ *
+ * <------ adjustment range ------>
+ * ----+ +---+-----+ +--+--------+---------
+ * 2 | | 3 | 4 | |17| 55 | 10
+ * ----+ +---+-----+ +--+--------+---------
+ * X axis is physical blocks number;
+ * reference counts are the numbers inside the rectangles
+ *
+ * The first thing we need to do is to ensure that there are no
+ * refcount extents crossing either boundary of the range to be
+ * adjusted. For any extent that does cross a boundary, split it into
+ * two extents so that we can increment the refcount of one of the
+ * pieces later:
+ *
+ * <------ adjustment range ------>
+ * ----+ +---+-----+ +--+--------+----+----
+ * 2 | | 3 | 2 | |17| 55 | 10 | 10
+ * ----+ +---+-----+ +--+--------+----+----
+ *
+ * For this next step, let's assume that all the physical blocks in
+ * the adjustment range are mapped to a file and are therefore in use
+ * at least once. Therefore, we can infer that any gap in the
+ * refcount tree within the adjustment range represents a physical
+ * extent with refcount == 1:
+ *
+ * <------ adjustment range ------>
+ * ----+---+---+-----+-+--+--------+----+----
+ * 2 |"1"| 3 | 2 |1|17| 55 | 10 | 10
+ * ----+---+---+-----+-+--+--------+----+----
+ * ^
+ *
+ * For each extent that falls within the interval range, figure out
+ * which extent is to the left or the right of that extent. Now we
+ * have a left, current, and right extent. If the new reference count
+ * of the center extent enables us to merge left, center, and right
+ * into one record covering all three, do so. If the center extent is
+ * at the left end of the range, abuts the left extent, and its new
+ * reference count matches the left extent's record, then merge them.
+ * If the center extent is at the right end of the range, abuts the
+ * right extent, and the reference counts match, merge those. In the
+ * example, we can left merge (assuming an increment operation):
+ *
+ * <------ adjustment range ------>
+ * --------+---+-----+-+--+--------+----+----
+ * 2 | 3 | 2 |1|17| 55 | 10 | 10
+ * --------+---+-----+-+--+--------+----+----
+ * ^
+ *
+ * For all other extents within the range, adjust the reference count
+ * or delete it if the refcount falls below 2. If we were
+ * incrementing, the end result looks like this:
+ *
+ * <------ adjustment range ------>
+ * --------+---+-----+-+--+--------+----+----
+ * 2 | 4 | 3 |2|18| 56 | 11 | 10
+ * --------+---+-----+-+--+--------+----+----
+ *
+ * The result of a decrement operation looks as such:
+ *
+ * <------ adjustment range ------>
+ * ----+ +---+ +--+--------+----+----
+ * 2 | | 2 | |16| 54 | 9 | 10
+ * ----+ +---+ +--+--------+----+----
+ * DDDD 111111DD
+ *
+ * The blocks marked "D" are freed; the blocks marked "1" are only
+ * referenced once and therefore the record is removed from the
+ * refcount btree.
+ */
+
+#define RLNEXT(rl) ((rl).rc_startblock + (rl).rc_blockcount)
+/*
+ * Split a left rlextent that crosses agbno.
+ */
+STATIC int
+try_split_left_rlextent(
+ struct xfs_btree_cur *cur,
+ xfs_agblock_t agbno)
+{
+ struct xfs_refcount_irec left, tmp;
+ int found_rec;
+ int error;
+
+ error = xfs_refcountbt_lookup_le(cur, agbno, &found_rec);
+ if (error)
+ goto out_error;
+ if (!found_rec)
+ return 0;
+
+ error = xfs_refcountbt_get_rec(cur, &left, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+ if (left.rc_startblock >= agbno || RLNEXT(left) <= agbno)
+ return 0;
+
+ trace_xfs_refcount_split_left_extent(cur->bc_mp, cur->bc_private.a.agno,
+ &left, agbno);
+ tmp = left;
+ tmp.rc_blockcount = agbno - left.rc_startblock;
+ error = xfs_refcountbt_update(cur, &tmp);
+ if (error)
+ goto out_error;
+
+ error = xfs_btree_increment(cur, 0, &found_rec);
+ if (error)
+ goto out_error;
+
+ tmp = left;
+ tmp.rc_startblock = agbno;
+ tmp.rc_blockcount -= (agbno - left.rc_startblock);
+ error = xfs_refcountbt_insert(cur, &tmp, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+ return error;
+
+out_error:
+ trace_xfs_refcount_split_left_extent_error(cur->bc_mp,
+ cur->bc_private.a.agno, error, _RET_IP_);
+ return error;
+}
+
+/*
+ * Split a right rlextent that crosses agbno.
+ */
+STATIC int
+try_split_right_rlextent(
+ struct xfs_btree_cur *cur,
+ xfs_agblock_t agbnext)
+{
+ struct xfs_refcount_irec right, tmp;
+ int found_rec;
+ int error;
+
+ error = xfs_refcountbt_lookup_le(cur, agbnext - 1, &found_rec);
+ if (error)
+ goto out_error;
+ if (!found_rec)
+ return 0;
+
+ error = xfs_refcountbt_get_rec(cur, &right, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+ if (RLNEXT(right) <= agbnext)
+ return 0;
+
+ trace_xfs_refcount_split_right_extent(cur->bc_mp,
+ cur->bc_private.a.agno, &right, agbnext);
+ tmp = right;
+ tmp.rc_startblock = agbnext;
+ tmp.rc_blockcount -= (agbnext - right.rc_startblock);
+ error = xfs_refcountbt_update(cur, &tmp);
+ if (error)
+ goto out_error;
+
+ tmp = right;
+ tmp.rc_blockcount = agbnext - right.rc_startblock;
+ error = xfs_refcountbt_insert(cur, &tmp, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+ return error;
+
+out_error:
+ trace_xfs_refcount_split_right_extent_error(cur->bc_mp,
+ cur->bc_private.a.agno, error, _RET_IP_);
+ return error;
+}
+
+/*
+ * Merge the left, center, and right extents.
+ */
+STATIC int
+merge_center(
+ struct xfs_btree_cur *cur,
+ struct xfs_refcount_irec *left,
+ struct xfs_refcount_irec *center,
+ unsigned long long extlen,
+ xfs_agblock_t *agbno,
+ xfs_extlen_t *aglen)
+{
+ int error;
+ int found_rec;
+
+ error = xfs_refcountbt_lookup_ge(cur, center->rc_startblock,
+ &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+ error = xfs_refcountbt_delete(cur, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+ if (center->rc_refcount > 1) {
+ error = xfs_refcountbt_delete(cur, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+ out_error);
+ }
+
+ error = xfs_refcountbt_lookup_le(cur, left->rc_startblock,
+ &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+ left->rc_blockcount = extlen;
+ error = xfs_refcountbt_update(cur, left);
+ if (error)
+ goto out_error;
+
+ *aglen = 0;
+ return error;
+
+out_error:
+ trace_xfs_refcount_merge_center_extents_error(cur->bc_mp,
+ cur->bc_private.a.agno, error, _RET_IP_);
+ return error;
+}
+
+/*
+ * Merge with the left extent.
+ */
+STATIC int
+merge_left(
+ struct xfs_btree_cur *cur,
+ struct xfs_refcount_irec *left,
+ struct xfs_refcount_irec *cleft,
+ xfs_agblock_t *agbno,
+ xfs_extlen_t *aglen)
+{
+ int error;
+ int found_rec;
+
+ if (cleft->rc_refcount > 1) {
+ error = xfs_refcountbt_lookup_le(cur, cleft->rc_startblock,
+ &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+ out_error);
+
+ error = xfs_refcountbt_delete(cur, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+ out_error);
+ }
+
+ error = xfs_refcountbt_lookup_le(cur, left->rc_startblock,
+ &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+ left->rc_blockcount += cleft->rc_blockcount;
+ error = xfs_refcountbt_update(cur, left);
+ if (error)
+ goto out_error;
+
+ *agbno += cleft->rc_blockcount;
+ *aglen -= cleft->rc_blockcount;
+ return error;
+
+out_error:
+ trace_xfs_refcount_merge_left_extent_error(cur->bc_mp,
+ cur->bc_private.a.agno, error, _RET_IP_);
+ return error;
+}
+
+/*
+ * Merge with the right extent.
+ */
+STATIC int
+merge_right(
+ struct xfs_btree_cur *cur,
+ struct xfs_refcount_irec *right,
+ struct xfs_refcount_irec *cright,
+ xfs_agblock_t *agbno,
+ xfs_extlen_t *aglen)
+{
+ int error;
+ int found_rec;
+
+ if (cright->rc_refcount > 1) {
+ error = xfs_refcountbt_lookup_le(cur, cright->rc_startblock,
+ &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+ out_error);
+
+ error = xfs_refcountbt_delete(cur, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+ out_error);
+ }
+
+ error = xfs_refcountbt_lookup_le(cur, right->rc_startblock,
+ &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+ right->rc_startblock -= cright->rc_blockcount;
+ right->rc_blockcount += cright->rc_blockcount;
+ error = xfs_refcountbt_update(cur, right);
+ if (error)
+ goto out_error;
+
+ *aglen -= cright->rc_blockcount;
+ return error;
+
+out_error:
+ trace_xfs_refcount_merge_right_extent_error(cur->bc_mp,
+ cur->bc_private.a.agno, error, _RET_IP_);
+ return error;
+}
+
+/*
+ * Find the left extent and the one after it (cleft). This function assumes
+ * that we've already split any extent crossing agbno.
+ */
+STATIC int
+find_left_extent(
+ struct xfs_btree_cur *cur,
+ struct xfs_refcount_irec *left,
+ struct xfs_refcount_irec *cleft,
+ xfs_agblock_t agbno,
+ xfs_extlen_t aglen)
+{
+ struct xfs_refcount_irec tmp;
+ int error;
+ int found_rec;
+
+ left->rc_blockcount = cleft->rc_blockcount = 0;
+ error = xfs_refcountbt_lookup_le(cur, agbno - 1, &found_rec);
+ if (error)
+ goto out_error;
+ if (!found_rec)
+ return 0;
+
+ error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+ if (RLNEXT(tmp) != agbno)
+ return 0;
+ /* We have a left extent; retrieve (or invent) the next right one */
+ *left = tmp;
+
+ error = xfs_btree_increment(cur, 0, &found_rec);
+ if (error)
+ goto out_error;
+ if (found_rec) {
+ error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+ out_error);
+
+ if (tmp.rc_startblock == agbno)
+ *cleft = tmp;
+ else {
+ cleft->rc_startblock = agbno;
+ cleft->rc_blockcount = min(aglen,
+ tmp.rc_startblock - agbno);
+ cleft->rc_refcount = 1;
+ }
+ } else {
+ cleft->rc_startblock = agbno;
+ cleft->rc_blockcount = aglen;
+ cleft->rc_refcount = 1;
+ }
+ trace_xfs_refcount_find_left_extent(cur->bc_mp, cur->bc_private.a.agno,
+ left, cleft, agbno);
+ return error;
+
+out_error:
+ trace_xfs_refcount_find_left_extent_error(cur->bc_mp,
+ cur->bc_private.a.agno, error, _RET_IP_);
+ return error;
+}
+
+/*
+ * Find the right extent and the one before it (cright). This function
+ * assumes that we've already split any extents crossing agbno + aglen.
+ */
+STATIC int
+find_right_extent(
+ struct xfs_btree_cur *cur,
+ struct xfs_refcount_irec *right,
+ struct xfs_refcount_irec *cright,
+ xfs_agblock_t agbno,
+ xfs_extlen_t aglen)
+{
+ struct xfs_refcount_irec tmp;
+ int error;
+ int found_rec;
+
+ right->rc_blockcount = cright->rc_blockcount = 0;
+ error = xfs_refcountbt_lookup_ge(cur, agbno + aglen, &found_rec);
+ if (error)
+ goto out_error;
+ if (!found_rec)
+ return 0;
+
+ error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+ if (tmp.rc_startblock != agbno + aglen)
+ return 0;
+ /* We have a right extent; retrieve (or invent) the next left one */
+ *right = tmp;
+
+ error = xfs_btree_decrement(cur, 0, &found_rec);
+ if (error)
+ goto out_error;
+ if (found_rec) {
+ error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+ out_error);
+
+ if (tmp.rc_startblock == agbno)
+ *cright = tmp;
+ else {
+ cright->rc_startblock = max(agbno,
+ RLNEXT(tmp));
+ cright->rc_blockcount = right->rc_startblock -
+ cright->rc_startblock;
+ cright->rc_refcount = 1;
+ }
+ } else {
+ cright->rc_startblock = agbno;
+ cright->rc_blockcount = aglen;
+ cright->rc_refcount = 1;
+ }
+ trace_xfs_refcount_find_right_extent(cur->bc_mp, cur->bc_private.a.agno,
+ cright, right, agbno + aglen);
+ return error;
+
+out_error:
+ trace_xfs_refcount_find_right_extent_error(cur->bc_mp,
+ cur->bc_private.a.agno, error, _RET_IP_);
+ return error;
+}
+#undef RLNEXT
+
+/*
+ * Try to merge with any extents on the boundaries of the adjustment range.
+ */
+STATIC int
+try_merge_rlextents(
+ struct xfs_btree_cur *cur,
+ xfs_agblock_t *agbno,
+ xfs_extlen_t *aglen,
+ int adjust)
+{
+ struct xfs_refcount_irec left, cleft, cright, right;
+ int error;
+ unsigned long long ulen;
+
+ left.rc_blockcount = cleft.rc_blockcount = 0;
+ cright.rc_blockcount = right.rc_blockcount = 0;
+
+ /*
+ * Find extents abutting the start and end of the range, and
+ * the adjacent extents inside the range.
+ */
+ error = find_left_extent(cur, &left, &cleft, *agbno, *aglen);
+ if (error)
+ return error;
+ error = find_right_extent(cur, &right, &cright, *agbno, *aglen);
+ if (error)
+ return error;
+
+ /* No left or right extent to merge; exit. */
+ if (left.rc_blockcount == 0 && right.rc_blockcount == 0)
+ return 0;
+
+ /* Try a center merge */
+ ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount +
+ right.rc_blockcount;
+ if (left.rc_blockcount != 0 && right.rc_blockcount != 0 &&
+ memcmp(&cleft, &cright, sizeof(cleft)) == 0 &&
+ left.rc_refcount == cleft.rc_refcount + adjust &&
+ right.rc_refcount == cleft.rc_refcount + adjust &&
+ ulen < MAXREFCEXTLEN) {
+ trace_xfs_refcount_merge_center_extents(cur->bc_mp,
+ cur->bc_private.a.agno, &left, &cleft, &right);
+ return merge_center(cur, &left, &cleft, ulen, agbno, aglen);
+ }
+
+ /* Try a left merge */
+ ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount;
+ if (left.rc_blockcount != 0 &&
+ left.rc_refcount == cleft.rc_refcount + adjust &&
+ ulen < MAXREFCEXTLEN) {
+ trace_xfs_refcount_merge_left_extent(cur->bc_mp,
+ cur->bc_private.a.agno, &left, &cleft);
+ return merge_left(cur, &left, &cleft, agbno, aglen);
+ }
+
+ /* Try a right merge */
+ ulen = (unsigned long long)right.rc_blockcount + cright.rc_blockcount;
+ if (right.rc_blockcount != 0 &&
+ right.rc_refcount == cright.rc_refcount + adjust &&
+ ulen < MAXREFCEXTLEN) {
+ trace_xfs_refcount_merge_right_extent(cur->bc_mp,
+ cur->bc_private.a.agno, &cright, &right);
+ return merge_right(cur, &right, &cright, agbno, aglen);
+ }
+
+ return error;
+}
+
+/*
+ * Adjust the refcounts of middle extents. At this point we should have
+ * split extents that crossed the adjustment range; merged with adjacent
+ * extents; and updated agbno/aglen to reflect the merges. Therefore,
+ * all we have to do is update the extents inside [agbno, agbno + aglen].
+ */
+STATIC int
+adjust_rlextents(
+ struct xfs_btree_cur *cur,
+ xfs_agblock_t agbno,
+ xfs_extlen_t aglen,
+ int adj,
+ struct xfs_bmap_free *flist,
+ struct xfs_owner_info *oinfo)
+{
+ struct xfs_refcount_irec ext, tmp;
+ int error;
+ int found_rec, found_tmp;
+ xfs_fsblock_t fsbno;
+
+ error = xfs_refcountbt_lookup_ge(cur, agbno, &found_rec);
+ if (error)
+ goto out_error;
+
+ while (aglen > 0) {
+ error = xfs_refcountbt_get_rec(cur, &ext, &found_rec);
+ if (error)
+ goto out_error;
+ if (!found_rec) {
+ ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
+ ext.rc_blockcount = 0;
+ ext.rc_refcount = 0;
+ }
+
+ /*
+ * Deal with a hole in the refcount tree; if a file maps to
+ * these blocks and there's no refcountbt recourd, pretend that
+ * there is one with refcount == 1.
+ */
+ if (ext.rc_startblock != agbno) {
+ tmp.rc_startblock = agbno;
+ tmp.rc_blockcount = min(aglen,
+ ext.rc_startblock - agbno);
+ tmp.rc_refcount = 1 + adj;
+ trace_xfs_refcount_modify_extent(cur->bc_mp,
+ cur->bc_private.a.agno, &tmp);
+
+ /*
+ * Either cover the hole (increment) or
+ * delete the range (decrement).
+ */
+ if (tmp.rc_refcount) {
+ error = xfs_refcountbt_insert(cur, &tmp,
+ &found_tmp);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+ found_tmp == 1, out_error);
+ } else {
+ fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
+ cur->bc_private.a.agno,
+ tmp.rc_startblock);
+ xfs_bmap_add_free(cur->bc_mp, flist, fsbno,
+ tmp.rc_blockcount, oinfo);
+ }
+
+ agbno += tmp.rc_blockcount;
+ aglen -= tmp.rc_blockcount;
+
+ error = xfs_refcountbt_lookup_ge(cur, agbno,
+ &found_rec);
+ if (error)
+ goto out_error;
+ }
+
+ /* Stop if there's nothing left to modify */
+ if (aglen == 0)
+ break;
+
+ /*
+ * Adjust the reference count and either update the tree
+ * (incr) or free the blocks (decr).
+ */
+ ext.rc_refcount += adj;
+ trace_xfs_refcount_modify_extent(cur->bc_mp,
+ cur->bc_private.a.agno, &ext);
+ if (ext.rc_refcount > 1) {
+ error = xfs_refcountbt_update(cur, &ext);
+ if (error)
+ goto out_error;
+ } else if (ext.rc_refcount == 1) {
+ error = xfs_refcountbt_delete(cur, &found_rec);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+ found_rec == 1, out_error);
+ goto advloop;
+ } else {
+ fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
+ cur->bc_private.a.agno,
+ ext.rc_startblock);
+ xfs_bmap_add_free(cur->bc_mp, flist, fsbno,
+ ext.rc_blockcount, oinfo);
+ }
+
+ error = xfs_btree_increment(cur, 0, &found_rec);
+ if (error)
+ goto out_error;
+
+advloop:
+ agbno += ext.rc_blockcount;
+ aglen -= ext.rc_blockcount;
+ }
+
+ return error;
+out_error:
+ trace_xfs_refcount_modify_extent_error(cur->bc_mp,
+ cur->bc_private.a.agno, error, _RET_IP_);
+ return error;
+}
+
+/*
+ * Adjust the reference count of a range of AG blocks.
+ *
+ * @mp: XFS mount object
+ * @tp: XFS transaction object
+ * @agbp: Buffer containing the AGF
+ * @agno: AG number
+ * @agbno: Start of range to adjust
+ * @aglen: Length of range to adjust
+ * @adj: +1 to increment, -1 to decrement reference count
+ * @flist: freelist (only required if adj == -1)
+ * @owner: owner of the blocks (only required if adj == -1)
+ */
+STATIC int
+xfs_refcountbt_adjust_refcount(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_buf *agbp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t agbno,
+ xfs_extlen_t aglen,
+ int adj,
+ struct xfs_bmap_free *flist,
+ struct xfs_owner_info *oinfo)
+{
+ struct xfs_btree_cur *cur;
+ int error;
+
+ cur = xfs_refcountbt_init_cursor(mp, tp, agbp, agno, flist);
+
+ /*
+ * Ensure that no rlextents cross the boundary of the adjustment range.
+ */
+ error = try_split_left_rlextent(cur, agbno);
+ if (error)
+ goto out_error;
+
+ error = try_split_right_rlextent(cur, agbno + aglen);
+ if (error)
+ goto out_error;
+
+ /*
+ * Try to merge with the left or right extents of the range.
+ */
+ error = try_merge_rlextents(cur, &agbno, &aglen, adj);
+ if (error)
+ goto out_error;
+
+ /* Now that we've taken care of the ends, adjust the middle extents */
+ error = adjust_rlextents(cur, agbno, aglen, adj, flist, oinfo);
+ if (error)
+ goto out_error;
+
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ return 0;
+
+out_error:
+ trace_xfs_refcount_adjust_error(mp, agno, error, _RET_IP_);
+ xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+ return error;
+}
+
+/**
+ * Increase the reference count of a range of AG blocks.
+ *
+ * @mp: XFS mount object
+ * @tp: XFS transaction object
+ * @agbp: Buffer containing the AGF
+ * @agno: AG number
+ * @agbno: Start of range to adjust
+ * @aglen: Length of range to adjust
+ * @flist: List of blocks to free
+ */
+int
+xfs_refcount_increase(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_buf *agbp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t agbno,
+ xfs_extlen_t aglen,
+ struct xfs_bmap_free *flist)
+{
+ trace_xfs_refcount_increase(mp, agno, agbno, aglen);
+ return xfs_refcountbt_adjust_refcount(mp, tp, agbp, agno, agbno,
+ aglen, 1, flist, NULL);
+}
+
+/**
+ * Decrease the reference count of a range of AG blocks.
+ *
+ * @mp: XFS mount object
+ * @tp: XFS transaction object
+ * @agbp: Buffer containing the AGF
+ * @agno: AG number
+ * @agbno: Start of range to adjust
+ * @aglen: Length of range to adjust
+ * @flist: List of blocks to free
+ * @owner: Extent owner
+ */
+int
+xfs_refcount_decrease(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_buf *agbp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t agbno,
+ xfs_extlen_t aglen,
+ struct xfs_bmap_free *flist,
+ struct xfs_owner_info *oinfo)
+{
+ trace_xfs_refcount_decrease(mp, agno, agbno, aglen);
+ return xfs_refcountbt_adjust_refcount(mp, tp, agbp, agno, agbno,
+ aglen, -1, flist, oinfo);
+}
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
index 033a9b1..6640e3d 100644
--- a/fs/xfs/libxfs/xfs_refcount.h
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -26,4 +26,12 @@ extern int xfs_refcountbt_lookup_ge(struct xfs_btree_cur *cur,
extern int xfs_refcountbt_get_rec(struct xfs_btree_cur *cur,
struct xfs_refcount_irec *irec, int *stat);
+extern int xfs_refcount_increase(struct xfs_mount *mp, struct xfs_trans *tp,
+ struct xfs_buf *agbp, xfs_agnumber_t agno, xfs_agblock_t agbno,
+ xfs_extlen_t aglen, struct xfs_bmap_free *flist);
+extern int xfs_refcount_decrease(struct xfs_mount *mp, struct xfs_trans *tp,
+ struct xfs_buf *agbp, xfs_agnumber_t agno, xfs_agblock_t agbno,
+ xfs_extlen_t aglen, struct xfs_bmap_free *flist,
+ struct xfs_owner_info *oinfo);
+
#endif /* __XFS_REFCOUNT_H__ */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 41/58] libxfs: adjust refcount when unmapping file blocks
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (39 preceding siblings ...)
2015-10-07 4:59 ` [PATCH 40/58] libxfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
@ 2015-10-07 4:59 ` Darrick J. Wong
2015-10-07 4:59 ` [PATCH 42/58] xfs: add refcount btree block detection to log recovery Darrick J. Wong
` (16 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:59 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
When we're unmapping blocks from a reflinked file, decrease the
refcount of the affected blocks and free the extents that are no
longer in use.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 16 +++++++++++++---
fs/xfs/libxfs/xfs_refcount.c | 42 ++++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_refcount.h | 4 ++++
3 files changed, 59 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 81f0ae0..f6812ba 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -46,6 +46,7 @@
#include "xfs_attr_leaf.h"
#include "xfs_filestream.h"
#include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
kmem_zone_t *xfs_bmap_free_item_zone;
@@ -5182,9 +5183,18 @@ xfs_bmap_del_extent(
/*
* If we need to, add to list of extents to delete.
*/
- if (do_fx)
- xfs_bmap_add_free(mp, flist, del->br_startblock,
- del->br_blockcount, NULL);
+ if (do_fx) {
+ if (xfs_is_reflink_inode(ip)) {
+ error = xfs_refcount_put_extent(mp, tp, flist,
+ del->br_startblock,
+ del->br_blockcount, NULL);
+ if (error)
+ goto done;
+ } else
+ xfs_bmap_add_free(mp, flist, del->br_startblock,
+ del->br_blockcount, NULL);
+ }
+
/*
* Adjust inode # blocks in the file.
*/
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index 02892bc..bd611d3 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -938,3 +938,45 @@ xfs_refcount_decrease(
return xfs_refcountbt_adjust_refcount(mp, tp, agbp, agno, agbno,
aglen, -1, flist, oinfo);
}
+
+/**
+ * xfs_refcount_put_extent() - release a range of blocks
+ *
+ * @mp: XFS mount object
+ * @tp: transaction that goes with the free operation
+ * @flist: List of blocks to be freed at the end of the transaction
+ * @fsbno: First fs block of the range to release
+ * @len: Length of range
+ * @owner: owner of the extent
+ */
+int
+xfs_refcount_put_extent(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_bmap_free *flist,
+ xfs_fsblock_t fsbno,
+ xfs_filblks_t fslen,
+ struct xfs_owner_info *oinfo)
+{
+ int error;
+ struct xfs_buf *agbp;
+ xfs_agnumber_t agno; /* allocation group number */
+ xfs_agblock_t agbno; /* ag start of range to free */
+ xfs_extlen_t aglen; /* ag length of range to free */
+
+ agno = XFS_FSB_TO_AGNO(mp, fsbno);
+ agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+ aglen = fslen;
+
+ /*
+ * Drop reference counts in the refcount tree.
+ */
+ error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
+ if (error)
+ return error;
+
+ error = xfs_refcount_decrease(mp, tp, agbp, agno, agbno, aglen, flist,
+ oinfo);
+ xfs_trans_brelse(tp, agbp);
+ return error;
+}
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
index 6640e3d..074d620 100644
--- a/fs/xfs/libxfs/xfs_refcount.h
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -34,4 +34,8 @@ extern int xfs_refcount_decrease(struct xfs_mount *mp, struct xfs_trans *tp,
xfs_extlen_t aglen, struct xfs_bmap_free *flist,
struct xfs_owner_info *oinfo);
+extern int xfs_refcount_put_extent(struct xfs_mount *mp, struct xfs_trans *tp,
+ struct xfs_bmap_free *flist, xfs_fsblock_t fsbno,
+ xfs_filblks_t len, struct xfs_owner_info *oinfo);
+
#endif /* __XFS_REFCOUNT_H__ */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 42/58] xfs: add refcount btree block detection to log recovery
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (40 preceding siblings ...)
2015-10-07 4:59 ` [PATCH 41/58] libxfs: adjust refcount when unmapping file blocks Darrick J. Wong
@ 2015-10-07 4:59 ` Darrick J. Wong
2015-10-07 4:59 ` [PATCH 43/58] xfs: map an inode's offset to an exact physical block Darrick J. Wong
` (15 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:59 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Teach log recovery how to deal with refcount btree blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_log_recover.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 0f5fb8f..9780702 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1848,6 +1848,7 @@ xlog_recover_get_buf_lsn(
case XFS_ABTB_MAGIC:
case XFS_ABTC_MAGIC:
case XFS_RMAP_CRC_MAGIC:
+ case XFS_REFC_CRC_MAGIC:
case XFS_IBT_CRC_MAGIC:
case XFS_IBT_MAGIC: {
struct xfs_btree_block *btb = blk;
@@ -2019,6 +2020,9 @@ xlog_recover_validate_buf_type(
case XFS_RMAP_CRC_MAGIC:
bp->b_ops = &xfs_rmapbt_buf_ops;
break;
+ case XFS_REFC_CRC_MAGIC:
+ bp->b_ops = &xfs_refcountbt_buf_ops;
+ break;
default:
xfs_warn(mp, "Bad btree block magic!");
ASSERT(0);
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 43/58] xfs: map an inode's offset to an exact physical block
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (41 preceding siblings ...)
2015-10-07 4:59 ` [PATCH 42/58] xfs: add refcount btree block detection to log recovery Darrick J. Wong
@ 2015-10-07 4:59 ` Darrick J. Wong
2015-10-07 4:59 ` [PATCH 44/58] xfs: add reflink feature flag to geometry Darrick J. Wong
` (14 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:59 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Teach the bmap routine to know how to map a range of file blocks to a
specific range of physical blocks, instead of simply allocating fresh
blocks. This enables reflink to map a file to blocks that are already
in use.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_bmap.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_bmap.h | 9 ++++++
2 files changed, 72 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index f6812ba..5f457b4 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4003,6 +4003,56 @@ xfs_bmap_btalloc(
}
/*
+ * For a remap operation, just "allocate" an extent at the address that the
+ * caller passed in, and ensure that the AGFL is the right size. The caller
+ * will then map the "allocated" extent into the file somewhere.
+ */
+static int
+xfs_bmap_remap(
+ struct xfs_bmalloca *ap)
+{
+ struct xfs_trans *tp = ap->tp;
+ struct xfs_mount *mp = tp->t_mountp;
+ xfs_agblock_t bno;
+ struct xfs_alloc_arg args;
+ int error;
+
+ /*
+ * validate that the block number is legal - the enables us to detect
+ * and handle a silent filesystem corruption rather than crashing.
+ */
+ memset(&args, 0, sizeof(struct xfs_alloc_arg));
+ args.tp = ap->tp;
+ args.mp = ap->tp->t_mountp;
+ bno = *ap->firstblock;
+ args.agno = XFS_FSB_TO_AGNO(mp, bno);
+ if (args.agno >= mp->m_sb.sb_agcount)
+ return -EFSCORRUPTED;
+
+ args.agbno = XFS_FSB_TO_AGBNO(mp, bno);
+ if (args.agbno >= mp->m_sb.sb_agblocks)
+ return -EFSCORRUPTED;
+
+ /* "Allocate" the extent from the range we passed in. */
+ trace_xfs_reflink_relink_blocks(ap->ip, *ap->firstblock,
+ ap->length);
+ ap->blkno = bno;
+ ap->ip->i_d.di_nblocks += ap->length;
+ xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
+
+ /* Fix the freelist, like a real allocator does. */
+
+ args.pag = xfs_perag_get(args.mp, args.agno);
+ ASSERT(args.pag);
+
+ error = xfs_alloc_fix_freelist(&args, XFS_ALLOC_FLAG_FREEING);
+ if (error)
+ goto error0;
+error0:
+ xfs_perag_put(args.pag);
+ return error;
+}
+/*
* xfs_bmap_alloc is called by xfs_bmapi to allocate an extent for a file.
* It figures out where to ask the underlying allocator to put the new extent.
*/
@@ -4010,6 +4060,8 @@ STATIC int
xfs_bmap_alloc(
struct xfs_bmalloca *ap) /* bmap alloc argument struct */
{
+ if (ap->flags & XFS_BMAPI_REMAP)
+ return xfs_bmap_remap(ap);
if (XFS_IS_REALTIME_INODE(ap->ip) && ap->userdata)
return xfs_bmap_rtalloc(ap);
return xfs_bmap_btalloc(ap);
@@ -4694,6 +4746,12 @@ xfs_bmapi_write(
ASSERT(len > 0);
ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+ if (whichfork == XFS_ATTR_FORK)
+ ASSERT(!(flags & XFS_BMAPI_REMAP));
+ if (flags & XFS_BMAPI_REMAP) {
+ ASSERT(!(flags & XFS_BMAPI_PREALLOC));
+ ASSERT(!(flags & XFS_BMAPI_CONVERT));
+ }
if (unlikely(XFS_TEST_ERROR(
(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
@@ -4743,6 +4801,12 @@ xfs_bmapi_write(
wasdelay = !inhole && isnullstartblock(bma.got.br_startblock);
/*
+ * Make sure we only reflink into a hole.
+ */
+ if (flags & XFS_BMAPI_REMAP)
+ ASSERT(inhole);
+
+ /*
* First, deal with the hole before the allocated space
* that we found, if any.
*/
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 59f26cf..78a56b69 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -110,6 +110,12 @@ typedef struct xfs_bmap_free
* from written to unwritten, otherwise convert from unwritten to written.
*/
#define XFS_BMAPI_CONVERT 0x040
+/*
+ * Map the inode offset to the block given in ap->firstblock. Primarily
+ * used for reflink. The range must be in a hole, and this flag cannot be
+ * turned on with PREALLOC or CONVERT, and cannot be used on the attr fork.
+ */
+#define XFS_BMAPI_REMAP 0x100
#define XFS_BMAPI_FLAGS \
{ XFS_BMAPI_ENTIRE, "ENTIRE" }, \
@@ -118,7 +124,8 @@ typedef struct xfs_bmap_free
{ XFS_BMAPI_PREALLOC, "PREALLOC" }, \
{ XFS_BMAPI_IGSTATE, "IGSTATE" }, \
{ XFS_BMAPI_CONTIG, "CONTIG" }, \
- { XFS_BMAPI_CONVERT, "CONVERT" }
+ { XFS_BMAPI_CONVERT, "CONVERT" }, \
+ { XFS_BMAPI_REMAP, "REMAP" }
static inline int xfs_bmapi_aflag(int w)
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 44/58] xfs: add reflink feature flag to geometry
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (42 preceding siblings ...)
2015-10-07 4:59 ` [PATCH 43/58] xfs: map an inode's offset to an exact physical block Darrick J. Wong
@ 2015-10-07 4:59 ` Darrick J. Wong
2015-10-07 5:00 ` [PATCH 45/58] xfs: create a separate workqueue for copy-on-write activities Darrick J. Wong
` (13 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 4:59 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Report the reflink feature in the XFS geometry so that xfs_info and
friends know the filesystem has this feature.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_fs.h | 3 ++-
fs/xfs/xfs_fsops.c | 4 +++-
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 9fbdb86..b6ee5d8 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -240,7 +240,8 @@ typedef struct xfs_fsop_resblks {
#define XFS_FSOP_GEOM_FLAGS_FTYPE 0x10000 /* inode directory types */
#define XFS_FSOP_GEOM_FLAGS_FINOBT 0x20000 /* free inode btree */
#define XFS_FSOP_GEOM_FLAGS_SPINODES 0x40000 /* sparse inode chunks */
-#define XFS_FSOP_GEOM_FLAGS_RMAPBT 0x80000 /* Reverse mapping btree */
+#define XFS_FSOP_GEOM_FLAGS_RMAPBT 0x80000 /* reverse mapping btree */
+#define XFS_FSOP_GEOM_FLAGS_REFLINK 0x100000 /* files can share blocks */
/*
* Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 7c37c22..920db9d 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -106,7 +106,9 @@ xfs_fs_geometry(
(xfs_sb_version_hassparseinodes(&mp->m_sb) ?
XFS_FSOP_GEOM_FLAGS_SPINODES : 0) |
(xfs_sb_version_hasrmapbt(&mp->m_sb) ?
- XFS_FSOP_GEOM_FLAGS_RMAPBT : 0);
+ XFS_FSOP_GEOM_FLAGS_RMAPBT : 0) |
+ (xfs_sb_version_hasreflink(&mp->m_sb) ?
+ XFS_FSOP_GEOM_FLAGS_REFLINK : 0);
geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
mp->m_sb.sb_logsectsize : BBSIZE;
geo->rtsectsize = mp->m_sb.sb_blocksize;
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 45/58] xfs: create a separate workqueue for copy-on-write activities
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (43 preceding siblings ...)
2015-10-07 4:59 ` [PATCH 44/58] xfs: add reflink feature flag to geometry Darrick J. Wong
@ 2015-10-07 5:00 ` Darrick J. Wong
2015-10-07 5:00 ` [PATCH 46/58] xfs: implement copy-on-write for reflinked blocks Darrick J. Wong
` (12 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:00 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Create a separate workqueue to handle copy on write blocks so that we
don't explode the number of kworkers if a flood of writes comes
through. We could possibly use m_buf_wq for this too...
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_super.c | 36 +++++++++++++++++++++++++++++++++++-
2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index aba42d7..6f4c335 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -142,6 +142,7 @@ typedef struct xfs_mount {
struct workqueue_struct *m_reclaim_workqueue;
struct workqueue_struct *m_log_workqueue;
struct workqueue_struct *m_eofblocks_workqueue;
+ struct workqueue_struct *m_cow_workqueue;
/*
* Generation of the filesysyem layout. This is incremented by each
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 6dc5993..c6a26b0 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -883,6 +883,33 @@ xfs_destroy_mount_workqueues(
destroy_workqueue(mp->m_buf_workqueue);
}
+STATIC int
+xfs_init_feature_workqueues(
+ struct xfs_mount *mp)
+{
+ if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+ /* XXX awful, but it'll improve when bh's go away */
+ /* XXX probably need locking around the endio rbtree */
+ mp->m_cow_workqueue = alloc_workqueue("xfs-cow/%s",
+ WQ_MEM_RECLAIM|WQ_FREEZABLE|WQ_UNBOUND,
+ 1, mp->m_fsname);
+ if (!mp->m_cow_workqueue)
+ goto out;
+ }
+
+ return 0;
+out:
+ return -ENOMEM;
+}
+
+STATIC void
+xfs_destroy_feature_workqueues(
+ struct xfs_mount *mp)
+{
+ if (mp->m_cow_workqueue)
+ destroy_workqueue(mp->m_cow_workqueue);
+}
+
/*
* Flush all dirty data to disk. Must not be called while holding an XFS_ILOCK
* or a page lock. We use sync_inodes_sb() here to ensure we block while waiting
@@ -1490,6 +1517,10 @@ xfs_fs_fill_super(
if (error)
goto out_free_sb;
+ error = xfs_init_feature_workqueues(mp);
+ if (error)
+ goto out_filestream_unmount;
+
/*
* we must configure the block size in the superblock before we run the
* full mount process as the mount process can lookup and cache inodes.
@@ -1526,7 +1557,7 @@ xfs_fs_fill_super(
error = xfs_mountfs(mp);
if (error)
- goto out_filestream_unmount;
+ goto out_destroy_feature_workqueues;
root = igrab(VFS_I(mp->m_rootip));
if (!root) {
@@ -1541,6 +1572,8 @@ xfs_fs_fill_super(
return 0;
+out_destroy_feature_workqueues:
+ xfs_destroy_feature_workqueues(mp);
out_filestream_unmount:
xfs_filestream_unmount(mp);
out_free_sb:
@@ -1570,6 +1603,7 @@ xfs_fs_put_super(
struct xfs_mount *mp = XFS_M(sb);
xfs_notice(mp, "Unmounting Filesystem");
+ xfs_destroy_feature_workqueues(mp);
xfs_filestream_unmount(mp);
xfs_unmountfs(mp);
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 46/58] xfs: implement copy-on-write for reflinked blocks
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (44 preceding siblings ...)
2015-10-07 5:00 ` [PATCH 45/58] xfs: create a separate workqueue for copy-on-write activities Darrick J. Wong
@ 2015-10-07 5:00 ` Darrick J. Wong
2015-10-07 5:00 ` [PATCH 47/58] xfs: handle directio " Darrick J. Wong
` (11 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:00 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Implement a copy-on-write handler for the buffered write path. When
writepages is called, allocate a new block (which we then tell the log
that we intend to delete so that it's freed if we crash), and then
write the buffer to the new block. Upon completion, remove the freed
block intent from the log and remap the file so that the changes
appear.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/Makefile | 1
fs/xfs/xfs_aops.c | 52 +++
fs/xfs/xfs_aops.h | 5
fs/xfs/xfs_file.c | 11 +
fs/xfs/xfs_icache.c | 3
fs/xfs/xfs_inode.h | 2
fs/xfs/xfs_reflink.c | 756 ++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_reflink.h | 41 ++
fs/xfs/xfs_trans.h | 3
fs/xfs/xfs_trans_extfree.c | 27 ++
10 files changed, 894 insertions(+), 7 deletions(-)
create mode 100644 fs/xfs/xfs_reflink.c
create mode 100644 fs/xfs/xfs_reflink.h
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 3565db6..0b7fa41 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -88,6 +88,7 @@ xfs-y += xfs_aops.o \
xfs_message.o \
xfs_mount.o \
xfs_mru_cache.o \
+ xfs_reflink.o \
xfs_super.o \
xfs_symlink.o \
xfs_sysfs.o \
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 50ab287..06a1d2f 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -31,6 +31,8 @@
#include "xfs_bmap.h"
#include "xfs_bmap_util.h"
#include "xfs_bmap_btree.h"
+#include "xfs_reflink.h"
+#include <linux/aio.h>
#include <linux/gfp.h>
#include <linux/mpage.h>
#include <linux/pagevec.h>
@@ -190,6 +192,8 @@ xfs_finish_ioend(
if (ioend->io_type == XFS_IO_UNWRITTEN)
queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
+ else if (ioend->io_type == XFS_IO_FORKED)
+ queue_work(mp->m_cow_workqueue, &ioend->io_work);
else if (ioend->io_append_trans)
queue_work(mp->m_data_workqueue, &ioend->io_work);
else
@@ -212,6 +216,25 @@ xfs_end_io(
ioend->io_error = -EIO;
goto done;
}
+
+ /*
+ * If we forked the block, we need to remap the bmbt and possibly
+ * finish up the i_size transaction too... or clean up after a
+ * failed write.
+ */
+ if (ioend->io_type == XFS_IO_FORKED) {
+ if (ioend->io_error) {
+ error = xfs_reflink_cancel_fork_ioend(ioend);
+ goto done;
+ }
+ error = xfs_reflink_fork_ioend(ioend);
+ if (error)
+ goto done;
+ if (ioend->io_append_trans)
+ error = xfs_setfilesize_ioend(ioend);
+ goto done;
+ }
+
if (ioend->io_error)
goto done;
@@ -266,6 +289,7 @@ xfs_alloc_ioend(
ioend->io_append_trans = NULL;
INIT_WORK(&ioend->io_work, xfs_end_io);
+ INIT_LIST_HEAD(&ioend->io_reflink_endio_list);
return ioend;
}
@@ -547,6 +571,7 @@ xfs_cancel_ioend(
} while ((bh = next_bh) != NULL);
mempool_free(ioend, xfs_ioend_pool);
+ xfs_reflink_cancel_fork_ioend(ioend);
} while ((ioend = next) != NULL);
}
@@ -563,7 +588,8 @@ xfs_add_to_ioend(
xfs_off_t offset,
unsigned int type,
xfs_ioend_t **result,
- int need_ioend)
+ int need_ioend,
+ struct xfs_reflink_ioend *eio)
{
xfs_ioend_t *ioend = *result;
@@ -584,6 +610,8 @@ xfs_add_to_ioend(
bh->b_private = NULL;
ioend->io_size += bh->b_size;
+ if (eio)
+ xfs_reflink_add_ioend(ioend, eio);
}
STATIC void
@@ -784,7 +812,7 @@ xfs_convert_page(
if (type != XFS_IO_OVERWRITE)
xfs_map_at_offset(inode, bh, imap, offset);
xfs_add_to_ioend(inode, bh, offset, type,
- ioendp, done);
+ ioendp, done, NULL);
page_dirty--;
count++;
@@ -947,6 +975,8 @@ xfs_vm_writepage(
int err, imap_valid = 0, uptodate = 1;
int count = 0;
int nonblocking = 0;
+ struct xfs_inode *ip = XFS_I(inode);
+ int err2 = 0;
trace_xfs_writepage(inode, page, 0, 0);
@@ -1115,11 +1145,15 @@ xfs_vm_writepage(
imap_valid = xfs_imap_valid(inode, &imap, offset);
}
if (imap_valid) {
+ struct xfs_reflink_ioend *eio = NULL;
+
+ err2 = xfs_reflink_write_fork_block(ip, &imap, offset,
+ &type, &eio);
lock_buffer(bh);
if (type != XFS_IO_OVERWRITE)
xfs_map_at_offset(inode, bh, &imap, offset);
xfs_add_to_ioend(inode, bh, offset, type, &ioend,
- new_ioend);
+ new_ioend, eio);
count++;
}
@@ -1133,6 +1167,9 @@ xfs_vm_writepage(
xfs_start_page_writeback(page, 1, count);
+ if (err)
+ goto error;
+
/* if there is no IO to be submitted for this page, we are done */
if (!ioend)
return 0;
@@ -1167,8 +1204,9 @@ xfs_vm_writepage(
/*
* Reserve log space if we might write beyond the on-disk inode size.
*/
- err = 0;
- if (ioend->io_type != XFS_IO_UNWRITTEN && xfs_ioend_is_append(ioend))
+ err = err2;
+ if (!err && ioend->io_type != XFS_IO_UNWRITTEN &&
+ xfs_ioend_is_append(ioend))
err = xfs_setfilesize_trans_alloc(ioend);
xfs_submit_ioend(wbc, iohead, err);
@@ -1818,6 +1856,10 @@ xfs_vm_write_begin(
if (!page)
return -ENOMEM;
+ status = xfs_reflink_reserve_fork_block(XFS_I(mapping->host), pos, len);
+ if (status)
+ return status;
+
status = __block_write_begin(page, pos, len, xfs_get_blocks);
if (unlikely(status)) {
struct inode *inode = mapping->host;
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 86afd1a..9cf206a 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -27,12 +27,14 @@ enum {
XFS_IO_DELALLOC, /* covers delalloc region */
XFS_IO_UNWRITTEN, /* covers allocated but uninitialized data */
XFS_IO_OVERWRITE, /* covers already allocated extent */
+ XFS_IO_FORKED, /* covers copy-on-write region */
};
#define XFS_IO_TYPES \
{ XFS_IO_DELALLOC, "delalloc" }, \
{ XFS_IO_UNWRITTEN, "unwritten" }, \
- { XFS_IO_OVERWRITE, "overwrite" }
+ { XFS_IO_OVERWRITE, "overwrite" }, \
+ { XFS_IO_FORKED, "forked" }
/*
* xfs_ioend struct manages large extent writes for XFS.
@@ -50,6 +52,7 @@ typedef struct xfs_ioend {
xfs_off_t io_offset; /* offset in the file */
struct work_struct io_work; /* xfsdatad work queue */
struct xfs_trans *io_append_trans;/* xact. for size update */
+ struct list_head io_reflink_endio_list;/* remappings for CoW */
} xfs_ioend_t;
extern const struct address_space_operations xfs_address_space_operations;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e78feb4..593223f 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -37,6 +37,7 @@
#include "xfs_log.h"
#include "xfs_icache.h"
#include "xfs_pnfs.h"
+#include "xfs_reflink.h"
#include <linux/dcache.h>
#include <linux/falloc.h>
@@ -1502,6 +1503,14 @@ xfs_filemap_page_mkwrite(
file_update_time(vma->vm_file);
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
+ /* Set up the remapping for a CoW mmap'd page */
+ ret = xfs_reflink_reserve_fork_block(XFS_I(inode),
+ vmf->page->index << PAGE_CACHE_SHIFT, PAGE_CACHE_SIZE);
+ if (ret) {
+ ret = block_page_mkwrite_return(ret);
+ goto out;
+ }
+
if (IS_DAX(inode)) {
ret = __dax_mkwrite(vma, vmf, xfs_get_blocks_direct,
xfs_end_io_dax_write);
@@ -1509,7 +1518,7 @@ xfs_filemap_page_mkwrite(
ret = __block_page_mkwrite(vma, vmf, xfs_get_blocks);
ret = block_page_mkwrite_return(ret);
}
-
+out:
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
sb_end_pagefault(inode->i_sb);
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 0a326bd..c409576 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -33,6 +33,7 @@
#include "xfs_bmap_util.h"
#include "xfs_dquot_item.h"
#include "xfs_dquot.h"
+#include "xfs_reflink.h"
#include <linux/kthread.h>
#include <linux/freezer.h>
@@ -80,6 +81,7 @@ xfs_inode_alloc(
ip->i_flags = 0;
ip->i_delayed_blks = 0;
memset(&ip->i_d, 0, sizeof(xfs_icdinode_t));
+ ip->i_remaps = RB_ROOT;
return ip;
}
@@ -115,6 +117,7 @@ xfs_inode_free(
ip->i_itemp = NULL;
}
+ xfs_reflink_cancel_fork_blocks(ip);
/*
* Because we use RCU freeing we need to ensure the inode always
* appears to be reclaimed with an invalid inode number when in the
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 6436a96..f4cf967 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -65,6 +65,8 @@ typedef struct xfs_inode {
xfs_icdinode_t i_d; /* most of ondisk inode */
+ struct rb_root i_remaps; /* CoW remappings in progress */
+
/* VFS inode */
struct inode i_vnode; /* embedded VFS inode */
} xfs_inode_t;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
new file mode 100644
index 0000000..1e00be2
--- /dev/null
+++ b/fs/xfs/xfs_reflink.c
@@ -0,0 +1,756 @@
+/*
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_inode.h"
+#include "xfs_trans.h"
+#include "xfs_inode_item.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_error.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_ioctl.h"
+#include "xfs_trace.h"
+#include "xfs_log.h"
+#include "xfs_icache.h"
+#include "xfs_pnfs.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_space.h"
+#include "xfs_bit.h"
+#include "xfs_alloc.h"
+#include "xfs_quota_defs.h"
+#include "xfs_quota.h"
+#include "xfs_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_reflink.h"
+
+#define CHECK_AG_NUMBER(mp, agno) \
+ do { \
+ ASSERT((agno) != NULLAGNUMBER); \
+ ASSERT((agno) < (mp)->m_sb.sb_agcount); \
+ } while (0)
+
+#define CHECK_AG_EXTENT(mp, agbno, len) \
+ do { \
+ ASSERT((agbno) != NULLAGBLOCK); \
+ ASSERT((len) > 0); \
+ ASSERT((unsigned long long)(agbno) + (len) <= \
+ (mp)->m_sb.sb_agblocks); \
+ } while (0)
+
+struct xfs_reflink_ioend {
+ struct rb_node rlei_node; /* tree of pending remappings */
+ struct list_head rlei_list; /* list of reflink ioends */
+ struct xfs_bmbt_irec rlei_mapping; /* new bmbt mapping to put in */
+ struct xfs_efi_log_item *rlei_efi; /* efi log item to cancel */
+ xfs_fsblock_t rlei_oldfsbno; /* old fsbno */
+};
+
+/**
+ * xfs_reflink_get_refcount() - get refcount and extent length for a given pblk
+ *
+ * @mp: XFS mount object
+ * @agno: AG number
+ * @agbno: AG block number
+ * @len: length of extent
+ * @nr: refcount
+ */
+int
+xfs_reflink_get_refcount(
+ struct xfs_mount *mp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t agbno,
+ xfs_extlen_t *len,
+ xfs_nlink_t *nr)
+{
+ struct xfs_btree_cur *cur;
+ struct xfs_buf *agbp;
+ struct xfs_refcount_irec tmp;
+ xfs_extlen_t aglen;
+ int error;
+ int i, have;
+ int bt_error;
+
+ if (!xfs_sb_version_hasreflink(&mp->m_sb)) {
+ *len = 0;
+ *nr = 1;
+ return 0;
+ }
+
+ error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+ if (error)
+ return error;
+ aglen = be32_to_cpu(XFS_BUF_TO_AGF(agbp)->agf_length);
+ ASSERT(agbno < aglen);
+
+ /*
+ * See if there's an extent covering the block we want.
+ */
+ bt_error = XFS_BTREE_ERROR;
+ cur = xfs_refcountbt_init_cursor(mp, NULL, agbp, agno, NULL);
+ error = xfs_refcountbt_lookup_le(cur, agbno, &have);
+ if (error)
+ goto out_error;
+ if (!have)
+ goto hole;
+ error = xfs_refcountbt_get_rec(cur, &tmp, &i);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+ if (tmp.rc_startblock + tmp.rc_blockcount <= agbno)
+ goto hole;
+
+ *len = tmp.rc_blockcount - (agbno - tmp.rc_startblock);
+ *nr = tmp.rc_refcount;
+ goto out;
+
+hole:
+ /*
+ * We're in a hole, so pretend that this we have a refcount=1 extent
+ * going to the next rlextent or the end of the AG.
+ */
+ error = xfs_btree_increment(cur, 0, &have);
+ if (error)
+ goto out_error;
+ if (!have)
+ *len = aglen - agbno;
+ else {
+ error = xfs_refcountbt_get_rec(cur, &tmp, &i);
+ if (error)
+ goto out_error;
+ XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+ *len = tmp.rc_startblock - agbno;
+ }
+ *nr = 1;
+
+out:
+ bt_error = XFS_BTREE_NOERROR;
+out_error:
+ xfs_btree_del_cursor(cur, bt_error);
+ xfs_buf_relse(agbp);
+ return error;
+}
+
+/*
+ * Allocate a replacement block for a copy-on-write operation.
+ *
+ * XXX: Ideally we'd scan up and down the incore extent list
+ * looking for a block, but do this stupid thing for now.
+ */
+STATIC int
+fork_one_block(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_inode *ip,
+ xfs_fsblock_t old,
+ xfs_fsblock_t *new,
+ xfs_fileoff_t offset)
+{
+ int error;
+ struct xfs_alloc_arg args; /* allocation arguments */
+
+ memset(&args, 0, sizeof(args));
+ args.tp = tp;
+ args.mp = mp;
+ args.type = XFS_ALLOCTYPE_NEAR_BNO;
+ args.firstblock = args.fsbno = old;
+ args.minlen = args.maxlen = args.prod = 1;
+ args.userdata = XFS_ALLOC_USERDATA;
+ error = xfs_alloc_vextent(&args);
+ if (error)
+ goto out_error;
+ ASSERT(args.len == 1);
+ ASSERT(args.fsbno != old);
+ *new = args.fsbno;
+
+out_error:
+ return error;
+}
+
+/* Compare two reflink ioend structures */
+STATIC int
+ioend_compare(
+ struct xfs_reflink_ioend *i1,
+ struct xfs_reflink_ioend *i2)
+{
+ if (i1->rlei_mapping.br_startoff > i2->rlei_mapping.br_startoff)
+ return 1;
+ if (i1->rlei_mapping.br_startoff < i2->rlei_mapping.br_startoff)
+ return -1;
+ return 0;
+}
+
+/* Attach a remapping object to an inode. */
+STATIC int
+remap_insert(
+ struct xfs_inode *ip,
+ struct xfs_reflink_ioend *eio)
+{
+ struct rb_node **new = &(ip->i_remaps.rb_node);
+ struct rb_node *parent = NULL;
+ struct xfs_reflink_ioend *this;
+ int result;
+
+ /* Figure out where to put new node */
+ while (*new) {
+ this = rb_entry(*new, struct xfs_reflink_ioend, rlei_node);
+ result = ioend_compare(eio, this);
+
+ parent = *new;
+ if (result < 0)
+ new = &((*new)->rb_left);
+ else if (result > 0)
+ new = &((*new)->rb_right);
+ else
+ return -EEXIST;
+ }
+
+ /* Add new node and rebalance tree. */
+ rb_link_node(&eio->rlei_node, parent, new);
+ rb_insert_color(&eio->rlei_node, &ip->i_remaps);
+
+ return 0;
+}
+
+/* Find a remapping object for a block in an inode */
+STATIC int
+remap_search(
+ struct xfs_inode *ip,
+ xfs_fileoff_t fsbno,
+ struct xfs_reflink_ioend **peio)
+{
+ struct rb_node *node = ip->i_remaps.rb_node;
+ struct xfs_reflink_ioend *data;
+ int result;
+ struct xfs_reflink_ioend f;
+
+ f.rlei_mapping.br_startoff = fsbno;
+ while (node) {
+ data = rb_entry(node, struct xfs_reflink_ioend, rlei_node);
+ result = ioend_compare(&f, data);
+
+ if (result < 0)
+ node = node->rb_left;
+ else if (result > 0)
+ node = node->rb_right;
+ else {
+ *peio = data;
+ return 0;
+ }
+ }
+
+ return -ENOENT;
+}
+
+/* Allocate a block to handle a copy on write later. */
+STATIC int
+__reserve_fork_block(
+ struct xfs_inode *ip,
+ struct xfs_bmbt_irec *imap,
+ xfs_off_t offset)
+{
+ xfs_fsblock_t fsbno;
+ xfs_fsblock_t new_fsbno;
+ xfs_off_t iomap_offset;
+ xfs_agnumber_t agno; /* allocation group number */
+ xfs_agblock_t agbno; /* ag start of range to free */
+ struct xfs_trans *tp = NULL;
+ int error;
+ struct xfs_reflink_ioend *eio;
+ struct xfs_mount *mp = ip->i_mount;
+
+ ASSERT(xfs_is_reflink_inode(ip));
+ iomap_offset = XFS_FSB_TO_B(mp, imap->br_startoff);
+ fsbno = imap->br_startblock + XFS_B_TO_FSB(mp, offset - iomap_offset);
+ agno = XFS_FSB_TO_AGNO(mp, fsbno);
+ agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+ CHECK_AG_NUMBER(mp, agno);
+ CHECK_AG_EXTENT(mp, agbno, 1);
+ ASSERT(imap->br_state == XFS_EXT_NORM);
+
+ /* If we've already got a remapping, we're done. */
+ error = remap_search(ip, XFS_B_TO_FSB(mp, offset), &eio);
+ if (!error)
+ return 0;
+
+ /*
+ * Ok, we have to fork this block. Allocate a replacement block,
+ * stash the new mapping, and add an EFI entry for recovery. When
+ * the (redirected) IO completes, we'll deal with remapping.
+ */
+ tp = xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write,
+ XFS_DIOSTRAT_SPACE_RES(mp, 2), 0);
+ if (error)
+ goto out_cancel;
+
+ error = fork_one_block(mp, tp, ip, fsbno, &new_fsbno,
+ XFS_B_TO_FSB(mp, offset));
+ if (error)
+ goto out_cancel;
+
+ trace_xfs_reflink_reserve_fork_block(ip, XFS_B_TO_FSB(mp, offset),
+ fsbno, 1, new_fsbno);
+
+ eio = kmem_zalloc(sizeof(*eio), KM_SLEEP | KM_NOFS);
+ eio->rlei_mapping.br_startblock = new_fsbno;
+ eio->rlei_mapping.br_startoff = XFS_B_TO_FSB(mp, offset);
+ eio->rlei_mapping.br_blockcount = 1;
+ eio->rlei_mapping.br_state = XFS_EXT_NORM;
+ eio->rlei_oldfsbno = fsbno;
+ eio->rlei_efi = xfs_trans_get_efi(tp, 1);
+ xfs_trans_log_efi_extent(tp, eio->rlei_efi, new_fsbno, 1);
+
+ error = remap_insert(ip, eio);
+ if (error)
+ goto out_cancel;
+
+ /*
+ * ...and we're done.
+ */
+ error = xfs_trans_commit(tp);
+ if (error)
+ goto out_error;
+
+ return error;
+
+out_cancel:
+ xfs_trans_cancel(tp);
+out_error:
+ trace_xfs_reflink_reserve_fork_block_error(ip, error, _RET_IP_);
+ return error;
+}
+
+/**
+ * xfs_reflink_reserve_fork_block() -- Allocate blocks to satisfy a copy on
+ * write operation.
+ * @ip: XFS inode
+ * @pos: file offset to start forking
+ * @len: number of bytes to fork
+ */
+int
+xfs_reflink_reserve_fork_block(
+ struct xfs_inode *ip,
+ xfs_off_t pos,
+ xfs_off_t len)
+{
+ struct xfs_bmbt_irec imap;
+ int nimaps;
+ int error;
+ xfs_fileoff_t lblk;
+ xfs_fileoff_t next_lblk;
+ xfs_off_t offset;
+ bool type;
+
+ if (!xfs_is_reflink_inode(ip))
+ return 0;
+
+ trace_xfs_reflink_force_getblocks(ip, len, pos, 0);
+
+ error = 0;
+ lblk = XFS_B_TO_FSBT(ip->i_mount, pos);
+ next_lblk = 1 + XFS_B_TO_FSBT(ip->i_mount, pos + len - 1);
+ while (lblk < next_lblk) {
+ offset = XFS_FSB_TO_B(ip->i_mount, lblk);
+ /* Read extent from the source file */
+ nimaps = 1;
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ error = xfs_bmapi_read(ip, lblk, next_lblk - lblk, &imap,
+ &nimaps, 0);
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ if (error)
+ break;
+
+ if (nimaps == 0)
+ break;
+
+ error = xfs_reflink_should_fork_block(ip, &imap, offset, &type);
+ if (error)
+ break;
+ if (!type)
+ goto advloop;
+
+ error = __reserve_fork_block(ip, &imap, offset);
+ if (error)
+ break;
+
+advloop:
+ lblk += imap.br_blockcount;
+ }
+
+ return error;
+}
+
+/**
+ * xfs_reflink_write_fork_block() -- find a remapping object and redirect the
+ * write.
+ *
+ * @ip: XFS inode
+ * @offset: file offset we're trying to write
+ * @imap: the mapping for this block (I/O)
+ * @type: the io type (I/O)
+ * @peio: pointer to a reflink ioend; caller must attach to an ioend (O)
+ */
+int
+xfs_reflink_write_fork_block(
+ struct xfs_inode *ip,
+ struct xfs_bmbt_irec *imap,
+ xfs_off_t offset,
+ unsigned int *type,
+ struct xfs_reflink_ioend **peio)
+{
+ int error;
+ struct xfs_reflink_ioend *eio = NULL;
+
+ if (!xfs_is_reflink_inode(ip))
+ return 0;
+ if (*type == XFS_IO_DELALLOC || *type == XFS_IO_UNWRITTEN)
+ return 0;
+
+ error = remap_search(ip, XFS_B_TO_FSB(ip->i_mount, offset), &eio);
+ if (error == -ENOENT)
+ return 0;
+ else if (error) {
+ trace_xfs_reflink_write_fork_block_error(ip, error, _RET_IP_);
+ return error;
+ }
+
+ trace_xfs_reflink_write_fork_block(ip, eio->rlei_mapping.br_startoff,
+ eio->rlei_oldfsbno, 1, eio->rlei_mapping.br_startblock);
+
+ *imap = eio->rlei_mapping;
+ *type = XFS_IO_FORKED;
+ *peio = eio;
+ return 0;
+}
+
+/* Remap a range of file blocks after forking. */
+STATIC int
+xfs_reflink_remap_after_io(
+ struct xfs_mount *mp,
+ struct xfs_inode *ip,
+ struct xfs_reflink_ioend *eio)
+{
+ struct xfs_trans *tp = NULL;
+ int error;
+ xfs_agnumber_t agno; /* allocation group number */
+ xfs_agblock_t agbno; /* ag start of range to free */
+ xfs_fsblock_t firstfsb;
+ int committed;
+ struct xfs_bmbt_irec imaps[1];
+ int nimaps = 1;
+ int done;
+ struct xfs_bmap_free free_list;
+ struct xfs_bmbt_irec *imap = &eio->rlei_mapping;
+ struct xfs_efd_log_item *efd;
+ unsigned int resblks;
+
+ ASSERT(xfs_is_reflink_inode(ip));
+ agno = XFS_FSB_TO_AGNO(mp, imap->br_startblock);
+ agbno = XFS_FSB_TO_AGBNO(mp, imap->br_startblock);
+ CHECK_AG_NUMBER(mp, agno);
+ CHECK_AG_EXTENT(mp, agbno, 1);
+ ASSERT(imap->br_state == XFS_EXT_NORM);
+
+ trace_xfs_reflink_remap_after_io(ip, imap->br_startoff,
+ eio->rlei_oldfsbno, imap->br_blockcount,
+ imap->br_startblock);
+
+
+ /* Delete temporary mapping */
+ error = remap_search(ip, imap->br_startoff, &eio);
+ if (error)
+ return error;
+ rb_erase(&eio->rlei_node, &ip->i_remaps);
+
+ /* Unmap the old blocks */
+ resblks = XFS_DIOSTRAT_SPACE_RES(mp, imap->br_blockcount * 3);
+ tp = xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, resblks, 0);
+ if (error)
+ goto out_cancel;
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+ xfs_bmap_init(&free_list, &firstfsb);
+ error = xfs_bunmapi(tp, ip, imap->br_startoff, imap->br_blockcount, 0,
+ imap->br_blockcount, &firstfsb, &free_list, &done);
+ if (error)
+ goto out_freelist;
+
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error)
+ goto out_cancel;
+
+ error = xfs_trans_commit(tp);
+ if (error)
+ goto out_error;
+
+ /* Remove the EFD and map the new block into the file. */
+ resblks = XFS_DIOSTRAT_SPACE_RES(mp, imap->br_blockcount * 3);
+ tp = xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, resblks, 0);
+ if (error)
+ goto out_cancel;
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+ efd = xfs_trans_get_efd(tp, eio->rlei_efi, 1);
+ xfs_trans_undelete_extent(tp, efd, imap->br_startblock,
+ imap->br_blockcount);
+
+ error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
+ XFS_BMAPI_REMAP, &imap->br_startblock,
+ imap->br_blockcount, &imaps[0], &nimaps,
+ &free_list);
+ if (error)
+ goto out_freelist;
+
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error)
+ goto out_cancel;
+
+ error = xfs_trans_commit(tp);
+ if (error)
+ goto out_error;
+ return error;
+
+out_freelist:
+ xfs_bmap_cancel(&free_list);
+out_cancel:
+ xfs_trans_cancel(tp);
+out_error:
+ trace_xfs_reflink_remap_after_io_error(ip, error, _RET_IP_);
+ return error;
+}
+
+/**
+ * xfs_reflink_fork_ioend() - remap all blocks after forking
+ *
+ * @ioend: the io completion object
+ */
+int
+xfs_reflink_fork_ioend(
+ struct xfs_ioend *ioend)
+{
+ int error, err2;
+ struct list_head *pos, *n;
+ struct xfs_reflink_ioend *eio;
+ struct xfs_inode *ip = XFS_I(ioend->io_inode);
+ struct xfs_mount *mp = ip->i_mount;
+
+ error = 0;
+ list_for_each_safe(pos, n, &ioend->io_reflink_endio_list) {
+ eio = list_entry(pos, struct xfs_reflink_ioend, rlei_list);
+ err2 = xfs_reflink_remap_after_io(mp, ip, eio);
+ if (error == 0)
+ error = err2;
+ kfree(eio);
+ }
+ return error;
+}
+
+/**
+ * xfs_reflink_should_fork_block() - determine if a block should be forked
+ *
+ * @ip: XFS inode object
+ * @imap: the fileoff:fsblock mapping that we might fork
+ * @offset: the file offset of the block we're examining
+ * @type: set to true if reflinked, false otherwise.
+ */
+int
+xfs_reflink_should_fork_block(
+ struct xfs_inode *ip,
+ struct xfs_bmbt_irec *imap,
+ xfs_off_t offset,
+ bool *type)
+{
+ xfs_fsblock_t fsbno;
+ xfs_off_t iomap_offset;
+ xfs_agnumber_t agno; /* allocation group number */
+ xfs_agblock_t agbno; /* ag start of range to free */
+ xfs_extlen_t len;
+ xfs_nlink_t nr;
+ int error;
+ struct xfs_mount *mp = ip->i_mount;
+
+ if (!xfs_is_reflink_inode(ip) ||
+ ISUNWRITTEN(imap) ||
+ imap->br_startblock == HOLESTARTBLOCK ||
+ imap->br_startblock == DELAYSTARTBLOCK) {
+ *type = false;
+ return 0;
+ }
+
+ iomap_offset = XFS_FSB_TO_B(mp, imap->br_startoff);
+ fsbno = imap->br_startblock + XFS_B_TO_FSB(mp, offset - iomap_offset);
+ agno = XFS_FSB_TO_AGNO(mp, fsbno);
+ agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+ CHECK_AG_NUMBER(mp, agno);
+ CHECK_AG_EXTENT(mp, agbno, 1);
+ ASSERT(imap->br_state == XFS_EXT_NORM);
+
+ error = xfs_reflink_get_refcount(mp, agno, agbno, &len, &nr);
+ if (error)
+ return error;
+ ASSERT(len != 0);
+ *type = (nr > 1);
+ return error;
+}
+
+/* Cancel a forked block being held for a CoW operation */
+STATIC int
+xfs_reflink_free_forked(
+ struct xfs_mount *mp,
+ struct xfs_inode *ip,
+ struct xfs_reflink_ioend *eio)
+{
+ struct xfs_trans *tp = NULL;
+ int error;
+ xfs_agnumber_t agno; /* allocation group number */
+ xfs_agblock_t agbno; /* ag start of range to free */
+ xfs_fsblock_t firstfsb;
+ int committed;
+ struct xfs_bmap_free free_list;
+ struct xfs_bmbt_irec *imap = &eio->rlei_mapping;
+ struct xfs_efd_log_item *efd;
+ unsigned int resblks;
+
+ ASSERT(xfs_is_reflink_inode(ip));
+ agno = XFS_FSB_TO_AGNO(mp, imap->br_startblock);
+ agbno = XFS_FSB_TO_AGBNO(mp, imap->br_startblock);
+ CHECK_AG_NUMBER(mp, agno);
+ CHECK_AG_EXTENT(mp, agbno, 1);
+ ASSERT(imap->br_state == XFS_EXT_NORM);
+
+ trace_xfs_reflink_free_forked(ip, imap->br_startoff,
+ eio->rlei_oldfsbno, imap->br_blockcount,
+ imap->br_startblock);
+
+ /* Remove the EFD and map the new block into the file. */
+ resblks = XFS_DIOSTRAT_SPACE_RES(mp, imap->br_blockcount * 3);
+ tp = xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, resblks, 0);
+ if (error)
+ goto out_cancel;
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+ efd = xfs_trans_get_efd(tp, eio->rlei_efi, 1);
+ xfs_trans_undelete_extent(tp, efd, imap->br_startblock,
+ imap->br_blockcount);
+
+ xfs_bmap_init(&free_list, &firstfsb);
+ xfs_bmap_add_free(mp, &free_list, imap->br_startblock, 1, NULL);
+
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error)
+ goto out_cancel;
+
+ error = xfs_trans_commit(tp);
+ if (error)
+ goto out_error;
+ return error;
+
+out_cancel:
+ xfs_trans_cancel(tp);
+out_error:
+ trace_xfs_reflink_free_forked_error(ip, error, _RET_IP_);
+ return error;
+}
+
+/**
+ * xfs_reflink_cancel_fork_ioend() - free all forked blocks attached to an ioend
+ *
+ * @ioend: the io completion object
+ */
+int
+xfs_reflink_cancel_fork_ioend(
+ struct xfs_ioend *ioend)
+{
+ int error, err2;
+ struct list_head *pos, *n;
+ struct xfs_reflink_ioend *eio;
+ struct xfs_inode *ip = XFS_I(ioend->io_inode);
+ struct xfs_mount *mp = ip->i_mount;
+
+ error = 0;
+ list_for_each_safe(pos, n, &ioend->io_reflink_endio_list) {
+ eio = list_entry(pos, struct xfs_reflink_ioend, rlei_list);
+ err2 = xfs_reflink_free_forked(mp, ip, eio);
+ if (error == 0)
+ error = err2;
+ kfree(eio);
+ }
+ return error;
+}
+
+/**
+ * xfs_reflink_cancel_fork_blocks() -- Free all forked blocks attached to
+ * an inode.
+ *
+ * @ip: The inode.
+ */
+int
+xfs_reflink_cancel_fork_blocks(
+ struct xfs_inode *ip)
+{
+ struct rb_node *node;
+ struct xfs_reflink_ioend *eio;
+ int error = 0;
+ int err2;
+
+ while ((node = rb_first(&ip->i_remaps))) {
+ eio = rb_entry(node, struct xfs_reflink_ioend, rlei_node);
+ err2 = xfs_reflink_free_forked(ip->i_mount, ip, eio);
+ if (error == 0)
+ error = err2;
+ rb_erase(node, &ip->i_remaps);
+ kfree(eio);
+ }
+
+ return error;
+}
+
+/**
+ * xfs_reflink_add_ioend() -- Hook ourselves up to the ioend processing
+ * so that we can finish forking a block after
+ * the write completes.
+ *
+ * @ioend: The regular ioend structure.
+ * @eio: The reflink ioend context.
+ */
+void
+xfs_reflink_add_ioend(
+ struct xfs_ioend *ioend,
+ struct xfs_reflink_ioend *eio)
+{
+ list_add_tail(&eio->rlei_list, &ioend->io_reflink_endio_list);
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
new file mode 100644
index 0000000..b3e12d2
--- /dev/null
+++ b/fs/xfs/xfs_reflink.h
@@ -0,0 +1,41 @@
+/*
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+#ifndef __XFS_REFLINK_H
+#define __XFS_REFLINK_H 1
+
+struct xfs_reflink_ioend;
+
+extern int xfs_reflink_get_refcount(struct xfs_mount *mp, xfs_agnumber_t agno,
+ xfs_agblock_t agbno, xfs_extlen_t *len, xfs_nlink_t *nr);
+extern int xfs_reflink_write_fork_block(struct xfs_inode *ip,
+ struct xfs_bmbt_irec *imap, xfs_off_t offset,
+ unsigned int *type, struct xfs_reflink_ioend **peio);
+extern int xfs_reflink_reserve_fork_block(struct xfs_inode *ip,
+ xfs_off_t pos, xfs_off_t len);
+extern int xfs_reflink_redirect_directio_write(struct xfs_inode *ip,
+ struct xfs_bmbt_irec *imap, xfs_off_t offset);
+extern int xfs_reflink_cancel_fork_ioend(struct xfs_ioend *ioend);
+extern int xfs_reflink_cancel_fork_blocks(struct xfs_inode *ip);
+extern int xfs_reflink_fork_ioend(struct xfs_ioend *ioend);
+extern void xfs_reflink_add_ioend(struct xfs_ioend *ioend,
+ struct xfs_reflink_ioend *eio);
+
+extern int xfs_reflink_should_fork_block(struct xfs_inode *ip,
+ struct xfs_bmbt_irec *imap, xfs_off_t offset, bool *type);
+
+#endif /* __XFS_REFLINK_H */
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 50fe77e..07e8460 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -223,6 +223,9 @@ struct xfs_efd_log_item *xfs_trans_get_efd(xfs_trans_t *,
int xfs_trans_free_extent(struct xfs_trans *,
struct xfs_efd_log_item *, xfs_fsblock_t,
xfs_extlen_t, struct xfs_owner_info *);
+void xfs_trans_undelete_extent(struct xfs_trans *,
+ struct xfs_efd_log_item *, xfs_fsblock_t,
+ xfs_extlen_t);
int xfs_trans_commit(struct xfs_trans *);
int __xfs_trans_roll(struct xfs_trans **, struct xfs_inode *, int *);
int xfs_trans_roll(struct xfs_trans **, struct xfs_inode *);
diff --git a/fs/xfs/xfs_trans_extfree.c b/fs/xfs/xfs_trans_extfree.c
index d1b8833..a2fed6e 100644
--- a/fs/xfs/xfs_trans_extfree.c
+++ b/fs/xfs/xfs_trans_extfree.c
@@ -146,3 +146,30 @@ xfs_trans_free_extent(
return error;
}
+
+/*
+ * Undelete this extent, by logging it to the EFD. Note that the transaction is
+ * marked dirty regardless of whether the extent free succeeds or fails to
+ * support the EFI/EFD lifecycle rules. This should only be used when the
+ * ownership of the extent hasn't changed, i.e. reflink copy-on-write.
+ */
+void
+xfs_trans_undelete_extent(
+ struct xfs_trans *tp,
+ struct xfs_efd_log_item *efdp,
+ xfs_fsblock_t start_block,
+ xfs_extlen_t ext_len)
+{
+ uint next_extent;
+ struct xfs_extent *extp;
+
+ tp->t_flags |= XFS_TRANS_DIRTY;
+ efdp->efd_item.li_desc->lid_flags |= XFS_LID_DIRTY;
+
+ next_extent = efdp->efd_next_extent;
+ ASSERT(next_extent < efdp->efd_format.efd_nextents);
+ extp = &(efdp->efd_format.efd_extents[next_extent]);
+ extp->ext_start = start_block;
+ extp->ext_len = ext_len;
+ efdp->efd_next_extent++;
+}
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 47/58] xfs: handle directio copy-on-write for reflinked blocks
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (45 preceding siblings ...)
2015-10-07 5:00 ` [PATCH 46/58] xfs: implement copy-on-write for reflinked blocks Darrick J. Wong
@ 2015-10-07 5:00 ` Darrick J. Wong
2015-10-07 5:00 ` [PATCH 48/58] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks Darrick J. Wong
` (10 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:00 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
We hope that CoW writes will be rare and that directio CoW writes will
be even more rare. Therefore, fall-back any such write to the
buffered path.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_aops.c | 6 ++++++
fs/xfs/xfs_file.c | 12 ++++++++++--
fs/xfs/xfs_reflink.c | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 06a1d2f..4ab24fa 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1491,6 +1491,12 @@ __xfs_get_blocks(
if (imap.br_startblock != HOLESTARTBLOCK &&
imap.br_startblock != DELAYSTARTBLOCK &&
(create || !ISUNWRITTEN(&imap))) {
+ if (create && direct) {
+ error = xfs_reflink_redirect_directio_write(ip, &imap,
+ offset);
+ if (error)
+ return error;
+ }
xfs_map_buffer(inode, bh_result, &imap, offset);
if (ISUNWRITTEN(&imap))
set_buffer_unwritten(bh_result);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 593223f..fc5b9ea 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -876,10 +876,18 @@ xfs_file_write_iter(
if (XFS_FORCED_SHUTDOWN(ip->i_mount))
return -EIO;
- if ((iocb->ki_flags & IOCB_DIRECT) || IS_DAX(inode))
+ /*
+ * Allow DIO to fall back to buffered *only* in the case that we're
+ * doing a reflink CoW.
+ */
+ if ((iocb->ki_flags & IOCB_DIRECT) || IS_DAX(inode)) {
ret = xfs_file_dio_aio_write(iocb, from);
- else
+ if (ret == -EREMCHG)
+ goto buffered;
+ } else {
+buffered:
ret = xfs_file_buffered_aio_write(iocb, from);
+ }
if (ret > 0) {
ssize_t err;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 1e00be2..226e23f 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -407,6 +407,40 @@ advloop:
}
/**
+ * xfs_reflink_redirect_directio_write() - bounce a directio write to a
+ * reflinked region down to buffered
+ * write mode.
+ *
+ * @ip: XFS inode object
+ * @imap: the fileoff:fsblock mapping that we might fork
+ * @offset: the file byte offset of the block we're examining
+ */
+int
+xfs_reflink_redirect_directio_write(
+ struct xfs_inode *ip,
+ struct xfs_bmbt_irec *imap,
+ xfs_off_t offset)
+{
+ bool type = false;
+ int error;
+
+ error = xfs_reflink_should_fork_block(ip, imap, offset, &type);
+ if (error)
+ return error;
+ if (!type)
+ return 0;
+
+ /*
+ * Are we doing a DIO write to a reflinked block? In the ideal world
+ * we at least would fork full blocks, but for now just fall back to
+ * buffered mode. Yuck. Use -EREMCHG ("remote address changed") to
+ * signal this, since in general XFS doesn't do this sort of fallback.
+ */
+ trace_xfs_reflink_bounce_direct_write(ip, imap);
+ return -EREMCHG;
+}
+
+/**
* xfs_reflink_write_fork_block() -- find a remapping object and redirect the
* write.
*
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 48/58] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (46 preceding siblings ...)
2015-10-07 5:00 ` [PATCH 47/58] xfs: handle directio " Darrick J. Wong
@ 2015-10-07 5:00 ` Darrick J. Wong
2015-10-21 21:17 ` Darrick J. Wong
2015-10-07 5:00 ` [PATCH 49/58] xfs: clear inode reflink flag when freeing blocks Darrick J. Wong
` (9 subsequent siblings)
57 siblings, 1 reply; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:00 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
When we're writing zeroes to a reflinked block (such as when we're
punching a reflinked range), we need to fork the the block and write
to that, otherwise we can corrupt the other reflinks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_bmap_util.c | 25 +++++++-
fs/xfs/xfs_reflink.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_reflink.h | 6 ++
3 files changed, 183 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 245a34a..b054b28 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -41,6 +41,7 @@
#include "xfs_icache.h"
#include "xfs_log.h"
#include "xfs_rmap_btree.h"
+#include "xfs_reflink.h"
/* Kernel only BMAP related definitions and functions */
@@ -1095,7 +1096,9 @@ xfs_zero_remaining_bytes(
xfs_buf_t *bp;
xfs_mount_t *mp = ip->i_mount;
int nimap;
- int error = 0;
+ int error = 0, err2;
+ bool should_fork;
+ struct xfs_trans *tp;
/*
* Avoid doing I/O beyond eof - it's not necessary
@@ -1136,8 +1139,14 @@ xfs_zero_remaining_bytes(
if (lastoffset > endoff)
lastoffset = endoff;
+ /* Do we need to CoW this block? */
+ error = xfs_reflink_should_fork_block(ip, &imap, offset,
+ &should_fork);
+ if (error)
+ return error;
+
/* DAX can just zero the backing device directly */
- if (IS_DAX(VFS_I(ip))) {
+ if (IS_DAX(VFS_I(ip)) && !should_fork) {
error = dax_zero_page_range(VFS_I(ip), offset,
lastoffset - offset + 1,
xfs_get_blocks_direct);
@@ -1158,10 +1167,22 @@ xfs_zero_remaining_bytes(
(offset - XFS_FSB_TO_B(mp, imap.br_startoff)),
0, lastoffset - offset + 1);
+ tp = NULL;
+ if (should_fork) {
+ error = xfs_reflink_fork_buf(ip, bp, offset_fsb, &tp);
+ if (error)
+ return error;
+ }
+
error = xfs_bwrite(bp);
+
+ err2 = xfs_reflink_finish_fork_buf(ip, bp, offset_fsb, tp,
+ error, imap.br_startblock);
xfs_buf_relse(bp);
if (error)
return error;
+ if (err2)
+ return err2;
}
return error;
}
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 226e23f..f5eed2f 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -788,3 +788,157 @@ xfs_reflink_add_ioend(
{
list_add_tail(&eio->rlei_list, &ioend->io_reflink_endio_list);
}
+
+/**
+ * xfs_reflink_fork_buf() - start a transaction to fork a buffer (if needed)
+ *
+ * @mp: XFS mount point
+ * @ip: XFS inode
+ * @bp: the buffer that we might need to fork
+ * @fileoff: file offset of the buffer
+ * @ptp: pointer to an XFS transaction
+ */
+int
+xfs_reflink_fork_buf(
+ struct xfs_inode *ip,
+ struct xfs_buf *bp,
+ xfs_fileoff_t fileoff,
+ struct xfs_trans **ptp)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp;
+ xfs_fsblock_t fsbno;
+ xfs_fsblock_t new_fsbno;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ uint resblks;
+ int error;
+
+ fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
+ agno = XFS_FSB_TO_AGNO(mp, fsbno);
+ agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+ CHECK_AG_NUMBER(mp, agno);
+ CHECK_AG_EXTENT(mp, agno, 1);
+
+ /*
+ * Get ready to remap the thing...
+ */
+ resblks = XFS_DIOSTRAT_SPACE_RES(mp, 3);
+ tp = xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, resblks, 0);
+
+ /*
+ * check for running out of space
+ */
+ if (error) {
+ /*
+ * Free the transaction structure.
+ */
+ ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
+ goto out_cancel;
+ }
+ error = xfs_trans_reserve_quota(tp, mp,
+ ip->i_udquot, ip->i_gdquot, ip->i_pdquot,
+ resblks, 0, XFS_QMOPT_RES_REGBLKS);
+ if (error)
+ goto out_cancel;
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+ /* fork block, remap buffer */
+ error = fork_one_block(mp, tp, ip, fsbno, &new_fsbno, fileoff);
+ if (error)
+ goto out_cancel;
+
+ trace_xfs_reflink_fork_buf(ip, fileoff, fsbno, 1, new_fsbno);
+
+ XFS_BUF_SET_ADDR(bp, XFS_FSB_TO_DADDR(mp, new_fsbno));
+ *ptp = tp;
+ return error;
+
+out_cancel:
+ xfs_trans_cancel(tp);
+ trace_xfs_reflink_fork_buf_error(ip, error, _RET_IP_);
+ return error;
+}
+
+/**
+ * xfs_reflink_finish_fork_buf() - finish forking a file buffer
+ *
+ * @ip: XFS inode
+ * @bp: the buffer that was forked
+ * @fileoff: file offset of the buffer
+ * @tp: transaction that was returned from xfs_reflink_fork_buf()
+ * @write_error: status code from writing the block
+ */
+int
+xfs_reflink_finish_fork_buf(
+ struct xfs_inode *ip,
+ struct xfs_buf *bp,
+ xfs_fileoff_t fileoff,
+ struct xfs_trans *tp,
+ int write_error,
+ xfs_fsblock_t old_fsbno)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_bmap_free free_list;
+ xfs_fsblock_t firstfsb;
+ xfs_fsblock_t fsbno;
+ struct xfs_bmbt_irec imaps[1];
+ xfs_agnumber_t agno;
+ int nimaps = 1;
+ int done;
+ int error = write_error;
+ int committed;
+ struct xfs_owner_info oinfo;
+
+ if (tp == NULL)
+ return 0;
+
+ fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
+ agno = XFS_FSB_TO_AGNO(mp, fsbno);
+ XFS_RMAP_INO_OWNER(&oinfo, ip->i_ino, XFS_DATA_FORK, fileoff);
+ if (write_error != 0)
+ goto out_write_error;
+
+ trace_xfs_reflink_fork_buf(ip, fileoff, old_fsbno, 1, fsbno);
+ /*
+ * Remap the old blocks.
+ */
+ xfs_bmap_init(&free_list, &firstfsb);
+ error = xfs_bunmapi(tp, ip, fileoff, 1, 0, 1, &firstfsb, &free_list,
+ &done);
+ if (error)
+ goto out_free;
+ ASSERT(done == 1);
+
+ error = xfs_bmapi_write(tp, ip, fileoff, 1, XFS_BMAPI_REMAP, &fsbno,
+ 1, &imaps[0], &nimaps, &free_list);
+ if (error)
+ goto out_free;
+
+ /*
+ * complete the transaction
+ */
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error)
+ goto out_cancel;
+
+ error = xfs_trans_commit(tp);
+ if (error)
+ goto out_error;
+
+ return error;
+out_free:
+ xfs_bmap_finish(&tp, &free_list, &committed);
+out_write_error:
+ done = xfs_free_extent(tp, fsbno, 1, &oinfo);
+ if (error == 0)
+ error = done;
+out_cancel:
+ xfs_trans_cancel(tp);
+out_error:
+ trace_xfs_reflink_finish_fork_buf_error(ip, error, _RET_IP_);
+ return error;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index b3e12d2..ce00cf6 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -38,4 +38,10 @@ extern void xfs_reflink_add_ioend(struct xfs_ioend *ioend,
extern int xfs_reflink_should_fork_block(struct xfs_inode *ip,
struct xfs_bmbt_irec *imap, xfs_off_t offset, bool *type);
+extern int xfs_reflink_fork_buf(struct xfs_inode *ip, struct xfs_buf *bp,
+ xfs_fileoff_t fileoff, struct xfs_trans **ptp);
+extern int xfs_reflink_finish_fork_buf(struct xfs_inode *ip, struct xfs_buf *bp,
+ xfs_fileoff_t fileoff, struct xfs_trans *tp, int write_error,
+ xfs_fsblock_t old_fsbno);
+
#endif /* __XFS_REFLINK_H */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 49/58] xfs: clear inode reflink flag when freeing blocks
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (47 preceding siblings ...)
2015-10-07 5:00 ` [PATCH 48/58] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks Darrick J. Wong
@ 2015-10-07 5:00 ` Darrick J. Wong
2015-10-07 5:00 ` [PATCH 50/58] xfs: reflink extents from one file to another Darrick J. Wong
` (8 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:00 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Clear the inode reflink flag when freeing or truncating all blocks
in a file.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_bmap_util.c | 8 ++++++++
fs/xfs/xfs_inode.c | 6 ++++++
2 files changed, 14 insertions(+)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index b054b28..b20d136 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1338,6 +1338,14 @@ xfs_free_file_space(
}
/*
+ * Clear the reflink flag if we freed everything.
+ */
+ if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip)) {
+ ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ }
+
+ /*
* complete the transaction
*/
error = xfs_bmap_finish(&tp, &free_list, &committed);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index dc40a6d..61a468b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1613,6 +1613,12 @@ xfs_itruncate_extents(
}
/*
+ * Clear the reflink flag if we truncated everything.
+ */
+ if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip))
+ ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+
+ /*
* Always re-log the inode so that our permanent transaction can keep
* on rolling it forward in the log.
*/
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 50/58] xfs: reflink extents from one file to another
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (48 preceding siblings ...)
2015-10-07 5:00 ` [PATCH 49/58] xfs: clear inode reflink flag when freeing blocks Darrick J. Wong
@ 2015-10-07 5:00 ` Darrick J. Wong
2015-10-07 5:12 ` kbuild test robot
2015-10-07 5:00 ` [PATCH 51/58] xfs: add clone file and clone range ioctls Darrick J. Wong
` (7 subsequent siblings)
57 siblings, 1 reply; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:00 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Reflink extents from one file to another; that is to say, iteratively
remove the mappings from the destination file, copy the mappings from
the source file to the destination file, and increment the reference
count of all the blocks that got remapped.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_reflink.c | 511 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_reflink.h | 3
2 files changed, 514 insertions(+)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index f5eed2f..ac81b02 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -942,3 +942,514 @@ out_error:
trace_xfs_reflink_finish_fork_buf_error(ip, error, _RET_IP_);
return error;
}
+
+/*
+ * Reflinking (Block) Ranges of Two Files Together
+ *
+ * First, ensure that the reflink flag is set on both inodes. The flag is an
+ * optimization to avoid unnecessary refcount btree lookups in the write path.
+ *
+ * Now we can iteratively remap the range of extents (and holes) in src to the
+ * corresponding ranges in dest. Let drange and srange denote the ranges of
+ * logical blocks in dest and src touched by the reflink operation.
+ *
+ * While the length of drange is greater than zero,
+ * - Read src's bmbt at the start of srange ("imap")
+ * - If imap doesn't exist, make imap appear to start at the end of srange
+ * with zero length.
+ * - If imap starts before srange, advance imap to start at srange.
+ * - If imap goes beyond srange, truncate imap to end at the end of srange.
+ * - Punch (imap start - srange start + imap len) blocks from dest at
+ * offset (drange start).
+ * - If imap points to a real range of pblks,
+ * > Increase the refcount of the imap's pblks
+ * > Map imap's pblks into dest at the offset
+ * (drange start + imap start - srange start)
+ * - Advance drange and srange by (imap start - srange start + imap len)
+ *
+ * Finally, if the reflink made dest longer, update both the in-core and
+ * on-disk file sizes.
+ *
+ * ASCII Art Demonstration:
+ *
+ * Let's say we want to reflink this source file:
+ *
+ * ----SSSSSSS-SSSSS----SSSSSS (src file)
+ * <-------------------->
+ *
+ * into this destination file:
+ *
+ * --DDDDDDDDDDDDDDDDDDD--DDD (dest file)
+ * <-------------------->
+ * '-' means a hole, and 'S' and 'D' are written blocks in the src and dest.
+ * Observe that the range has different logical offsets in either file.
+ *
+ * Consider that the first extent in the source file doesn't line up with our
+ * reflink range. Unmapping and remapping are separate operations, so we can
+ * unmap more blocks from the destination file than we remap.
+ *
+ * ----SSSSSSS-SSSSS----SSSSSS
+ * <------->
+ * --DDDDD---------DDDDD--DDD
+ * <------->
+ *
+ * Now remap the source extent into the destination file:
+ *
+ * ----SSSSSSS-SSSSS----SSSSSS
+ * <------->
+ * --DDDDD--SSSSSSSDDDDD--DDD
+ * <------->
+ *
+ * Do likewise with the second hole and extent in our range. Holes in the
+ * unmap range don't affect our operation.
+ *
+ * ----SSSSSSS-SSSSS----SSSSSS
+ * <---->
+ * --DDDDD--SSSSSSS-SSSSS-DDD
+ * <---->
+ *
+ * Finally, unmap and remap part of the third extent. This will increase the
+ * size of the destination file.
+ *
+ * ----SSSSSSS-SSSSS----SSSSSS
+ * <----->
+ * --DDDDD--SSSSSSS-SSSSS----SSS
+ * <----->
+ *
+ * Once we update the destination file's i_size, we're done.
+ */
+
+/*
+ * Ensure the reflink bit is set in both inodes.
+ */
+STATIC int
+set_inode_reflink_flag(
+ struct xfs_inode *src,
+ struct xfs_inode *dest)
+{
+ struct xfs_mount *mp = src->i_mount;
+ int error;
+ struct xfs_trans *tp;
+
+ if (xfs_is_reflink_inode(src) && xfs_is_reflink_inode(dest))
+ return 0;
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
+
+ /*
+ * check for running out of space
+ */
+ if (error) {
+ /*
+ * Free the transaction structure.
+ */
+ ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
+ goto out_cancel;
+ }
+
+ /* Lock both files against IO */
+ if (src->i_ino == dest->i_ino)
+ xfs_ilock(src, XFS_ILOCK_EXCL);
+ else
+ xfs_lock_two_inodes(src, dest, XFS_ILOCK_EXCL);
+
+ if (!xfs_is_reflink_inode(src)) {
+ trace_xfs_reflink_set_inode_flag(src);
+ xfs_trans_ijoin(tp, src, XFS_ILOCK_EXCL);
+ src->i_d.di_flags2 |= XFS_DIFLAG2_REFLINK;
+ xfs_trans_log_inode(tp, src, XFS_ILOG_CORE);
+ } else
+ xfs_iunlock(src, XFS_ILOCK_EXCL);
+
+ if (src->i_ino == dest->i_ino)
+ goto commit_flags;
+
+ if (!xfs_is_reflink_inode(dest)) {
+ trace_xfs_reflink_set_inode_flag(dest);
+ xfs_trans_ijoin(tp, dest, XFS_ILOCK_EXCL);
+ dest->i_d.di_flags2 |= XFS_DIFLAG2_REFLINK;
+ xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE);
+ } else
+ xfs_iunlock(dest, XFS_ILOCK_EXCL);
+
+commit_flags:
+ error = xfs_trans_commit(tp);
+ if (error)
+ goto out_error;
+ return error;
+
+out_cancel:
+ xfs_trans_cancel(tp);
+out_error:
+ trace_xfs_reflink_set_inode_flag_error(dest, error, _RET_IP_);
+ return error;
+}
+
+/*
+ * Update destination inode size, if necessary.
+ */
+STATIC int
+update_dest_isize(
+ struct xfs_inode *dest,
+ xfs_off_t newlen)
+{
+ struct xfs_mount *mp = dest->i_mount;
+ struct xfs_trans *tp;
+ int error;
+
+ if (newlen <= i_size_read(VFS_I(dest)))
+ return 0;
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
+
+ /*
+ * check for running out of space
+ */
+ if (error) {
+ /*
+ * Free the transaction structure.
+ */
+ ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
+ goto out_cancel;
+ }
+
+ xfs_ilock(dest, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, dest, XFS_ILOCK_EXCL);
+
+ trace_xfs_reflink_update_inode_size(dest, newlen);
+ i_size_write(VFS_I(dest), newlen);
+ dest->i_d.di_size = newlen;
+ xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE);
+
+ error = xfs_trans_commit(tp);
+ if (error)
+ goto out_error;
+ return error;
+
+out_cancel:
+ xfs_trans_cancel(tp);
+out_error:
+ trace_xfs_reflink_update_inode_size_error(dest, error, _RET_IP_);
+ return error;
+}
+
+/*
+ * Punch a range of file blocks, assuming that there's no remapping in
+ * progress and that the file is eligible for reflink.
+ *
+ * XXX: Could we just use xfs_free_file_space?
+ */
+STATIC int
+punch_range(
+ struct xfs_inode *dest,
+ xfs_fileoff_t off,
+ xfs_filblks_t len)
+{
+ struct xfs_mount *mp = dest->i_mount;
+ int error, done;
+ uint resblks;
+ struct xfs_trans *tp;
+ xfs_fsblock_t firstfsb;
+ struct xfs_bmap_free free_list;
+ int committed;
+
+ /*
+ * free file space until done or until there is an error
+ */
+ trace_xfs_reflink_punch_range(dest, off, len);
+ resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
+ error = done = 0;
+ while (!error && !done) {
+ /*
+ * allocate and setup the transaction. Allow this
+ * transaction to dip into the reserve blocks to ensure
+ * the freeing of the space succeeds at ENOSPC.
+ */
+ tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, resblks, 0);
+
+ /*
+ * check for running out of space
+ */
+ if (error) {
+ /*
+ * Free the transaction structure.
+ */
+ ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
+ goto out_cancel;
+ }
+ xfs_ilock(dest, XFS_ILOCK_EXCL);
+ error = xfs_trans_reserve_quota(tp, mp,
+ dest->i_udquot, dest->i_gdquot, dest->i_pdquot,
+ resblks, 0, XFS_QMOPT_RES_REGBLKS);
+ if (error)
+ goto out_cancel;
+
+ xfs_trans_ijoin(tp, dest, XFS_ILOCK_EXCL);
+
+ /*
+ * issue the bunmapi() call to free the blocks
+ */
+ xfs_bmap_init(&free_list, &firstfsb);
+ error = xfs_bunmapi(tp, dest, off, len,
+ 0, 2, &firstfsb, &free_list, &done);
+ if (error)
+ goto out_freelist;
+
+ /*
+ * complete the transaction
+ */
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error)
+ goto out_freelist;
+
+ error = xfs_trans_commit(tp);
+ }
+ if (error)
+ goto out_error;
+
+ return error;
+out_freelist:
+ xfs_bmap_cancel(&free_list);
+out_cancel:
+ xfs_trans_cancel(tp);
+out_error:
+ trace_xfs_reflink_punch_range_error(dest, error, _RET_IP_);
+ return error;
+}
+
+/*
+ * Reflink a continuous range of blocks.
+ */
+STATIC int
+remap_one_range(
+ struct xfs_inode *dest,
+ struct xfs_bmbt_irec *imap,
+ xfs_fileoff_t destoff)
+{
+ struct xfs_mount *mp = dest->i_mount;
+ int error;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ struct xfs_trans *tp;
+ uint resblks;
+ struct xfs_buf *agbp;
+ xfs_fsblock_t firstfsb;
+ struct xfs_bmap_free free_list;
+ struct xfs_bmbt_irec imap_tmp;
+ int nimaps;
+ int committed;
+
+ resblks = XFS_DIOSTRAT_SPACE_RES(mp, 1);
+ tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, resblks, 0);
+ /*
+ * Check for running out of space
+ */
+ if (error) {
+ /*
+ * Free the transaction structure.
+ */
+ ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
+ goto out_cancel;
+ }
+
+ xfs_ilock(dest, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, dest, XFS_ILOCK_EXCL);
+
+ /* Update the refcount tree */
+ agno = XFS_FSB_TO_AGNO(mp, imap->br_startblock);
+ agbno = XFS_FSB_TO_AGBNO(mp, imap->br_startblock);
+ error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
+ if (error)
+ goto out_cancel;
+ xfs_bmap_init(&free_list, &firstfsb);
+ error = xfs_refcount_increase(mp, tp, agbp, agno, agbno,
+ imap->br_blockcount, &free_list);
+ xfs_trans_brelse(tp, agbp);
+ if (error)
+ goto out_freelist;
+
+ /* Add this extent to the destination file */
+ trace_xfs_reflink_remap_range(dest, destoff, imap->br_blockcount,
+ imap->br_startblock);
+ nimaps = 1;
+ error = xfs_bmapi_write(tp, dest, destoff, imap->br_blockcount,
+ XFS_BMAPI_REMAP, &imap->br_startblock,
+ imap->br_blockcount, &imap_tmp, &nimaps,
+ &free_list);
+ if (error)
+ goto out_freelist;
+
+ /*
+ * Complete the transaction
+ */
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error)
+ goto out_freelist;
+
+ error = xfs_trans_commit(tp);
+ if (error)
+ goto out_error;
+ return error;
+
+out_freelist:
+ xfs_bmap_cancel(&free_list);
+out_cancel:
+ xfs_trans_cancel(tp);
+out_error:
+ trace_xfs_reflink_remap_range_error(dest, error, _RET_IP_);
+ return error;
+}
+
+/**
+ * Iteratively remap one file's extents (and holes) to another's.
+ */
+#define IMAPNEXT(i) ((i).br_startoff + (i).br_blockcount)
+STATIC int
+remap_blocks(
+ struct xfs_inode *src,
+ xfs_fileoff_t srcoff,
+ struct xfs_inode *dest,
+ xfs_fileoff_t destoff,
+ xfs_filblks_t len)
+{
+ struct xfs_bmbt_irec imap;
+ int nimaps;
+ int error;
+ xfs_fileoff_t srcioff;
+
+ /* drange = (destoff, destoff + len); srange = (srcoff, srcoff + len) */
+ while (len) {
+ trace_xfs_reflink_main_loop(src, srcoff, len, dest, destoff);
+ /* Read extent from the source file */
+ nimaps = 1;
+ xfs_ilock(src, XFS_ILOCK_EXCL);
+ error = xfs_bmapi_read(src, srcoff, len, &imap, &nimaps, 0);
+ xfs_iunlock(src, XFS_ILOCK_EXCL);
+ if (error)
+ break;
+
+ /*
+ * If imap doesn't exist, pretend that it does just past
+ * srange.
+ */
+ if (nimaps == 0) {
+ imap.br_startoff = srcoff + len;
+ imap.br_startblock = HOLESTARTBLOCK;
+ imap.br_blockcount = 0;
+ imap.br_state = XFS_EXT_INVALID;
+ }
+ trace_xfs_reflink_read_iomap(src, srcoff, len, XFS_IO_FORKED,
+ &imap);
+
+ /* If imap starts before srange, advance it to start there */
+ if (imap.br_startoff < srcoff) {
+ imap.br_blockcount -= srcoff - imap.br_startoff;
+ imap.br_startoff = srcoff;
+ }
+
+ /* If imap ends after srange, truncate it to match srange */
+ if (IMAPNEXT(imap) > srcoff + len)
+ imap.br_blockcount -= IMAPNEXT(imap) - (srcoff + len);
+
+ srcioff = imap.br_startoff - srcoff;
+
+ /* Punch logical blocks from drange */
+ error = punch_range(dest, destoff,
+ srcioff + imap.br_blockcount);
+ if (error)
+ break;
+
+ /*
+ * If imap points to real blocks, increase refcount and map;
+ * otherwise, skip it.
+ */
+ if (imap.br_startblock == HOLESTARTBLOCK ||
+ imap.br_startblock == DELAYSTARTBLOCK ||
+ ISUNWRITTEN(&imap))
+ goto advloop;
+
+ error = remap_one_range(dest, &imap, destoff + srcioff);
+ if (error)
+ break;
+advloop:
+ /* Advance drange/srange */
+ srcoff += srcioff + imap.br_blockcount;
+ destoff += srcioff + imap.br_blockcount;
+ len -= srcioff + imap.br_blockcount;
+ }
+
+ return error;
+}
+#undef IMAPNEXT
+
+/**
+ * xfs_reflink() - link a range of blocks from one inode to another
+ *
+ * @src: Inode to clone from
+ * @srcoff: Offset within source to start clone from
+ * @dest: Inode to clone to
+ * @destoff: Offset within @inode to start clone
+ * @len: Original length, passed by user, of range to clone
+ */
+int
+xfs_reflink(
+ struct xfs_inode *src,
+ xfs_off_t srcoff,
+ struct xfs_inode *dest,
+ xfs_off_t destoff,
+ xfs_off_t len)
+{
+ struct xfs_mount *mp = src->i_mount;
+ xfs_fileoff_t sfsbno, dfsbno;
+ xfs_filblks_t fsblen;
+ int error;
+
+ if (!xfs_sb_version_hasreflink(&mp->m_sb))
+ return -EOPNOTSUPP;
+
+ if (XFS_FORCED_SHUTDOWN(mp))
+ return -EIO;
+
+ /* Don't reflink realtime inodes */
+ if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
+ return -EINVAL;
+
+ trace_xfs_reflink_range(src, srcoff, len, dest, destoff);
+
+ /* Lock both files against IO */
+ if (src->i_ino == dest->i_ino) {
+ xfs_ilock(src, XFS_IOLOCK_EXCL);
+ xfs_ilock(src, XFS_MMAPLOCK_EXCL);
+ } else {
+ xfs_lock_two_inodes(src, dest, XFS_IOLOCK_EXCL);
+ xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
+ }
+
+ error = set_inode_reflink_flag(src, dest);
+ if (error)
+ goto out_error;
+
+ dfsbno = XFS_B_TO_FSBT(mp, destoff);
+ sfsbno = XFS_B_TO_FSBT(mp, srcoff);
+ fsblen = XFS_B_TO_FSB(mp, len);
+ error = remap_blocks(src, sfsbno, dest, dfsbno, fsblen);
+ if (error)
+ goto out_error;
+
+ error = update_dest_isize(dest, destoff + len);
+
+out_error:
+ xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
+ xfs_iunlock(src, XFS_IOLOCK_EXCL);
+ if (src->i_ino != dest->i_ino) {
+ xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
+ xfs_iunlock(dest, XFS_IOLOCK_EXCL);
+ }
+ if (error)
+ trace_xfs_reflink_range_error(dest, error, _RET_IP_);
+ return error;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index ce00cf6..b633824 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -44,4 +44,7 @@ extern int xfs_reflink_finish_fork_buf(struct xfs_inode *ip, struct xfs_buf *bp,
xfs_fileoff_t fileoff, struct xfs_trans *tp, int write_error,
xfs_fsblock_t old_fsbno);
+extern int xfs_reflink(struct xfs_inode *src, xfs_off_t srcoff,
+ struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len);
+
#endif /* __XFS_REFLINK_H */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 51/58] xfs: add clone file and clone range ioctls
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (49 preceding siblings ...)
2015-10-07 5:00 ` [PATCH 50/58] xfs: reflink extents from one file to another Darrick J. Wong
@ 2015-10-07 5:00 ` Darrick J. Wong
2015-10-07 5:13 ` kbuild test robot
` (2 more replies)
2015-10-07 5:00 ` [PATCH 52/58] xfs: emulate the btrfs dedupe extent same ioctl Darrick J. Wong
` (6 subsequent siblings)
57 siblings, 3 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:00 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Define two ioctls which allow userspace to reflink a range of blocks
between two files or to reflink one file's contents to another.
These ioctls must have the same ABI as the btrfs ioctls with similar
names.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_fs.h | 11 +++
fs/xfs/xfs_ioctl.c | 192 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_ioctl32.c | 2 +
3 files changed, 205 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index b6ee5d8..2c8cd04 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -561,6 +561,17 @@ typedef struct xfs_swapext
#define XFS_IOC_GOINGDOWN _IOR ('X', 125, __uint32_t)
/* XFS_IOC_GETFSUUID ---------- deprecated 140 */
+/* reflink ioctls; these MUST match the btrfs ioctl definitions */
+/* from struct btrfs_ioctl_clone_range_args */
+struct xfs_clone_args {
+ __s64 src_fd;
+ __u64 src_offset;
+ __u64 src_length;
+ __u64 dest_offset;
+};
+
+#define XFS_IOC_CLONE _IOW (0x94, 9, int)
+#define XFS_IOC_CLONE_RANGE _IOW (0x94, 13, struct xfs_clone_args)
#ifndef HAVE_BBMACROS
/*
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index ea7d85a..ce4812e 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -40,6 +40,7 @@
#include "xfs_symlink.h"
#include "xfs_trans.h"
#include "xfs_pnfs.h"
+#include "xfs_reflink.h"
#include <linux/capability.h>
#include <linux/dcache.h>
@@ -48,6 +49,8 @@
#include <linux/pagemap.h>
#include <linux/slab.h>
#include <linux/exportfs.h>
+#include <linux/fsnotify.h>
+#include <linux/security.h>
/*
* xfs_find_handle maps from userspace xfs_fsop_handlereq structure to
@@ -1503,6 +1506,153 @@ xfs_ioc_swapext(
}
/*
+ * Flush all file writes out to disk.
+ */
+static int
+wait_for_io(
+ struct inode *inode,
+ loff_t offset,
+ size_t len)
+{
+ loff_t rounding;
+ loff_t ioffset;
+ loff_t iendoffset;
+ loff_t bs;
+ int ret;
+
+ bs = inode->i_sb->s_blocksize;
+ inode_dio_wait(inode);
+
+ rounding = max_t(xfs_off_t, bs, PAGE_CACHE_SIZE);
+ ioffset = round_down(offset, rounding);
+ iendoffset = round_up(offset + len, rounding) - 1;
+ ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
+ iendoffset);
+ return ret;
+}
+
+/*
+ * For reflink, validate the VFS parameters, convert them into the XFS
+ * equivalents, and then call the internal reflink function.
+ */
+STATIC int
+xfs_ioctl_reflink(
+ struct file *file_in,
+ loff_t pos_in,
+ struct file *file_out,
+ loff_t pos_out,
+ size_t len)
+{
+ struct inode *inode_in;
+ struct inode *inode_out;
+ ssize_t ret;
+ loff_t bs;
+ loff_t isize;
+ int same_inode;
+ loff_t blen;
+
+ if (len == 0)
+ return 0;
+ else if (len != ~0ULL && (ssize_t)len < 0)
+ return -EINVAL;
+
+ /* Do we have the correct permissions? */
+ if (!(file_in->f_mode & FMODE_READ) ||
+ !(file_out->f_mode & FMODE_WRITE) ||
+ (file_out->f_flags & O_APPEND))
+ return -EPERM;
+ ret = security_file_permission(file_out, MAY_WRITE);
+ if (ret)
+ return ret;
+
+ inode_in = file_inode(file_in);
+ inode_out = file_inode(file_out);
+ bs = inode_out->i_sb->s_blocksize;
+
+ /* Don't touch certain kinds of inodes */
+ if (IS_IMMUTABLE(inode_out))
+ return -EPERM;
+ if (IS_SWAPFILE(inode_in) ||
+ IS_SWAPFILE(inode_out))
+ return -ETXTBSY;
+
+ /* Reflink only works within this filesystem. */
+ if (inode_in->i_sb != inode_out->i_sb ||
+ file_in->f_path.mnt != file_out->f_path.mnt)
+ return -EXDEV;
+ same_inode = (inode_in->i_ino == inode_out->i_ino);
+
+ /* Don't reflink dirs, pipes, sockets... */
+ if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
+ return -EISDIR;
+ if (S_ISFIFO(inode_in->i_mode) || S_ISFIFO(inode_out->i_mode))
+ return -ESPIPE;
+ if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
+ return -EINVAL;
+
+ /* Are we going all the way to the end? */
+ isize = i_size_read(inode_in);
+ if (isize == 0)
+ return 0;
+ if (len == ~0ULL)
+ len = isize - pos_in;
+
+ /* Ensure offsets don't wrap and the input is inside i_size */
+ if (pos_in + len < pos_in || pos_out + len < pos_out ||
+ pos_in + len > isize)
+ return -EINVAL;
+
+ /* If we're linking to EOF, continue to the block boundary. */
+ if (pos_in + len == isize)
+ blen = ALIGN(isize, bs) - pos_in;
+ else
+ blen = len;
+
+ /* Only reflink if we're aligned to block boundaries */
+ if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_in + blen, bs) ||
+ !IS_ALIGNED(pos_out, bs) || !IS_ALIGNED(pos_out + blen, bs))
+ return -EINVAL;
+
+ /* Don't allow overlapped reflink within the same file */
+ if (same_inode && pos_out + blen > pos_in && pos_out < pos_in + blen)
+ return -EINVAL;
+
+ ret = mnt_want_write_file(file_out);
+ if (ret)
+ return ret;
+
+ /* Wait for the completion of any pending IOs on srcfile */
+ ret = wait_for_io(inode_in, pos_in, len);
+ if (ret)
+ goto out_unlock;
+ ret = wait_for_io(inode_out, pos_out, len);
+ if (ret)
+ goto out_unlock;
+
+ ret = xfs_reflink(XFS_I(inode_in), pos_in, XFS_I(inode_out),
+ pos_out, len);
+ if (ret < 0)
+ goto out_unlock;
+
+ /* Truncate the page cache so we don't see stale data */
+ truncate_inode_pages_range(&inode_out->i_data, pos_out,
+ PAGE_CACHE_ALIGN(pos_out + len) - 1);
+
+out_unlock:
+ if (ret == 0) {
+ fsnotify_access(file_in);
+ add_rchar(current, len);
+ fsnotify_modify(file_out);
+ add_wchar(current, len);
+ }
+ inc_syscr(current);
+ inc_syscw(current);
+
+ mnt_drop_write_file(file_out);
+ return ret;
+}
+
+/*
* Note: some of the ioctl's return positive numbers as a
* byte count indicating success, such as readlink_by_handle.
* So we don't "sign flip" like most other routines. This means
@@ -1800,6 +1950,48 @@ xfs_file_ioctl(
return xfs_icache_free_eofblocks(mp, &keofb);
}
+ case XFS_IOC_CLONE: {
+ struct fd src;
+
+ src = fdget(p);
+ if (!src.file)
+ return -EBADF;
+
+ trace_xfs_ioctl_clone(file_inode(src.file), file_inode(filp));
+
+ error = xfs_ioctl_reflink(src.file, 0, filp, 0, ~0ULL);
+ fdput(src);
+ if (error > 0)
+ error = 0;
+
+ return error;
+ }
+
+ case XFS_IOC_CLONE_RANGE: {
+ struct fd src;
+ struct xfs_clone_args args;
+
+ if (copy_from_user(&args, arg, sizeof(args)))
+ return -EFAULT;
+ src = fdget(args.src_fd);
+ if (!src.file)
+ return -EBADF;
+ if (args.src_length == 0)
+ args.src_length = ~0ULL;
+
+ trace_xfs_ioctl_clone_range(file_inode(src.file),
+ args.src_offset, args.src_length,
+ file_inode(filp), args.dest_offset);
+
+ error = xfs_ioctl_reflink(src.file, args.src_offset, filp,
+ args.dest_offset, args.src_length);
+ fdput(src);
+ if (error > 0)
+ error = 0;
+
+ return error;
+ }
+
default:
return -ENOTTY;
}
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index b88bdc8..76d8729 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -558,6 +558,8 @@ xfs_file_compat_ioctl(
case XFS_IOC_GOINGDOWN:
case XFS_IOC_ERROR_INJECTION:
case XFS_IOC_ERROR_CLEARALL:
+ case XFS_IOC_CLONE:
+ case XFS_IOC_CLONE_RANGE:
return xfs_file_ioctl(filp, cmd, p);
#ifndef BROKEN_X86_ALIGNMENT
/* These are handled fine if no alignment issues */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 52/58] xfs: emulate the btrfs dedupe extent same ioctl
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (50 preceding siblings ...)
2015-10-07 5:00 ` [PATCH 51/58] xfs: add clone file and clone range ioctls Darrick J. Wong
@ 2015-10-07 5:00 ` Darrick J. Wong
2015-10-07 5:00 ` [PATCH 53/58] xfs: teach fiemap about reflink'd extents Darrick J. Wong
` (5 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:00 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Emulate the BTRFS_IOC_EXTENT_SAME ioctl. This operation is similar
to clone_range, but the kernel must confirm that the contents of the
two extents are identical before performing the reflink.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_fs.h | 30 ++++++++++++
fs/xfs/xfs_ioctl.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++--
fs/xfs/xfs_ioctl32.c | 1
fs/xfs/xfs_reflink.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_reflink.h | 6 ++
5 files changed, 275 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 2c8cd04..c63afd4 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -570,8 +570,38 @@ struct xfs_clone_args {
__u64 dest_offset;
};
+/* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
+#define XFS_EXTENT_DATA_SAME 0
+#define XFS_EXTENT_DATA_DIFFERS 1
+
+/* from struct btrfs_ioctl_file_extent_same_info */
+struct xfs_extent_data_info {
+ __s64 fd; /* in - destination file */
+ __u64 logical_offset; /* in - start of extent in destination */
+ __u64 bytes_deduped; /* out - total # of bytes we were able
+ * to dedupe from this file */
+ /* status of this dedupe operation:
+ * 0 if dedup succeeds
+ * < 0 for error
+ * == XFS_SAME_DATA_DIFFERS if data differs
+ */
+ __s32 status; /* out - see above description */
+ __u32 reserved;
+};
+
+/* from struct btrfs_ioctl_file_extent_same_args */
+struct xfs_extent_data {
+ __u64 logical_offset; /* in - start of extent in source */
+ __u64 length; /* in - length of extent */
+ __u16 dest_count; /* in - total elements in info array */
+ __u16 reserved1;
+ __u32 reserved2;
+ struct xfs_extent_data_info info[0];
+};
+
#define XFS_IOC_CLONE _IOW (0x94, 9, int)
#define XFS_IOC_CLONE_RANGE _IOW (0x94, 13, struct xfs_clone_args)
+#define XFS_IOC_FILE_EXTENT_SAME _IOWR(0x94, 54, struct xfs_extent_data)
#ifndef HAVE_BBMACROS
/*
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index ce4812e..50ea19e 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1541,7 +1541,8 @@ xfs_ioctl_reflink(
loff_t pos_in,
struct file *file_out,
loff_t pos_out,
- size_t len)
+ size_t len,
+ bool is_dedupe)
{
struct inode *inode_in;
struct inode *inode_out;
@@ -1550,6 +1551,7 @@ xfs_ioctl_reflink(
loff_t isize;
int same_inode;
loff_t blen;
+ unsigned int flags;
if (len == 0)
return 0;
@@ -1629,8 +1631,12 @@ xfs_ioctl_reflink(
if (ret)
goto out_unlock;
+ flags = 0;
+ if (is_dedupe)
+ flags |= XFS_REFLINK_DEDUPE;
+
ret = xfs_reflink(XFS_I(inode_in), pos_in, XFS_I(inode_out),
- pos_out, len);
+ pos_out, len, flags);
if (ret < 0)
goto out_unlock;
@@ -1652,6 +1658,112 @@ out_unlock:
return ret;
}
+#define XFS_MAX_DEDUPE_LEN (16 * 1024 * 1024)
+
+static long
+xfs_ioctl_file_extent_same(
+ struct file *file,
+ struct xfs_extent_data __user *argp)
+{
+ struct xfs_extent_data *same;
+ struct xfs_extent_data_info *info;
+ struct inode *src;
+ u64 off;
+ u64 len;
+ int i;
+ int ret;
+ unsigned long size;
+ bool is_admin;
+ u16 count;
+
+ is_admin = capable(CAP_SYS_ADMIN);
+ src = file_inode(file);
+ if (!(file->f_mode & FMODE_READ))
+ return -EINVAL;
+
+ if (get_user(count, &argp->dest_count)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ size = offsetof(struct xfs_extent_data __user,
+ info[count]);
+
+ same = memdup_user(argp, size);
+
+ if (IS_ERR(same)) {
+ ret = PTR_ERR(same);
+ goto out;
+ }
+
+ off = same->logical_offset;
+ len = same->length;
+
+ /*
+ * Limit the total length we will dedupe for each operation.
+ * This is intended to bound the total time spent in this
+ * ioctl to something sane.
+ */
+ if (len > XFS_MAX_DEDUPE_LEN)
+ len = XFS_MAX_DEDUPE_LEN;
+
+ ret = -EISDIR;
+ if (S_ISDIR(src->i_mode))
+ goto out;
+
+ ret = -EACCES;
+ if (!S_ISREG(src->i_mode))
+ goto out;
+
+ /* pre-format output fields to sane values */
+ for (i = 0; i < count; i++) {
+ same->info[i].bytes_deduped = 0ULL;
+ same->info[i].status = 0;
+ }
+
+ for (i = 0, info = same->info; i < count; i++, info++) {
+ struct inode *dst;
+ struct fd dst_file = fdget(info->fd);
+
+ if (!dst_file.file) {
+ info->status = -EBADF;
+ continue;
+ }
+ dst = file_inode(dst_file.file);
+
+ trace_xfs_ioctl_file_extent_same(file_inode(file), off, len,
+ dst, info->logical_offset);
+
+ info->bytes_deduped = 0;
+ if (!(is_admin || (dst_file.file->f_mode & FMODE_WRITE))) {
+ info->status = -EINVAL;
+ } else if (file->f_path.mnt != dst_file.file->f_path.mnt) {
+ info->status = -EXDEV;
+ } else if (S_ISDIR(dst->i_mode)) {
+ info->status = -EISDIR;
+ } else if (!S_ISREG(dst->i_mode)) {
+ info->status = -EACCES;
+ } else {
+ info->status = xfs_ioctl_reflink(file, off,
+ dst_file.file,
+ info->logical_offset,
+ len, true);
+ if (info->status == -EBADE)
+ info->status = XFS_EXTENT_DATA_DIFFERS;
+ else if (info->status == 0)
+ info->bytes_deduped = len;
+ }
+ fdput(dst_file);
+ }
+
+ ret = copy_to_user(argp, same, size);
+ if (ret)
+ ret = -EFAULT;
+
+out:
+ return ret;
+}
+
/*
* Note: some of the ioctl's return positive numbers as a
* byte count indicating success, such as readlink_by_handle.
@@ -1959,7 +2071,7 @@ xfs_file_ioctl(
trace_xfs_ioctl_clone(file_inode(src.file), file_inode(filp));
- error = xfs_ioctl_reflink(src.file, 0, filp, 0, ~0ULL);
+ error = xfs_ioctl_reflink(src.file, 0, filp, 0, ~0ULL, false);
fdput(src);
if (error > 0)
error = 0;
@@ -1984,7 +2096,8 @@ xfs_file_ioctl(
file_inode(filp), args.dest_offset);
error = xfs_ioctl_reflink(src.file, args.src_offset, filp,
- args.dest_offset, args.src_length);
+ args.dest_offset, args.src_length,
+ false);
fdput(src);
if (error > 0)
error = 0;
@@ -1992,6 +2105,9 @@ xfs_file_ioctl(
return error;
}
+ case XFS_IOC_FILE_EXTENT_SAME:
+ return xfs_ioctl_file_extent_same(filp, arg);
+
default:
return -ENOTTY;
}
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 76d8729..575c292 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -560,6 +560,7 @@ xfs_file_compat_ioctl(
case XFS_IOC_ERROR_CLEARALL:
case XFS_IOC_CLONE:
case XFS_IOC_CLONE_RANGE:
+ case XFS_IOC_FILE_EXTENT_SAME:
return xfs_file_ioctl(filp, cmd, p);
#ifndef BROKEN_X86_ALIGNMENT
/* These are handled fine if no alignment issues */
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index ac81b02..dee3556 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1386,6 +1386,103 @@ advloop:
}
#undef IMAPNEXT
+/*
+ * Read a page's worth of file data into the page cache.
+ */
+STATIC struct page *
+xfs_get_page(
+ struct inode *inode, /* inode */
+ xfs_off_t offset) /* where in the inode to read */
+{
+ struct address_space *mapping;
+ struct page *page;
+ pgoff_t n;
+
+ n = offset >> PAGE_CACHE_SHIFT;
+ mapping = inode->i_mapping;
+ page = read_mapping_page(mapping, n, NULL);
+ if (IS_ERR(page))
+ return page;
+ if (!PageUptodate(page)) {
+ page_cache_release(page);
+ return NULL;
+ }
+ return page;
+}
+
+/*
+ * Compare extents of two files to see if they are the same.
+ */
+STATIC int
+xfs_compare_extents(
+ struct inode *src, /* first inode */
+ xfs_off_t srcoff, /* offset of first inode */
+ struct inode *dest, /* second inode */
+ xfs_off_t destoff, /* offset of second inode */
+ xfs_off_t len, /* length of data to compare */
+ bool *is_same) /* out: true if the contents match */
+{
+ xfs_off_t src_poff;
+ xfs_off_t dest_poff;
+ void *src_addr;
+ void *dest_addr;
+ struct page *src_page;
+ struct page *dest_page;
+ xfs_off_t cmp_len;
+ bool same;
+ int error;
+
+ error = -EINVAL;
+ same = true;
+ while (len) {
+ src_poff = srcoff & (PAGE_CACHE_SIZE - 1);
+ dest_poff = destoff & (PAGE_CACHE_SIZE - 1);
+ cmp_len = min(PAGE_CACHE_SIZE - src_poff,
+ PAGE_CACHE_SIZE - dest_poff);
+ cmp_len = min(cmp_len, len);
+ ASSERT(cmp_len > 0);
+
+ trace_xfs_reflink_compare_extents(XFS_I(src), srcoff, cmp_len,
+ XFS_I(dest), destoff);
+
+ src_page = xfs_get_page(src, srcoff);
+ if (!src_page)
+ goto out_error;
+ dest_page = xfs_get_page(dest, destoff);
+ if (!dest_page) {
+ page_cache_release(src_page);
+ goto out_error;
+ }
+ src_addr = kmap_atomic(src_page);
+ dest_addr = kmap_atomic(dest_page);
+
+ flush_dcache_page(src_page);
+ flush_dcache_page(dest_page);
+
+ if (memcmp(src_addr + src_poff, dest_addr + dest_poff, cmp_len))
+ same = false;
+
+ kunmap_atomic(src_addr);
+ kunmap_atomic(dest_addr);
+ page_cache_release(src_page);
+ page_cache_release(dest_page);
+
+ if (!same)
+ break;
+
+ srcoff += cmp_len;
+ destoff += cmp_len;
+ len -= cmp_len;
+ }
+
+ *is_same = same;
+ return 0;
+
+out_error:
+ trace_xfs_reflink_compare_extents_error(XFS_I(dest), error, _RET_IP_);
+ return error;
+}
+
/**
* xfs_reflink() - link a range of blocks from one inode to another
*
@@ -1394,6 +1491,7 @@ advloop:
* @dest: Inode to clone to
* @destoff: Offset within @inode to start clone
* @len: Original length, passed by user, of range to clone
+ * @flags: Flags to modify reflink's behavior
*/
int
xfs_reflink(
@@ -1401,12 +1499,14 @@ xfs_reflink(
xfs_off_t srcoff,
struct xfs_inode *dest,
xfs_off_t destoff,
- xfs_off_t len)
+ xfs_off_t len,
+ unsigned int flags)
{
struct xfs_mount *mp = src->i_mount;
xfs_fileoff_t sfsbno, dfsbno;
xfs_filblks_t fsblen;
int error;
+ bool is_same;
if (!xfs_sb_version_hasreflink(&mp->m_sb))
return -EOPNOTSUPP;
@@ -1418,6 +1518,9 @@ xfs_reflink(
if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
return -EINVAL;
+ if (flags & ~XFS_REFLINK_ALL)
+ return -EINVAL;
+
trace_xfs_reflink_range(src, srcoff, len, dest, destoff);
/* Lock both files against IO */
@@ -1429,6 +1532,21 @@ xfs_reflink(
xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
}
+ /*
+ * Check that the extents are the same.
+ */
+ if (flags & XFS_REFLINK_DEDUPE) {
+ is_same = false;
+ error = xfs_compare_extents(VFS_I(src), srcoff, VFS_I(dest),
+ destoff, len, &is_same);
+ if (error)
+ goto out_error;
+ if (!is_same) {
+ error = -EBADE;
+ goto out_error;
+ }
+ }
+
error = set_inode_reflink_flag(src, dest);
if (error)
goto out_error;
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index b633824..c60a9bd 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -44,7 +44,11 @@ extern int xfs_reflink_finish_fork_buf(struct xfs_inode *ip, struct xfs_buf *bp,
xfs_fileoff_t fileoff, struct xfs_trans *tp, int write_error,
xfs_fsblock_t old_fsbno);
+#define XFS_REFLINK_DEDUPE 1 /* only reflink if contents match */
+#define XFS_REFLINK_ALL (XFS_REFLINK_DEDUPE)
+
extern int xfs_reflink(struct xfs_inode *src, xfs_off_t srcoff,
- struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len);
+ struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len,
+ unsigned int flags);
#endif /* __XFS_REFLINK_H */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 53/58] xfs: teach fiemap about reflink'd extents
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (51 preceding siblings ...)
2015-10-07 5:00 ` [PATCH 52/58] xfs: emulate the btrfs dedupe extent same ioctl Darrick J. Wong
@ 2015-10-07 5:00 ` Darrick J. Wong
2015-10-07 5:01 ` [PATCH 54/58] xfs: swap inode reflink flags when swapping inode extents Darrick J. Wong
` (4 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:00 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Teach FIEMAP to report shared (i.e. reflinked) extents.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_bmap_util.c | 2 +-
fs/xfs/xfs_bmap_util.h | 3 ++
fs/xfs/xfs_ioctl.c | 12 ++++++++-
fs/xfs/xfs_iops.c | 62 +++++++++++++++++++++++++++++++++++++++---------
4 files changed, 63 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index b20d136..f161db1 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -698,7 +698,7 @@ xfs_getbmap(
int full = 0; /* user array is full */
/* format results & advance arg */
- error = formatter(&arg, &out[i], &full);
+ error = formatter(ip, &arg, &out[i], &full);
if (error || full)
break;
}
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index af97d9a..d0dc504 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -37,7 +37,8 @@ int xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
xfs_fileoff_t start_fsb, xfs_fileoff_t length);
/* bmap to userspace formatter - copy to user & advance pointer */
-typedef int (*xfs_bmap_format_t)(void **, struct getbmapx *, int *);
+typedef int (*xfs_bmap_format_t)(struct xfs_inode *, void **, struct getbmapx *,
+ int *);
int xfs_getbmap(struct xfs_inode *ip, struct getbmapx *bmv,
xfs_bmap_format_t formatter, void *arg);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 50ea19e..92aaca0 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1352,7 +1352,11 @@ out_drop_write:
}
STATIC int
-xfs_getbmap_format(void **ap, struct getbmapx *bmv, int *full)
+xfs_getbmap_format(
+ struct xfs_inode *ip,
+ void **ap,
+ struct getbmapx *bmv,
+ int *full)
{
struct getbmap __user *base = (struct getbmap __user *)*ap;
@@ -1396,7 +1400,11 @@ xfs_ioc_getbmap(
}
STATIC int
-xfs_getbmapx_format(void **ap, struct getbmapx *bmv, int *full)
+xfs_getbmapx_format(
+ struct xfs_inode *ip,
+ void **ap,
+ struct getbmapx *bmv,
+ int *full)
{
struct getbmapx __user *base = (struct getbmapx __user *)*ap;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 8294132..5eeed1b 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -38,6 +38,8 @@
#include "xfs_dir2.h"
#include "xfs_trans_space.h"
#include "xfs_pnfs.h"
+#include "xfs_bit.h"
+#include "xfs_reflink.h"
#include <linux/capability.h>
#include <linux/xattr.h>
@@ -1014,14 +1016,21 @@ xfs_vn_update_time(
*/
STATIC int
xfs_fiemap_format(
+ struct xfs_inode *ip,
void **arg,
struct getbmapx *bmv,
int *full)
{
- int error;
+ int error = 0;
struct fiemap_extent_info *fieinfo = *arg;
u32 fiemap_flags = 0;
- u64 logical, physical, length;
+ u64 logical, physical, length, loop_len, len;
+ xfs_extlen_t elen;
+ xfs_nlink_t nr;
+ xfs_fsblock_t fsbno;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ struct xfs_mount *mp = ip->i_mount;
/* Do nothing for a hole */
if (bmv->bmv_block == -1LL)
@@ -1029,7 +1038,7 @@ xfs_fiemap_format(
logical = BBTOB(bmv->bmv_offset);
physical = BBTOB(bmv->bmv_block);
- length = BBTOB(bmv->bmv_length);
+ length = loop_len = BBTOB(bmv->bmv_length);
if (bmv->bmv_oflags & BMV_OF_PREALLOC)
fiemap_flags |= FIEMAP_EXTENT_UNWRITTEN;
@@ -1038,16 +1047,45 @@ xfs_fiemap_format(
FIEMAP_EXTENT_UNKNOWN);
physical = 0; /* no block yet */
}
- if (bmv->bmv_oflags & BMV_OF_LAST)
- fiemap_flags |= FIEMAP_EXTENT_LAST;
-
- error = fiemap_fill_next_extent(fieinfo, logical, physical,
- length, fiemap_flags);
- if (error > 0) {
- error = 0;
- *full = 1; /* user array now full */
- }
+ while (loop_len > 0) {
+ u32 ext_flags = 0;
+
+ if (bmv->bmv_oflags & BMV_OF_DELALLOC) {
+ physical = 0;
+ len = loop_len;
+ nr = 1;
+ } else if (xfs_is_reflink_inode(ip)) {
+ fsbno = XFS_DADDR_TO_FSB(mp, BTOBB(physical));
+ agno = XFS_FSB_TO_AGNO(mp, fsbno);
+ agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+ error = xfs_reflink_get_refcount(mp, agno, agbno,
+ &elen, &nr);
+ if (error)
+ goto out;
+ len = XFS_FSB_TO_B(mp, elen);
+ if (len == 0 || len > loop_len)
+ len = loop_len;
+ if (nr >= 2)
+ ext_flags |= FIEMAP_EXTENT_SHARED;
+ } else
+ len = loop_len;
+ if ((bmv->bmv_oflags & BMV_OF_LAST) &&
+ len == loop_len)
+ ext_flags |= FIEMAP_EXTENT_LAST;
+
+ error = fiemap_fill_next_extent(fieinfo, logical, physical,
+ len, fiemap_flags | ext_flags);
+ if (error > 0) {
+ error = 0;
+ *full = 1; /* user array now full */
+ goto out;
+ }
+ logical += len;
+ physical += len;
+ loop_len -= len;
+ }
+out:
return error;
}
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 54/58] xfs: swap inode reflink flags when swapping inode extents
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (52 preceding siblings ...)
2015-10-07 5:00 ` [PATCH 53/58] xfs: teach fiemap about reflink'd extents Darrick J. Wong
@ 2015-10-07 5:01 ` Darrick J. Wong
2015-10-07 5:01 ` [PATCH 55/58] vfs: add a FALLOC_FL_UNSHARE mode to fallocate to unshare a range of blocks Darrick J. Wong
` (3 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:01 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
When we're swapping the extents of two inodes, be sure to swap the
reflink inode flag too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_bmap_util.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index f161db1..6d7bd6a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1948,6 +1948,18 @@ xfs_swap_extents(
break;
}
+ /* Do we have to swap reflink flags? */
+ if ((ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK) ^
+ (tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK)) {
+ __uint64_t f;
+
+ f = ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK;
+ ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+ ip->i_d.di_flags2 |= tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK;
+ tip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+ tip->i_d.di_flags2 |= f & XFS_DIFLAG2_REFLINK;
+ }
+
xfs_trans_log_inode(tp, ip, src_log_flags);
xfs_trans_log_inode(tp, tip, target_log_flags);
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 55/58] vfs: add a FALLOC_FL_UNSHARE mode to fallocate to unshare a range of blocks
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (53 preceding siblings ...)
2015-10-07 5:01 ` [PATCH 54/58] xfs: swap inode reflink flags when swapping inode extents Darrick J. Wong
@ 2015-10-07 5:01 ` Darrick J. Wong
2015-10-07 5:01 ` [PATCH 56/58] xfs: unshare a range of blocks via fallocate Darrick J. Wong
` (2 subsequent siblings)
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:01 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Add a new mode to fallocate that unshares a range of blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/open.c | 5 +++++
include/linux/falloc.h | 3 ++-
include/uapi/linux/falloc.h | 14 ++++++++++++++
3 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/fs/open.c b/fs/open.c
index b6f1e96..58e2b61 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -256,6 +256,11 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
(mode & ~FALLOC_FL_INSERT_RANGE))
return -EINVAL;
+ /* Unshare range should only be used exclusively. */
+ if ((mode & FALLOC_FL_UNSHARE_RANGE) &&
+ (mode & ~FALLOC_FL_UNSHARE_RANGE))
+ return -EINVAL;
+
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
diff --git a/include/linux/falloc.h b/include/linux/falloc.h
index 9961110..7494dc6 100644
--- a/include/linux/falloc.h
+++ b/include/linux/falloc.h
@@ -25,6 +25,7 @@ struct space_resv {
FALLOC_FL_PUNCH_HOLE | \
FALLOC_FL_COLLAPSE_RANGE | \
FALLOC_FL_ZERO_RANGE | \
- FALLOC_FL_INSERT_RANGE)
+ FALLOC_FL_INSERT_RANGE | \
+ FALLOC_FL_UNSHARE_RANGE)
#endif /* _FALLOC_H_ */
diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
index 3e445a7..a1c293b 100644
--- a/include/uapi/linux/falloc.h
+++ b/include/uapi/linux/falloc.h
@@ -58,4 +58,18 @@
*/
#define FALLOC_FL_INSERT_RANGE 0x20
+/*
+ * FALLOC_FL_UNSHARE_RANGE is used to unshare shared blocks within the
+ * file size without overwriting any existing data. The purpose of this
+ * call is to preemptively reallocate any blocks that are subject to
+ * copy-on-write.
+ *
+ * Different filesystems may implement different limitations on the
+ * granularity of the operation. Most will limit operations to filesystem
+ * block size boundaries, but this boundary may be larger or smaller
+ * depending on the filesystem and/or the configuration of the filesystem
+ * or file.
+ */
+#define FALLOC_FL_UNSHARE_RANGE 0x40
+
#endif /* _UAPI_FALLOC_H_ */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 56/58] xfs: unshare a range of blocks via fallocate
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (54 preceding siblings ...)
2015-10-07 5:01 ` [PATCH 55/58] vfs: add a FALLOC_FL_UNSHARE mode to fallocate to unshare a range of blocks Darrick J. Wong
@ 2015-10-07 5:01 ` Darrick J. Wong
2015-10-07 5:01 ` [PATCH 57/58] xfs: support XFS_XFLAG_REFLINK (and FS_NOCOW_FL) on reflink filesystems Darrick J. Wong
2015-10-07 5:01 ` [PATCH 58/58] xfs: recognize the reflink feature bit Darrick J. Wong
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:01 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Now that we have an fallocate flag to unshare a range of blocks, make
XFS actually implement it.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_file.c | 11 ++
fs/xfs/xfs_reflink.c | 321 ++++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_reflink.h | 3
3 files changed, 334 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index fc5b9ea..5756046 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -905,7 +905,7 @@ buffered:
#define XFS_FALLOC_FL_SUPPORTED \
(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | \
- FALLOC_FL_INSERT_RANGE)
+ FALLOC_FL_INSERT_RANGE | FALLOC_FL_UNSHARE_RANGE)
STATIC long
xfs_file_fallocate(
@@ -982,6 +982,15 @@ xfs_file_fallocate(
goto out_unlock;
}
do_file_insert = 1;
+ } else if (mode & FALLOC_FL_UNSHARE_RANGE) {
+ if (offset + len > i_size_read(inode)) {
+ error = -EINVAL;
+ goto out_unlock;
+ }
+
+ error = xfs_reflink_unshare(ip, file, offset, len);
+ if (error)
+ goto out_unlock;
} else {
flags |= XFS_PREALLOC_SET;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index dee3556..92d8345 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1571,3 +1571,324 @@ out_error:
trace_xfs_reflink_range_error(dest, error, _RET_IP_);
return error;
}
+
+/**
+ * xfs_reflink_dirty_range() -- Dirty all the shared blocks in the file so that
+ * they're rewritten elsewhere. Similar to generic_perform_write().
+ *
+ * @filp: VFS file pointer
+ * @pos: offset to start dirtying
+ * @len: number of bytes to dirty
+ */
+STATIC int
+xfs_reflink_dirty_range(
+ struct file *filp,
+ xfs_off_t pos,
+ xfs_off_t len)
+{
+ struct address_space *mapping;
+ const struct address_space_operations *a_ops;
+ int error;
+ unsigned int flags;
+ struct page *page;
+ struct page *rpage;
+ unsigned long offset; /* Offset into pagecache page */
+ unsigned long bytes; /* Bytes to write to page */
+ void *fsdata;
+
+ mapping = filp->f_mapping;
+ a_ops = mapping->a_ops;
+ flags = AOP_FLAG_UNINTERRUPTIBLE;
+ do {
+
+ offset = (pos & (PAGE_CACHE_SIZE - 1));
+ bytes = min_t(unsigned long, len, PAGE_CACHE_SIZE) - offset;
+ rpage = xfs_get_page(file_inode(filp), pos);
+ if (IS_ERR(rpage)) {
+ error = PTR_ERR(rpage);
+ break;
+ } else if (!rpage) {
+ error = -ENOMEM;
+ break;
+ }
+
+ error = a_ops->write_begin(filp, mapping, pos, bytes, flags,
+ &page, &fsdata);
+ page_cache_release(rpage);
+ if (error < 0)
+ break;
+
+ trace_xfs_reflink_unshare_page(file_inode(filp), page,
+ pos, bytes);
+
+ if (!PageUptodate(page)) {
+ pr_err("%s: STALE? ino=%lu pos=%llu\n",
+ __func__, filp->f_inode->i_ino, pos);
+ WARN_ON(1);
+ }
+ if (mapping_writably_mapped(mapping))
+ flush_dcache_page(page);
+
+ error = a_ops->write_end(filp, mapping, pos, bytes, bytes,
+ page, fsdata);
+ if (error < 0)
+ break;
+ else if (error == 0) {
+ error = -EIO;
+ break;
+ } else {
+ bytes = error;
+ error = 0;
+ }
+
+ cond_resched();
+
+ pos += bytes;
+ len -= bytes;
+
+ balance_dirty_pages_ratelimited(mapping);
+ if (fatal_signal_pending(current)) {
+ error = -EINTR;
+ break;
+ }
+ } while (len > 0);
+
+ return error;
+}
+
+/*
+ * The user wants to preemptively CoW all shared blocks in this file,
+ * which enables us to turn off the reflink flag. Iterate all
+ * extents which are not prealloc/delalloc to see which ranges are
+ * mentioned in the refcount tree, then read those blocks into the
+ * pagecache, dirty them, fsync them back out, and then we can update
+ * the inode flag. What happens if we run out of memory? :)
+ */
+STATIC int
+xfs_reflink_dirty_extents(
+ struct xfs_inode *ip,
+ struct file *filp,
+ xfs_fileoff_t fbno,
+ xfs_filblks_t end,
+ xfs_off_t isize)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ xfs_extlen_t rlen;
+ xfs_nlink_t nr;
+ xfs_off_t fpos;
+ xfs_off_t flen;
+ struct xfs_bmbt_irec map[2];
+ int nmaps;
+ int error;
+
+ while (end - fbno > 0) {
+ nmaps = 1;
+ /*
+ * Look for extents in the file. Skip holes, delalloc, or
+ * unwritten extents; they can't be reflinked.
+ */
+ error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0);
+ if (error)
+ goto out;
+ if (nmaps == 0)
+ break;
+ if (map[0].br_startblock == HOLESTARTBLOCK ||
+ map[0].br_startblock == DELAYSTARTBLOCK ||
+ ISUNWRITTEN(&map[0]))
+ goto next;
+
+ map[1] = map[0];
+ while (map[1].br_blockcount) {
+ agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock);
+ agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock);
+ CHECK_AG_NUMBER(mp, agno);
+ CHECK_AG_EXTENT(mp, agbno, 1);
+
+ error = xfs_reflink_get_refcount(mp, agno, agbno,
+ &rlen, &nr);
+ if (error)
+ goto out;
+ XFS_WANT_CORRUPTED_GOTO(mp, rlen != 0, out);
+ if (rlen > map[1].br_blockcount)
+ rlen = map[1].br_blockcount;
+ if (nr < 2)
+ goto skip_copy;
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ fpos = XFS_FSB_TO_B(mp, map[1].br_startoff);
+ flen = XFS_FSB_TO_B(mp, rlen);
+ if (fpos + flen > isize)
+ flen = isize - fpos;
+ error = xfs_reflink_dirty_range(filp, fpos, flen);
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ if (error)
+ goto out;
+skip_copy:
+ map[1].br_blockcount -= rlen;
+ map[1].br_startoff += rlen;
+ map[1].br_startblock += rlen;
+ }
+
+next:
+ fbno = map[0].br_startoff + map[0].br_blockcount;
+ }
+out:
+ return error;
+}
+
+/* Iterate the extents; if there are no reflinked blocks, clear the flag. */
+STATIC int
+xfs_reflink_try_clear_inode_flag(
+ struct xfs_inode *ip,
+ xfs_off_t old_isize)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp;
+ xfs_fileoff_t fbno;
+ xfs_filblks_t end;
+ xfs_agnumber_t agno;
+ xfs_agblock_t agbno;
+ xfs_extlen_t rlen;
+ xfs_nlink_t nr;
+ struct xfs_bmbt_irec map[2];
+ int nmaps;
+ int error = 0;
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+ if (old_isize != i_size_read(VFS_I(ip)))
+ goto out;
+ if (!(ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK))
+ goto out;
+
+ fbno = 0;
+ end = XFS_B_TO_FSB(mp, old_isize);
+ while (end - fbno > 0) {
+ nmaps = 1;
+ /*
+ * Look for extents in the file. Skip holes, delalloc, or
+ * unwritten extents; they can't be reflinked.
+ */
+ error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0);
+ if (error)
+ goto out;
+ if (nmaps == 0)
+ break;
+ if (map[0].br_startblock == HOLESTARTBLOCK ||
+ map[0].br_startblock == DELAYSTARTBLOCK ||
+ ISUNWRITTEN(&map[0]))
+ goto next;
+
+ map[1] = map[0];
+ while (map[1].br_blockcount) {
+ agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock);
+ agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock);
+ CHECK_AG_NUMBER(mp, agno);
+ CHECK_AG_EXTENT(mp, agbno, 1);
+
+ error = xfs_reflink_get_refcount(mp, agno, agbno,
+ &rlen, &nr);
+ if (error)
+ goto out;
+ XFS_WANT_CORRUPTED_GOTO(mp, rlen != 0, out);
+ if (rlen > map[1].br_blockcount)
+ rlen = map[1].br_blockcount;
+ /* Someone else is reflinking */
+ if (nr >= 2) {
+ error = 0;
+ goto out;
+ }
+
+ map[1].br_blockcount -= rlen;
+ map[1].br_startoff += rlen;
+ map[1].br_startblock += rlen;
+ }
+
+next:
+ fbno = map[0].br_startoff + map[0].br_blockcount;
+ }
+
+ /* No reflinked blocks, so clear the flag */
+ tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
+ if (error) {
+ xfs_trans_cancel(tp);
+ goto out;
+ }
+ trace_xfs_reflink_unset_inode_flag(ip);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+ ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ error = xfs_trans_commit(tp);
+ if (error) {
+ xfs_trans_cancel(tp);
+ goto out;
+ }
+
+ return 0;
+out:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ return error;
+}
+
+/**
+ * xfs_reflink_unshare() - Pre-COW all shared blocks within a given range
+ * of a file and turn off the reflink flag if we
+ * unshare all of the file's blocks.
+ * @ip: XFS inode
+ * @filp: VFS file structure
+ * @offset: Offset to start
+ * @len: Length to ...
+ */
+int
+xfs_reflink_unshare(
+ struct xfs_inode *ip,
+ struct file *filp,
+ xfs_off_t offset,
+ xfs_off_t len)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ xfs_fileoff_t fbno;
+ xfs_filblks_t end;
+ xfs_off_t old_isize, isize;
+ int error;
+
+ if (!xfs_sb_version_hasreflink(&ip->i_mount->m_sb) ||
+ !xfs_is_reflink_inode(ip))
+ return 0;
+
+ trace_xfs_reflink_unshare(ip);
+
+ inode_dio_wait(VFS_I(ip));
+
+ /* Try to CoW the selected ranges */
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ fbno = XFS_B_TO_FSB(mp, offset);
+ old_isize = isize = i_size_read(VFS_I(ip));
+ end = XFS_B_TO_FSB(mp, offset + len);
+ error = xfs_reflink_dirty_extents(ip, filp, fbno, end, isize);
+ if (error)
+ goto out_unlock;
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+ /* Wait for the IO to finish */
+ error = filemap_write_and_wait(filp->f_mapping);
+ if (error)
+ goto out;
+
+ /* Turn off the reflink flag if we unshared the whole file */
+ if (offset == 0 && len == isize) {
+ error = xfs_reflink_try_clear_inode_flag(ip, old_isize);
+ if (error)
+ goto out;
+ }
+
+ return 0;
+
+out_unlock:
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+out:
+ trace_xfs_reflink_unshare_error(ip, error, _RET_IP_);
+ return error;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index c60a9bd..4ce2cba6 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -51,4 +51,7 @@ extern int xfs_reflink(struct xfs_inode *src, xfs_off_t srcoff,
struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len,
unsigned int flags);
+extern int xfs_reflink_unshare(struct xfs_inode *ip, struct file *filp,
+ xfs_off_t offset, xfs_off_t len);
+
#endif /* __XFS_REFLINK_H */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 57/58] xfs: support XFS_XFLAG_REFLINK (and FS_NOCOW_FL) on reflink filesystems
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (55 preceding siblings ...)
2015-10-07 5:01 ` [PATCH 56/58] xfs: unshare a range of blocks via fallocate Darrick J. Wong
@ 2015-10-07 5:01 ` Darrick J. Wong
2015-10-07 5:01 ` [PATCH 58/58] xfs: recognize the reflink feature bit Darrick J. Wong
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:01 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Report the reflink/nocow flags as appropriate in the XFS-specific and
"standard" getattr ioctls.
Allow the user to clear the reflink flag (or set the nocow flag) if the
size is zero.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_fs.h | 1 +
fs/xfs/xfs_inode.c | 10 +++++++--
fs/xfs/xfs_ioctl.c | 15 +++++++++++++-
fs/xfs/xfs_reflink.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_reflink.h | 4 ++++
5 files changed, 77 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index c63afd4..0dcffc8 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -67,6 +67,7 @@ struct fsxattr {
#define XFS_XFLAG_EXTSZINHERIT 0x00001000 /* inherit inode extent size */
#define XFS_XFLAG_NODEFRAG 0x00002000 /* do not defragment */
#define XFS_XFLAG_FILESTREAM 0x00004000 /* use filestream allocator */
+#define XFS_XFLAG_REFLINK 0x00008000 /* file is reflinked */
#define XFS_XFLAG_HASATTR 0x80000000 /* no DIFLAG for this */
/*
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 61a468b..297151a 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -610,7 +610,8 @@ __xfs_iflock(
STATIC uint
_xfs_dic2xflags(
- __uint16_t di_flags)
+ __uint16_t di_flags,
+ __uint64_t di_flags2)
{
uint flags = 0;
@@ -643,6 +644,8 @@ _xfs_dic2xflags(
flags |= XFS_XFLAG_NODEFRAG;
if (di_flags & XFS_DIFLAG_FILESTREAM)
flags |= XFS_XFLAG_FILESTREAM;
+ if (di_flags2 & XFS_DIFLAG2_REFLINK)
+ flags |= XFS_XFLAG_REFLINK;
}
return flags;
@@ -654,7 +657,7 @@ xfs_ip2xflags(
{
xfs_icdinode_t *dic = &ip->i_d;
- return _xfs_dic2xflags(dic->di_flags) |
+ return _xfs_dic2xflags(dic->di_flags, dic->di_flags2) |
(XFS_IFORK_Q(ip) ? XFS_XFLAG_HASATTR : 0);
}
@@ -662,7 +665,8 @@ uint
xfs_dic2xflags(
xfs_dinode_t *dip)
{
- return _xfs_dic2xflags(be16_to_cpu(dip->di_flags)) |
+ return _xfs_dic2xflags(be16_to_cpu(dip->di_flags),
+ be64_to_cpu(dip->di_flags2)) |
(XFS_DFORK_Q(dip) ? XFS_XFLAG_HASATTR : 0);
}
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 92aaca0..1dba30b 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -870,6 +870,10 @@ xfs_merge_ioc_xflags(
xflags |= XFS_XFLAG_NODUMP;
else
xflags &= ~XFS_XFLAG_NODUMP;
+ if (flags & FS_NOCOW_FL)
+ xflags &= ~XFS_XFLAG_REFLINK;
+ else
+ xflags |= XFS_XFLAG_REFLINK;
return xflags;
}
@@ -1181,6 +1185,10 @@ xfs_ioctl_setattr(
trace_xfs_ioctl_setattr(ip);
+ code = xfs_reflink_check_flag_adjust(ip, &fa->fsx_xflags);
+ if (code)
+ return code;
+
code = xfs_ioctl_setattr_check_projid(ip, fa);
if (code)
return code;
@@ -1303,6 +1311,7 @@ xfs_ioc_getxflags(
unsigned int flags;
flags = xfs_di2lxflags(ip->i_d.di_flags);
+ xfs_reflink_get_lxflags(ip, &flags);
if (copy_to_user(arg, &flags, sizeof(flags)))
return -EFAULT;
return 0;
@@ -1324,11 +1333,15 @@ xfs_ioc_setxflags(
if (flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \
FS_NOATIME_FL | FS_NODUMP_FL | \
- FS_SYNC_FL))
+ FS_SYNC_FL | FS_NOCOW_FL))
return -EOPNOTSUPP;
fa.fsx_xflags = xfs_merge_ioc_xflags(flags, xfs_ip2xflags(ip));
+ error = xfs_reflink_check_flag_adjust(ip, &fa.fsx_xflags);
+ if (error)
+ return error;
+
error = mnt_want_write_file(filp);
if (error)
return error;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 92d8345..99635aa 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1892,3 +1892,54 @@ out:
trace_xfs_reflink_unshare_error(ip, error, _RET_IP_);
return error;
}
+
+/**
+ * xfs_reflink_get_lxflags() - set reflink-related linux inode flags
+ *
+ * @ip: XFS inode
+ * @flags: Pointer to the user-visible inode flags
+ */
+void
+xfs_reflink_get_lxflags(
+ struct xfs_inode *ip, /* XFS inode */
+ unsigned int *flags) /* user flags */
+{
+ /*
+ * If this is a reflink-capable filesystem and there are no shared
+ * blocks, then this is a "nocow" file.
+ */
+ if (!xfs_sb_version_hasreflink(&ip->i_mount->m_sb) ||
+ xfs_is_reflink_inode(ip))
+ return;
+ *flags |= FS_NOCOW_FL;
+}
+
+/**
+ * xfs_reflink_check_flag_adjust() - The only change we allow to the inode
+ * reflink flag is to clear it when the
+ * fs supports reflink and the size is zero.
+ *
+ * @ip: XFS inode
+ * @xflags: XFS in-core inode flags
+ */
+int
+xfs_reflink_check_flag_adjust(
+ struct xfs_inode *ip,
+ unsigned int *xflags)
+{
+ unsigned int chg;
+
+ chg = !!(*xflags & XFS_XFLAG_REFLINK) ^ !!xfs_is_reflink_inode(ip);
+
+ if (!chg)
+ return 0;
+ if (!xfs_sb_version_hasreflink(&ip->i_mount->m_sb))
+ return -EOPNOTSUPP;
+ if (i_size_read(VFS_I(ip)) != 0)
+ return -EINVAL;
+ if (*xflags & XFS_XFLAG_REFLINK) {
+ *xflags &= ~XFS_XFLAG_REFLINK;
+ return 0;
+ }
+ return 0;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 4ce2cba6..4733bf1 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -54,4 +54,8 @@ extern int xfs_reflink(struct xfs_inode *src, xfs_off_t srcoff,
extern int xfs_reflink_unshare(struct xfs_inode *ip, struct file *filp,
xfs_off_t offset, xfs_off_t len);
+extern void xfs_reflink_get_lxflags(struct xfs_inode *ip, unsigned int *flags);
+extern int xfs_reflink_check_flag_adjust(struct xfs_inode *ip,
+ unsigned int *xflags);
+
#endif /* __XFS_REFLINK_H */
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH 58/58] xfs: recognize the reflink feature bit
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
` (56 preceding siblings ...)
2015-10-07 5:01 ` [PATCH 57/58] xfs: support XFS_XFLAG_REFLINK (and FS_NOCOW_FL) on reflink filesystems Darrick J. Wong
@ 2015-10-07 5:01 ` Darrick J. Wong
57 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-07 5:01 UTC (permalink / raw)
To: david, darrick.wong; +Cc: linux-fsdevel, xfs
Add the reflink feature flag to the set of recognized feature flags.
This enables users to write to reflink filesystems.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/libxfs/xfs_format.h | 3 ++-
fs/xfs/libxfs/xfs_sb.c | 9 +++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 98e52ab..e405823 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -451,7 +451,8 @@ xfs_sb_has_compat_feature(
#define XFS_SB_FEAT_RO_COMPAT_REFLINK (1 << 2) /* reflinked files */
#define XFS_SB_FEAT_RO_COMPAT_ALL \
(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
- XFS_SB_FEAT_RO_COMPAT_RMAPBT)
+ XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
+ XFS_SB_FEAT_RO_COMPAT_REFLINK)
#define XFS_SB_FEAT_RO_COMPAT_UNKNOWN ~XFS_SB_FEAT_RO_COMPAT_ALL
static inline bool
xfs_sb_has_ro_compat_feature(
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 5bc638f..0607cea 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -220,6 +220,15 @@ xfs_mount_validate_sb(
"EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
}
+ if (xfs_sb_version_hasreflink(sbp)) {
+ xfs_alert(mp,
+"EXPERIMENTAL reflink feature enabled. Use at your own risk!");
+ if (xfs_sb_version_hasrmapbt(sbp)) {
+ xfs_alert(mp,
+"EXPERIMENTAL reverse mapping btree AND reflink? You crazy!");
+ }
+ }
+
/*
* More sanity checking. Most of these were stolen directly from
* xfs_repair.
^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH 50/58] xfs: reflink extents from one file to another
2015-10-07 5:00 ` [PATCH 50/58] xfs: reflink extents from one file to another Darrick J. Wong
@ 2015-10-07 5:12 ` kbuild test robot
0 siblings, 0 replies; 67+ messages in thread
From: kbuild test robot @ 2015-10-07 5:12 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-fsdevel, xfs, kbuild-all, darrick.wong
[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]
Hi Darrick,
[auto build test WARNING on v4.3-rc4 -- if it's inappropriate base, please ignore]
config: sh-titan_defconfig (attached as .config)
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh
Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings
All warnings (new ones prefixed by >>):
fs/xfs/xfs_reflink.c: In function 'remap_blocks':
>> fs/xfs/xfs_reflink.c:1385:2: warning: 'error' may be used uninitialized in this function [-Wuninitialized]
vim +/error +1385 fs/xfs/xfs_reflink.c
1369 */
1370 if (imap.br_startblock == HOLESTARTBLOCK ||
1371 imap.br_startblock == DELAYSTARTBLOCK ||
1372 ISUNWRITTEN(&imap))
1373 goto advloop;
1374
1375 error = remap_one_range(dest, &imap, destoff + srcioff);
1376 if (error)
1377 break;
1378 advloop:
1379 /* Advance drange/srange */
1380 srcoff += srcioff + imap.br_blockcount;
1381 destoff += srcioff + imap.br_blockcount;
1382 len -= srcioff + imap.br_blockcount;
1383 }
1384
> 1385 return error;
1386 }
1387 #undef IMAPNEXT
1388
1389 /**
1390 * xfs_reflink() - link a range of blocks from one inode to another
1391 *
1392 * @src: Inode to clone from
1393 * @srcoff: Offset within source to start clone from
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 15297 bytes --]
[-- Attachment #3: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 51/58] xfs: add clone file and clone range ioctls
2015-10-07 5:00 ` [PATCH 51/58] xfs: add clone file and clone range ioctls Darrick J. Wong
@ 2015-10-07 5:13 ` kbuild test robot
2015-10-07 6:46 ` kbuild test robot
2015-10-07 7:35 ` kbuild test robot
2 siblings, 0 replies; 67+ messages in thread
From: kbuild test robot @ 2015-10-07 5:13 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: kbuild-all, david, darrick.wong, linux-fsdevel, xfs
[-- Attachment #1: Type: text/plain, Size: 1379 bytes --]
Hi Darrick,
[auto build test WARNING on v4.3-rc4 -- if it's inappropriate base, please ignore]
config: i386-randconfig-i0-201540 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All warnings (new ones prefixed by >>):
fs/xfs/xfs_ioctl.c: In function 'xfs_file_ioctl':
>> fs/xfs/xfs_ioctl.c:1962:51: warning: large integer implicitly truncated to unsigned type [-Woverflow]
error = xfs_ioctl_reflink(src.file, 0, filp, 0, ~0ULL);
^
vim +1962 fs/xfs/xfs_ioctl.c
1946 error = xfs_fs_eofblocks_from_user(&eofb, &keofb);
1947 if (error)
1948 return error;
1949
1950 return xfs_icache_free_eofblocks(mp, &keofb);
1951 }
1952
1953 case XFS_IOC_CLONE: {
1954 struct fd src;
1955
1956 src = fdget(p);
1957 if (!src.file)
1958 return -EBADF;
1959
1960 trace_xfs_ioctl_clone(file_inode(src.file), file_inode(filp));
1961
> 1962 error = xfs_ioctl_reflink(src.file, 0, filp, 0, ~0ULL);
1963 fdput(src);
1964 if (error > 0)
1965 error = 0;
1966
1967 return error;
1968 }
1969
1970 case XFS_IOC_CLONE_RANGE: {
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 19876 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 51/58] xfs: add clone file and clone range ioctls
2015-10-07 5:00 ` [PATCH 51/58] xfs: add clone file and clone range ioctls Darrick J. Wong
2015-10-07 5:13 ` kbuild test robot
@ 2015-10-07 6:46 ` kbuild test robot
2015-10-07 7:35 ` kbuild test robot
2 siblings, 0 replies; 67+ messages in thread
From: kbuild test robot @ 2015-10-07 6:46 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: kbuild-all, david, darrick.wong, linux-fsdevel, xfs
[-- Attachment #1: Type: text/plain, Size: 483 bytes --]
Hi Darrick,
[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore]
config: i386-allmodconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
>> ERROR: "security_file_permission" undefined!
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 51590 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 51/58] xfs: add clone file and clone range ioctls
2015-10-07 5:00 ` [PATCH 51/58] xfs: add clone file and clone range ioctls Darrick J. Wong
2015-10-07 5:13 ` kbuild test robot
2015-10-07 6:46 ` kbuild test robot
@ 2015-10-07 7:35 ` kbuild test robot
2 siblings, 0 replies; 67+ messages in thread
From: kbuild test robot @ 2015-10-07 7:35 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: kbuild-all, david, darrick.wong, linux-fsdevel, xfs
[-- Attachment #1: Type: text/plain, Size: 503 bytes --]
Hi Darrick,
[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore]
config: x86_64-allmodconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All errors (new ones prefixed by >>):
>> ERROR: "security_file_permission" [fs/xfs/xfs.ko] undefined!
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 50053 bytes --]
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 48/58] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks
2015-10-07 5:00 ` [PATCH 48/58] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks Darrick J. Wong
@ 2015-10-21 21:17 ` Darrick J. Wong
0 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-21 21:17 UTC (permalink / raw)
To: david; +Cc: linux-fsdevel, xfs
On Tue, Oct 06, 2015 at 10:00:25PM -0700, Darrick J. Wong wrote:
> When we're writing zeroes to a reflinked block (such as when we're
> punching a reflinked range), we need to fork the the block and write
> to that, otherwise we can corrupt the other reflinks.
Something that's missing from this patch series: when we shorten a file,
the VFS will zero the page contents between the new and old EOF. Therefore,
we must ensure that the block is forked prior to this happening, or else we
end up changing the shared block, corrupting the other files.
--D
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> fs/xfs/xfs_bmap_util.c | 25 +++++++-
> fs/xfs/xfs_reflink.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_reflink.h | 6 ++
> 3 files changed, 183 insertions(+), 2 deletions(-)
>
>
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 245a34a..b054b28 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -41,6 +41,7 @@
> #include "xfs_icache.h"
> #include "xfs_log.h"
> #include "xfs_rmap_btree.h"
> +#include "xfs_reflink.h"
>
> /* Kernel only BMAP related definitions and functions */
>
> @@ -1095,7 +1096,9 @@ xfs_zero_remaining_bytes(
> xfs_buf_t *bp;
> xfs_mount_t *mp = ip->i_mount;
> int nimap;
> - int error = 0;
> + int error = 0, err2;
> + bool should_fork;
> + struct xfs_trans *tp;
>
> /*
> * Avoid doing I/O beyond eof - it's not necessary
> @@ -1136,8 +1139,14 @@ xfs_zero_remaining_bytes(
> if (lastoffset > endoff)
> lastoffset = endoff;
>
> + /* Do we need to CoW this block? */
> + error = xfs_reflink_should_fork_block(ip, &imap, offset,
> + &should_fork);
> + if (error)
> + return error;
> +
> /* DAX can just zero the backing device directly */
> - if (IS_DAX(VFS_I(ip))) {
> + if (IS_DAX(VFS_I(ip)) && !should_fork) {
> error = dax_zero_page_range(VFS_I(ip), offset,
> lastoffset - offset + 1,
> xfs_get_blocks_direct);
> @@ -1158,10 +1167,22 @@ xfs_zero_remaining_bytes(
> (offset - XFS_FSB_TO_B(mp, imap.br_startoff)),
> 0, lastoffset - offset + 1);
>
> + tp = NULL;
> + if (should_fork) {
> + error = xfs_reflink_fork_buf(ip, bp, offset_fsb, &tp);
> + if (error)
> + return error;
> + }
> +
> error = xfs_bwrite(bp);
> +
> + err2 = xfs_reflink_finish_fork_buf(ip, bp, offset_fsb, tp,
> + error, imap.br_startblock);
> xfs_buf_relse(bp);
> if (error)
> return error;
> + if (err2)
> + return err2;
> }
> return error;
> }
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 226e23f..f5eed2f 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -788,3 +788,157 @@ xfs_reflink_add_ioend(
> {
> list_add_tail(&eio->rlei_list, &ioend->io_reflink_endio_list);
> }
> +
> +/**
> + * xfs_reflink_fork_buf() - start a transaction to fork a buffer (if needed)
> + *
> + * @mp: XFS mount point
> + * @ip: XFS inode
> + * @bp: the buffer that we might need to fork
> + * @fileoff: file offset of the buffer
> + * @ptp: pointer to an XFS transaction
> + */
> +int
> +xfs_reflink_fork_buf(
> + struct xfs_inode *ip,
> + struct xfs_buf *bp,
> + xfs_fileoff_t fileoff,
> + struct xfs_trans **ptp)
> +{
> + struct xfs_mount *mp = ip->i_mount;
> + struct xfs_trans *tp;
> + xfs_fsblock_t fsbno;
> + xfs_fsblock_t new_fsbno;
> + xfs_agnumber_t agno;
> + xfs_agblock_t agbno;
> + uint resblks;
> + int error;
> +
> + fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
> + agno = XFS_FSB_TO_AGNO(mp, fsbno);
> + agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
> + CHECK_AG_NUMBER(mp, agno);
> + CHECK_AG_EXTENT(mp, agno, 1);
> +
> + /*
> + * Get ready to remap the thing...
> + */
> + resblks = XFS_DIOSTRAT_SPACE_RES(mp, 3);
> + tp = xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE);
> + error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, resblks, 0);
> +
> + /*
> + * check for running out of space
> + */
> + if (error) {
> + /*
> + * Free the transaction structure.
> + */
> + ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
> + goto out_cancel;
> + }
> + error = xfs_trans_reserve_quota(tp, mp,
> + ip->i_udquot, ip->i_gdquot, ip->i_pdquot,
> + resblks, 0, XFS_QMOPT_RES_REGBLKS);
> + if (error)
> + goto out_cancel;
> +
> + xfs_ilock(ip, XFS_ILOCK_EXCL);
> + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> +
> + /* fork block, remap buffer */
> + error = fork_one_block(mp, tp, ip, fsbno, &new_fsbno, fileoff);
> + if (error)
> + goto out_cancel;
> +
> + trace_xfs_reflink_fork_buf(ip, fileoff, fsbno, 1, new_fsbno);
> +
> + XFS_BUF_SET_ADDR(bp, XFS_FSB_TO_DADDR(mp, new_fsbno));
> + *ptp = tp;
> + return error;
> +
> +out_cancel:
> + xfs_trans_cancel(tp);
> + trace_xfs_reflink_fork_buf_error(ip, error, _RET_IP_);
> + return error;
> +}
> +
> +/**
> + * xfs_reflink_finish_fork_buf() - finish forking a file buffer
> + *
> + * @ip: XFS inode
> + * @bp: the buffer that was forked
> + * @fileoff: file offset of the buffer
> + * @tp: transaction that was returned from xfs_reflink_fork_buf()
> + * @write_error: status code from writing the block
> + */
> +int
> +xfs_reflink_finish_fork_buf(
> + struct xfs_inode *ip,
> + struct xfs_buf *bp,
> + xfs_fileoff_t fileoff,
> + struct xfs_trans *tp,
> + int write_error,
> + xfs_fsblock_t old_fsbno)
> +{
> + struct xfs_mount *mp = ip->i_mount;
> + struct xfs_bmap_free free_list;
> + xfs_fsblock_t firstfsb;
> + xfs_fsblock_t fsbno;
> + struct xfs_bmbt_irec imaps[1];
> + xfs_agnumber_t agno;
> + int nimaps = 1;
> + int done;
> + int error = write_error;
> + int committed;
> + struct xfs_owner_info oinfo;
> +
> + if (tp == NULL)
> + return 0;
> +
> + fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
> + agno = XFS_FSB_TO_AGNO(mp, fsbno);
> + XFS_RMAP_INO_OWNER(&oinfo, ip->i_ino, XFS_DATA_FORK, fileoff);
> + if (write_error != 0)
> + goto out_write_error;
> +
> + trace_xfs_reflink_fork_buf(ip, fileoff, old_fsbno, 1, fsbno);
> + /*
> + * Remap the old blocks.
> + */
> + xfs_bmap_init(&free_list, &firstfsb);
> + error = xfs_bunmapi(tp, ip, fileoff, 1, 0, 1, &firstfsb, &free_list,
> + &done);
> + if (error)
> + goto out_free;
> + ASSERT(done == 1);
> +
> + error = xfs_bmapi_write(tp, ip, fileoff, 1, XFS_BMAPI_REMAP, &fsbno,
> + 1, &imaps[0], &nimaps, &free_list);
> + if (error)
> + goto out_free;
> +
> + /*
> + * complete the transaction
> + */
> + error = xfs_bmap_finish(&tp, &free_list, &committed);
> + if (error)
> + goto out_cancel;
> +
> + error = xfs_trans_commit(tp);
> + if (error)
> + goto out_error;
> +
> + return error;
> +out_free:
> + xfs_bmap_finish(&tp, &free_list, &committed);
> +out_write_error:
> + done = xfs_free_extent(tp, fsbno, 1, &oinfo);
> + if (error == 0)
> + error = done;
> +out_cancel:
> + xfs_trans_cancel(tp);
> +out_error:
> + trace_xfs_reflink_finish_fork_buf_error(ip, error, _RET_IP_);
> + return error;
> +}
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index b3e12d2..ce00cf6 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -38,4 +38,10 @@ extern void xfs_reflink_add_ioend(struct xfs_ioend *ioend,
> extern int xfs_reflink_should_fork_block(struct xfs_inode *ip,
> struct xfs_bmbt_irec *imap, xfs_off_t offset, bool *type);
>
> +extern int xfs_reflink_fork_buf(struct xfs_inode *ip, struct xfs_buf *bp,
> + xfs_fileoff_t fileoff, struct xfs_trans **ptp);
> +extern int xfs_reflink_finish_fork_buf(struct xfs_inode *ip, struct xfs_buf *bp,
> + xfs_fileoff_t fileoff, struct xfs_trans *tp, int write_error,
> + xfs_fsblock_t old_fsbno);
> +
> #endif /* __XFS_REFLINK_H */
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 25/58] xfs: bmap btree changes should update rmap btree
2015-10-07 4:57 ` [PATCH 25/58] xfs: bmap btree changes should update rmap btree Darrick J. Wong
@ 2015-10-21 21:39 ` Darrick J. Wong
0 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-21 21:39 UTC (permalink / raw)
To: david; +Cc: linux-fsdevel, xfs
On Tue, Oct 06, 2015 at 09:57:52PM -0700, Darrick J. Wong wrote:
> Any update to a file's bmap should make the corresponding change to
> the rmapbt. On a reflink filesystem, this is absolutely required
> because a given (file data) physical block can have multiple owners
> and the only sane way to find an rmap given a bmap is if there is a
> 1:1 correspondence.
>
> (At some point we can optimize this for non-reflink filesystems
> because regular merge still works there.)
Still working on this; xfs_repair will have to be taught to merge the
extent rmaps for non-reflink filesystems, and the functions that are
called by the functions in xfs_bmap.c will of course need to be shut
off on !reflink.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 262 ++++++++++++++++++++++++++++++++++-
> fs/xfs/libxfs/xfs_bmap.h | 1
> fs/xfs/libxfs/xfs_rmap.c | 296 ++++++++++++++++++++++++++++++++++++++++
> fs/xfs/libxfs/xfs_rmap_btree.h | 20 +++
> 4 files changed, 568 insertions(+), 11 deletions(-)
>
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index e740ef5..81f0ae0 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -45,6 +45,7 @@
> #include "xfs_symlink.h"
> #include "xfs_attr_leaf.h"
> #include "xfs_filestream.h"
> +#include "xfs_rmap_btree.h"
>
>
> kmem_zone_t *xfs_bmap_free_item_zone;
> @@ -1861,6 +1862,10 @@ xfs_bmap_add_extent_delay_real(
> if (error)
> goto done;
> }
> + error = xfs_rmap_combine(bma->rcur, bma->ip->i_ino,
> + XFS_DATA_FORK, &LEFT, &RIGHT, &PREV);
> + if (error)
> + goto done;
> break;
>
> case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG:
> @@ -1893,6 +1898,10 @@ xfs_bmap_add_extent_delay_real(
> if (error)
> goto done;
> }
> + error = xfs_rmap_lcombine(bma->rcur, bma->ip->i_ino,
> + XFS_DATA_FORK, &LEFT, &PREV);
> + if (error)
> + goto done;
> break;
>
> case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
> @@ -1924,6 +1933,10 @@ xfs_bmap_add_extent_delay_real(
> if (error)
> goto done;
> }
> + error = xfs_rmap_rcombine(bma->rcur, bma->ip->i_ino,
> + XFS_DATA_FORK, &RIGHT, &PREV, new);
> + if (error)
> + goto done;
> break;
>
> case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING:
> @@ -1953,6 +1966,10 @@ xfs_bmap_add_extent_delay_real(
> goto done;
> XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
> }
> + error = xfs_rmap_insert(bma->rcur, bma->ip->i_ino,
> + XFS_DATA_FORK, new);
> + if (error)
> + goto done;
> break;
>
> case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG:
> @@ -1988,6 +2005,10 @@ xfs_bmap_add_extent_delay_real(
> if (error)
> goto done;
> }
> + error = xfs_rmap_lcombine(bma->rcur, bma->ip->i_ino,
> + XFS_DATA_FORK, &LEFT, new);
> + if (error)
> + goto done;
> da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp),
> startblockval(PREV.br_startblock));
> xfs_bmbt_set_startblock(ep, nullstartblock(da_new));
> @@ -2023,6 +2044,10 @@ xfs_bmap_add_extent_delay_real(
> goto done;
> XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
> }
> + error = xfs_rmap_insert(bma->rcur, bma->ip->i_ino,
> + XFS_DATA_FORK, new);
> + if (error)
> + goto done;
>
> if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
> error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
> @@ -2071,6 +2096,8 @@ xfs_bmap_add_extent_delay_real(
> if (error)
> goto done;
> }
> + error = xfs_rmap_rcombine(bma->rcur, bma->ip->i_ino,
> + XFS_DATA_FORK, &RIGHT, new, new);
>
> da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp),
> startblockval(PREV.br_startblock));
> @@ -2107,6 +2134,10 @@ xfs_bmap_add_extent_delay_real(
> goto done;
> XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
> }
> + error = xfs_rmap_insert(bma->rcur, bma->ip->i_ino,
> + XFS_DATA_FORK, new);
> + if (error)
> + goto done;
>
> if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
> error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
> @@ -2176,6 +2207,10 @@ xfs_bmap_add_extent_delay_real(
> goto done;
> XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
> }
> + error = xfs_rmap_insert(bma->rcur, bma->ip->i_ino,
> + XFS_DATA_FORK, new);
> + if (error)
> + goto done;
>
> if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
> error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
> @@ -2271,7 +2306,8 @@ xfs_bmap_add_extent_unwritten_real(
> xfs_bmbt_irec_t *new, /* new data to add to file extents */
> xfs_fsblock_t *first, /* pointer to firstblock variable */
> xfs_bmap_free_t *flist, /* list of extents to be freed */
> - int *logflagsp) /* inode logging flags */
> + int *logflagsp, /* inode logging flags */
> + struct xfs_btree_cur *rcur)/* rmap btree pointer */
> {
> xfs_btree_cur_t *cur; /* btree cursor */
> xfs_bmbt_rec_host_t *ep; /* extent entry for idx */
> @@ -2417,6 +2453,10 @@ xfs_bmap_add_extent_unwritten_real(
> RIGHT.br_blockcount, LEFT.br_state)))
> goto done;
> }
> + error = xfs_rmap_combine(rcur, ip->i_ino,
> + XFS_DATA_FORK, &LEFT, &RIGHT, &PREV);
> + if (error)
> + goto done;
> break;
>
> case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG:
> @@ -2454,6 +2494,10 @@ xfs_bmap_add_extent_unwritten_real(
> LEFT.br_state)))
> goto done;
> }
> + error = xfs_rmap_lcombine(rcur, ip->i_ino,
> + XFS_DATA_FORK, &LEFT, &PREV);
> + if (error)
> + goto done;
> break;
>
> case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
> @@ -2489,6 +2533,10 @@ xfs_bmap_add_extent_unwritten_real(
> newext)))
> goto done;
> }
> + error = xfs_rmap_rcombine(rcur, ip->i_ino,
> + XFS_DATA_FORK, &RIGHT, &PREV, new);
> + if (error)
> + goto done;
> break;
>
> case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING:
Missing a xfs_rmap operation to update the rmap here.
> @@ -2562,6 +2610,14 @@ xfs_bmap_add_extent_unwritten_real(
> if (error)
> goto done;
> }
> + error = xfs_rmap_move(rcur, ip->i_ino,
> + XFS_DATA_FORK, &PREV, new->br_blockcount);
> + if (error)
> + goto done;
> + error = xfs_rmap_resize(rcur, ip->i_ino,
> + XFS_DATA_FORK, &LEFT, -new->br_blockcount);
The move operation moves the start of PREV rightward, so the arg to resize
should be positive to make LEFT bigger.
> + if (error)
> + goto done;
> break;
>
> case BMAP_LEFT_FILLING:
> @@ -2600,6 +2656,14 @@ xfs_bmap_add_extent_unwritten_real(
> goto done;
> XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
> }
> + error = xfs_rmap_move(rcur, ip->i_ino,
> + XFS_DATA_FORK, &PREV, new->br_blockcount);
> + if (error)
> + goto done;
> + error = xfs_rmap_insert(rcur, ip->i_ino,
> + XFS_DATA_FORK, new);
> + if (error)
> + goto done;
> break;
>
> case BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
> @@ -2642,6 +2706,14 @@ xfs_bmap_add_extent_unwritten_real(
> newext)))
> goto done;
> }
> + error = xfs_rmap_resize(rcur, ip->i_ino,
> + XFS_DATA_FORK, &PREV, -new->br_blockcount);
> + if (error)
> + goto done;
> + error = xfs_rmap_move(rcur, ip->i_ino,
> + XFS_DATA_FORK, &RIGHT, -new->br_blockcount);
> + if (error)
> + goto done;
> break;
>
> case BMAP_RIGHT_FILLING:
> @@ -2682,6 +2754,14 @@ xfs_bmap_add_extent_unwritten_real(
> goto done;
> XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
> }
> + error = xfs_rmap_resize(rcur, ip->i_ino,
> + XFS_DATA_FORK, &PREV, -new->br_blockcount);
> + if (error)
> + goto done;
> + error = xfs_rmap_insert(rcur, ip->i_ino,
> + XFS_DATA_FORK, new);
> + if (error)
> + goto done;
> break;
>
> case 0:
> @@ -2743,6 +2823,17 @@ xfs_bmap_add_extent_unwritten_real(
> goto done;
> XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
> }
> + error = xfs_rmap_resize(rcur, ip->i_ino, XFS_DATA_FORK, &PREV,
> + new->br_startoff - PREV.br_startoff -
> + PREV.br_blockcount);
> + if (error)
> + goto done;
> + error = xfs_rmap_insert(rcur, ip->i_ino, XFS_DATA_FORK, new);
> + if (error)
> + goto done;
> + error = xfs_rmap_insert(rcur, ip->i_ino, XFS_DATA_FORK, &r[1]);
> + if (error)
> + goto done;
> break;
>
> case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG:
> @@ -2946,6 +3037,7 @@ xfs_bmap_add_extent_hole_real(
> int rval=0; /* return value (logging flags) */
> int state; /* state bits, accessed thru macros */
> struct xfs_mount *mp;
> + struct xfs_bmbt_irec prev; /* fake previous extent entry */
>
> mp = bma->tp ? bma->tp->t_mountp : NULL;
> ifp = XFS_IFORK_PTR(bma->ip, whichfork);
> @@ -3053,6 +3145,12 @@ xfs_bmap_add_extent_hole_real(
> if (error)
> goto done;
> }
> + prev = *new;
> + prev.br_startblock = nullstartblock(0);
> + error = xfs_rmap_combine(bma->rcur, bma->ip->i_ino,
> + whichfork, &left, &right, &prev);
> + if (error)
> + goto done;
> break;
>
> case BMAP_LEFT_CONTIG:
> @@ -3085,6 +3183,10 @@ xfs_bmap_add_extent_hole_real(
> if (error)
> goto done;
> }
> + error = xfs_rmap_resize(bma->rcur, bma->ip->i_ino,
> + whichfork, &left, new->br_blockcount);
> + if (error)
> + goto done;
> break;
>
> case BMAP_RIGHT_CONTIG:
> @@ -3119,6 +3221,10 @@ xfs_bmap_add_extent_hole_real(
> if (error)
> goto done;
> }
> + error = xfs_rmap_move(bma->rcur, bma->ip->i_ino,
> + whichfork, &right, -new->br_blockcount);
> + if (error)
> + goto done;
> break;
>
> case 0:
> @@ -3147,6 +3253,10 @@ xfs_bmap_add_extent_hole_real(
> goto done;
> XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
> }
> + error = xfs_rmap_insert(bma->rcur, bma->ip->i_ino,
> + whichfork, new);
> + if (error)
> + goto done;
> break;
> }
>
> @@ -4276,6 +4386,59 @@ xfs_bmapi_delay(
> return 0;
> }
>
> +static int
> +alloc_rcur(
> + struct xfs_mount *mp,
> + struct xfs_trans *tp,
> + struct xfs_btree_cur **pcur,
> + xfs_fsblock_t fsblock)
> +{
> + struct xfs_btree_cur *cur = *pcur;
> + struct xfs_buf *agbp;
> + int error;
> + xfs_agnumber_t agno;
> +
> + agno = XFS_FSB_TO_AGNO(mp, fsblock);
> + if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> + return 0;
> + if (cur && cur->bc_private.a.agno == agno)
> + return 0;
> + if (isnullstartblock(fsblock))
> + return 0;
> +
> + error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
> + if (error)
> + return error;
> +
> + cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
> + if (!cur) {
> + xfs_trans_brelse(tp, agbp);
> + return -ENOMEM;
> + }
> +
> + *pcur = cur;
> + return 0;
> +}
> +
> +static void
> +free_rcur(
> + struct xfs_btree_cur **pcur,
> + int bt_error)
> +{
> + struct xfs_btree_cur *cur = *pcur;
> + struct xfs_buf *agbp;
> + struct xfs_trans *tp;
> +
> + if (cur == NULL)
> + return;
> +
> + agbp = cur->bc_private.a.agbp;
> + tp = cur->bc_tp;
> + xfs_btree_del_cursor(cur, bt_error);
> + xfs_trans_brelse(tp, agbp);
> +
> + *pcur = NULL;
> +}
So this whole strategy of updating rmaps right after updating the bmbt
has an unfortunate flaw -- we iterate file extents in logical offset
order, which means that we have no guarantee that the extents we
process will be in order of increasing AG. This violates the deadlock
prevention rule that says we must only lock thing in increasing AG
order, and indeed I've seen some deadlocks with generic/013.
A solution to this is to pursue a strategy similar to the
xfs_bmap_free list handling -- collect rmap update intents in a list
that's kept sorted in first AG and then chronological order, then when
we're done making bmbt updates we can use xfs_trans_roll to make all
the rmapbt updates in AG order as part of a separate transaction.
I don't know if that's a good strategy since it means that bmbt and
rmapbt updates commit in different transactions, but it at least gets
rid of some deadlock opportunities.
A side benefit is that we can hang the rmap intents off the
xfs_bmap_free structure which should reduce the amount of code
changes.
>
> static int
> xfs_bmapi_allocate(
> @@ -4368,6 +4531,10 @@ xfs_bmapi_allocate(
> xfs_sb_version_hasextflgbit(&mp->m_sb))
> bma->got.br_state = XFS_EXT_UNWRITTEN;
>
> + error = alloc_rcur(mp, bma->tp, &bma->rcur, bma->got.br_startblock);
> + if (error)
> + return error;
> +
> if (bma->wasdel)
> error = xfs_bmap_add_extent_delay_real(bma);
> else
> @@ -4429,9 +4596,14 @@ xfs_bmapi_convert_unwritten(
> mval->br_state = (mval->br_state == XFS_EXT_UNWRITTEN)
> ? XFS_EXT_NORM : XFS_EXT_UNWRITTEN;
>
> + error = alloc_rcur(bma->ip->i_mount, bma->tp, &bma->rcur,
> + mval->br_startblock);
> + if (error)
> + return error;
> +
> error = xfs_bmap_add_extent_unwritten_real(bma->tp, bma->ip, &bma->idx,
> &bma->cur, mval, bma->firstblock, bma->flist,
> - &tmp_logflags);
> + &tmp_logflags, bma->rcur);
> /*
> * Log the inode core unconditionally in the unwritten extent conversion
> * path because the conversion might not have done so (e.g., if the
> @@ -4633,6 +4805,7 @@ xfs_bmapi_write(
> }
> *nmap = n;
>
> + free_rcur(&bma.rcur, XFS_BTREE_NOERROR);
> /*
> * Transform from btree to extents, give it cur.
> */
> @@ -4652,6 +4825,7 @@ xfs_bmapi_write(
> XFS_IFORK_MAXEXT(ip, whichfork));
> error = 0;
> error0:
> + free_rcur(&bma.rcur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
> /*
> * Log everything. Do this after conversion, there's no point in
> * logging the extent records if we've converted to btree format.
> @@ -4704,7 +4878,8 @@ xfs_bmap_del_extent(
> xfs_btree_cur_t *cur, /* if null, not a btree */
> xfs_bmbt_irec_t *del, /* data to remove from extents */
> int *logflagsp, /* inode logging flags */
> - int whichfork) /* data or attr fork */
> + int whichfork, /* data or attr fork */
> + struct xfs_btree_cur *rcur) /* rmap btree */
> {
> xfs_filblks_t da_new; /* new delay-alloc indirect blocks */
> xfs_filblks_t da_old; /* old delay-alloc indirect blocks */
> @@ -4822,6 +4997,9 @@ xfs_bmap_del_extent(
> XFS_IFORK_NEXT_SET(ip, whichfork,
> XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
> flags |= XFS_ILOG_CORE;
> + error = xfs_rmap_delete(rcur, ip->i_ino, whichfork, &got);
> + if (error)
> + goto done;
> if (!cur) {
> flags |= xfs_ilog_fext(whichfork);
> break;
> @@ -4849,6 +5027,10 @@ xfs_bmap_del_extent(
> }
> xfs_bmbt_set_startblock(ep, del_endblock);
> trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
> + error = xfs_rmap_move(rcur, ip->i_ino, whichfork,
> + &got, del->br_blockcount);
> + if (error)
> + goto done;
> if (!cur) {
> flags |= xfs_ilog_fext(whichfork);
> break;
> @@ -4875,6 +5057,10 @@ xfs_bmap_del_extent(
> break;
> }
> trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
> + error = xfs_rmap_resize(rcur, ip->i_ino, whichfork,
> + &got, -del->br_blockcount);
> + if (error)
> + goto done;
> if (!cur) {
> flags |= xfs_ilog_fext(whichfork);
> break;
> @@ -4900,6 +5086,15 @@ xfs_bmap_del_extent(
> if (!delay) {
> new.br_startblock = del_endblock;
> flags |= XFS_ILOG_CORE;
> + error = xfs_rmap_resize(rcur, ip->i_ino,
> + whichfork, &got,
> + temp - got.br_blockcount);
> + if (error)
> + goto done;
> + error = xfs_rmap_insert(rcur, ip->i_ino,
> + whichfork, &new);
> + if (error)
> + goto done;
> if (cur) {
> if ((error = xfs_bmbt_update(cur,
> got.br_startoff,
> @@ -5052,6 +5247,7 @@ xfs_bunmapi(
> int wasdel; /* was a delayed alloc extent */
> int whichfork; /* data or attribute fork */
> xfs_fsblock_t sum;
> + struct xfs_btree_cur *rcur = NULL; /* rmap btree */
>
> trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_);
>
> @@ -5136,6 +5332,11 @@ xfs_bunmapi(
> got.br_startoff + got.br_blockcount - 1);
> if (bno < start)
> break;
> +
> + error = alloc_rcur(mp, tp, &rcur, got.br_startblock);
> + if (error)
> + goto error0;
> +
> /*
> * Then deal with the (possibly delayed) allocated space
> * we found.
> @@ -5195,7 +5396,7 @@ xfs_bunmapi(
> del.br_state = XFS_EXT_UNWRITTEN;
> error = xfs_bmap_add_extent_unwritten_real(tp, ip,
> &lastx, &cur, &del, firstblock, flist,
> - &logflags);
> + &logflags, rcur);
> if (error)
> goto error0;
> goto nodelete;
> @@ -5253,7 +5454,8 @@ xfs_bunmapi(
> lastx--;
> error = xfs_bmap_add_extent_unwritten_real(tp,
> ip, &lastx, &cur, &prev,
> - firstblock, flist, &logflags);
> + firstblock, flist, &logflags,
> + rcur);
> if (error)
> goto error0;
> goto nodelete;
> @@ -5262,7 +5464,8 @@ xfs_bunmapi(
> del.br_state = XFS_EXT_UNWRITTEN;
> error = xfs_bmap_add_extent_unwritten_real(tp,
> ip, &lastx, &cur, &del,
> - firstblock, flist, &logflags);
> + firstblock, flist, &logflags,
> + rcur);
> if (error)
> goto error0;
> goto nodelete;
> @@ -5315,7 +5518,7 @@ xfs_bunmapi(
> goto error0;
> }
> error = xfs_bmap_del_extent(ip, tp, &lastx, flist, cur, &del,
> - &tmp_logflags, whichfork);
> + &tmp_logflags, whichfork, rcur);
> logflags |= tmp_logflags;
> if (error)
> goto error0;
> @@ -5339,6 +5542,7 @@ nodelete:
> }
> *done = bno == (xfs_fileoff_t)-1 || bno < start || lastx < 0;
>
> + free_rcur(&rcur, XFS_BTREE_NOERROR);
> /*
> * Convert to a btree if necessary.
> */
> @@ -5366,6 +5570,7 @@ nodelete:
> */
> error = 0;
> error0:
> + free_rcur(&rcur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
> /*
> * Log everything. Do this after conversion, there's no point in
> * logging the extent records if we've converted to btree format.
> @@ -5438,7 +5643,8 @@ xfs_bmse_merge(
> struct xfs_bmbt_rec_host *gotp, /* extent to shift */
> struct xfs_bmbt_rec_host *leftp, /* preceding extent */
> struct xfs_btree_cur *cur,
> - int *logflags) /* output */
> + int *logflags, /* output */
> + struct xfs_btree_cur *rcur) /* rmap btree */
> {
> struct xfs_bmbt_irec got;
> struct xfs_bmbt_irec left;
> @@ -5469,6 +5675,13 @@ xfs_bmse_merge(
> XFS_IFORK_NEXT_SET(ip, whichfork,
> XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
> *logflags |= XFS_ILOG_CORE;
> + error = xfs_rmap_resize(rcur, ip->i_ino, whichfork, &left,
> + blockcount - left.br_blockcount);
> + if (error)
> + return error;
> + error = xfs_rmap_delete(rcur, ip->i_ino, whichfork, &got);
> + if (error)
> + return error;
> if (!cur) {
> *logflags |= XFS_ILOG_DEXT;
> return 0;
> @@ -5511,7 +5724,8 @@ xfs_bmse_shift_one(
> struct xfs_bmbt_rec_host *gotp,
> struct xfs_btree_cur *cur,
> int *logflags,
> - enum shift_direction direction)
> + enum shift_direction direction,
> + struct xfs_btree_cur *rcur)
> {
> struct xfs_ifork *ifp;
> struct xfs_mount *mp;
> @@ -5561,7 +5775,7 @@ xfs_bmse_shift_one(
> offset_shift_fsb)) {
> return xfs_bmse_merge(ip, whichfork, offset_shift_fsb,
> *current_ext, gotp, adj_irecp,
> - cur, logflags);
> + cur, logflags, rcur);
> }
> } else {
> startoff = got.br_startoff + offset_shift_fsb;
> @@ -5598,6 +5812,10 @@ update_current_ext:
> (*current_ext)--;
> xfs_bmbt_set_startoff(gotp, startoff);
> *logflags |= XFS_ILOG_CORE;
> + error = xfs_rmap_slide(rcur, ip->i_ino, whichfork,
> + &got, startoff - got.br_startoff);
> + if (error)
> + return error;
> if (!cur) {
> *logflags |= XFS_ILOG_DEXT;
> return 0;
> @@ -5649,6 +5867,7 @@ xfs_bmap_shift_extents(
> int error = 0;
> int whichfork = XFS_DATA_FORK;
> int logflags = 0;
> + struct xfs_btree_cur *rcur = NULL;
>
> if (unlikely(XFS_TEST_ERROR(
> (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> @@ -5737,9 +5956,14 @@ xfs_bmap_shift_extents(
> }
>
> while (nexts++ < num_exts) {
> + xfs_bmbt_get_all(gotp, &got);
> + error = alloc_rcur(mp, tp, &rcur, got.br_startblock);
> + if (error)
> + return error;
> +
> error = xfs_bmse_shift_one(ip, whichfork, offset_shift_fsb,
> ¤t_ext, gotp, cur, &logflags,
> - direction);
> + direction, rcur);
> if (error)
> goto del_cursor;
> /*
> @@ -5765,6 +5989,7 @@ xfs_bmap_shift_extents(
> }
>
> del_cursor:
> + free_rcur(&rcur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
> if (cur)
> xfs_btree_del_cursor(cur,
> error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
> @@ -5801,6 +6026,7 @@ xfs_bmap_split_extent_at(
> int error = 0;
> int logflags = 0;
> int i = 0;
> + struct xfs_btree_cur *rcur = NULL;
>
> if (unlikely(XFS_TEST_ERROR(
> (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> @@ -5895,6 +6121,18 @@ xfs_bmap_split_extent_at(
> XFS_WANT_CORRUPTED_GOTO(mp, i == 1, del_cursor);
> }
>
> + /* update rmapbt */
> + error = alloc_rcur(mp, tp, &rcur, new.br_startblock);
> + if (error)
> + goto del_cursor;
> + error = xfs_rmap_resize(rcur, ip->i_ino, whichfork, &got, -gotblkcnt);
got is modified by the previous code clause, so this causes incorrect
rmap entries to end up in the rmapbt.
> + if (error)
> + goto del_cursor;
> + error = xfs_rmap_insert(rcur, ip->i_ino, whichfork, &new);
> + if (error)
> + goto del_cursor;
> + free_rcur(&rcur, XFS_BTREE_NOERROR);
> +
> /*
> * Convert to a btree if necessary.
> */
> @@ -5908,6 +6146,8 @@ xfs_bmap_split_extent_at(
> }
>
> del_cursor:
> + free_rcur(&rcur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
> +
> if (cur) {
> cur->bc_private.b.allocated = 0;
> xfs_btree_del_cursor(cur,
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index 89fa3dd..59f26cf 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -56,6 +56,7 @@ struct xfs_bmalloca {
> bool aeof; /* allocated space at eof */
> bool conv; /* overwriting unwritten extents */
> int flags;
> + struct xfs_btree_cur *rcur; /* rmap btree cursor */
> };
>
> /*
> diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> index 045f9a7..d821b1a 100644
> --- a/fs/xfs/libxfs/xfs_rmap.c
> +++ b/fs/xfs/libxfs/xfs_rmap.c
> @@ -563,3 +563,299 @@ out_error:
> xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> return error;
> }
> +
> +/* Encode logical offset for a rmapbt record */
> +STATIC uint64_t
> +b2r_off(
> + int whichfork,
> + xfs_fileoff_t off)
> +{
> + uint64_t x;
> +
> + x = off;
> + if (whichfork == XFS_ATTR_FORK)
> + x |= XFS_RMAP_OFF_ATTR;
> + return x;
> +}
> +
> +/* Encode blockcount for a rmapbt record */
> +STATIC xfs_extlen_t
> +b2r_len(
> + struct xfs_bmbt_irec *irec)
> +{
> + xfs_extlen_t x;
> +
> + x = irec->br_blockcount;
> + if (irec->br_state == XFS_EXT_UNWRITTEN)
> + x |= XFS_RMAP_LEN_UNWRITTEN;
> + return x;
> +}
> +
> +/* Combine two adjacent rmap extents */
> +int
> +xfs_rmap_combine(
> + struct xfs_btree_cur *rcur,
> + xfs_ino_t ino,
> + int whichfork,
> + struct xfs_bmbt_irec *LEFT,
> + struct xfs_bmbt_irec *RIGHT,
> + struct xfs_bmbt_irec *PREV)
> +{
> + int error;
> +
> + if (!rcur)
> + return 0;
> +
> + trace_xfs_rmap_combine(rcur->bc_mp, rcur->bc_private.a.agno, ino,
> + whichfork, LEFT, PREV, RIGHT);
> +
> + /* Delete right rmap */
> + error = xfs_rmapbt_delete(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp, RIGHT->br_startblock),
> + b2r_len(RIGHT), ino,
> + b2r_off(whichfork, RIGHT->br_startoff));
> + if (error)
> + goto done;
> +
> + /* Delete prev rmap */
> + if (!isnullstartblock(PREV->br_startblock)) {
> + error = xfs_rmapbt_delete(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp,
> + PREV->br_startblock),
> + b2r_len(PREV), ino,
> + b2r_off(whichfork, PREV->br_startoff));
> + if (error)
> + goto done;
> + }
> +
> + /* Enlarge left rmap */
> + return xfs_rmap_resize(rcur, ino, whichfork, LEFT,
> + PREV->br_blockcount + RIGHT->br_blockcount);
> +done:
> + return error;
> +}
> +
> +/* Extend a left rmap extent */
> +int
> +xfs_rmap_lcombine(
> + struct xfs_btree_cur *rcur,
> + xfs_ino_t ino,
> + int whichfork,
> + struct xfs_bmbt_irec *LEFT,
> + struct xfs_bmbt_irec *PREV)
> +{
> + int error;
> +
> + if (!rcur)
> + return 0;
> +
> + trace_xfs_rmap_lcombine(rcur->bc_mp, rcur->bc_private.a.agno, ino,
> + whichfork, LEFT, PREV);
> +
> + /* Delete prev rmap */
> + if (!isnullstartblock(PREV->br_startblock)) {
> + error = xfs_rmapbt_delete(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp,
> + PREV->br_startblock),
> + b2r_len(PREV), ino,
> + b2r_off(whichfork, PREV->br_startoff));
> + if (error)
> + goto done;
> + }
> +
> + /* Enlarge left rmap */
> + return xfs_rmap_resize(rcur, ino, whichfork, LEFT, PREV->br_blockcount);
> +done:
> + return error;
> +}
> +
> +/* Extend a right rmap extent */
> +int
> +xfs_rmap_rcombine(
> + struct xfs_btree_cur *rcur,
> + xfs_ino_t ino,
> + int whichfork,
> + struct xfs_bmbt_irec *RIGHT,
> + struct xfs_bmbt_irec *PREV,
> + struct xfs_bmbt_irec *new)
> +{
> + int error;
> +
> + if (!rcur)
> + return 0;
> + ASSERT(PREV->br_startoff == new->br_startoff);
> +
> + trace_xfs_rmap_rcombine(rcur->bc_mp, rcur->bc_private.a.agno, ino,
> + whichfork, RIGHT, PREV);
> +
> + /* Delete prev rmap */
> + if (!isnullstartblock(PREV->br_startblock)) {
> + error = xfs_rmapbt_delete(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp,
> + PREV->br_startblock),
> + b2r_len(PREV), ino,
> + b2r_off(whichfork, PREV->br_startoff));
> + if (error)
> + goto done;
> + }
> +
> + /* Enlarge right rmap */
> + return xfs_rmap_resize(rcur, ino, whichfork, RIGHT,
> + -PREV->br_blockcount);
The starting point of RIGHT should be moved leftward to the start of
PREV, so this is really an xfs_rmap_move(..., -PREV->br_blockcount);
--D
> +done:
> + return error;
> +}
> +
> +/* Insert a rmap extent */
> +int
> +xfs_rmap_insert(
> + struct xfs_btree_cur *rcur,
> + xfs_ino_t ino,
> + int whichfork,
> + struct xfs_bmbt_irec *new)
> +{
> + if (!rcur)
> + return 0;
> +
> + trace_xfs_rmap_insert(rcur->bc_mp, rcur->bc_private.a.agno, ino,
> + whichfork, new);
> +
> + return xfs_rmapbt_insert(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp, new->br_startblock),
> + b2r_len(new), ino,
> + b2r_off(whichfork, new->br_startoff));
> +}
> +
> +/* Delete a rmap extent */
> +int
> +xfs_rmap_delete(
> + struct xfs_btree_cur *rcur,
> + xfs_ino_t ino,
> + int whichfork,
> + struct xfs_bmbt_irec *new)
> +{
> + if (!rcur)
> + return 0;
> +
> + trace_xfs_rmap_delete(rcur->bc_mp, rcur->bc_private.a.agno, ino,
> + whichfork, new);
> +
> + return xfs_rmapbt_delete(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp, new->br_startblock),
> + b2r_len(new), ino,
> + b2r_off(whichfork, new->br_startoff));
> +}
> +
> +/* Change the start of an rmap */
> +int
> +xfs_rmap_move(
> + struct xfs_btree_cur *rcur,
> + xfs_ino_t ino,
> + int whichfork,
> + struct xfs_bmbt_irec *PREV,
> + long start_adj)
> +{
> + int error;
> + struct xfs_bmbt_irec irec;
> +
> + if (!rcur)
> + return 0;
> +
> + trace_xfs_rmap_move(rcur->bc_mp, rcur->bc_private.a.agno, ino,
> + whichfork, PREV, start_adj);
> +
> + /* Delete prev rmap */
> + error = xfs_rmapbt_delete(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
> + b2r_len(PREV), ino,
> + b2r_off(whichfork, PREV->br_startoff));
> + if (error)
> + goto done;
> +
> + /* Re-add rmap with new start */
> + irec = *PREV;
> + irec.br_startblock += start_adj;
> + irec.br_startoff += start_adj;
> + irec.br_blockcount -= start_adj;
> + return xfs_rmapbt_insert(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp, irec.br_startblock),
> + b2r_len(&irec), ino,
> + b2r_off(whichfork, irec.br_startoff));
> +done:
> + return error;
> +}
> +
> +/* Change the logical offset of an rmap */
> +int
> +xfs_rmap_slide(
> + struct xfs_btree_cur *rcur,
> + xfs_ino_t ino,
> + int whichfork,
> + struct xfs_bmbt_irec *PREV,
> + long start_adj)
> +{
> + int error;
> +
> + if (!rcur)
> + return 0;
> +
> + trace_xfs_rmap_slide(rcur->bc_mp, rcur->bc_private.a.agno, ino,
> + whichfork, PREV, start_adj);
> +
> + /* Delete prev rmap */
> + error = xfs_rmapbt_delete(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
> + b2r_len(PREV), ino,
> + b2r_off(whichfork, PREV->br_startoff));
> + if (error)
> + goto done;
> +
> + /* Re-add rmap with new logical offset */
> + return xfs_rmapbt_insert(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
> + b2r_len(PREV), ino,
> + b2r_off(whichfork, PREV->br_startoff + start_adj));
> +done:
> + return error;
> +}
> +
> +/* Change the size of an rmap */
> +int
> +xfs_rmap_resize(
> + struct xfs_btree_cur *rcur,
> + xfs_ino_t ino,
> + int whichfork,
> + struct xfs_bmbt_irec *PREV,
> + long size_adj)
> +{
> + int i;
> + int error;
> + struct xfs_bmbt_irec irec;
> + struct xfs_rmap_irec rrec;
> +
> + if (!rcur)
> + return 0;
> +
> + trace_xfs_rmap_resize(rcur->bc_mp, rcur->bc_private.a.agno, ino,
> + whichfork, PREV, size_adj);
> +
> + error = xfs_rmap_lookup_eq(rcur,
> + XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
> + b2r_len(PREV), ino,
> + b2r_off(whichfork, PREV->br_startoff), &i);
> + if (error)
> + goto done;
> + XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
> + error = xfs_rmap_get_rec(rcur, &rrec, &i);
> + if (error)
> + goto done;
> + XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
> + irec = *PREV;
> + irec.br_blockcount += size_adj;
> + rrec.rm_blockcount = b2r_len(&irec);
> + error = xfs_rmap_update(rcur, &rrec);
> + if (error)
> + goto done;
> +done:
> + return error;
> +}
> diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
> index d7c9722..0131d9a 100644
> --- a/fs/xfs/libxfs/xfs_rmap_btree.h
> +++ b/fs/xfs/libxfs/xfs_rmap_btree.h
> @@ -68,4 +68,24 @@ int xfs_rmap_free(struct xfs_trans *tp, struct xfs_buf *agbp,
> xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
> struct xfs_owner_info *oinfo);
>
> +/* functions for updating the rmapbt based on bmbt map/unmap operations */
> +int xfs_rmap_combine(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
> + struct xfs_bmbt_irec *LEFT, struct xfs_bmbt_irec *RIGHT,
> + struct xfs_bmbt_irec *PREV);
> +int xfs_rmap_lcombine(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
> + struct xfs_bmbt_irec *LEFT, struct xfs_bmbt_irec *PREV);
> +int xfs_rmap_rcombine(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
> + struct xfs_bmbt_irec *RIGHT, struct xfs_bmbt_irec *PREV,
> + struct xfs_bmbt_irec *new);
> +int xfs_rmap_insert(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
> + struct xfs_bmbt_irec *new);
> +int xfs_rmap_delete(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
> + struct xfs_bmbt_irec *new);
> +int xfs_rmap_move(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
> + struct xfs_bmbt_irec *PREV, long start_adj);
> +int xfs_rmap_slide(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
> + struct xfs_bmbt_irec *PREV, long start_adj);
> +int xfs_rmap_resize(struct xfs_btree_cur *rcur, xfs_ino_t ino, int whichfork,
> + struct xfs_bmbt_irec *PREV, long size_adj);
> +
> #endif /* __XFS_RMAP_BTREE_H__ */
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 40/58] libxfs: adjust refcount of an extent of blocks in refcount btree
2015-10-07 4:59 ` [PATCH 40/58] libxfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
@ 2015-10-27 19:05 ` Darrick J. Wong
2015-10-30 20:56 ` Darrick J. Wong
0 siblings, 1 reply; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-27 19:05 UTC (permalink / raw)
To: david; +Cc: linux-fsdevel, xfs
On Tue, Oct 06, 2015 at 09:59:33PM -0700, Darrick J. Wong wrote:
> Provide functions to adjust the reference counts for an extent of
> physical blocks stored in the refcount btree.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> fs/xfs/libxfs/xfs_refcount.c | 771 ++++++++++++++++++++++++++++++++++++++++++
> fs/xfs/libxfs/xfs_refcount.h | 8
> 2 files changed, 779 insertions(+)
>
>
> diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> index b3f2c25..02892bc 100644
> --- a/fs/xfs/libxfs/xfs_refcount.c
> +++ b/fs/xfs/libxfs/xfs_refcount.c
> @@ -167,3 +167,774 @@ xfs_refcountbt_delete(
> out_error:
> return error;
> }
> +
> +/*
> + * Adjusting the Reference Count
> + *
> + * As stated elsewhere, the reference count btree (refcbt) stores
> + * >1 reference counts for extents of physical blocks. In this
> + * operation, we're either raising or lowering the reference count of
> + * some subrange stored in the tree:
> + *
> + * <------ adjustment range ------>
> + * ----+ +---+-----+ +--+--------+---------
> + * 2 | | 3 | 4 | |17| 55 | 10
> + * ----+ +---+-----+ +--+--------+---------
> + * X axis is physical blocks number;
> + * reference counts are the numbers inside the rectangles
> + *
> + * The first thing we need to do is to ensure that there are no
> + * refcount extents crossing either boundary of the range to be
> + * adjusted. For any extent that does cross a boundary, split it into
> + * two extents so that we can increment the refcount of one of the
> + * pieces later:
> + *
> + * <------ adjustment range ------>
> + * ----+ +---+-----+ +--+--------+----+----
> + * 2 | | 3 | 2 | |17| 55 | 10 | 10
> + * ----+ +---+-----+ +--+--------+----+----
> + *
> + * For this next step, let's assume that all the physical blocks in
> + * the adjustment range are mapped to a file and are therefore in use
> + * at least once. Therefore, we can infer that any gap in the
> + * refcount tree within the adjustment range represents a physical
> + * extent with refcount == 1:
> + *
> + * <------ adjustment range ------>
> + * ----+---+---+-----+-+--+--------+----+----
> + * 2 |"1"| 3 | 2 |1|17| 55 | 10 | 10
> + * ----+---+---+-----+-+--+--------+----+----
> + * ^
> + *
> + * For each extent that falls within the interval range, figure out
> + * which extent is to the left or the right of that extent. Now we
> + * have a left, current, and right extent. If the new reference count
> + * of the center extent enables us to merge left, center, and right
> + * into one record covering all three, do so. If the center extent is
> + * at the left end of the range, abuts the left extent, and its new
> + * reference count matches the left extent's record, then merge them.
> + * If the center extent is at the right end of the range, abuts the
> + * right extent, and the reference counts match, merge those. In the
> + * example, we can left merge (assuming an increment operation):
> + *
> + * <------ adjustment range ------>
> + * --------+---+-----+-+--+--------+----+----
> + * 2 | 3 | 2 |1|17| 55 | 10 | 10
> + * --------+---+-----+-+--+--------+----+----
> + * ^
> + *
> + * For all other extents within the range, adjust the reference count
> + * or delete it if the refcount falls below 2. If we were
> + * incrementing, the end result looks like this:
> + *
> + * <------ adjustment range ------>
> + * --------+---+-----+-+--+--------+----+----
> + * 2 | 4 | 3 |2|18| 56 | 11 | 10
> + * --------+---+-----+-+--+--------+----+----
> + *
> + * The result of a decrement operation looks as such:
> + *
> + * <------ adjustment range ------>
> + * ----+ +---+ +--+--------+----+----
> + * 2 | | 2 | |16| 54 | 9 | 10
> + * ----+ +---+ +--+--------+----+----
> + * DDDD 111111DD
> + *
> + * The blocks marked "D" are freed; the blocks marked "1" are only
> + * referenced once and therefore the record is removed from the
> + * refcount btree.
> + */
> +
> +#define RLNEXT(rl) ((rl).rc_startblock + (rl).rc_blockcount)
> +/*
> + * Split a left rlextent that crosses agbno.
> + */
> +STATIC int
> +try_split_left_rlextent(
> + struct xfs_btree_cur *cur,
> + xfs_agblock_t agbno)
> +{
> + struct xfs_refcount_irec left, tmp;
> + int found_rec;
> + int error;
> +
> + error = xfs_refcountbt_lookup_le(cur, agbno, &found_rec);
> + if (error)
> + goto out_error;
> + if (!found_rec)
> + return 0;
> +
> + error = xfs_refcountbt_get_rec(cur, &left, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> + if (left.rc_startblock >= agbno || RLNEXT(left) <= agbno)
> + return 0;
> +
> + trace_xfs_refcount_split_left_extent(cur->bc_mp, cur->bc_private.a.agno,
> + &left, agbno);
> + tmp = left;
> + tmp.rc_blockcount = agbno - left.rc_startblock;
> + error = xfs_refcountbt_update(cur, &tmp);
> + if (error)
> + goto out_error;
> +
> + error = xfs_btree_increment(cur, 0, &found_rec);
> + if (error)
> + goto out_error;
> +
> + tmp = left;
> + tmp.rc_startblock = agbno;
> + tmp.rc_blockcount -= (agbno - left.rc_startblock);
> + error = xfs_refcountbt_insert(cur, &tmp, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> + return error;
> +
> +out_error:
> + trace_xfs_refcount_split_left_extent_error(cur->bc_mp,
> + cur->bc_private.a.agno, error, _RET_IP_);
> + return error;
> +}
> +
> +/*
> + * Split a right rlextent that crosses agbno.
> + */
> +STATIC int
> +try_split_right_rlextent(
> + struct xfs_btree_cur *cur,
> + xfs_agblock_t agbnext)
> +{
> + struct xfs_refcount_irec right, tmp;
> + int found_rec;
> + int error;
> +
> + error = xfs_refcountbt_lookup_le(cur, agbnext - 1, &found_rec);
> + if (error)
> + goto out_error;
> + if (!found_rec)
> + return 0;
> +
> + error = xfs_refcountbt_get_rec(cur, &right, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> + if (RLNEXT(right) <= agbnext)
> + return 0;
> +
> + trace_xfs_refcount_split_right_extent(cur->bc_mp,
> + cur->bc_private.a.agno, &right, agbnext);
> + tmp = right;
> + tmp.rc_startblock = agbnext;
> + tmp.rc_blockcount -= (agbnext - right.rc_startblock);
> + error = xfs_refcountbt_update(cur, &tmp);
> + if (error)
> + goto out_error;
> +
> + tmp = right;
> + tmp.rc_blockcount = agbnext - right.rc_startblock;
> + error = xfs_refcountbt_insert(cur, &tmp, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> + return error;
> +
> +out_error:
> + trace_xfs_refcount_split_right_extent_error(cur->bc_mp,
> + cur->bc_private.a.agno, error, _RET_IP_);
> + return error;
> +}
> +
> +/*
> + * Merge the left, center, and right extents.
> + */
> +STATIC int
> +merge_center(
> + struct xfs_btree_cur *cur,
> + struct xfs_refcount_irec *left,
> + struct xfs_refcount_irec *center,
> + unsigned long long extlen,
> + xfs_agblock_t *agbno,
> + xfs_extlen_t *aglen)
> +{
> + int error;
> + int found_rec;
> +
> + error = xfs_refcountbt_lookup_ge(cur, center->rc_startblock,
> + &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> +
> + error = xfs_refcountbt_delete(cur, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> +
> + if (center->rc_refcount > 1) {
> + error = xfs_refcountbt_delete(cur, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> + out_error);
> + }
> +
> + error = xfs_refcountbt_lookup_le(cur, left->rc_startblock,
> + &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> +
> + left->rc_blockcount = extlen;
> + error = xfs_refcountbt_update(cur, left);
> + if (error)
> + goto out_error;
> +
> + *aglen = 0;
> + return error;
> +
> +out_error:
> + trace_xfs_refcount_merge_center_extents_error(cur->bc_mp,
> + cur->bc_private.a.agno, error, _RET_IP_);
> + return error;
> +}
> +
> +/*
> + * Merge with the left extent.
> + */
> +STATIC int
> +merge_left(
> + struct xfs_btree_cur *cur,
> + struct xfs_refcount_irec *left,
> + struct xfs_refcount_irec *cleft,
> + xfs_agblock_t *agbno,
> + xfs_extlen_t *aglen)
> +{
> + int error;
> + int found_rec;
> +
> + if (cleft->rc_refcount > 1) {
> + error = xfs_refcountbt_lookup_le(cur, cleft->rc_startblock,
> + &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> + out_error);
> +
> + error = xfs_refcountbt_delete(cur, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> + out_error);
> + }
> +
> + error = xfs_refcountbt_lookup_le(cur, left->rc_startblock,
> + &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> +
> + left->rc_blockcount += cleft->rc_blockcount;
> + error = xfs_refcountbt_update(cur, left);
> + if (error)
> + goto out_error;
> +
> + *agbno += cleft->rc_blockcount;
> + *aglen -= cleft->rc_blockcount;
> + return error;
> +
> +out_error:
> + trace_xfs_refcount_merge_left_extent_error(cur->bc_mp,
> + cur->bc_private.a.agno, error, _RET_IP_);
> + return error;
> +}
> +
> +/*
> + * Merge with the right extent.
> + */
> +STATIC int
> +merge_right(
> + struct xfs_btree_cur *cur,
> + struct xfs_refcount_irec *right,
> + struct xfs_refcount_irec *cright,
> + xfs_agblock_t *agbno,
> + xfs_extlen_t *aglen)
> +{
> + int error;
> + int found_rec;
> +
> + if (cright->rc_refcount > 1) {
> + error = xfs_refcountbt_lookup_le(cur, cright->rc_startblock,
> + &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> + out_error);
> +
> + error = xfs_refcountbt_delete(cur, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> + out_error);
> + }
> +
> + error = xfs_refcountbt_lookup_le(cur, right->rc_startblock,
> + &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> +
> + right->rc_startblock -= cright->rc_blockcount;
> + right->rc_blockcount += cright->rc_blockcount;
> + error = xfs_refcountbt_update(cur, right);
> + if (error)
> + goto out_error;
> +
> + *aglen -= cright->rc_blockcount;
> + return error;
> +
> +out_error:
> + trace_xfs_refcount_merge_right_extent_error(cur->bc_mp,
> + cur->bc_private.a.agno, error, _RET_IP_);
> + return error;
> +}
> +
> +/*
> + * Find the left extent and the one after it (cleft). This function assumes
> + * that we've already split any extent crossing agbno.
> + */
> +STATIC int
> +find_left_extent(
> + struct xfs_btree_cur *cur,
> + struct xfs_refcount_irec *left,
> + struct xfs_refcount_irec *cleft,
> + xfs_agblock_t agbno,
> + xfs_extlen_t aglen)
> +{
> + struct xfs_refcount_irec tmp;
> + int error;
> + int found_rec;
> +
> + left->rc_blockcount = cleft->rc_blockcount = 0;
> + error = xfs_refcountbt_lookup_le(cur, agbno - 1, &found_rec);
> + if (error)
> + goto out_error;
> + if (!found_rec)
> + return 0;
> +
> + error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> +
> + if (RLNEXT(tmp) != agbno)
> + return 0;
> + /* We have a left extent; retrieve (or invent) the next right one */
> + *left = tmp;
> +
> + error = xfs_btree_increment(cur, 0, &found_rec);
> + if (error)
> + goto out_error;
> + if (found_rec) {
> + error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> + out_error);
> +
> + if (tmp.rc_startblock == agbno)
> + *cleft = tmp;
> + else {
> + cleft->rc_startblock = agbno;
> + cleft->rc_blockcount = min(aglen,
> + tmp.rc_startblock - agbno);
> + cleft->rc_refcount = 1;
> + }
> + } else {
> + cleft->rc_startblock = agbno;
> + cleft->rc_blockcount = aglen;
> + cleft->rc_refcount = 1;
> + }
> + trace_xfs_refcount_find_left_extent(cur->bc_mp, cur->bc_private.a.agno,
> + left, cleft, agbno);
> + return error;
> +
> +out_error:
> + trace_xfs_refcount_find_left_extent_error(cur->bc_mp,
> + cur->bc_private.a.agno, error, _RET_IP_);
> + return error;
> +}
> +
> +/*
> + * Find the right extent and the one before it (cright). This function
> + * assumes that we've already split any extents crossing agbno + aglen.
> + */
> +STATIC int
> +find_right_extent(
> + struct xfs_btree_cur *cur,
> + struct xfs_refcount_irec *right,
> + struct xfs_refcount_irec *cright,
> + xfs_agblock_t agbno,
> + xfs_extlen_t aglen)
> +{
> + struct xfs_refcount_irec tmp;
> + int error;
> + int found_rec;
> +
> + right->rc_blockcount = cright->rc_blockcount = 0;
> + error = xfs_refcountbt_lookup_ge(cur, agbno + aglen, &found_rec);
> + if (error)
> + goto out_error;
> + if (!found_rec)
> + return 0;
> +
> + error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> +
> + if (tmp.rc_startblock != agbno + aglen)
> + return 0;
> + /* We have a right extent; retrieve (or invent) the next left one */
> + *right = tmp;
> +
> + error = xfs_btree_decrement(cur, 0, &found_rec);
> + if (error)
> + goto out_error;
> + if (found_rec) {
> + error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> + out_error);
> +
> + if (tmp.rc_startblock == agbno)
Since tmp represents the refcount extent immediately to the left of (agbno +
aglen), we want to copy tmp into cright if the end of tmp coincides with (agbno
+ aglen). This is probably a copy-pasta mistake.
Also, s/RLNEXT/RCNEXT/ since these are refcount extents, not reflink extents.
(Weird anachronism).
--D
> + *cright = tmp;
> + else {
> + cright->rc_startblock = max(agbno,
> + RLNEXT(tmp));
> + cright->rc_blockcount = right->rc_startblock -
> + cright->rc_startblock;
> + cright->rc_refcount = 1;
> + }
> + } else {
> + cright->rc_startblock = agbno;
> + cright->rc_blockcount = aglen;
> + cright->rc_refcount = 1;
> + }
> + trace_xfs_refcount_find_right_extent(cur->bc_mp, cur->bc_private.a.agno,
> + cright, right, agbno + aglen);
> + return error;
> +
> +out_error:
> + trace_xfs_refcount_find_right_extent_error(cur->bc_mp,
> + cur->bc_private.a.agno, error, _RET_IP_);
> + return error;
> +}
> +#undef RLNEXT
> +
> +/*
> + * Try to merge with any extents on the boundaries of the adjustment range.
> + */
> +STATIC int
> +try_merge_rlextents(
> + struct xfs_btree_cur *cur,
> + xfs_agblock_t *agbno,
> + xfs_extlen_t *aglen,
> + int adjust)
> +{
> + struct xfs_refcount_irec left, cleft, cright, right;
> + int error;
> + unsigned long long ulen;
> +
> + left.rc_blockcount = cleft.rc_blockcount = 0;
> + cright.rc_blockcount = right.rc_blockcount = 0;
> +
> + /*
> + * Find extents abutting the start and end of the range, and
> + * the adjacent extents inside the range.
> + */
> + error = find_left_extent(cur, &left, &cleft, *agbno, *aglen);
> + if (error)
> + return error;
> + error = find_right_extent(cur, &right, &cright, *agbno, *aglen);
> + if (error)
> + return error;
> +
> + /* No left or right extent to merge; exit. */
> + if (left.rc_blockcount == 0 && right.rc_blockcount == 0)
> + return 0;
> +
> + /* Try a center merge */
> + ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount +
> + right.rc_blockcount;
> + if (left.rc_blockcount != 0 && right.rc_blockcount != 0 &&
> + memcmp(&cleft, &cright, sizeof(cleft)) == 0 &&
> + left.rc_refcount == cleft.rc_refcount + adjust &&
> + right.rc_refcount == cleft.rc_refcount + adjust &&
> + ulen < MAXREFCEXTLEN) {
> + trace_xfs_refcount_merge_center_extents(cur->bc_mp,
> + cur->bc_private.a.agno, &left, &cleft, &right);
> + return merge_center(cur, &left, &cleft, ulen, agbno, aglen);
> + }
> +
> + /* Try a left merge */
> + ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount;
> + if (left.rc_blockcount != 0 &&
> + left.rc_refcount == cleft.rc_refcount + adjust &&
> + ulen < MAXREFCEXTLEN) {
> + trace_xfs_refcount_merge_left_extent(cur->bc_mp,
> + cur->bc_private.a.agno, &left, &cleft);
> + return merge_left(cur, &left, &cleft, agbno, aglen);
> + }
> +
> + /* Try a right merge */
> + ulen = (unsigned long long)right.rc_blockcount + cright.rc_blockcount;
> + if (right.rc_blockcount != 0 &&
> + right.rc_refcount == cright.rc_refcount + adjust &&
> + ulen < MAXREFCEXTLEN) {
> + trace_xfs_refcount_merge_right_extent(cur->bc_mp,
> + cur->bc_private.a.agno, &cright, &right);
> + return merge_right(cur, &right, &cright, agbno, aglen);
> + }
> +
> + return error;
> +}
> +
> +/*
> + * Adjust the refcounts of middle extents. At this point we should have
> + * split extents that crossed the adjustment range; merged with adjacent
> + * extents; and updated agbno/aglen to reflect the merges. Therefore,
> + * all we have to do is update the extents inside [agbno, agbno + aglen].
> + */
> +STATIC int
> +adjust_rlextents(
> + struct xfs_btree_cur *cur,
> + xfs_agblock_t agbno,
> + xfs_extlen_t aglen,
> + int adj,
> + struct xfs_bmap_free *flist,
> + struct xfs_owner_info *oinfo)
> +{
> + struct xfs_refcount_irec ext, tmp;
> + int error;
> + int found_rec, found_tmp;
> + xfs_fsblock_t fsbno;
> +
> + error = xfs_refcountbt_lookup_ge(cur, agbno, &found_rec);
> + if (error)
> + goto out_error;
> +
> + while (aglen > 0) {
> + error = xfs_refcountbt_get_rec(cur, &ext, &found_rec);
> + if (error)
> + goto out_error;
> + if (!found_rec) {
> + ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
> + ext.rc_blockcount = 0;
> + ext.rc_refcount = 0;
> + }
> +
> + /*
> + * Deal with a hole in the refcount tree; if a file maps to
> + * these blocks and there's no refcountbt recourd, pretend that
> + * there is one with refcount == 1.
> + */
> + if (ext.rc_startblock != agbno) {
> + tmp.rc_startblock = agbno;
> + tmp.rc_blockcount = min(aglen,
> + ext.rc_startblock - agbno);
> + tmp.rc_refcount = 1 + adj;
> + trace_xfs_refcount_modify_extent(cur->bc_mp,
> + cur->bc_private.a.agno, &tmp);
> +
> + /*
> + * Either cover the hole (increment) or
> + * delete the range (decrement).
> + */
> + if (tmp.rc_refcount) {
> + error = xfs_refcountbt_insert(cur, &tmp,
> + &found_tmp);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
> + found_tmp == 1, out_error);
> + } else {
> + fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
> + cur->bc_private.a.agno,
> + tmp.rc_startblock);
> + xfs_bmap_add_free(cur->bc_mp, flist, fsbno,
> + tmp.rc_blockcount, oinfo);
> + }
> +
> + agbno += tmp.rc_blockcount;
> + aglen -= tmp.rc_blockcount;
> +
> + error = xfs_refcountbt_lookup_ge(cur, agbno,
> + &found_rec);
> + if (error)
> + goto out_error;
> + }
> +
> + /* Stop if there's nothing left to modify */
> + if (aglen == 0)
> + break;
> +
> + /*
> + * Adjust the reference count and either update the tree
> + * (incr) or free the blocks (decr).
> + */
> + ext.rc_refcount += adj;
> + trace_xfs_refcount_modify_extent(cur->bc_mp,
> + cur->bc_private.a.agno, &ext);
> + if (ext.rc_refcount > 1) {
> + error = xfs_refcountbt_update(cur, &ext);
> + if (error)
> + goto out_error;
> + } else if (ext.rc_refcount == 1) {
> + error = xfs_refcountbt_delete(cur, &found_rec);
> + if (error)
> + goto out_error;
> + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
> + found_rec == 1, out_error);
> + goto advloop;
> + } else {
> + fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
> + cur->bc_private.a.agno,
> + ext.rc_startblock);
> + xfs_bmap_add_free(cur->bc_mp, flist, fsbno,
> + ext.rc_blockcount, oinfo);
> + }
> +
> + error = xfs_btree_increment(cur, 0, &found_rec);
> + if (error)
> + goto out_error;
> +
> +advloop:
> + agbno += ext.rc_blockcount;
> + aglen -= ext.rc_blockcount;
> + }
> +
> + return error;
> +out_error:
> + trace_xfs_refcount_modify_extent_error(cur->bc_mp,
> + cur->bc_private.a.agno, error, _RET_IP_);
> + return error;
> +}
> +
> +/*
> + * Adjust the reference count of a range of AG blocks.
> + *
> + * @mp: XFS mount object
> + * @tp: XFS transaction object
> + * @agbp: Buffer containing the AGF
> + * @agno: AG number
> + * @agbno: Start of range to adjust
> + * @aglen: Length of range to adjust
> + * @adj: +1 to increment, -1 to decrement reference count
> + * @flist: freelist (only required if adj == -1)
> + * @owner: owner of the blocks (only required if adj == -1)
> + */
> +STATIC int
> +xfs_refcountbt_adjust_refcount(
> + struct xfs_mount *mp,
> + struct xfs_trans *tp,
> + struct xfs_buf *agbp,
> + xfs_agnumber_t agno,
> + xfs_agblock_t agbno,
> + xfs_extlen_t aglen,
> + int adj,
> + struct xfs_bmap_free *flist,
> + struct xfs_owner_info *oinfo)
> +{
> + struct xfs_btree_cur *cur;
> + int error;
> +
> + cur = xfs_refcountbt_init_cursor(mp, tp, agbp, agno, flist);
> +
> + /*
> + * Ensure that no rlextents cross the boundary of the adjustment range.
> + */
> + error = try_split_left_rlextent(cur, agbno);
> + if (error)
> + goto out_error;
> +
> + error = try_split_right_rlextent(cur, agbno + aglen);
> + if (error)
> + goto out_error;
> +
> + /*
> + * Try to merge with the left or right extents of the range.
> + */
> + error = try_merge_rlextents(cur, &agbno, &aglen, adj);
> + if (error)
> + goto out_error;
> +
> + /* Now that we've taken care of the ends, adjust the middle extents */
> + error = adjust_rlextents(cur, agbno, aglen, adj, flist, oinfo);
> + if (error)
> + goto out_error;
> +
> + xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> + return 0;
> +
> +out_error:
> + trace_xfs_refcount_adjust_error(mp, agno, error, _RET_IP_);
> + xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> + return error;
> +}
> +
> +/**
> + * Increase the reference count of a range of AG blocks.
> + *
> + * @mp: XFS mount object
> + * @tp: XFS transaction object
> + * @agbp: Buffer containing the AGF
> + * @agno: AG number
> + * @agbno: Start of range to adjust
> + * @aglen: Length of range to adjust
> + * @flist: List of blocks to free
> + */
> +int
> +xfs_refcount_increase(
> + struct xfs_mount *mp,
> + struct xfs_trans *tp,
> + struct xfs_buf *agbp,
> + xfs_agnumber_t agno,
> + xfs_agblock_t agbno,
> + xfs_extlen_t aglen,
> + struct xfs_bmap_free *flist)
> +{
> + trace_xfs_refcount_increase(mp, agno, agbno, aglen);
> + return xfs_refcountbt_adjust_refcount(mp, tp, agbp, agno, agbno,
> + aglen, 1, flist, NULL);
> +}
> +
> +/**
> + * Decrease the reference count of a range of AG blocks.
> + *
> + * @mp: XFS mount object
> + * @tp: XFS transaction object
> + * @agbp: Buffer containing the AGF
> + * @agno: AG number
> + * @agbno: Start of range to adjust
> + * @aglen: Length of range to adjust
> + * @flist: List of blocks to free
> + * @owner: Extent owner
> + */
> +int
> +xfs_refcount_decrease(
> + struct xfs_mount *mp,
> + struct xfs_trans *tp,
> + struct xfs_buf *agbp,
> + xfs_agnumber_t agno,
> + xfs_agblock_t agbno,
> + xfs_extlen_t aglen,
> + struct xfs_bmap_free *flist,
> + struct xfs_owner_info *oinfo)
> +{
> + trace_xfs_refcount_decrease(mp, agno, agbno, aglen);
> + return xfs_refcountbt_adjust_refcount(mp, tp, agbp, agno, agbno,
> + aglen, -1, flist, oinfo);
> +}
> diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
> index 033a9b1..6640e3d 100644
> --- a/fs/xfs/libxfs/xfs_refcount.h
> +++ b/fs/xfs/libxfs/xfs_refcount.h
> @@ -26,4 +26,12 @@ extern int xfs_refcountbt_lookup_ge(struct xfs_btree_cur *cur,
> extern int xfs_refcountbt_get_rec(struct xfs_btree_cur *cur,
> struct xfs_refcount_irec *irec, int *stat);
>
> +extern int xfs_refcount_increase(struct xfs_mount *mp, struct xfs_trans *tp,
> + struct xfs_buf *agbp, xfs_agnumber_t agno, xfs_agblock_t agbno,
> + xfs_extlen_t aglen, struct xfs_bmap_free *flist);
> +extern int xfs_refcount_decrease(struct xfs_mount *mp, struct xfs_trans *tp,
> + struct xfs_buf *agbp, xfs_agnumber_t agno, xfs_agblock_t agbno,
> + xfs_extlen_t aglen, struct xfs_bmap_free *flist,
> + struct xfs_owner_info *oinfo);
> +
> #endif /* __XFS_REFCOUNT_H__ */
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 40/58] libxfs: adjust refcount of an extent of blocks in refcount btree
2015-10-27 19:05 ` Darrick J. Wong
@ 2015-10-30 20:56 ` Darrick J. Wong
0 siblings, 0 replies; 67+ messages in thread
From: Darrick J. Wong @ 2015-10-30 20:56 UTC (permalink / raw)
To: david; +Cc: linux-fsdevel, xfs
On Tue, Oct 27, 2015 at 12:05:33PM -0700, Darrick J. Wong wrote:
> On Tue, Oct 06, 2015 at 09:59:33PM -0700, Darrick J. Wong wrote:
> > Provide functions to adjust the reference counts for an extent of
> > physical blocks stored in the refcount btree.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > fs/xfs/libxfs/xfs_refcount.c | 771 ++++++++++++++++++++++++++++++++++++++++++
> > fs/xfs/libxfs/xfs_refcount.h | 8
> > 2 files changed, 779 insertions(+)
> >
> >
> > diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> > index b3f2c25..02892bc 100644
> > --- a/fs/xfs/libxfs/xfs_refcount.c
> > +++ b/fs/xfs/libxfs/xfs_refcount.c
> > @@ -167,3 +167,774 @@ xfs_refcountbt_delete(
> > out_error:
> > return error;
> > }
> > +
> > +/*
> > + * Adjusting the Reference Count
> > + *
> > + * As stated elsewhere, the reference count btree (refcbt) stores
> > + * >1 reference counts for extents of physical blocks. In this
> > + * operation, we're either raising or lowering the reference count of
> > + * some subrange stored in the tree:
> > + *
> > + * <------ adjustment range ------>
> > + * ----+ +---+-----+ +--+--------+---------
> > + * 2 | | 3 | 4 | |17| 55 | 10
> > + * ----+ +---+-----+ +--+--------+---------
> > + * X axis is physical blocks number;
> > + * reference counts are the numbers inside the rectangles
> > + *
> > + * The first thing we need to do is to ensure that there are no
> > + * refcount extents crossing either boundary of the range to be
> > + * adjusted. For any extent that does cross a boundary, split it into
> > + * two extents so that we can increment the refcount of one of the
> > + * pieces later:
> > + *
> > + * <------ adjustment range ------>
> > + * ----+ +---+-----+ +--+--------+----+----
> > + * 2 | | 3 | 2 | |17| 55 | 10 | 10
> > + * ----+ +---+-----+ +--+--------+----+----
> > + *
> > + * For this next step, let's assume that all the physical blocks in
> > + * the adjustment range are mapped to a file and are therefore in use
> > + * at least once. Therefore, we can infer that any gap in the
> > + * refcount tree within the adjustment range represents a physical
> > + * extent with refcount == 1:
> > + *
> > + * <------ adjustment range ------>
> > + * ----+---+---+-----+-+--+--------+----+----
> > + * 2 |"1"| 3 | 2 |1|17| 55 | 10 | 10
> > + * ----+---+---+-----+-+--+--------+----+----
> > + * ^
> > + *
> > + * For each extent that falls within the interval range, figure out
> > + * which extent is to the left or the right of that extent. Now we
> > + * have a left, current, and right extent. If the new reference count
> > + * of the center extent enables us to merge left, center, and right
> > + * into one record covering all three, do so. If the center extent is
> > + * at the left end of the range, abuts the left extent, and its new
> > + * reference count matches the left extent's record, then merge them.
> > + * If the center extent is at the right end of the range, abuts the
> > + * right extent, and the reference counts match, merge those. In the
> > + * example, we can left merge (assuming an increment operation):
> > + *
> > + * <------ adjustment range ------>
> > + * --------+---+-----+-+--+--------+----+----
> > + * 2 | 3 | 2 |1|17| 55 | 10 | 10
> > + * --------+---+-----+-+--+--------+----+----
> > + * ^
> > + *
> > + * For all other extents within the range, adjust the reference count
> > + * or delete it if the refcount falls below 2. If we were
> > + * incrementing, the end result looks like this:
> > + *
> > + * <------ adjustment range ------>
> > + * --------+---+-----+-+--+--------+----+----
> > + * 2 | 4 | 3 |2|18| 56 | 11 | 10
> > + * --------+---+-----+-+--+--------+----+----
> > + *
> > + * The result of a decrement operation looks as such:
> > + *
> > + * <------ adjustment range ------>
> > + * ----+ +---+ +--+--------+----+----
> > + * 2 | | 2 | |16| 54 | 9 | 10
> > + * ----+ +---+ +--+--------+----+----
> > + * DDDD 111111DD
> > + *
> > + * The blocks marked "D" are freed; the blocks marked "1" are only
> > + * referenced once and therefore the record is removed from the
> > + * refcount btree.
> > + */
> > +
> > +#define RLNEXT(rl) ((rl).rc_startblock + (rl).rc_blockcount)
> > +/*
> > + * Split a left rlextent that crosses agbno.
> > + */
> > +STATIC int
> > +try_split_left_rlextent(
> > + struct xfs_btree_cur *cur,
> > + xfs_agblock_t agbno)
> > +{
> > + struct xfs_refcount_irec left, tmp;
> > + int found_rec;
> > + int error;
> > +
> > + error = xfs_refcountbt_lookup_le(cur, agbno, &found_rec);
> > + if (error)
> > + goto out_error;
> > + if (!found_rec)
> > + return 0;
> > +
> > + error = xfs_refcountbt_get_rec(cur, &left, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > + if (left.rc_startblock >= agbno || RLNEXT(left) <= agbno)
> > + return 0;
> > +
> > + trace_xfs_refcount_split_left_extent(cur->bc_mp, cur->bc_private.a.agno,
> > + &left, agbno);
> > + tmp = left;
> > + tmp.rc_blockcount = agbno - left.rc_startblock;
> > + error = xfs_refcountbt_update(cur, &tmp);
> > + if (error)
> > + goto out_error;
> > +
> > + error = xfs_btree_increment(cur, 0, &found_rec);
> > + if (error)
> > + goto out_error;
> > +
> > + tmp = left;
> > + tmp.rc_startblock = agbno;
> > + tmp.rc_blockcount -= (agbno - left.rc_startblock);
> > + error = xfs_refcountbt_insert(cur, &tmp, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > + return error;
> > +
> > +out_error:
> > + trace_xfs_refcount_split_left_extent_error(cur->bc_mp,
> > + cur->bc_private.a.agno, error, _RET_IP_);
> > + return error;
> > +}
> > +
> > +/*
> > + * Split a right rlextent that crosses agbno.
> > + */
> > +STATIC int
> > +try_split_right_rlextent(
> > + struct xfs_btree_cur *cur,
> > + xfs_agblock_t agbnext)
> > +{
> > + struct xfs_refcount_irec right, tmp;
> > + int found_rec;
> > + int error;
> > +
> > + error = xfs_refcountbt_lookup_le(cur, agbnext - 1, &found_rec);
> > + if (error)
> > + goto out_error;
> > + if (!found_rec)
> > + return 0;
> > +
> > + error = xfs_refcountbt_get_rec(cur, &right, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > + if (RLNEXT(right) <= agbnext)
> > + return 0;
> > +
> > + trace_xfs_refcount_split_right_extent(cur->bc_mp,
> > + cur->bc_private.a.agno, &right, agbnext);
> > + tmp = right;
> > + tmp.rc_startblock = agbnext;
> > + tmp.rc_blockcount -= (agbnext - right.rc_startblock);
> > + error = xfs_refcountbt_update(cur, &tmp);
> > + if (error)
> > + goto out_error;
> > +
> > + tmp = right;
> > + tmp.rc_blockcount = agbnext - right.rc_startblock;
> > + error = xfs_refcountbt_insert(cur, &tmp, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > + return error;
> > +
> > +out_error:
> > + trace_xfs_refcount_split_right_extent_error(cur->bc_mp,
> > + cur->bc_private.a.agno, error, _RET_IP_);
> > + return error;
> > +}
> > +
> > +/*
> > + * Merge the left, center, and right extents.
> > + */
> > +STATIC int
> > +merge_center(
> > + struct xfs_btree_cur *cur,
> > + struct xfs_refcount_irec *left,
> > + struct xfs_refcount_irec *center,
> > + unsigned long long extlen,
> > + xfs_agblock_t *agbno,
> > + xfs_extlen_t *aglen)
> > +{
> > + int error;
> > + int found_rec;
> > +
> > + error = xfs_refcountbt_lookup_ge(cur, center->rc_startblock,
> > + &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > +
> > + error = xfs_refcountbt_delete(cur, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > +
> > + if (center->rc_refcount > 1) {
> > + error = xfs_refcountbt_delete(cur, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> > + out_error);
> > + }
> > +
> > + error = xfs_refcountbt_lookup_le(cur, left->rc_startblock,
> > + &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > +
> > + left->rc_blockcount = extlen;
> > + error = xfs_refcountbt_update(cur, left);
> > + if (error)
> > + goto out_error;
> > +
> > + *aglen = 0;
> > + return error;
> > +
> > +out_error:
> > + trace_xfs_refcount_merge_center_extents_error(cur->bc_mp,
> > + cur->bc_private.a.agno, error, _RET_IP_);
> > + return error;
> > +}
> > +
> > +/*
> > + * Merge with the left extent.
> > + */
> > +STATIC int
> > +merge_left(
> > + struct xfs_btree_cur *cur,
> > + struct xfs_refcount_irec *left,
> > + struct xfs_refcount_irec *cleft,
> > + xfs_agblock_t *agbno,
> > + xfs_extlen_t *aglen)
> > +{
> > + int error;
> > + int found_rec;
> > +
> > + if (cleft->rc_refcount > 1) {
> > + error = xfs_refcountbt_lookup_le(cur, cleft->rc_startblock,
> > + &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> > + out_error);
> > +
> > + error = xfs_refcountbt_delete(cur, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> > + out_error);
> > + }
> > +
> > + error = xfs_refcountbt_lookup_le(cur, left->rc_startblock,
> > + &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > +
> > + left->rc_blockcount += cleft->rc_blockcount;
> > + error = xfs_refcountbt_update(cur, left);
> > + if (error)
> > + goto out_error;
> > +
> > + *agbno += cleft->rc_blockcount;
> > + *aglen -= cleft->rc_blockcount;
> > + return error;
> > +
> > +out_error:
> > + trace_xfs_refcount_merge_left_extent_error(cur->bc_mp,
> > + cur->bc_private.a.agno, error, _RET_IP_);
> > + return error;
> > +}
> > +
> > +/*
> > + * Merge with the right extent.
> > + */
> > +STATIC int
> > +merge_right(
> > + struct xfs_btree_cur *cur,
> > + struct xfs_refcount_irec *right,
> > + struct xfs_refcount_irec *cright,
> > + xfs_agblock_t *agbno,
> > + xfs_extlen_t *aglen)
> > +{
> > + int error;
> > + int found_rec;
> > +
> > + if (cright->rc_refcount > 1) {
> > + error = xfs_refcountbt_lookup_le(cur, cright->rc_startblock,
> > + &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> > + out_error);
> > +
> > + error = xfs_refcountbt_delete(cur, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> > + out_error);
> > + }
> > +
> > + error = xfs_refcountbt_lookup_le(cur, right->rc_startblock,
> > + &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > +
> > + right->rc_startblock -= cright->rc_blockcount;
> > + right->rc_blockcount += cright->rc_blockcount;
> > + error = xfs_refcountbt_update(cur, right);
> > + if (error)
> > + goto out_error;
> > +
> > + *aglen -= cright->rc_blockcount;
> > + return error;
> > +
> > +out_error:
> > + trace_xfs_refcount_merge_right_extent_error(cur->bc_mp,
> > + cur->bc_private.a.agno, error, _RET_IP_);
> > + return error;
> > +}
> > +
> > +/*
> > + * Find the left extent and the one after it (cleft). This function assumes
> > + * that we've already split any extent crossing agbno.
> > + */
> > +STATIC int
> > +find_left_extent(
> > + struct xfs_btree_cur *cur,
> > + struct xfs_refcount_irec *left,
> > + struct xfs_refcount_irec *cleft,
> > + xfs_agblock_t agbno,
> > + xfs_extlen_t aglen)
> > +{
> > + struct xfs_refcount_irec tmp;
> > + int error;
> > + int found_rec;
> > +
> > + left->rc_blockcount = cleft->rc_blockcount = 0;
> > + error = xfs_refcountbt_lookup_le(cur, agbno - 1, &found_rec);
> > + if (error)
> > + goto out_error;
> > + if (!found_rec)
> > + return 0;
> > +
> > + error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > +
> > + if (RLNEXT(tmp) != agbno)
> > + return 0;
> > + /* We have a left extent; retrieve (or invent) the next right one */
> > + *left = tmp;
> > +
> > + error = xfs_btree_increment(cur, 0, &found_rec);
> > + if (error)
> > + goto out_error;
> > + if (found_rec) {
> > + error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> > + out_error);
> > +
> > + if (tmp.rc_startblock == agbno)
> > + *cleft = tmp;
> > + else {
> > + cleft->rc_startblock = agbno;
> > + cleft->rc_blockcount = min(aglen,
> > + tmp.rc_startblock - agbno);
> > + cleft->rc_refcount = 1;
> > + }
> > + } else {
> > + cleft->rc_startblock = agbno;
> > + cleft->rc_blockcount = aglen;
> > + cleft->rc_refcount = 1;
> > + }
> > + trace_xfs_refcount_find_left_extent(cur->bc_mp, cur->bc_private.a.agno,
> > + left, cleft, agbno);
> > + return error;
> > +
> > +out_error:
> > + trace_xfs_refcount_find_left_extent_error(cur->bc_mp,
> > + cur->bc_private.a.agno, error, _RET_IP_);
> > + return error;
> > +}
> > +
> > +/*
> > + * Find the right extent and the one before it (cright). This function
> > + * assumes that we've already split any extents crossing agbno + aglen.
> > + */
> > +STATIC int
> > +find_right_extent(
> > + struct xfs_btree_cur *cur,
> > + struct xfs_refcount_irec *right,
> > + struct xfs_refcount_irec *cright,
> > + xfs_agblock_t agbno,
> > + xfs_extlen_t aglen)
> > +{
> > + struct xfs_refcount_irec tmp;
> > + int error;
> > + int found_rec;
> > +
> > + right->rc_blockcount = cright->rc_blockcount = 0;
> > + error = xfs_refcountbt_lookup_ge(cur, agbno + aglen, &found_rec);
> > + if (error)
> > + goto out_error;
> > + if (!found_rec)
> > + return 0;
> > +
> > + error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
> > +
> > + if (tmp.rc_startblock != agbno + aglen)
> > + return 0;
> > + /* We have a right extent; retrieve (or invent) the next left one */
> > + *right = tmp;
> > +
> > + error = xfs_btree_decrement(cur, 0, &found_rec);
> > + if (error)
> > + goto out_error;
> > + if (found_rec) {
> > + error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
> > + out_error);
> > +
> > + if (tmp.rc_startblock == agbno)
>
> Since tmp represents the refcount extent immediately to the left of (agbno +
> aglen), we want to copy tmp into cright if the end of tmp coincides with (agbno
> + aglen). This is probably a copy-pasta mistake.
>
> Also, s/RLNEXT/RCNEXT/ since these are refcount extents, not reflink extents.
> (Weird anachronism).
>
> --D
>
> > + *cright = tmp;
> > + else {
> > + cright->rc_startblock = max(agbno,
> > + RLNEXT(tmp));
> > + cright->rc_blockcount = right->rc_startblock -
> > + cright->rc_startblock;
> > + cright->rc_refcount = 1;
> > + }
> > + } else {
> > + cright->rc_startblock = agbno;
> > + cright->rc_blockcount = aglen;
> > + cright->rc_refcount = 1;
> > + }
> > + trace_xfs_refcount_find_right_extent(cur->bc_mp, cur->bc_private.a.agno,
> > + cright, right, agbno + aglen);
> > + return error;
> > +
> > +out_error:
> > + trace_xfs_refcount_find_right_extent_error(cur->bc_mp,
> > + cur->bc_private.a.agno, error, _RET_IP_);
> > + return error;
> > +}
> > +#undef RLNEXT
> > +
> > +/*
> > + * Try to merge with any extents on the boundaries of the adjustment range.
> > + */
> > +STATIC int
> > +try_merge_rlextents(
> > + struct xfs_btree_cur *cur,
> > + xfs_agblock_t *agbno,
> > + xfs_extlen_t *aglen,
> > + int adjust)
> > +{
> > + struct xfs_refcount_irec left, cleft, cright, right;
> > + int error;
> > + unsigned long long ulen;
> > +
> > + left.rc_blockcount = cleft.rc_blockcount = 0;
> > + cright.rc_blockcount = right.rc_blockcount = 0;
> > +
> > + /*
> > + * Find extents abutting the start and end of the range, and
> > + * the adjacent extents inside the range.
> > + */
> > + error = find_left_extent(cur, &left, &cleft, *agbno, *aglen);
> > + if (error)
> > + return error;
> > + error = find_right_extent(cur, &right, &cright, *agbno, *aglen);
> > + if (error)
> > + return error;
> > +
> > + /* No left or right extent to merge; exit. */
> > + if (left.rc_blockcount == 0 && right.rc_blockcount == 0)
> > + return 0;
> > +
> > + /* Try a center merge */
> > + ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount +
> > + right.rc_blockcount;
> > + if (left.rc_blockcount != 0 && right.rc_blockcount != 0 &&
> > + memcmp(&cleft, &cright, sizeof(cleft)) == 0 &&
> > + left.rc_refcount == cleft.rc_refcount + adjust &&
> > + right.rc_refcount == cleft.rc_refcount + adjust &&
> > + ulen < MAXREFCEXTLEN) {
> > + trace_xfs_refcount_merge_center_extents(cur->bc_mp,
> > + cur->bc_private.a.agno, &left, &cleft, &right);
> > + return merge_center(cur, &left, &cleft, ulen, agbno, aglen);
> > + }
> > +
> > + /* Try a left merge */
> > + ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount;
> > + if (left.rc_blockcount != 0 &&
> > + left.rc_refcount == cleft.rc_refcount + adjust &&
> > + ulen < MAXREFCEXTLEN) {
> > + trace_xfs_refcount_merge_left_extent(cur->bc_mp,
> > + cur->bc_private.a.agno, &left, &cleft);
> > + return merge_left(cur, &left, &cleft, agbno, aglen);
> > + }
We shouldn't return unconditionally here -- suppose that left, cleft, cright,
and right are all distinct extents and we want to merge left:cleft and merge
cright:right? You'd miss that second merge this way.
--D
> > +
> > + /* Try a right merge */
> > + ulen = (unsigned long long)right.rc_blockcount + cright.rc_blockcount;
> > + if (right.rc_blockcount != 0 &&
> > + right.rc_refcount == cright.rc_refcount + adjust &&
> > + ulen < MAXREFCEXTLEN) {
> > + trace_xfs_refcount_merge_right_extent(cur->bc_mp,
> > + cur->bc_private.a.agno, &cright, &right);
> > + return merge_right(cur, &right, &cright, agbno, aglen);
> > + }
> > +
> > + return error;
> > +}
> > +
> > +/*
> > + * Adjust the refcounts of middle extents. At this point we should have
> > + * split extents that crossed the adjustment range; merged with adjacent
> > + * extents; and updated agbno/aglen to reflect the merges. Therefore,
> > + * all we have to do is update the extents inside [agbno, agbno + aglen].
> > + */
> > +STATIC int
> > +adjust_rlextents(
> > + struct xfs_btree_cur *cur,
> > + xfs_agblock_t agbno,
> > + xfs_extlen_t aglen,
> > + int adj,
> > + struct xfs_bmap_free *flist,
> > + struct xfs_owner_info *oinfo)
> > +{
> > + struct xfs_refcount_irec ext, tmp;
> > + int error;
> > + int found_rec, found_tmp;
> > + xfs_fsblock_t fsbno;
> > +
> > + error = xfs_refcountbt_lookup_ge(cur, agbno, &found_rec);
> > + if (error)
> > + goto out_error;
> > +
> > + while (aglen > 0) {
> > + error = xfs_refcountbt_get_rec(cur, &ext, &found_rec);
> > + if (error)
> > + goto out_error;
> > + if (!found_rec) {
> > + ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
> > + ext.rc_blockcount = 0;
> > + ext.rc_refcount = 0;
> > + }
> > +
> > + /*
> > + * Deal with a hole in the refcount tree; if a file maps to
> > + * these blocks and there's no refcountbt recourd, pretend that
> > + * there is one with refcount == 1.
> > + */
> > + if (ext.rc_startblock != agbno) {
> > + tmp.rc_startblock = agbno;
> > + tmp.rc_blockcount = min(aglen,
> > + ext.rc_startblock - agbno);
> > + tmp.rc_refcount = 1 + adj;
> > + trace_xfs_refcount_modify_extent(cur->bc_mp,
> > + cur->bc_private.a.agno, &tmp);
> > +
> > + /*
> > + * Either cover the hole (increment) or
> > + * delete the range (decrement).
> > + */
> > + if (tmp.rc_refcount) {
> > + error = xfs_refcountbt_insert(cur, &tmp,
> > + &found_tmp);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
> > + found_tmp == 1, out_error);
> > + } else {
> > + fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
> > + cur->bc_private.a.agno,
> > + tmp.rc_startblock);
> > + xfs_bmap_add_free(cur->bc_mp, flist, fsbno,
> > + tmp.rc_blockcount, oinfo);
> > + }
> > +
> > + agbno += tmp.rc_blockcount;
> > + aglen -= tmp.rc_blockcount;
> > +
> > + error = xfs_refcountbt_lookup_ge(cur, agbno,
> > + &found_rec);
> > + if (error)
> > + goto out_error;
> > + }
> > +
> > + /* Stop if there's nothing left to modify */
> > + if (aglen == 0)
> > + break;
> > +
> > + /*
> > + * Adjust the reference count and either update the tree
> > + * (incr) or free the blocks (decr).
> > + */
> > + ext.rc_refcount += adj;
> > + trace_xfs_refcount_modify_extent(cur->bc_mp,
> > + cur->bc_private.a.agno, &ext);
> > + if (ext.rc_refcount > 1) {
> > + error = xfs_refcountbt_update(cur, &ext);
> > + if (error)
> > + goto out_error;
> > + } else if (ext.rc_refcount == 1) {
> > + error = xfs_refcountbt_delete(cur, &found_rec);
> > + if (error)
> > + goto out_error;
> > + XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
> > + found_rec == 1, out_error);
> > + goto advloop;
> > + } else {
> > + fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
> > + cur->bc_private.a.agno,
> > + ext.rc_startblock);
> > + xfs_bmap_add_free(cur->bc_mp, flist, fsbno,
> > + ext.rc_blockcount, oinfo);
> > + }
> > +
> > + error = xfs_btree_increment(cur, 0, &found_rec);
> > + if (error)
> > + goto out_error;
> > +
> > +advloop:
> > + agbno += ext.rc_blockcount;
> > + aglen -= ext.rc_blockcount;
> > + }
> > +
> > + return error;
> > +out_error:
> > + trace_xfs_refcount_modify_extent_error(cur->bc_mp,
> > + cur->bc_private.a.agno, error, _RET_IP_);
> > + return error;
> > +}
> > +
> > +/*
> > + * Adjust the reference count of a range of AG blocks.
> > + *
> > + * @mp: XFS mount object
> > + * @tp: XFS transaction object
> > + * @agbp: Buffer containing the AGF
> > + * @agno: AG number
> > + * @agbno: Start of range to adjust
> > + * @aglen: Length of range to adjust
> > + * @adj: +1 to increment, -1 to decrement reference count
> > + * @flist: freelist (only required if adj == -1)
> > + * @owner: owner of the blocks (only required if adj == -1)
> > + */
> > +STATIC int
> > +xfs_refcountbt_adjust_refcount(
> > + struct xfs_mount *mp,
> > + struct xfs_trans *tp,
> > + struct xfs_buf *agbp,
> > + xfs_agnumber_t agno,
> > + xfs_agblock_t agbno,
> > + xfs_extlen_t aglen,
> > + int adj,
> > + struct xfs_bmap_free *flist,
> > + struct xfs_owner_info *oinfo)
> > +{
> > + struct xfs_btree_cur *cur;
> > + int error;
> > +
> > + cur = xfs_refcountbt_init_cursor(mp, tp, agbp, agno, flist);
> > +
> > + /*
> > + * Ensure that no rlextents cross the boundary of the adjustment range.
> > + */
> > + error = try_split_left_rlextent(cur, agbno);
> > + if (error)
> > + goto out_error;
> > +
> > + error = try_split_right_rlextent(cur, agbno + aglen);
> > + if (error)
> > + goto out_error;
> > +
> > + /*
> > + * Try to merge with the left or right extents of the range.
> > + */
> > + error = try_merge_rlextents(cur, &agbno, &aglen, adj);
> > + if (error)
> > + goto out_error;
> > +
> > + /* Now that we've taken care of the ends, adjust the middle extents */
> > + error = adjust_rlextents(cur, agbno, aglen, adj, flist, oinfo);
> > + if (error)
> > + goto out_error;
> > +
> > + xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > + return 0;
> > +
> > +out_error:
> > + trace_xfs_refcount_adjust_error(mp, agno, error, _RET_IP_);
> > + xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
> > + return error;
> > +}
> > +
> > +/**
> > + * Increase the reference count of a range of AG blocks.
> > + *
> > + * @mp: XFS mount object
> > + * @tp: XFS transaction object
> > + * @agbp: Buffer containing the AGF
> > + * @agno: AG number
> > + * @agbno: Start of range to adjust
> > + * @aglen: Length of range to adjust
> > + * @flist: List of blocks to free
> > + */
> > +int
> > +xfs_refcount_increase(
> > + struct xfs_mount *mp,
> > + struct xfs_trans *tp,
> > + struct xfs_buf *agbp,
> > + xfs_agnumber_t agno,
> > + xfs_agblock_t agbno,
> > + xfs_extlen_t aglen,
> > + struct xfs_bmap_free *flist)
> > +{
> > + trace_xfs_refcount_increase(mp, agno, agbno, aglen);
> > + return xfs_refcountbt_adjust_refcount(mp, tp, agbp, agno, agbno,
> > + aglen, 1, flist, NULL);
> > +}
> > +
> > +/**
> > + * Decrease the reference count of a range of AG blocks.
> > + *
> > + * @mp: XFS mount object
> > + * @tp: XFS transaction object
> > + * @agbp: Buffer containing the AGF
> > + * @agno: AG number
> > + * @agbno: Start of range to adjust
> > + * @aglen: Length of range to adjust
> > + * @flist: List of blocks to free
> > + * @owner: Extent owner
> > + */
> > +int
> > +xfs_refcount_decrease(
> > + struct xfs_mount *mp,
> > + struct xfs_trans *tp,
> > + struct xfs_buf *agbp,
> > + xfs_agnumber_t agno,
> > + xfs_agblock_t agbno,
> > + xfs_extlen_t aglen,
> > + struct xfs_bmap_free *flist,
> > + struct xfs_owner_info *oinfo)
> > +{
> > + trace_xfs_refcount_decrease(mp, agno, agbno, aglen);
> > + return xfs_refcountbt_adjust_refcount(mp, tp, agbp, agno, agbno,
> > + aglen, -1, flist, oinfo);
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
> > index 033a9b1..6640e3d 100644
> > --- a/fs/xfs/libxfs/xfs_refcount.h
> > +++ b/fs/xfs/libxfs/xfs_refcount.h
> > @@ -26,4 +26,12 @@ extern int xfs_refcountbt_lookup_ge(struct xfs_btree_cur *cur,
> > extern int xfs_refcountbt_get_rec(struct xfs_btree_cur *cur,
> > struct xfs_refcount_irec *irec, int *stat);
> >
> > +extern int xfs_refcount_increase(struct xfs_mount *mp, struct xfs_trans *tp,
> > + struct xfs_buf *agbp, xfs_agnumber_t agno, xfs_agblock_t agbno,
> > + xfs_extlen_t aglen, struct xfs_bmap_free *flist);
> > +extern int xfs_refcount_decrease(struct xfs_mount *mp, struct xfs_trans *tp,
> > + struct xfs_buf *agbp, xfs_agnumber_t agno, xfs_agblock_t agbno,
> > + xfs_extlen_t aglen, struct xfs_bmap_free *flist,
> > + struct xfs_owner_info *oinfo);
> > +
> > #endif /* __XFS_REFCOUNT_H__ */
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2015-10-30 20:56 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-07 4:54 [RFCv3 00/58] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
2015-10-07 4:54 ` [PATCH 01/58] libxfs: make xfs_alloc_fix_freelist non-static Darrick J. Wong
2015-10-07 4:54 ` [PATCH 02/58] xfs: fix log ticket type printing Darrick J. Wong
2015-10-07 4:55 ` [PATCH 03/58] xfs: introduce rmap btree definitions Darrick J. Wong
2015-10-07 4:55 ` [PATCH 04/58] xfs: add rmap btree stats infrastructure Darrick J. Wong
2015-10-07 4:55 ` [PATCH 05/58] xfs: rmap btree add more reserved blocks Darrick J. Wong
2015-10-07 4:55 ` [PATCH 06/58] xfs: add owner field to extent allocation and freeing Darrick J. Wong
2015-10-07 4:55 ` [PATCH 07/58] xfs: add extended " Darrick J. Wong
2015-10-07 4:55 ` [PATCH 08/58] xfs: introduce rmap extent operation stubs Darrick J. Wong
2015-10-07 4:55 ` [PATCH 09/58] xfs: extend rmap extent operation stubs to take full owner info Darrick J. Wong
2015-10-07 4:55 ` [PATCH 10/58] xfs: define the on-disk rmap btree format Darrick J. Wong
2015-10-07 4:55 ` [PATCH 11/58] xfs: enhance " Darrick J. Wong
2015-10-07 4:56 ` [PATCH 12/58] xfs: add rmap btree growfs support Darrick J. Wong
2015-10-07 4:56 ` [PATCH 13/58] xfs: enhance " Darrick J. Wong
2015-10-07 4:56 ` [PATCH 14/58] xfs: rmap btree transaction reservations Darrick J. Wong
2015-10-07 4:56 ` [PATCH 15/58] xfs: rmap btree requires more reserved free space Darrick J. Wong
2015-10-07 4:56 ` [PATCH 16/58] libxfs: fix min freelist length calculation Darrick J. Wong
2015-10-07 4:56 ` [PATCH 17/58] xfs: add rmap btree operations Darrick J. Wong
2015-10-07 4:57 ` [PATCH 18/58] xfs: enhance " Darrick J. Wong
2015-10-07 4:57 ` [PATCH 19/58] xfs: add an extent to the rmap btree Darrick J. Wong
2015-10-07 4:57 ` [PATCH 20/58] xfs: add tracepoints for the rmap-mirrors-bmbt functions Darrick J. Wong
2015-10-07 4:57 ` [PATCH 21/58] xfs: teach rmap_alloc how to deal with our larger rmap btree Darrick J. Wong
2015-10-07 4:57 ` [PATCH 22/58] xfs: remove an extent from the " Darrick J. Wong
2015-10-07 4:57 ` [PATCH 23/58] xfs: enhanced " Darrick J. Wong
2015-10-07 4:57 ` [PATCH 24/58] xfs: add rmap btree insert and delete helpers Darrick J. Wong
2015-10-07 4:57 ` [PATCH 25/58] xfs: bmap btree changes should update rmap btree Darrick J. Wong
2015-10-21 21:39 ` Darrick J. Wong
2015-10-07 4:57 ` [PATCH 26/58] xfs: add rmap btree geometry feature flag Darrick J. Wong
2015-10-07 4:58 ` [PATCH 27/58] xfs: add rmap btree block detection to log recovery Darrick J. Wong
2015-10-07 4:58 ` [PATCH 28/58] xfs: enable the rmap btree functionality Darrick J. Wong
2015-10-07 4:58 ` [PATCH 29/58] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Darrick J. Wong
2015-10-07 4:58 ` [PATCH 30/58] xfs: implement " Darrick J. Wong
2015-10-07 4:58 ` [PATCH 31/58] libxfs: refactor short btree block verification Darrick J. Wong
2015-10-07 4:58 ` [PATCH 32/58] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
2015-10-07 4:58 ` [PATCH 33/58] xfs: introduce refcount btree definitions Darrick J. Wong
2015-10-07 4:58 ` [PATCH 34/58] xfs: add refcount btree stats infrastructure Darrick J. Wong
2015-10-07 4:58 ` [PATCH 35/58] xfs: refcount btree add more reserved blocks Darrick J. Wong
2015-10-07 4:59 ` [PATCH 36/58] xfs: define the on-disk refcount btree format Darrick J. Wong
2015-10-07 4:59 ` [PATCH 37/58] xfs: define tracepoints for refcount/reflink activities Darrick J. Wong
2015-10-07 4:59 ` [PATCH 38/58] xfs: add refcount btree support to growfs Darrick J. Wong
2015-10-07 4:59 ` [PATCH 39/58] xfs: add refcount btree operations Darrick J. Wong
2015-10-07 4:59 ` [PATCH 40/58] libxfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
2015-10-27 19:05 ` Darrick J. Wong
2015-10-30 20:56 ` Darrick J. Wong
2015-10-07 4:59 ` [PATCH 41/58] libxfs: adjust refcount when unmapping file blocks Darrick J. Wong
2015-10-07 4:59 ` [PATCH 42/58] xfs: add refcount btree block detection to log recovery Darrick J. Wong
2015-10-07 4:59 ` [PATCH 43/58] xfs: map an inode's offset to an exact physical block Darrick J. Wong
2015-10-07 4:59 ` [PATCH 44/58] xfs: add reflink feature flag to geometry Darrick J. Wong
2015-10-07 5:00 ` [PATCH 45/58] xfs: create a separate workqueue for copy-on-write activities Darrick J. Wong
2015-10-07 5:00 ` [PATCH 46/58] xfs: implement copy-on-write for reflinked blocks Darrick J. Wong
2015-10-07 5:00 ` [PATCH 47/58] xfs: handle directio " Darrick J. Wong
2015-10-07 5:00 ` [PATCH 48/58] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks Darrick J. Wong
2015-10-21 21:17 ` Darrick J. Wong
2015-10-07 5:00 ` [PATCH 49/58] xfs: clear inode reflink flag when freeing blocks Darrick J. Wong
2015-10-07 5:00 ` [PATCH 50/58] xfs: reflink extents from one file to another Darrick J. Wong
2015-10-07 5:12 ` kbuild test robot
2015-10-07 5:00 ` [PATCH 51/58] xfs: add clone file and clone range ioctls Darrick J. Wong
2015-10-07 5:13 ` kbuild test robot
2015-10-07 6:46 ` kbuild test robot
2015-10-07 7:35 ` kbuild test robot
2015-10-07 5:00 ` [PATCH 52/58] xfs: emulate the btrfs dedupe extent same ioctl Darrick J. Wong
2015-10-07 5:00 ` [PATCH 53/58] xfs: teach fiemap about reflink'd extents Darrick J. Wong
2015-10-07 5:01 ` [PATCH 54/58] xfs: swap inode reflink flags when swapping inode extents Darrick J. Wong
2015-10-07 5:01 ` [PATCH 55/58] vfs: add a FALLOC_FL_UNSHARE mode to fallocate to unshare a range of blocks Darrick J. Wong
2015-10-07 5:01 ` [PATCH 56/58] xfs: unshare a range of blocks via fallocate Darrick J. Wong
2015-10-07 5:01 ` [PATCH 57/58] xfs: support XFS_XFLAG_REFLINK (and FS_NOCOW_FL) on reflink filesystems Darrick J. Wong
2015-10-07 5:01 ` [PATCH 58/58] xfs: recognize the reflink feature bit Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).