* [PATCH] xfs: xfs_swap_extents needs to handle dynamic fork offsets
@ 2010-01-12 5:01 Dave Chinner
2010-01-12 13:09 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2010-01-12 5:01 UTC (permalink / raw)
To: xfs
When swapping extents, we can corrupt inodes by swapping data forks
that are in incompatible formats. This is caused by the two indoes
having different fork offsets due to the presence of an attribute
fork on an attr2 filesystem. xfs_fsr tries to be smart about
setting the fork offset, but the trick it plays only works on attr1
(old fixed format attribute fork) filesystems.
Changing the way xfs_fsr sets up the attribute fork will prevent
this situation from ever occurring, so in the kernel code we can get
by with a preventative fix - check that the data fork in the
defragmented inode is in a format valid for the inode it is being
swapped into. This will lead to files that will silently and
potentially repeatedly fail defragmentation, so issue a warning to
the log when this particular failure occurs to let us know that
xfs_fsr needs updating/fixing.
To help identify how to improve xfs_fsr to avoid this issue, add
trace points for the inodes being swapped so that we can determine
why the swap was rejected and to confirm that the code is making the
right decisions and modifications when swapping forks.
A further complication is even when the swap is allowed to proceed
when the fork offset is different between the two inodes then value
for the maximum number of extents the data fork can hold can be
wrong. Make sure these are also set correctly after the swap occurs.
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/linux-2.6/xfs_trace.h | 53 ++++++++++++++++
fs/xfs/xfs_dfrag.c | 140 +++++++++++++++++++++++++++++++++---------
2 files changed, 163 insertions(+), 30 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_trace.h b/fs/xfs/linux-2.6/xfs_trace.h
index 058fff1..d4145bf 100644
--- a/fs/xfs/linux-2.6/xfs_trace.h
+++ b/fs/xfs/linux-2.6/xfs_trace.h
@@ -1441,6 +1441,59 @@ TRACE_EVENT(xfs_dir2_leafn_moveents,
__entry->count)
);
+#define XFS_SWAPEXT_INODES \
+ { 0, "target" }, \
+ { 1, "temp" }
+
+#define XFS_INODE_FORMAT_STR \
+ { 0, "invalid" }, \
+ { 1, "local" }, \
+ { 2, "extent" }, \
+ { 3, "btree" }
+
+DECLARE_EVENT_CLASS(xfs_swap_extent_class,
+ TP_PROTO(struct xfs_inode *ip, int which),
+ TP_ARGS(ip, which),
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(int, which)
+ __field(xfs_ino_t, ino)
+ __field(int, format)
+ __field(int, nex)
+ __field(int, max_nex)
+ __field(int, broot_size)
+ __field(int, fork_off)
+ ),
+ TP_fast_assign(
+ __entry->dev = VFS_I(ip)->i_sb->s_dev;
+ __entry->which = which;
+ __entry->ino = ip->i_ino;
+ __entry->format = ip->i_d.di_format;
+ __entry->nex = ip->i_d.di_nextents;
+ __entry->max_nex = ip->i_df.if_ext_max;
+ __entry->broot_size = ip->i_df.if_broot_bytes;
+ __entry->fork_off = XFS_IFORK_BOFF(ip);
+ ),
+ TP_printk("dev %d:%d %s inode 0x%llx, %s format, num_extents %d, "
+ "Max in-fork extents %d, broot size %d, fork offset %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __print_symbolic(__entry->which, XFS_SWAPEXT_INODES),
+ __entry->ino,
+ __print_symbolic(__entry->format, XFS_INODE_FORMAT_STR),
+ __entry->nex,
+ __entry->max_nex,
+ __entry->broot_size,
+ __entry->fork_off)
+)
+
+#define DEFINE_SWAPEXT_EVENT(name) \
+DEFINE_EVENT(xfs_swap_extent_class, name, \
+ TP_PROTO(struct xfs_inode *ip, int which), \
+ TP_ARGS(ip, which))
+
+DEFINE_SWAPEXT_EVENT(xfs_swap_extent_before);
+DEFINE_SWAPEXT_EVENT(xfs_swap_extent_after);
+
#endif /* _TRACE_XFS_H */
#undef TRACE_INCLUDE_PATH
diff --git a/fs/xfs/xfs_dfrag.c b/fs/xfs/xfs_dfrag.c
index d1483a4..f83d127 100644
--- a/fs/xfs/xfs_dfrag.c
+++ b/fs/xfs/xfs_dfrag.c
@@ -53,7 +53,7 @@ xfs_swapext(
xfs_swapext_t *sxp)
{
xfs_inode_t *ip, *tip;
- struct file *file, *target_file;
+ struct file *file, *tmp_file;
int error = 0;
/* Pull information for the target fd */
@@ -68,56 +68,128 @@ xfs_swapext(
goto out_put_file;
}
- target_file = fget((int)sxp->sx_fdtmp);
- if (!target_file) {
+ tmp_file = fget((int)sxp->sx_fdtmp);
+ if (!tmp_file) {
error = XFS_ERROR(EINVAL);
goto out_put_file;
}
- if (!(target_file->f_mode & FMODE_WRITE) ||
- (target_file->f_flags & O_APPEND)) {
+ if (!(tmp_file->f_mode & FMODE_WRITE) ||
+ (tmp_file->f_flags & O_APPEND)) {
error = XFS_ERROR(EBADF);
- goto out_put_target_file;
+ goto out_put_tmp_file;
}
if (IS_SWAPFILE(file->f_path.dentry->d_inode) ||
- IS_SWAPFILE(target_file->f_path.dentry->d_inode)) {
+ IS_SWAPFILE(tmp_file->f_path.dentry->d_inode)) {
error = XFS_ERROR(EINVAL);
- goto out_put_target_file;
+ goto out_put_tmp_file;
}
ip = XFS_I(file->f_path.dentry->d_inode);
- tip = XFS_I(target_file->f_path.dentry->d_inode);
+ tip = XFS_I(tmp_file->f_path.dentry->d_inode);
if (ip->i_mount != tip->i_mount) {
error = XFS_ERROR(EINVAL);
- goto out_put_target_file;
+ goto out_put_tmp_file;
}
if (ip->i_ino == tip->i_ino) {
error = XFS_ERROR(EINVAL);
- goto out_put_target_file;
+ goto out_put_tmp_file;
}
if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
error = XFS_ERROR(EIO);
- goto out_put_target_file;
+ goto out_put_tmp_file;
}
error = xfs_swap_extents(ip, tip, sxp);
- out_put_target_file:
- fput(target_file);
+ out_put_tmp_file:
+ fput(tmp_file);
out_put_file:
fput(file);
out:
return error;
}
+/*
+ * We need to check that the format of the data fork in the temporary inode is
+ * valid for the target inode before doing the swap. This is not a problem with
+ * attr1 because of the fixed fok offset, but attr2 has a dynamically sized
+ * data fork depending on the space the attribute fork is taking so we can get
+ * invalid formats on the target inode.
+ *
+ * e.g. target has space for 7 extents in extent format, temp inode only has
+ * space for 6. If we defragment down to 7 extents, then the tmp format is a
+ * btree, but when swapped it needs to be in extent format. Hence we can't just
+ * blindly swap data forks on attr2 filesystems.
+ *
+ * Note that we check the swap in both directions so that we don't end up with
+ * a corrupt temporary inode, either.
+ *
+ * Note that fixing the way xfs_fsr sets up the attribute fork in the source
+ * inode will prevent this situation from occurring, so all we do here is
+ * reject and log the attempt. basically we are putting the responsibility on
+ * userspace to get this right.
+ */
+static int
+xfs_swap_extents_check_format(
+ xfs_inode_t *ip, /* target inode */
+ xfs_inode_t *tip) /* tmp inode */
+{
+
+ /* Should never get a local format */
+ if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL ||
+ tip->i_d.di_format == XFS_DINODE_FMT_LOCAL)
+ return EINVAL;
+
+ /*
+ * if the target inode has less extents that then temporary inode then
+ * why did userspace call us?
+ */
+ if (ip->i_d.di_nextents < tip->i_d.di_nextents)
+ return EINVAL;
+
+ /*
+ * if the target inode is in extent form and the temp inode is in btree
+ * form then we will end up with the target inode in the wrong format
+ * as we already know there are less extents in the temp inode.
+ */
+ if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
+ tip->i_d.di_format == XFS_DINODE_FMT_BTREE)
+ return EINVAL;
+
+ /* Check temp in extent form to max in target */
+ if (tip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
+ XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) > ip->i_df.if_ext_max)
+ return EINVAL;
+
+ /* Check target in extent form to max in temp */
+ if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
+ XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) > tip->i_df.if_ext_max)
+ return EINVAL;
+
+ /* Check root block of temp in btree form to max in target */
+ if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
+ XFS_IFORK_BOFF(ip) &&
+ tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip))
+ return EINVAL;
+
+ /* Check root block of target in btree form to max in temp */
+ if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
+ XFS_IFORK_BOFF(tip) &&
+ ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip))
+ return EINVAL;
+
+ return 0;
+}
+
int
xfs_swap_extents(
- xfs_inode_t *ip,
- xfs_inode_t *tip,
+ xfs_inode_t *ip, /* target inode */
+ xfs_inode_t *tip, /* tmp inode */
xfs_swapext_t *sxp)
{
xfs_mount_t *mp;
@@ -161,13 +233,6 @@ xfs_swap_extents(
goto out_unlock;
}
- /* Should never get a local format */
- if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL ||
- tip->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
- error = XFS_ERROR(EINVAL);
- goto out_unlock;
- }
-
if (VN_CACHED(VFS_I(tip)) != 0) {
error = xfs_flushinval_pages(tip, 0, -1,
FI_REMAPF_LOCKED);
@@ -189,13 +254,15 @@ xfs_swap_extents(
goto out_unlock;
}
- /*
- * If the target has extended attributes, the tmp file
- * must also in order to ensure the correct data fork
- * format.
- */
- if ( XFS_IFORK_Q(ip) != XFS_IFORK_Q(tip) ) {
- error = XFS_ERROR(EINVAL);
+ trace_xfs_swap_extent_before(ip, 0);
+ trace_xfs_swap_extent_before(tip, 1);
+
+ /* check inode formats now that data is flushed */
+ error = xfs_swap_extents_check_format(ip, tip);
+ if (error) {
+ xfs_fs_cmn_err(CE_NOTE, mp,
+ "%s: inode 0x%llx format is incompatible for exchanging.",
+ __FILE__, ip->i_ino);
goto out_unlock;
}
@@ -276,6 +343,16 @@ xfs_swap_extents(
*tifp = *tempifp; /* struct copy */
/*
+ * Fix the in-memory data fork values that are dependent on the fork
+ * offset in the inode. We can't assume they remain the same as attr2
+ * has dynamic fork offsets.
+ */
+ ifp->if_ext_max = XFS_IFORK_SIZE(ip, XFS_DATA_FORK) /
+ (uint)sizeof(xfs_bmbt_rec_t);
+ tifp->if_ext_max = XFS_IFORK_SIZE(tip, XFS_DATA_FORK) /
+ (uint)sizeof(xfs_bmbt_rec_t);
+
+ /*
* Fix the on-disk inode values
*/
tmp = (__uint64_t)ip->i_d.di_nblocks;
@@ -347,6 +424,9 @@ xfs_swap_extents(
error = xfs_trans_commit(tp, XFS_TRANS_SWAPEXT);
+ trace_xfs_swap_extent_after(ip, 0);
+ trace_xfs_swap_extent_after(tip, 1);
+
out:
kmem_free(tempifp);
return error;
--
1.6.5
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] xfs: xfs_swap_extents needs to handle dynamic fork offsets
2010-01-12 5:01 [PATCH] xfs: xfs_swap_extents needs to handle dynamic fork offsets Dave Chinner
@ 2010-01-12 13:09 ` Christoph Hellwig
2010-01-13 22:32 ` Alex Elder
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2010-01-12 13:09 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> + TP_printk("dev %d:%d %s inode 0x%llx, %s format, num_extents %d, "
> + "Max in-fork extents %d, broot size %d, fork offset %d",
It would be nice to keep the
"dev %d:%d ino 0x%llx"
prefix as a convention so that all trace records are similar at their
beginning.
> #undef TRACE_INCLUDE_PATH
> diff --git a/fs/xfs/xfs_dfrag.c b/fs/xfs/xfs_dfrag.c
> @@ -53,7 +53,7 @@ xfs_swapext(
> xfs_swapext_t *sxp)
> {
> xfs_inode_t *ip, *tip;
> - struct file *file, *target_file;
> + struct file *file, *tmp_file;
I think these xfs_swapext belong into a separate patch. While they
make the code quite a bit more redable they're purely cleanups and
can wait for 2.6.34. And while you're at it you might also want to
merge xfs_swap_extents into xfs_swapext - there's no need for that
split at all.
> +static int
> +xfs_swap_extents_check_format(
> + xfs_inode_t *ip, /* target inode */
> + xfs_inode_t *tip) /* tmp inode */
What about just using struct xfs_inode * for new code?
> + /* Should never get a local format */
> + if (ip->i_d.di_format == XFS_DINODE_FMT_LOCAL ||
> + tip->i_d.di_format == XFS_DINODE_FMT_LOCAL)
> + return EINVAL;
Ok, same check as in the old code. Any reason we drop the XFS_ERROR
here?
> + /*
> + * if the target inode has less extents that then temporary inode then
> + * why did userspace call us?
> + */
> + if (ip->i_d.di_nextents < tip->i_d.di_nextents)
> + return EINVAL;
Ok.
> + /*
> + * if the target inode is in extent form and the temp inode is in btree
> + * form then we will end up with the target inode in the wrong format
> + * as we already know there are less extents in the temp inode.
> + */
> + if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
> + tip->i_d.di_format == XFS_DINODE_FMT_BTREE)
> + return EINVAL;
Ok.
> + /* Check temp in extent form to max in target */
> + if (tip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
> + XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK) > ip->i_df.if_ext_max)
> + return EINVAL;
> +
> + /* Check target in extent form to max in temp */
> + if (ip->i_d.di_format == XFS_DINODE_FMT_EXTENTS &&
> + XFS_IFORK_NEXTENTS(ip, XFS_DATA_FORK) > tip->i_df.if_ext_max)
> + return EINVAL;
So we check this either way just to be sure, ok.
> + /* Check root block of temp in btree form to max in target */
> + if (tip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
> + XFS_IFORK_BOFF(ip) &&
> + tip->i_df.if_broot_bytes > XFS_IFORK_BOFF(ip))
> + return EINVAL;
> +
> + /* Check root block of target in btree form to max in temp */
> + if (ip->i_d.di_format == XFS_DINODE_FMT_BTREE &&
> + XFS_IFORK_BOFF(tip) &&
> + ip->i_df.if_broot_bytes > XFS_IFORK_BOFF(tip))
> + return EINVAL;
Same here.
Patch looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread* RE: [PATCH] xfs: xfs_swap_extents needs to handle dynamic fork offsets
2010-01-12 13:09 ` Christoph Hellwig
@ 2010-01-13 22:32 ` Alex Elder
2010-01-13 23:03 ` Dave Chinner
2010-01-14 6:35 ` Christoph Hellwig
0 siblings, 2 replies; 5+ messages in thread
From: Alex Elder @ 2010-01-13 22:32 UTC (permalink / raw)
To: Christoph Hellwig, Dave Chinner; +Cc: xfs
Christoph Hellwig wrote:
>> + TP_printk("dev %d:%d %s inode 0x%llx, %s format, num_extents %d, "
>> + "Max in-fork extents %d, broot size %d, fork offset %d",
>
> It would be nice to keep the
>
> "dev %d:%d ino 0x%llx"
>
> prefix as a convention so that all trace records are similar at their
> beginning.
Perhaps:
+ TP_printk("dev %d:%d inode 0x%llx (%s), %s format, num_extents %d, "
^
+--- symbolic entry->which
>> #undef TRACE_INCLUDE_PATH
>> diff --git a/fs/xfs/xfs_dfrag.c b/fs/xfs/xfs_dfrag.c
>> @@ -53,7 +53,7 @@ xfs_swapext(
>> xfs_swapext_t *sxp)
>> {
>> xfs_inode_t *ip, *tip;
>> - struct file *file, *target_file;
>> + struct file *file, *tmp_file;
>
> I think these xfs_swapext belong into a separate patch. While they
> make the code quite a bit more redable they're purely cleanups and
> can wait for 2.6.34. And while you're at it you might also want to
> merge xfs_swap_extents into xfs_swapext - there's no need for that
> split at all.
I agree. The change here is good (it was confusing and wrong before).
But Dave can you please re-submit this with only the critical changes
so I can get them to Linus in this release cycle? I think the trace
addition is probably fine, just get rid of the gratuitous variable
name change in xfs_swapext() and put that in a separate patch.
The rest looks all good. Thanks.
-Alex
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] xfs: xfs_swap_extents needs to handle dynamic fork offsets
2010-01-13 22:32 ` Alex Elder
@ 2010-01-13 23:03 ` Dave Chinner
2010-01-14 6:35 ` Christoph Hellwig
1 sibling, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2010-01-13 23:03 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
On Wed, Jan 13, 2010 at 04:32:12PM -0600, Alex Elder wrote:
> Christoph Hellwig wrote:
> >> + TP_printk("dev %d:%d %s inode 0x%llx, %s format, num_extents %d, "
> >> + "Max in-fork extents %d, broot size %d, fork offset %d",
> >
> > It would be nice to keep the
> >
> > "dev %d:%d ino 0x%llx"
> >
> > prefix as a convention so that all trace records are similar at their
> > beginning.
>
> Perhaps:
> + TP_printk("dev %d:%d inode 0x%llx (%s), %s format, num_extents %d, "
> ^
> +--- symbolic entry->which
>
> >> #undef TRACE_INCLUDE_PATH
> >> diff --git a/fs/xfs/xfs_dfrag.c b/fs/xfs/xfs_dfrag.c
> >> @@ -53,7 +53,7 @@ xfs_swapext(
> >> xfs_swapext_t *sxp)
> >> {
> >> xfs_inode_t *ip, *tip;
> >> - struct file *file, *target_file;
> >> + struct file *file, *tmp_file;
> >
> > I think these xfs_swapext belong into a separate patch. While they
> > make the code quite a bit more redable they're purely cleanups and
> > can wait for 2.6.34. And while you're at it you might also want to
> > merge xfs_swap_extents into xfs_swapext - there's no need for that
> > split at all.
>
> I agree. The change here is good (it was confusing and wrong before).
I think this is all a bit crazy - I had to make this change before I
could understand the code clearly enough to fix the bug. I had to go
all the way back to xfs_fsr to work out what it was providing the
kernel to determine WTF the kernel code was doing.
Clarifying the temp/target inodes is especially important because
the bug fix results in enforcing the target/temp inode ordering by
checking the temp inode has less extents than the target otherwise
it will reject the swap. If you can't tell which inode is which,
how can you tell the code is correct?
> But Dave can you please re-submit this with only the critical changes
> so I can get them to Linus in this release cycle? I think the trace
> addition is probably fine, just get rid of the gratuitous variable
> name change in xfs_swapext() and put that in a separate patch.
The tracing is there so if someone reports a problem with the new
code we can easily determine if the correct action was taken by
swap extents or whether there's some condition I haven't handled.
I'll split it out into three patches - the bug fix, the rename
and the tracing code so you can pick and chose which ones you want
to take first...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] xfs: xfs_swap_extents needs to handle dynamic fork offsets
2010-01-13 22:32 ` Alex Elder
2010-01-13 23:03 ` Dave Chinner
@ 2010-01-14 6:35 ` Christoph Hellwig
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2010-01-14 6:35 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
On Wed, Jan 13, 2010 at 04:32:12PM -0600, Alex Elder wrote:
> Christoph Hellwig wrote:
> >> + TP_printk("dev %d:%d %s inode 0x%llx, %s format, num_extents %d, "
> >> + "Max in-fork extents %d, broot size %d, fork offset %d",
> >
> > It would be nice to keep the
> >
> > "dev %d:%d ino 0x%llx"
> >
> > prefix as a convention so that all trace records are similar at their
> > beginning.
>
> Perhaps:
> + TP_printk("dev %d:%d inode 0x%llx (%s), %s format, num_extents %d, "
> ^
> +--- symbolic entry->which
Sounds fine, except s/inode/ino/ above to stay consistant.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-01-14 6:34 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-12 5:01 [PATCH] xfs: xfs_swap_extents needs to handle dynamic fork offsets Dave Chinner
2010-01-12 13:09 ` Christoph Hellwig
2010-01-13 22:32 ` Alex Elder
2010-01-13 23:03 ` Dave Chinner
2010-01-14 6:35 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox