From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, zlang@redhat.com
Subject: Re: [PATCH] [RFC] xfs: fix inode fork extent count overflow
Date: Mon, 16 Sep 2019 09:20:05 -0700 [thread overview]
Message-ID: <20190916162005.GX2229799@magnolia> (raw)
In-Reply-To: <20190911012107.26553-1-david@fromorbit.com>
On Wed, Sep 11, 2019 at 11:21:07AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> [commit message is verbose for discussion purposes - will trim it
> down later. Some questions about implementation details at the end.]
>
> Zorro Lang recently ran a new test to stress single inode extent
> counts now that they are no longer limited by memory allocation.
> The test was simply:
>
> # xfs_io -f -c "falloc 0 40t" /mnt/scratch/big-file
> # ~/src/xfstests-dev/punch-alternating /mnt/scratch/big-file
>
> This test uncovered a problem where the hole punching operation
> appeared to finish with no error, but apparently only created 268M
> extents instead of the 10 billion it was supposed to.
>
> Further, trying to punch out extents that should have been present
> resulted in success, but no change in the extent count. It looked
> like a silent failure.
>
> While running the test and observing the behaviour in real time,
> I observed the extent coutn growing at ~2M extents/minute, and saw
> this after about an hour:
>
> # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next ; \
> > sleep 60 ; \
> > xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
> fsxattr.nextents = 127657993
> fsxattr.nextents = 129683339
> #
>
> And a few minutes later this:
>
> # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
> fsxattr.nextents = 4177861124
> #
>
> Ah, what? Where did that 4 billion extra extents suddenly come from?
>
> Stop the workload, unmount, mount:
>
> # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
> fsxattr.nextents = 166044375
> #
>
> And it's back at the expected number. i.e. the extent count is
> correct on disk, but it's screwed up in memory. I loaded up the
> extent list, and immediately:
>
> # xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
> fsxattr.nextents = 4192576215
> #
>
> It's bad again. So, where does that number come from?
> xfs_fill_fsxattr():
>
> if (ip->i_df.if_flags & XFS_IFEXTENTS)
> fa->fsx_nextents = xfs_iext_count(&ip->i_df);
> else
> fa->fsx_nextents = ip->i_d.di_nextents;
>
> And that's the behaviour I just saw in a nutshell. The on disk count
> is correct, but once the tree is loaded into memory, it goes whacky.
> Clearly there's something wrong with xfs_iext_count():
>
> inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp)
> {
> return ifp->if_bytes / sizeof(struct xfs_iext_rec);
> }
>
> Simple enough, but 134M extents is 2**27, and that's right about
On the plus side, 2^27 is way better than the last time anyone tried to
create an egregious number of extents.
> where things went wrong. A struct xfs_iext_rec is 16 bytes in size,
> which means 2**27 * 2**4 = 2**31 and we're right on target for an
> integer overflow. And, sure enough:
>
> struct xfs_ifork {
> int if_bytes; /* bytes in if_u1 */
> ....
>
> Once we get 2**27 extents in a file, we overflow if_bytes and the
> in-core extent count goes wrong. And when we reach 2**28 extents,
> if_bytes wraps back to zero and things really start to go wrong
> there. This is where the silent failure comes from - only the first
> 2**28 extents can be looked up directly due to the overflow, all the
> extents above this index wrap back to somewhere in the first 2**28
> extents. Hence with a regular pattern, trying to punch a hole in the
> range that didn't have holes mapped to a hole in the first 2**28
> extents and so "succeeded" without changing anything. Hence "silent
> failure"...
>
> Fix this by converting if_bytes to a int64_t and converting all the
> index variables and size calculations to use int64_t types to avoid
> overflows in future. Signed integers are still used to enable easy
> detection of extent count underflows. This enables scalability of
> extent counts to the limits of the on-disk format - MAXEXTNUM
> (2**31) extents.
>
> Current testing is at over 500M extents and still going:
>
> fsxattr.nextents = 517310478
>
> Reported-by: Zorro Lang <zlang@redhat.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks reasonable to me; did Zorro retest w/ this patch?
If so,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> fs/xfs/libxfs/xfs_attr_leaf.c | 18 ++++++++++--------
> fs/xfs/libxfs/xfs_dir2_sf.c | 2 +-
> fs/xfs/libxfs/xfs_iext_tree.c | 2 +-
> fs/xfs/libxfs/xfs_inode_fork.c | 8 ++++----
> fs/xfs/libxfs/xfs_inode_fork.h | 14 ++++++++------
> 5 files changed, 24 insertions(+), 20 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> index b9f019603d0b..0bfb1ba919e2 100644
> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> @@ -453,13 +453,15 @@ xfs_attr_copy_value(
> * special case for dev/uuid inodes, they have fixed size data forks.
> */
> int
> -xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes)
> +xfs_attr_shortform_bytesfit(
> + struct xfs_inode *dp,
> + int bytes)
> {
> - int offset;
> - int minforkoff; /* lower limit on valid forkoff locations */
> - int maxforkoff; /* upper limit on valid forkoff locations */
> - int dsize;
> - xfs_mount_t *mp = dp->i_mount;
> + struct xfs_mount *mp = dp->i_mount;
> + int64_t dsize;
> + int minforkoff;
> + int maxforkoff;
> + int offset;
>
> /* rounded down */
> offset = (XFS_LITINO(mp, dp->i_d.di_version) - bytes) >> 3;
> @@ -525,7 +527,7 @@ xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes)
> * A data fork btree root must have space for at least
> * MINDBTPTRS key/ptr pairs if the data fork is small or empty.
> */
> - minforkoff = max(dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
> + minforkoff = max_t(int64_t, dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS));
> minforkoff = roundup(minforkoff, 8) >> 3;
>
> /* attr fork btree root can have at least this many key/ptr pairs */
> @@ -939,7 +941,7 @@ xfs_attr_shortform_verify(
> char *endp;
> struct xfs_ifork *ifp;
> int i;
> - int size;
> + int64_t size;
>
> ASSERT(ip->i_d.di_aformat == XFS_DINODE_FMT_LOCAL);
> ifp = XFS_IFORK_PTR(ip, XFS_ATTR_FORK);
> diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
> index 85f14fc2a8da..ae16ca7c422a 100644
> --- a/fs/xfs/libxfs/xfs_dir2_sf.c
> +++ b/fs/xfs/libxfs/xfs_dir2_sf.c
> @@ -628,7 +628,7 @@ xfs_dir2_sf_verify(
> int i;
> int i8count;
> int offset;
> - int size;
> + int64_t size;
> int error;
> uint8_t filetype;
>
> diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c
> index 7bc87408f1a0..52451809c478 100644
> --- a/fs/xfs/libxfs/xfs_iext_tree.c
> +++ b/fs/xfs/libxfs/xfs_iext_tree.c
> @@ -596,7 +596,7 @@ xfs_iext_realloc_root(
> struct xfs_ifork *ifp,
> struct xfs_iext_cursor *cur)
> {
> - size_t new_size = ifp->if_bytes + sizeof(struct xfs_iext_rec);
> + int64_t new_size = ifp->if_bytes + sizeof(struct xfs_iext_rec);
> void *new;
>
> /* account for the prev/next pointers */
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index c643beeb5a24..8fdd0424070e 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -129,7 +129,7 @@ xfs_init_local_fork(
> struct xfs_inode *ip,
> int whichfork,
> const void *data,
> - int size)
> + int64_t size)
> {
> struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
> int mem_size = size, real_size = 0;
> @@ -467,11 +467,11 @@ xfs_iroot_realloc(
> void
> xfs_idata_realloc(
> struct xfs_inode *ip,
> - int byte_diff,
> + int64_t byte_diff,
> int whichfork)
> {
> struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
> - int new_size = (int)ifp->if_bytes + byte_diff;
> + int64_t new_size = ifp->if_bytes + byte_diff;
>
> ASSERT(new_size >= 0);
> ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork));
> @@ -552,7 +552,7 @@ xfs_iextents_copy(
> struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork);
> struct xfs_iext_cursor icur;
> struct xfs_bmbt_irec rec;
> - int copied = 0;
> + int64_t copied = 0;
>
> ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL | XFS_ILOCK_SHARED));
> ASSERT(ifp->if_bytes > 0);
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
> index 00c62ce170d0..7b845c052fb4 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.h
> +++ b/fs/xfs/libxfs/xfs_inode_fork.h
> @@ -13,16 +13,16 @@ struct xfs_dinode;
> * File incore extent information, present for each of data & attr forks.
> */
> struct xfs_ifork {
> - int if_bytes; /* bytes in if_u1 */
> - unsigned int if_seq; /* fork mod counter */
> + int64_t if_bytes; /* bytes in if_u1 */
> struct xfs_btree_block *if_broot; /* file's incore btree root */
> - short if_broot_bytes; /* bytes allocated for root */
> - unsigned char if_flags; /* per-fork flags */
> + unsigned int if_seq; /* fork mod counter */
> int if_height; /* height of the extent tree */
> union {
> void *if_root; /* extent tree root */
> char *if_data; /* inline file data */
> } if_u1;
> + short if_broot_bytes; /* bytes allocated for root */
> + unsigned char if_flags; /* per-fork flags */
> };
>
> /*
> @@ -93,12 +93,14 @@ int xfs_iformat_fork(struct xfs_inode *, struct xfs_dinode *);
> void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
> struct xfs_inode_log_item *, int);
> void xfs_idestroy_fork(struct xfs_inode *, int);
> -void xfs_idata_realloc(struct xfs_inode *, int, int);
> +void xfs_idata_realloc(struct xfs_inode *ip, int64_t byte_diff,
> + int whichfork);
> void xfs_iroot_realloc(struct xfs_inode *, int, int);
> int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int);
> int xfs_iextents_copy(struct xfs_inode *, struct xfs_bmbt_rec *,
> int);
> -void xfs_init_local_fork(struct xfs_inode *, int, const void *, int);
> +void xfs_init_local_fork(struct xfs_inode *ip, int whichfork,
> + const void *data, int64_t size);
>
> xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp);
> void xfs_iext_insert(struct xfs_inode *, struct xfs_iext_cursor *cur,
> --
> 2.23.0.rc1
>
next prev parent reply other threads:[~2019-09-16 16:20 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-11 1:21 [PATCH] [RFC] xfs: fix inode fork extent count overflow Dave Chinner
2019-09-11 10:55 ` Christoph Hellwig
2019-09-12 1:08 ` Dave Chinner
2019-09-18 16:46 ` Christoph Hellwig
2019-09-16 16:20 ` Darrick J. Wong [this message]
2019-09-16 17:22 ` Zorro Lang
2019-09-16 22:00 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190916162005.GX2229799@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox