From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
bfoster@redhat.com, sandeen@redhat.com, dchinner@redhat.com,
linux-xfs@vger.kernel.org
Subject: Re: xfs: Uninitialized memory read at xlog_write
Date: Mon, 18 Sep 2017 11:21:21 -0700 [thread overview]
Message-ID: <20170918182121.GI6540@magnolia> (raw)
In-Reply-To: <20170914224431.GA10621@dastard>
On Fri, Sep 15, 2017 at 08:44:31AM +1000, Dave Chinner wrote:
> On Thu, Sep 14, 2017 at 07:15:38PM +0900, Tetsuo Handa wrote:
> > Dave Chinner wrote:
> > > On Wed, Sep 13, 2017 at 06:59:38PM +0900, Tetsuo Handa wrote:
> > > > Dave Chinner wrote:
> > > > > On Wed, Sep 13, 2017 at 04:14:37PM +0900, Tetsuo Handa wrote:
> > > > > > [ OK ] Stopped target Switch Root.
> > > > > >
> > > > > > [ OK ] Stopped target Initrd File Systems.[ 1054.691505] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff880135396660)
> > > > > > [ 1054.691506] 000000000000000093050a200000000000000000000000000000000000000000
> > > > > > [ 1054.691511] u u u u u u u u i i i i i i i i u u u u u u u u u u u u u u u u
> > > > > > [ 1054.691515] ^
> > > > > > [ 1054.691519] RIP: 0010:xlog_write+0x344/0x6b0
> > > > >
> > > > > What line of code does this correspond to?
> > > > >
> > > >
> > > > /*
> > > > * Copy region.
> > > > *
> > > > * Unmount records just log an opheader, so can have
> > > > * empty payloads with no data region to copy. Hence we
> > > > * only copy the payload if the vector says it has data
> > > > * to copy.
> > > > */
> > > > ASSERT(copy_len >= 0);
> > > > if (copy_len > 0) {
> > > > memcpy(ptr, reg->i_addr + copy_off, copy_len); // <= xlog_write+0x344/0x6b0
> > > > xlog_write_adv_cnt(&ptr, &len, &log_offset,
> > > > copy_len);
> > > > }
> > > >
> > >
> > > Ok, that's what I suspected. The region being copied is set up
> > > in xlog_cil_insert_format_items(), so problem is in one of the
> > > ->iop_format methods it calls to format the dirty metadata into the
> > > region.
> > >
> > > And given that the address is ...6660, it's likely the offset into
> > > the structure being copied is 96 bytes.
> > >
> > > $ pahole...
> > > .....
> > > struct xfs_log_dinode {
> > > .....
> > > xfs_agino_t di_next_unlinked; /* 96 4 */
> > > .....
> > >
> > > Try the patch below.
> >
> > That patch did not help.
> >
> > I checked values passed to memcpy() using below patch.
>
> ok....
>
> >
> > ----------
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index c5107c7..f91c4c7 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -2476,6 +2476,8 @@
> > */
> > ASSERT(copy_len >= 0);
> > if (copy_len > 0) {
> > + printk(KERN_INFO "ptr=%p reg->i_addr=%p copy_off=%u copy_len=%u\n",
> > + ptr, reg->i_addr, copy_off, copy_len);
>
> You need to print out the reg->i_type here. Then we'll know exactly
> where it came from.
>
> > memcpy(ptr, reg->i_addr + copy_off, copy_len);
> > xlog_write_adv_cnt(&ptr, &len, &log_offset,
> > copy_len);
> > ----------
> >
> > The copy_len was not multiple of sizeof(struct xfs_log_dinode).
> > Thus, I guess we can't assume this is "struct xfs_log_dinode".
>
> It still could be - we don't always log the entire structure.
>
> static inline uint xfs_log_dinode_size(int version)
> {
> if (version == 3)
> return sizeof(struct xfs_log_dinode);
> return offsetof(struct xfs_log_dinode, di_next_unlinked);
> }
>
> i.e. If it's v2 inode, the logged region is 96 bytes in length, not
> 176.
>
> >
> > ----------
> > Starting Load/Save Random Seed...
> >
> > Starting Configure read-only root support...
> >
> > [ 1106.927991] ptr=ffffc90001c08218 reg->i_addr=ffff880134c7fda8 copy_off=0 copy_len=16
> > [ 1106.928022] ptr=ffffc90001c08234 reg->i_addr=ffff88013395f858 copy_off=0 copy_len=56
> > [ 1106.928100] ptr=ffffc90001c08278 reg->i_addr=ffff88013395f890 copy_off=0 copy_len=96
> > [ 1106.932354] WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88013395f860)
>
> Hold on- that warning has come from the /prior/ region copy, not
> the current region!
>
> i.e. 2nd last copy region was ffff88013395f858 for 0x38 bytes, and
> this spans address that failed ffff88013395f860. i.e. it's 8 bytes
> into that region. It's 48 bytes before the region that we were
> copying when kmemcheck triggered!
>
> I'm going to guess that we've got a log op header, followed by
> a inode format header, follow by something like a v2 inode core.
> i.e. log op headers are 16 bytes, xfs_inode_log_format are 56 bytes,
> and a v2 inode core is 96 bytes.
>
> So, the inode log format header:
>
> struct xfs_inode_log_format {
> uint16_t ilf_type; /* 0 2 */
> uint16_t ilf_size; /* 2 2 */
> uint32_t ilf_fields; /* 4 4 */
> uint16_t ilf_asize; /* 8 2 */
> uint16_t ilf_dsize; /* 10 2 */
>
> /* XXX 4 bytes hole, try to pack */
>
> uint64_t ilf_ino; /* 16 8 */
> union {
> uint32_t ilfu_rdev; /* 4 */
> uuid_t ilfu_uuid; /* 16 */
> } ilf_u; /* 24 16 */
> int64_t ilf_blkno; /* 40 8 */
> int32_t ilf_len; /* 48 4 */
> int32_t ilf_boffset; /* 52 4 */
>
> /* size: 56, cachelines: 1, members: 10 */
> /* sum members: 52, holes: 1, sum holes: 4 */
> /* last cacheline: 56 bytes */
> };
>
> Oh, that's right. Someone, a long, long time ago, screwed up the on
> disk inode log structure but nobody noticed it due to the Irix MIPS
> compilers padding the structure identically on 32 and 64 bit
> systems. Port to linux, and i386/x86-64 padded it differently. The
> fix at the time was to make it right in recovery and continue to
> use the native structure at runtime. Ok, let's fix that properly
> now.
>
> Try the patch below.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> xfs: Don't log uninitialised fields in inode structures
>
> From: Dave Chinner <dchinner@redhat.com>
>
> Prevent kmemcheck from throwing warnings about reading uninitialised
> memory when formatting inodes into the incore log buffer. There are
> several issues here - we don't always log all the fields in the
> inode log format item, and we never log the inode the
> di_next_unlinked field.
>
> In the case of the inode log format item, this is aused b cerbated
"is aused b cerbated" ?
> by the old xfs_inode_log_format structure padding issue. Hence make
> the padded, 64 bit aligned version of the structure the one we always
> use for formatting the log and get rid of the 64 bit variant. This
> means we'll always log the 64-bit version and so recovery only needs
> to convert from the unpadded 32 bit version from older 32 bit
> kernels.
And those old 32-bit kernels can read the xfs_inode_log_format{,_64}
structures, right?
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/libxfs/xfs_log_format.h | 27 +++++----------
> fs/xfs/xfs_inode_item.c | 79 ++++++++++++++++++++++--------------------
> fs/xfs/xfs_ondisk.h | 2 +-
> 3 files changed, 50 insertions(+), 58 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
> index 8372e9bcd7b6..71de185735e0 100644
> --- a/fs/xfs/libxfs/xfs_log_format.h
> +++ b/fs/xfs/libxfs/xfs_log_format.h
> @@ -270,6 +270,7 @@ typedef struct xfs_inode_log_format {
> uint32_t ilf_fields; /* flags for fields logged */
> uint16_t ilf_asize; /* size of attr d/ext/root */
> uint16_t ilf_dsize; /* size of data/ext/root */
> + uint32_t ilf_pad; /* pad for 64 bit boundary */
> uint64_t ilf_ino; /* inode number */
> union {
> uint32_t ilfu_rdev; /* rdev value for dev inode*/
> @@ -280,29 +281,17 @@ typedef struct xfs_inode_log_format {
> int32_t ilf_boffset; /* off of inode in buffer */
> } xfs_inode_log_format_t;
>
> -typedef struct xfs_inode_log_format_32 {
> - uint16_t ilf_type; /* inode log item type */
> - uint16_t ilf_size; /* size of this item */
> - uint32_t ilf_fields; /* flags for fields logged */
> - uint16_t ilf_asize; /* size of attr d/ext/root */
> - uint16_t ilf_dsize; /* size of data/ext/root */
> - uint64_t ilf_ino; /* inode number */
> - union {
> - uint32_t ilfu_rdev; /* rdev value for dev inode*/
> - uuid_t ilfu_uuid; /* mount point value */
> - } ilf_u;
> - int64_t ilf_blkno; /* blkno of inode buffer */
> - int32_t ilf_len; /* len of inode buffer */
> - int32_t ilf_boffset; /* off of inode in buffer */
> -} __attribute__((packed)) xfs_inode_log_format_32_t;
> -
> -typedef struct xfs_inode_log_format_64 {
> +/*
> + * Old 32 bit systems will log in this format without the 64 bit
> + * alignment padding. Recovery will detect this and convert it to the
> + * correct format.
> + */
> +struct xfs_inode_log_format_32 {
> uint16_t ilf_type; /* inode log item type */
> uint16_t ilf_size; /* size of this item */
> uint32_t ilf_fields; /* flags for fields logged */
> uint16_t ilf_asize; /* size of attr d/ext/root */
> uint16_t ilf_dsize; /* size of data/ext/root */
> - uint32_t ilf_pad; /* pad for 64 bit boundary */
> uint64_t ilf_ino; /* inode number */
> union {
> uint32_t ilfu_rdev; /* rdev value for dev inode*/
> @@ -311,7 +300,7 @@ typedef struct xfs_inode_log_format_64 {
> int64_t ilf_blkno; /* blkno of inode buffer */
> int32_t ilf_len; /* len of inode buffer */
> int32_t ilf_boffset; /* off of inode in buffer */
> -} xfs_inode_log_format_64_t;
> +} __attribute__((packed));
>
>
> /*
> diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
> index 6d0f74ec31e8..aec9cf36b5b7 100644
> --- a/fs/xfs/xfs_inode_item.c
> +++ b/fs/xfs/xfs_inode_item.c
> @@ -364,6 +364,9 @@ xfs_inode_to_log_dinode(
> to->di_dmstate = from->di_dmstate;
> to->di_flags = from->di_flags;
>
> + /* log a dummy value to ensure log structure is fully initialised */
> + to->di_next_unlinked = NULLAGINO;
> +
> if (from->di_version == 3) {
> to->di_changecount = inode->i_version;
> to->di_crtime.t_sec = from->di_crtime.t_sec;
> @@ -404,6 +407,11 @@ xfs_inode_item_format_core(
> * the second with the on-disk inode structure, and a possible third and/or
> * fourth with the inode data/extents/b-tree root and inode attributes
> * data/extents/b-tree root.
> + *
> + * Note: Always use the 64 bit inode log format structure so we don't
> + * leave an uninitialised hole in the format item on 64 bit systems. Log
> + * recovery on 32 bit systems handles this just fine, so there's no reason
> + * for not using an initialising the properly padded structure all the time.
> */
> STATIC void
> xfs_inode_item_format(
> @@ -412,8 +420,8 @@ xfs_inode_item_format(
> {
> struct xfs_inode_log_item *iip = INODE_ITEM(lip);
> struct xfs_inode *ip = iip->ili_inode;
> - struct xfs_inode_log_format *ilf;
> struct xfs_log_iovec *vecp = NULL;
> + struct xfs_inode_log_format *ilf;
>
> ASSERT(ip->i_d.di_version > 1);
>
> @@ -425,7 +433,17 @@ xfs_inode_item_format(
> ilf->ilf_boffset = ip->i_imap.im_boffset;
> ilf->ilf_fields = XFS_ILOG_CORE;
> ilf->ilf_size = 2; /* format + core */
> - xlog_finish_iovec(lv, vecp, sizeof(struct xfs_inode_log_format));
> +
> + /*
> + * make sure we don't leak uninitialised data into the log in the case
> + * when we don't log every field in the inode.
> + */
> + ilf->ilf_dsize = 0;
> + ilf->ilf_asize = 0;
> + ilf->ilf_pad = 0;
> + uuid_copy(&ilf->ilf_u.ilfu_uuid, &uuid_null);
> +
> + xlog_finish_iovec(lv, vecp, sizeof(*ilf));
>
> xfs_inode_item_format_core(ip, lv, &vecp);
> xfs_inode_item_format_data_fork(iip, ilf, lv, &vecp);
> @@ -855,44 +873,29 @@ xfs_istale_done(
> }
>
> /*
> - * convert an xfs_inode_log_format struct from either 32 or 64 bit versions
> - * (which can have different field alignments) to the native version
> + * convert an xfs_inode_log_format struct from the old 32 bit version
> + * (which can have different field alignments) to the native 64 bit version
> */
> int
> xfs_inode_item_format_convert(
> - xfs_log_iovec_t *buf,
> - xfs_inode_log_format_t *in_f)
> + struct xfs_log_iovec *buf,
> + struct xfs_inode_log_format *in_f)
> {
> - if (buf->i_len == sizeof(xfs_inode_log_format_32_t)) {
> - xfs_inode_log_format_32_t *in_f32 = buf->i_addr;
> -
> - in_f->ilf_type = in_f32->ilf_type;
> - in_f->ilf_size = in_f32->ilf_size;
> - in_f->ilf_fields = in_f32->ilf_fields;
> - in_f->ilf_asize = in_f32->ilf_asize;
> - in_f->ilf_dsize = in_f32->ilf_dsize;
> - in_f->ilf_ino = in_f32->ilf_ino;
> - /* copy biggest field of ilf_u */
> - uuid_copy(&in_f->ilf_u.ilfu_uuid, &in_f32->ilf_u.ilfu_uuid);
> - in_f->ilf_blkno = in_f32->ilf_blkno;
> - in_f->ilf_len = in_f32->ilf_len;
> - in_f->ilf_boffset = in_f32->ilf_boffset;
> - return 0;
> - } else if (buf->i_len == sizeof(xfs_inode_log_format_64_t)){
> - xfs_inode_log_format_64_t *in_f64 = buf->i_addr;
> -
> - in_f->ilf_type = in_f64->ilf_type;
> - in_f->ilf_size = in_f64->ilf_size;
> - in_f->ilf_fields = in_f64->ilf_fields;
> - in_f->ilf_asize = in_f64->ilf_asize;
> - in_f->ilf_dsize = in_f64->ilf_dsize;
> - in_f->ilf_ino = in_f64->ilf_ino;
> - /* copy biggest field of ilf_u */
> - uuid_copy(&in_f->ilf_u.ilfu_uuid, &in_f64->ilf_u.ilfu_uuid);
> - in_f->ilf_blkno = in_f64->ilf_blkno;
> - in_f->ilf_len = in_f64->ilf_len;
> - in_f->ilf_boffset = in_f64->ilf_boffset;
> - return 0;
> - }
> - return -EFSCORRUPTED;
> + struct xfs_inode_log_format_32 *in_f32 = buf->i_addr;
> +
> + if (buf->i_len != sizeof(*in_f32))
> + return -EFSCORRUPTED;
> +
> + in_f->ilf_type = in_f32->ilf_type;
> + in_f->ilf_size = in_f32->ilf_size;
> + in_f->ilf_fields = in_f32->ilf_fields;
> + in_f->ilf_asize = in_f32->ilf_asize;
> + in_f->ilf_dsize = in_f32->ilf_dsize;
> + in_f->ilf_ino = in_f32->ilf_ino;
> + /* copy biggest field of ilf_u */
> + uuid_copy(&in_f->ilf_u.ilfu_uuid, &in_f32->ilf_u.ilfu_uuid);
> + in_f->ilf_blkno = in_f32->ilf_blkno;
> + in_f->ilf_len = in_f32->ilf_len;
> + in_f->ilf_boffset = in_f32->ilf_boffset;
> + return 0;
> }
> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> index 0c381d71b242..0492436a053f 100644
> --- a/fs/xfs/xfs_ondisk.h
> +++ b/fs/xfs/xfs_ondisk.h
> @@ -134,7 +134,7 @@ xfs_check_ondisk_structs(void)
> XFS_CHECK_STRUCT_SIZE(struct xfs_icreate_log, 28);
> XFS_CHECK_STRUCT_SIZE(struct xfs_ictimestamp, 8);
> XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format_32, 52);
> - XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format_64, 56);
> + XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format, 56);
Will require a change to xfs/122 as well...
--D
> XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat, 20);
> XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header, 16);
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-09-18 18:21 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-04 12:19 xfs: Uninitialized memory read at xlog_write Tetsuo Handa
2017-09-04 17:34 ` Darrick J. Wong
2017-09-04 21:44 ` Tetsuo Handa
2017-09-11 15:01 ` Brian Foster
2017-09-13 7:14 ` Tetsuo Handa
2017-09-13 9:43 ` Dave Chinner
2017-09-13 9:59 ` Tetsuo Handa
2017-09-13 21:40 ` Dave Chinner
2017-09-14 10:15 ` Tetsuo Handa
2017-09-14 22:44 ` Dave Chinner
2017-09-15 11:19 ` Tetsuo Handa
2017-09-18 18:21 ` Darrick J. Wong [this message]
2017-09-20 0:46 ` Dave Chinner
2017-09-20 0:49 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170918182121.GI6540@magnolia \
--to=darrick.wong@oracle.com \
--cc=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=linux-xfs@vger.kernel.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).