public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Cc: hch@lst.de, linux-xfs@vger.kernel.org, sandeen@sandeen.net,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] iomap_dio_rw: Prevent reading file data beyond iomap_dio->i_size
Date: Fri, 7 Apr 2017 09:25:28 -0700	[thread overview]
Message-ID: <20170407162528.GG4864@birch.djwong.org> (raw)
In-Reply-To: <1491557563-3278-1-git-send-email-chandan@linux.vnet.ibm.com>

On Fri, Apr 07, 2017 at 03:02:43PM +0530, Chandan Rajendra wrote:
> On a ppc64 machine executing overlayfs/019 with xfs as the lower and
> upper filesystem causes the following call trace,
> 
> WARNING: CPU: 2 PID: 8034 at /root/repos/linux/fs/iomap.c:765 .iomap_dio_actor+0xcc/0x420
> Modules linked in:
> CPU: 2 PID: 8034 Comm: fsstress Tainted: G             L  4.11.0-rc5-next-20170405 #100
> task: c000000631314880 task.stack: c0000003915d4000
> NIP: c00000000035a72c LR: c00000000035a6f4 CTR: c00000000035a660
> REGS: c0000003915d7570 TRAP: 0700   Tainted: G             L   (4.11.0-rc5-next-20170405)
> MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>
>   CR: 24004284  XER: 00000000
> CFAR: c0000000006f7190 SOFTE: 1
> GPR00: c00000000035a6f4 c0000003915d77f0 c0000000015a3f00 000000007c22f600
> GPR04: 000000000022d000 0000000000002600 c0000003b2d56360 c0000003915d7960
> GPR08: c0000003915d7cd0 0000000000000002 0000000000002600 c000000000521cc0
> GPR12: 0000000024004284 c00000000fd80a00 000000004b04ae64 ffffffffffffffff
> GPR16: 000000001000ca70 0000000000000000 c0000003b2d56380 c00000000153d2b8
> GPR20: 0000000000000010 c0000003bc87bac8 0000000000223000 000000000022f5ff
> GPR24: c0000003b2d56360 000000000000000c 0000000000002600 000000000022d000
> GPR28: 0000000000000000 c0000003915d7960 c0000003b2d56360 00000000000001ff
> NIP [c00000000035a72c] .iomap_dio_actor+0xcc/0x420
> LR [c00000000035a6f4] .iomap_dio_actor+0x94/0x420
> Call Trace:
> [c0000003915d77f0] [c00000000035a6f4] .iomap_dio_actor+0x94/0x420 (unreliable)
> [c0000003915d78f0] [c00000000035b9f4] .iomap_apply+0xf4/0x1f0
> [c0000003915d79d0] [c00000000035c320] .iomap_dio_rw+0x230/0x420
> [c0000003915d7ae0] [c000000000512a14] .xfs_file_dio_aio_read+0x84/0x160
> [c0000003915d7b80] [c000000000512d24] .xfs_file_read_iter+0x104/0x130
> [c0000003915d7c10] [c0000000002d6234] .__vfs_read+0x114/0x1a0
> [c0000003915d7cf0] [c0000000002d7a8c] .vfs_read+0xac/0x1a0
> [c0000003915d7d90] [c0000000002d96b8] .SyS_read+0x58/0x100
> [c0000003915d7e30] [c00000000000b8e0] system_call+0x38/0xfc
> Instruction dump:
> 78630020 7f831b78 7ffc07b4 7c7ce039 40820360 a13d0018 2f890003 419e0288
> 2f890004 419e00a0 2f890001 419e02a8 <0fe00000> 3b80fffb 38210100 7f83e378
> 
> The above problem can also be recreated on a regular xfs filesystem
> using the command,
> 
> $ fsstress -d /mnt -l 1000 -n 1000 -p 1000
> 
> The reason for the call trace is,
> 1. When 'reserving' blocks for delayed allocation , XFS reserves more
>    blocks (i.e. past file's current EOF) than required. This is done
>    because XFS assumes that userspace might write more data and hence
>    'reserving' more blocks might lead to the file's new data being
>    stored contiguously on disk.
> 2. The in-memory 'struct xfs_bmbt_irec' mapping the file's last extent would
>    then cover the prealloc-ed EOF blocks in addition to the regular blocks.
> 3. When flushing the dirty blocks to disk, we only flush data till the
>    file's EOF. But before writing out the dirty data, we allocate blocks
>    on the disk for holding the file's new data. This allocation includes
>    the blocks that are part of the 'prealloc EOF blocks'.
> 4. Later, when the last reference to the inode is being closed, XFS frees the
>    unused 'prealloc EOF blocks' in xfs_inactive().
> 
> In step 3 above, When allocating space on disk for the delayed allocation
> range, the space allocator might sometimes allocate less blocks than
> required. If such an allocation ends right at the current EOF of the
> file, We will not be able to clear the "delayed allocation" flag for the
> 'prealloc EOF blocks', since we won't have dirty buffer heads associated
> with that range of the file.
> 
> In such a situation if a Direct I/O read operation is performed on file
> range [X, Y] (where X < EOF and Y > EOF), we flush dirty data in the
> range [X, Y] and invalidate page cache for that range (Refer to
> iomap_dio_rw()). Later for performing the Direct I/O read, XFS obtains
> the extent items (which are still cached in memory) for the file
> range. When doing so we are not supposed to get an extent item with
> IOMAP_DELALLOC flag set, since the previous "flush" operation should
> have converted any delayed allocation data in the range [X, Y]. Hence we
> end up hitting a WARN_ON_ONCE(1) statement in iomap_dio_actor().
> 
> This commit fixes the bug by preventing the read operation from going
> beyond iomap_dio->i_size.
> 
> Reported-by: Santhosh G <santhog4@linux.vnet.ibm.com>
> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks ok to me, but I'll wait for the test before making further decisions.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
> Changelog:
> RFC->V1: Made trivial changes to address Christoph's review comments.
> 
> I will send a patch to add a corresponding test to xfstests soon.
> 
>  fs/iomap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 0b457ff..1ff0c9c 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -904,6 +904,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  			break;
>  		}
>  		pos += ret;
> +
> +		if (iov_iter_rw(iter) == READ && pos >= dio->i_size)
> +			break;
>  	} while ((count = iov_iter_count(iter)) > 0);
>  	blk_finish_plug(&plug);
>  
> -- 
> 2.5.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-04-07 16:25 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-07  9:32 [PATCH] iomap_dio_rw: Prevent reading file data beyond iomap_dio->i_size Chandan Rajendra
2017-04-07 16:25 ` Darrick J. Wong [this message]
2017-04-09 15:09   ` Chandan Rajendra
2017-04-11 14:37     ` Chandan Rajendra
2017-04-11 17:43       ` Darrick J. Wong
2017-04-12 16:28         ` Chandan Rajendra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170407162528.GG4864@birch.djwong.org \
    --to=darrick.wong@oracle.com \
    --cc=chandan@linux.vnet.ibm.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox