Re: [PATCH] Btrfs: fix file corruption after cloning inline extents

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Liu Bo <bo.li.liu@oracle.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org, Filipe Manana <fdmanana@suse.com>
Subject: Re: [PATCH] Btrfs: fix file corruption after cloning inline extents
Date: Thu, 2 Mar 2017 14:18:21 -0800	[thread overview]
Message-ID: <20170302221821.GA26698@lim.localdomain> (raw)
In-Reply-To: <1436888088-8631-1-git-send-email-fdmanana@kernel.org>

On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Using the clone ioctl (or extent_same ioctl, which calls the same extent
> cloning function as well) we end up allowing copy an inline extent from
> the source file into a non-zero offset of the destination file. This is
> something not expected and that the btrfs code is not prepared to deal
> with - all inline extents must be at a file offset equals to 0.
>

Somehow I failed to reproduce the BUG_ON with this case.

> For example, the following excerpt of a test case for fstests triggers
> a crash/BUG_ON() on a write operation after an inline extent is cloned
> into a non-zero offset:
> 
>   _scratch_mkfs >>$seqres.full 2>&1
>   _scratch_mount
> 
>   # Create our test files. File foo has the same 2K of data at offset 4K
>   # as file bar has at its offset 0.
>   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
>       -c "pwrite -S 0xbb 4k 2K" \
>       -c "pwrite -S 0xcc 8K 4K" \
>       $SCRATCH_MNT/foo | _filter_xfs_io
> 
>   # File bar consists of a single inline extent (2K size).
>   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
>      $SCRATCH_MNT/bar | _filter_xfs_io
> 
>   # Now call the clone ioctl to clone the extent of file bar into file
>   # foo at its offset 4K. This made file foo have an inline extent at
>   # offset 4K, something which the btrfs code can not deal with in future
>   # IO operations because all inline extents are supposed to start at an
>   # offset of 0, resulting in all sorts of chaos.
>   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
>   # what it returns for other cases dealing with inlined extents.
>   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
>       $SCRATCH_MNT/bar $SCRATCH_MNT/foo
> 
>   # Because of the inline extent at offset 4K, the following write made
>   # the kernel crash with a BUG_ON().
>   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
>

On 4.10, after allowing to clone an inline extent to dst file's offset greater
than zero, I followed the test case manually and got these

[root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
(257 0): ram 4096 disk 12648448 disk_size 4096
(257 4096): ram 2048 disk 0 disk_size 2048 -- inline
(257 8192): ram 4096 disk 12656640 disk_size 4096
file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20

[root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo 
wrote 2048/2048 bytes at offset 6144
2 KiB, 1 ops; 0.0000 sec (12.520 MiB/sec and 6410.2564 ops/sec)

[root@localhost trinity]# sync
[root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
(257 0): ram 4096 disk 12648448 disk_size 4096
(257 4096): ram 4096 disk 12582912 disk_size 4096
(257 8192): ram 4096 disk 12656640 disk_size 4096
file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00


Looks like we now are able to cope with these inline extents?


Thanks,

-liubo


>   status=0
>   exit
> 
> The stack trace of the BUG_ON() triggered by the last write is:
> 
>   [152154.035903] ------------[ cut here ]------------
>   [152154.036424] kernel BUG at mm/page-writeback.c:2286!
>   [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>   [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
>   [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
>   [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
>   [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
>   [152154.036424] RIP: 0010:[<ffffffff8111a9d5>]  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
>   [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
>   [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
>   [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
>   [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
>   [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
>   [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
>   [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
>   [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
>   [152154.036424] Stack:
>   [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
>   [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
>   [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
>   [152154.036424] Call Trace:
>   [152154.036424]  [<ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
>   [152154.036424]  [<ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
>   [152154.036424]  [<ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
>   [152154.036424]  [<ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
>   [152154.036424]  [<ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
>   [152154.036424]  [<ffffffff81165a4a>] __vfs_write+0x7c/0xa5
>   [152154.036424]  [<ffffffff81165f89>] vfs_write+0xa0/0xe4
>   [152154.036424]  [<ffffffff81166855>] SyS_pwrite64+0x64/0x82
>   [152154.036424]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
>   [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
>   [152154.036424] RIP  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
>   [152154.036424]  RSP <ffff880429effc68>
>   [152154.242621] ---[ end trace e3d3376b23a57041 ]---
> 
> Fix this by returning the error EOPNOTSUPP if an attempt to copy an
> inline extent into a non-zero offset happens, just like what is done for
> other scenarios that would require copying/splitting inline extents,
> which were introduced by the following commits:
> 
>    00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split")
>    3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>  fs/btrfs/ioctl.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index d389815..0770c91 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3588,6 +3588,20 @@ process_slot:
>  				u64 trim = 0;
>  				u64 aligned_end = 0;
>  
> +				/*
> +				 * Don't copy an inline extent into an offset
> +				 * greater than zero. Having an inline extent
> +				 * at such an offset results in chaos as btrfs
> +				 * isn't prepared for such cases. Just skip
> +				 * this case for the same reasons as commented
> +				 * at btrfs_ioctl_clone().
> +				 */
> +				if (last_dest_end > 0) {
> +					ret = -EOPNOTSUPP;
> +					btrfs_end_transaction(trans, root);
> +					goto out;
> +				}
> +
>  				if (off > key.offset) {
>  					skip = off - key.offset;
>  					new_key.offset += skip;
> -- 
> 2.1.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2017-03-02 22:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-14 15:34 [PATCH] Btrfs: fix file corruption after cloning inline extents fdmanana
2017-03-02 22:18 ` Liu Bo [this message]
2017-03-03  0:36   ` Liu Bo
2017-03-03 15:36     ` Filipe Manana
2017-03-03 18:48       ` Liu Bo
2017-03-06 23:31         ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170302221821.GA26698@lim.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=fdmanana@kernel.org \
    --cc=fdmanana@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).