From: Filipe Manana <fdmanana@suse.com>
To: Greg KH <gregkh@linuxfoundation.org>, fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org, stable@vger.kernel.org, dsterba@suse.cz
Subject: Re: [PATCH 5.10.x] btrfs: fix crash after non-aligned direct IO write with O_DSYNC
Date: Tue, 16 Feb 2021 14:52:17 +0000 [thread overview]
Message-ID: <6e801bfb-01e2-2119-cfa6-e0d710102c4e@suse.com> (raw)
In-Reply-To: <YCvbvJujcuiGcBSj@kroah.com>
On 16/02/21 14:50, Greg KH wrote:
> On Tue, Feb 16, 2021 at 02:40:31PM +0000, fdmanana@kernel.org wrote:
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> Whenever we attempt to do a non-aligned direct IO write with O_DSYNC, we
>> end up triggering an assertion and crashing. Example reproducer:
>>
>> $ cat test.sh
>> #!/bin/bash
>>
>> DEV=/dev/sdj
>> MNT=/mnt/sdj
>>
>> mkfs.btrfs -f $DEV > /dev/null
>> mount $DEV $MNT
>>
>> # Do a direct IO write with O_DSYNC into a non-aligned range...
>> xfs_io -f -d -s -c "pwrite -S 0xab -b 64K 1111 64K" $MNT/foobar
>>
>> umount $MNT
>>
>> When running the reproducer an assertion fails and produces the following
>> trace:
>>
>> [ 2418.403134] assertion failed: !current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA, in fs/btrfs/space-info.c:1467
>> [ 2418.403745] ------------[ cut here ]------------
>> [ 2418.404306] kernel BUG at fs/btrfs/ctree.h:3286!
>> [ 2418.404862] invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC PTI
>> [ 2418.405451] CPU: 1 PID: 64705 Comm: xfs_io Tainted: G D 5.10.15-btrfs-next-87 #1
>> [ 2418.406026] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>> [ 2418.407228] RIP: 0010:assertfail.constprop.0+0x18/0x26 [btrfs]
>> [ 2418.407835] Code: e6 48 c7 (...)
>> [ 2418.409078] RSP: 0018:ffffb06080d13c98 EFLAGS: 00010246
>> [ 2418.409696] RAX: 000000000000006c RBX: ffff994c1debbf08 RCX: 0000000000000000
>> [ 2418.410302] RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
>> [ 2418.410904] RBP: ffff994c21770000 R08: 0000000000000000 R09: 0000000000000000
>> [ 2418.411504] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000010000
>> [ 2418.412111] R13: ffff994c22198400 R14: ffff994c21770000 R15: 0000000000000000
>> [ 2418.412713] FS: 00007f54fd7aff00(0000) GS:ffff994d35200000(0000) knlGS:0000000000000000
>> [ 2418.413326] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2418.413933] CR2: 000056549596d000 CR3: 000000010b928003 CR4: 0000000000370ee0
>> [ 2418.414528] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 2418.415109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [ 2418.415669] Call Trace:
>> [ 2418.416254] btrfs_reserve_data_bytes.cold+0x22/0x22 [btrfs]
>> [ 2418.416812] btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
>> [ 2418.417380] btrfs_buffered_write+0x1b0/0x7f0 [btrfs]
>> [ 2418.418315] btrfs_file_write_iter+0x2a9/0x770 [btrfs]
>> [ 2418.418920] new_sync_write+0x11f/0x1c0
>> [ 2418.419430] vfs_write+0x2bb/0x3b0
>> [ 2418.419972] __x64_sys_pwrite64+0x90/0xc0
>> [ 2418.420486] do_syscall_64+0x33/0x80
>> [ 2418.420979] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [ 2418.421486] RIP: 0033:0x7f54fda0b986
>> [ 2418.421981] Code: 48 c7 c0 (...)
>> [ 2418.423019] RSP: 002b:00007ffc40569c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
>> [ 2418.423547] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54fda0b986
>> [ 2418.424075] RDX: 0000000000010000 RSI: 000056549595e000 RDI: 0000000000000003
>> [ 2418.424596] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000400
>> [ 2418.425119] R10: 0000000000000400 R11: 0000000000000246 R12: 00000000ffffffff
>> [ 2418.425644] R13: 0000000000000400 R14: 0000000000010000 R15: 0000000000000000
>> [ 2418.426148] Modules linked in: btrfs blake2b_generic (...)
>> [ 2418.429540] ---[ end trace ef2aeb44dc0afa34 ]---
>>
>> 1) At btrfs_file_write_iter() we set current->journal_info to
>> BTRFS_DIO_SYNC_STUB;
>>
>> 2) We then call __btrfs_direct_write(), which calls btrfs_direct_IO();
>>
>> 3) We can't do the direct IO write because it starts at a non-aligned
>> offset (1111). So at btrfs_direct_IO() we return -EINVAL (coming from
>> check_direct_IO() which does the alignment check), but we leave
>> current->journal_info set to BTRFS_DIO_SYNC_STUB - we only clear it
>> at btrfs_dio_iomap_begin(), because we assume we always get there;
>>
>> 4) Then at __btrfs_direct_write() we see that the attempt to do the
>> direct IO write was not successful, 0 bytes written, so we fallback
>> to a buffered write by calling btrfs_buffered_write();
>>
>> 5) There we call btrfs_check_data_free_space() which in turn calls
>> btrfs_alloc_data_chunk_ondemand() and that calls
>> btrfs_reserve_data_bytes() with flush == BTRFS_RESERVE_FLUSH_DATA;
>>
>> 6) Then at btrfs_reserve_data_bytes() we have current->journal_info set to
>> BTRFS_DIO_SYNC_STUB, therefore not NULL, and flush has the value
>> BTRFS_RESERVE_FLUSH_DATA, triggering the second assertion:
>>
>> int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
>> enum btrfs_reserve_flush_enum flush)
>> {
>> struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
>> int ret;
>>
>> ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA ||
>> flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE);
>> ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
>> (...)
>>
>> So fix that by setting the journal to NULL whenever check_direct_IO()
>> returns a failure.
>>
>> This bug only affects 5.10 kernels, and the regression was introduced in
>> 5.10-rc1 by commit 0eb79294dbe328 ("btrfs: dio iomap DSYNC workaround").
>> The bug does not exist in 5.11 kernels due to commit ecfdc08b8cc65d
>> ("btrfs: remove dio iomap DSYNC workaround"), which depends on a large
>> patchset that went into the merge window for 5.11. So this is a fix only
>> for 5.10.x stable kernels, as there are people hitting this bug.
>>
>> Fixes: 0eb79294dbe328 ("btrfs: dio iomap DSYNC workaround")
>> CC: stable@vger.kernel.org # 5.10 (and only 5.10)
>> CC: David Sterba <dsterba@suse.cz>
>> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1181605
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> ---
>> fs/btrfs/inode.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> As this is a one-off patch, I need the btrfs maintainers to ack this and
> really justify why we can't take the larger patch or patch series here
> instead, as that is almost always the correct thing to do instead.
David will likely reply soon, but from another thread:
https://lore.kernel.org/linux-btrfs/20210216142324.GO1993@twin.jikos.cz/
Thanks.
>
> thanks,
>
> greg k-h
>
next prev parent reply other threads:[~2021-02-16 14:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-16 14:40 [PATCH 5.10.x] btrfs: fix crash after non-aligned direct IO write with O_DSYNC fdmanana
2021-02-16 14:50 ` Greg KH
2021-02-16 14:52 ` Filipe Manana [this message]
2021-02-16 15:15 ` David Sterba
2021-02-16 15:34 ` Greg KH
2021-02-16 17:52 ` David Sterba
2021-02-22 11:07 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e801bfb-01e2-2119-cfa6-e0d710102c4e@suse.com \
--to=fdmanana@suse.com \
--cc=dsterba@suse.cz \
--cc=fdmanana@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox