From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 236D4C433DB for ; Tue, 16 Feb 2021 14:53:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D862764E0F for ; Tue, 16 Feb 2021 14:53:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229946AbhBPOxG (ORCPT ); Tue, 16 Feb 2021 09:53:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:49458 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229936AbhBPOxF (ORCPT ); Tue, 16 Feb 2021 09:53:05 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1613487138; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=58q7hE6MmQicNPupya+yGJBpiGUsOI1RPL++E1Uf840=; b=ABuU1ObXip3nf6nytjSQOU5FVD3zC0UDeivkkhtk4WNm+rkoA238VWUKg2Lyv1/q5QUPTO BSbBvas6Qv+IuDJYLk6F1NUrI7h7e7ZeHg38SLkXo0I7gJv16dmpVRwW0wCm+gTMr7rqaM wyYtPaoiuls+HbuwI+7nmsIl5HX7sFA= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 753EFAC69; Tue, 16 Feb 2021 14:52:18 +0000 (UTC) Subject: Re: [PATCH 5.10.x] btrfs: fix crash after non-aligned direct IO write with O_DSYNC To: Greg KH , fdmanana@kernel.org Cc: linux-btrfs@vger.kernel.org, stable@vger.kernel.org, dsterba@suse.cz References: <94663c8a2172dc96b760d356a538d45c36f46040.1613062764.git.fdmanana@suse.com> From: Filipe Manana Message-ID: <6e801bfb-01e2-2119-cfa6-e0d710102c4e@suse.com> Date: Tue, 16 Feb 2021 14:52:17 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org On 16/02/21 14:50, Greg KH wrote: > On Tue, Feb 16, 2021 at 02:40:31PM +0000, fdmanana@kernel.org wrote: >> From: Filipe Manana >> >> Whenever we attempt to do a non-aligned direct IO write with O_DSYNC, we >> end up triggering an assertion and crashing. Example reproducer: >> >> $ cat test.sh >> #!/bin/bash >> >> DEV=/dev/sdj >> MNT=/mnt/sdj >> >> mkfs.btrfs -f $DEV > /dev/null >> mount $DEV $MNT >> >> # Do a direct IO write with O_DSYNC into a non-aligned range... >> xfs_io -f -d -s -c "pwrite -S 0xab -b 64K 1111 64K" $MNT/foobar >> >> umount $MNT >> >> When running the reproducer an assertion fails and produces the following >> trace: >> >> [ 2418.403134] assertion failed: !current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA, in fs/btrfs/space-info.c:1467 >> [ 2418.403745] ------------[ cut here ]------------ >> [ 2418.404306] kernel BUG at fs/btrfs/ctree.h:3286! >> [ 2418.404862] invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC PTI >> [ 2418.405451] CPU: 1 PID: 64705 Comm: xfs_io Tainted: G D 5.10.15-btrfs-next-87 #1 >> [ 2418.406026] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >> [ 2418.407228] RIP: 0010:assertfail.constprop.0+0x18/0x26 [btrfs] >> [ 2418.407835] Code: e6 48 c7 (...) >> [ 2418.409078] RSP: 0018:ffffb06080d13c98 EFLAGS: 00010246 >> [ 2418.409696] RAX: 000000000000006c RBX: ffff994c1debbf08 RCX: 0000000000000000 >> [ 2418.410302] RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff >> [ 2418.410904] RBP: ffff994c21770000 R08: 0000000000000000 R09: 0000000000000000 >> [ 2418.411504] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000010000 >> [ 2418.412111] R13: ffff994c22198400 R14: ffff994c21770000 R15: 0000000000000000 >> [ 2418.412713] FS: 00007f54fd7aff00(0000) GS:ffff994d35200000(0000) knlGS:0000000000000000 >> [ 2418.413326] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 2418.413933] CR2: 000056549596d000 CR3: 000000010b928003 CR4: 0000000000370ee0 >> [ 2418.414528] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 2418.415109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 2418.415669] Call Trace: >> [ 2418.416254] btrfs_reserve_data_bytes.cold+0x22/0x22 [btrfs] >> [ 2418.416812] btrfs_check_data_free_space+0x4c/0xa0 [btrfs] >> [ 2418.417380] btrfs_buffered_write+0x1b0/0x7f0 [btrfs] >> [ 2418.418315] btrfs_file_write_iter+0x2a9/0x770 [btrfs] >> [ 2418.418920] new_sync_write+0x11f/0x1c0 >> [ 2418.419430] vfs_write+0x2bb/0x3b0 >> [ 2418.419972] __x64_sys_pwrite64+0x90/0xc0 >> [ 2418.420486] do_syscall_64+0x33/0x80 >> [ 2418.420979] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [ 2418.421486] RIP: 0033:0x7f54fda0b986 >> [ 2418.421981] Code: 48 c7 c0 (...) >> [ 2418.423019] RSP: 002b:00007ffc40569c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000012 >> [ 2418.423547] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54fda0b986 >> [ 2418.424075] RDX: 0000000000010000 RSI: 000056549595e000 RDI: 0000000000000003 >> [ 2418.424596] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000400 >> [ 2418.425119] R10: 0000000000000400 R11: 0000000000000246 R12: 00000000ffffffff >> [ 2418.425644] R13: 0000000000000400 R14: 0000000000010000 R15: 0000000000000000 >> [ 2418.426148] Modules linked in: btrfs blake2b_generic (...) >> [ 2418.429540] ---[ end trace ef2aeb44dc0afa34 ]--- >> >> 1) At btrfs_file_write_iter() we set current->journal_info to >> BTRFS_DIO_SYNC_STUB; >> >> 2) We then call __btrfs_direct_write(), which calls btrfs_direct_IO(); >> >> 3) We can't do the direct IO write because it starts at a non-aligned >> offset (1111). So at btrfs_direct_IO() we return -EINVAL (coming from >> check_direct_IO() which does the alignment check), but we leave >> current->journal_info set to BTRFS_DIO_SYNC_STUB - we only clear it >> at btrfs_dio_iomap_begin(), because we assume we always get there; >> >> 4) Then at __btrfs_direct_write() we see that the attempt to do the >> direct IO write was not successful, 0 bytes written, so we fallback >> to a buffered write by calling btrfs_buffered_write(); >> >> 5) There we call btrfs_check_data_free_space() which in turn calls >> btrfs_alloc_data_chunk_ondemand() and that calls >> btrfs_reserve_data_bytes() with flush == BTRFS_RESERVE_FLUSH_DATA; >> >> 6) Then at btrfs_reserve_data_bytes() we have current->journal_info set to >> BTRFS_DIO_SYNC_STUB, therefore not NULL, and flush has the value >> BTRFS_RESERVE_FLUSH_DATA, triggering the second assertion: >> >> int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes, >> enum btrfs_reserve_flush_enum flush) >> { >> struct btrfs_space_info *data_sinfo = fs_info->data_sinfo; >> int ret; >> >> ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA || >> flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE); >> ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA); >> (...) >> >> So fix that by setting the journal to NULL whenever check_direct_IO() >> returns a failure. >> >> This bug only affects 5.10 kernels, and the regression was introduced in >> 5.10-rc1 by commit 0eb79294dbe328 ("btrfs: dio iomap DSYNC workaround"). >> The bug does not exist in 5.11 kernels due to commit ecfdc08b8cc65d >> ("btrfs: remove dio iomap DSYNC workaround"), which depends on a large >> patchset that went into the merge window for 5.11. So this is a fix only >> for 5.10.x stable kernels, as there are people hitting this bug. >> >> Fixes: 0eb79294dbe328 ("btrfs: dio iomap DSYNC workaround") >> CC: stable@vger.kernel.org # 5.10 (and only 5.10) >> CC: David Sterba >> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1181605 >> Signed-off-by: Filipe Manana >> --- >> fs/btrfs/inode.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) > > As this is a one-off patch, I need the btrfs maintainers to ack this and > really justify why we can't take the larger patch or patch series here > instead, as that is almost always the correct thing to do instead. David will likely reply soon, but from another thread: https://lore.kernel.org/linux-btrfs/20210216142324.GO1993@twin.jikos.cz/ Thanks. > > thanks, > > greg k-h >