From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, Qu Wenruo <wqu@suse.com>,
linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2] btrfs: qgroup: Fix data leakage caused by race between writeback and truncate
Date: Fri, 17 Jul 2020 19:56:09 -0400 [thread overview]
Message-ID: <3ba9208e-0f85-a7d3-e6e2-17a1dac1de2e@toxicpanda.com> (raw)
In-Reply-To: <dcc47e7f-53e0-e832-0e39-e8c1d82e318e@gmx.com>
On 7/17/20 7:38 PM, Qu Wenruo wrote:
>
>
> On 2020/7/17 下午11:30, Josef Bacik wrote:
>> On 7/17/20 3:12 AM, Qu Wenruo wrote:
>>> [BUG]
>>> When running tests like generic/013 on test device with btrfs quota
>>> enabled, it can normally lead to data leakage, detected at unmount time:
>>>
>>> BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type
>>> 0 rsv 4096
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142
>>> close_ctree+0x1dc/0x323 [btrfs]
>>> RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
>>> Call Trace:
>>> btrfs_put_super+0x15/0x17 [btrfs]
>>> generic_shutdown_super+0x72/0x110
>>> kill_anon_super+0x18/0x30
>>> btrfs_kill_super+0x17/0x30 [btrfs]
>>> deactivate_locked_super+0x3b/0xa0
>>> deactivate_super+0x40/0x50
>>> cleanup_mnt+0x135/0x190
>>> __cleanup_mnt+0x12/0x20
>>> task_work_run+0x64/0xb0
>>> __prepare_exit_to_usermode+0x1bc/0x1c0
>>> __syscall_return_slowpath+0x47/0x230
>>> do_syscall_64+0x64/0xb0
>>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> ---[ end trace caf08beafeca2392 ]---
>>> BTRFS error (device dm-3): qgroup reserved space leaked
>>>
>>> [CAUSE]
>>> In the offending case, the offending operations are:
>>> 2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
>>> 2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
>>>
>>> The following sequence of events could happen after the writev():
>>> CPU1 (writeback) | CPU2 (truncate)
>>> -----------------------------------------------------------------
>>> btrfs_writepages() |
>>> |- extent_write_cache_pages() |
>>> |- Got page for 1003520 |
>>> | 1003520 is Dirty, no writeback |
>>> | So (!clear_page_dirty_for_io()) |
>>> | gets called for it |
>>> |- Now page 1003520 is Clean. |
>>> | | btrfs_setattr()
>>> | | |- btrfs_setsize()
>>> | | |- truncate_setsize()
>>> | | New i_size is 18388
>>> |- __extent_writepage() |
>>> | |- page_offset() > i_size |
>>> |- btrfs_invalidatepage() |
>>> |- Page is clean, so no qgroup |
>>> callback executed
>>>
>>> This means, the qgroup reserved data space is not properly released in
>>> btrfs_invalidatepage() as the page is Clean.
>>>
>>> [FIX]
>>> Instead of checking the dirty bit of a page, call
>>> btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
>>>
>>> As qgroup rsv are completely binded to the QGROUP_RESERVED bit of
>>> io_tree, not binded to page status, thus we won't cause double freeing
>>> anyway.
>>>
>>> Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from
>>> going subzero")
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>
>>
>> I don't understand how this is ok. We can call invalidatepage via
>> memory pressure, so what if we have started the write and have an
>> ordered extent outstanding, and then we call into invalidate page and
>> now unconditionally drop the qgroup reservation, even tho we still need
>> it for the ordered extent. Am I missing something here? Thanks,
>
> As long as the ordered extent as been started
> (__btrfs_add_ordered_extent()), then the QGROUP_RESERVED bit is cleared,
> either freed for NODATACOW write, or released for COW writes.
>
> IIRC this recent change is suggested by you, and that paved the road for
> this fix.
>
Yeah I had it backwards in my head, this looks good to me, you can add
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
next prev parent reply other threads:[~2020-07-17 23:56 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-17 7:12 [PATCH v2] btrfs: qgroup: Fix data leakage caused by race between writeback and truncate Qu Wenruo
2020-07-17 15:30 ` Josef Bacik
2020-07-17 23:38 ` Qu Wenruo
2020-07-17 23:56 ` Josef Bacik [this message]
2020-07-20 15:00 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3ba9208e-0f85-a7d3-e6e2-17a1dac1de2e@toxicpanda.com \
--to=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox