public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, Qu Wenruo <wqu@suse.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2] btrfs: qgroup: Fix data leakage caused by race between writeback and truncate
Date: Fri, 17 Jul 2020 19:56:09 -0400	[thread overview]
Message-ID: <3ba9208e-0f85-a7d3-e6e2-17a1dac1de2e@toxicpanda.com> (raw)
In-Reply-To: <dcc47e7f-53e0-e832-0e39-e8c1d82e318e@gmx.com>

On 7/17/20 7:38 PM, Qu Wenruo wrote:
> 
> 
> On 2020/7/17 下午11:30, Josef Bacik wrote:
>> On 7/17/20 3:12 AM, Qu Wenruo wrote:
>>> [BUG]
>>> When running tests like generic/013 on test device with btrfs quota
>>> enabled, it can normally lead to data leakage, detected at unmount time:
>>>
>>>     BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type
>>> 0 rsv 4096
>>>     ------------[ cut here ]------------
>>>     WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142
>>> close_ctree+0x1dc/0x323 [btrfs]
>>>     RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
>>>     Call Trace:
>>>      btrfs_put_super+0x15/0x17 [btrfs]
>>>      generic_shutdown_super+0x72/0x110
>>>      kill_anon_super+0x18/0x30
>>>      btrfs_kill_super+0x17/0x30 [btrfs]
>>>      deactivate_locked_super+0x3b/0xa0
>>>      deactivate_super+0x40/0x50
>>>      cleanup_mnt+0x135/0x190
>>>      __cleanup_mnt+0x12/0x20
>>>      task_work_run+0x64/0xb0
>>>      __prepare_exit_to_usermode+0x1bc/0x1c0
>>>      __syscall_return_slowpath+0x47/0x230
>>>      do_syscall_64+0x64/0xb0
>>>      entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>     ---[ end trace caf08beafeca2392 ]---
>>>     BTRFS error (device dm-3): qgroup reserved space leaked
>>>
>>> [CAUSE]
>>> In the offending case, the offending operations are:
>>> 2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
>>> 2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
>>>
>>> The following sequence of events could happen after the writev():
>>>      CPU1 (writeback)        |        CPU2 (truncate)
>>> -----------------------------------------------------------------
>>> btrfs_writepages()            |
>>> |- extent_write_cache_pages()        |
>>>      |- Got page for 1003520        |
>>>      |  1003520 is Dirty, no writeback    |
>>>      |  So (!clear_page_dirty_for_io())   |
>>>      |  gets called for it        |
>>>      |- Now page 1003520 is Clean.    |
>>>      |                    | btrfs_setattr()
>>>      |                    | |- btrfs_setsize()
>>>      |                    |    |- truncate_setsize()
>>>      |                    |       New i_size is 18388
>>>      |- __extent_writepage()        |
>>>      |  |- page_offset() > i_size        |
>>>         |- btrfs_invalidatepage()        |
>>>       |- Page is clean, so no qgroup |
>>>          callback executed
>>>
>>> This means, the qgroup reserved data space is not properly released in
>>> btrfs_invalidatepage() as the page is Clean.
>>>
>>> [FIX]
>>> Instead of checking the dirty bit of a page, call
>>> btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
>>>
>>> As qgroup rsv are completely binded to the QGROUP_RESERVED bit of
>>> io_tree, not binded to page status, thus we won't cause double freeing
>>> anyway.
>>>
>>> Fixes: 0b34c261e235 ("btrfs: qgroup: Prevent qgroup->reserved from
>>> going subzero")
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>
>>
>> I don't understand how this is ok.  We can call invalidatepage via
>> memory pressure, so what if we have started the write and have an
>> ordered extent outstanding, and then we call into invalidate page and
>> now unconditionally drop the qgroup reservation, even tho we still need
>> it for the ordered extent.  Am I missing something here?  Thanks,
> 
> As long as the ordered extent as been started
> (__btrfs_add_ordered_extent()), then the QGROUP_RESERVED bit is cleared,
> either freed for NODATACOW write, or released for COW writes.
> 
> IIRC this recent change is suggested by you, and that paved the road for
> this fix.
> 

Yeah I had it backwards in my head, this looks good to me, you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

  reply	other threads:[~2020-07-17 23:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-17  7:12 [PATCH v2] btrfs: qgroup: Fix data leakage caused by race between writeback and truncate Qu Wenruo
2020-07-17 15:30 ` Josef Bacik
2020-07-17 23:38   ` Qu Wenruo
2020-07-17 23:56     ` Josef Bacik [this message]
2020-07-20 15:00 ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ba9208e-0f85-a7d3-e6e2-17a1dac1de2e@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox