From: Anand Jain <anand.jain@oracle.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 2/4] btrfs: fix race between writes to swap files and scrub
Date: Fri, 5 Feb 2021 15:44:36 +0800 [thread overview]
Message-ID: <d563cfa9-5cf2-b408-0a7f-cdb597ebc9cc@oracle.com> (raw)
In-Reply-To: <CAL3q7H5OOyZHja4hG8cmMOjcsOQSUKvMms9kZHG28i4GqNkOjA@mail.gmail.com>
On 2/4/2021 6:11 PM, Filipe Manana wrote:
> On Thu, Feb 4, 2021 at 8:48 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> On 2/3/2021 7:17 PM, fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> When we active a swap file, at btrfs_swap_activate(), we acquire the
>>> exclusive operation lock to prevent the physical location of the swap
>>> file extents to be changed by operations such as balance and device
>>> replace/resize/remove. We also call there can_nocow_extent() which,
>>> among other things, checks if the block group of a swap file extent is
>>> currently RO, and if it is we can not use the extent, since a write
>>> into it would result in COWing the extent.
>>>
>>> However we have no protection against a scrub operation running after we
>>> activate the swap file, which can result in the swap file extents to be
>>> COWed while the scrub is running and operating on the respective block
>>> group, because scrub turns a block group into RO before it processes it
>>> and then back again to RW mode after processing it. That means an attempt
>>> to write into a swap file extent while scrub is processing the respective
>>> block group, will result in COWing the extent, changing its physical
>>> location on disk.
>>>
>>> Fix this by making sure that block groups that have extents that are used
>>> by active swap files can not be turned into RO mode, therefore making it
>>> not possible for a scrub to turn them into RO mode.
>>
>>> When a scrub finds a
>>> block group that can not be turned to RO due to the existence of extents
>>> used by swap files, it proceeds to the next block group and logs a warning
>>> message that mentions the block group was skipped due to active swap
>>> files - this is the same approach we currently use for balance.
>>
>> It is better if this info is documented in the scrub man-page. IMO.
>>
>>> This ends up removing the need to call btrfs_extent_readonly() from
>>> can_nocow_extent(), as btrfs_swap_activate() now checks if a block group
>>> is RO through the new function btrfs_inc_block_group_swap_extents().
>>>
>>> The only other caller of can_nocow_extent() is the direct IO write path,
There is a third caller. check_can_nocow() also calls
can_nocow_extent(), which I missed before. Any idea where does it get to
know that extent is RO in the threads using check_can_nocow()? I have to
back out the RB for this reason for now.
>>> btrfs_get_blocks_direct_write(), but that already checks if a block group
>>> is RO through the call to btrfs_inc_nocow_writers().
>>> In fact, after this
>>> change we end up optimizing the direct IO write path, since we no longer
>>> iterate the block groups rbtree twice, once with btrfs_extent_readonly(),
>>> through can_nocow_extent(), and once again with btrfs_inc_nocow_writers().
>>> This can save time and reduce contention on the lock that protects the
>>> rbtree (specially because it is a spinlock and not a read/write lock) on
>>> very large filesystems, with several thousands of allocated block groups.
>>>
>>> Fixes: ed46ff3d42378 ("Btrfs: support swap files")
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>
>> I am not sure about the optimization of direct IO part, but as such
>> changes looks good.
Clarifying about the optimization part (for both buffered and direct IO)
- After patch 1, and patch 2, now we check on the RO extents after the
functions btrfs_cross_ref_exist(), and csum_exist_in_range(), both of
them have search_slot, whereas, before patch 1, and patch 2, we used to
fail early (if the extent is RO) and avoided the search_slot, so I am
not sure if there is optimization.
Thanks, Anand
next prev parent reply other threads:[~2021-02-05 7:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-03 11:17 [PATCH 0/4] btrfs: fix a couple swapfile support bugs fdmanana
2021-02-03 11:17 ` [PATCH 1/4] btrfs: avoid checking for RO block group twice during nocow writeback fdmanana
2021-02-04 7:47 ` Anand Jain
2021-02-03 11:17 ` [PATCH 2/4] btrfs: fix race between writes to swap files and scrub fdmanana
2021-02-04 8:48 ` Anand Jain
2021-02-04 10:11 ` Filipe Manana
2021-02-05 7:44 ` Anand Jain [this message]
2021-02-05 12:54 ` Filipe Manana
2021-02-03 11:17 ` [PATCH 3/4] btrfs: remove no longer used function btrfs_extent_readonly() fdmanana
2021-02-03 11:17 ` [PATCH 4/4] btrfs: fix race between swap file activation and snapshot creation fdmanana
2021-02-05 12:55 ` [PATCH v2 0/3] btrfs: fix a couple swapfile support bugs fdmanana
2021-02-05 12:55 ` [PATCH v2 1/3] btrfs: avoid checking for RO block group twice during nocow writeback fdmanana
2021-02-10 12:28 ` Anand Jain
2021-02-05 12:55 ` [PATCH v2 2/3] btrfs: fix race between writes to swap files and scrub fdmanana
2021-02-10 16:54 ` Anand Jain
2021-02-05 12:55 ` [PATCH v2 3/3] btrfs: fix race between swap file activation and snapshot creation fdmanana
2021-02-10 17:19 ` Anand Jain
2021-02-10 22:15 ` [PATCH v2 0/3] btrfs: fix a couple swapfile support bugs David Sterba
2021-02-10 15:30 ` [PATCH 0/4] " Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d563cfa9-5cf2-b408-0a7f-cdb597ebc9cc@oracle.com \
--to=anand.jain@oracle.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).