linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/9 v2] fix replace-start and replace-cancel racing
Date: Fri, 16 Nov 2018 17:32:50 +0800	[thread overview]
Message-ID: <1e7e50d8-72df-f9b9-2cad-78e026e6a1c0@oracle.com> (raw)
In-Reply-To: <20181115154136.GV24115@twin.jikos.cz>



On 11/15/2018 11:41 PM, David Sterba wrote:
> On Sun, Nov 11, 2018 at 10:22:15PM +0800, Anand Jain wrote:
>> v1->v2:
>>    2/9: Drop writeback required
>>    3/9: Drop writeback required
>>    7/9: Use the condition within the WARN_ON()
>>    6/9: Use the condition within the ASSERT()
>>
>> Replace-start and replace-cancel threads can race to create a messy
>> situation leading to UAF. We use the scrub code to write
>> the blocks on the replace target. So if we haven't have set the
>> replace-scrub-running yet, without this patch we just ignore the error
>> and free the target device. When this happens the system panics with
>> UAF error.
>>
>> Its nice to see that btrfs_dev_replace_finishing() already handles
>> the ECANCELED (replace canceled) situation, but for an unknown reason
>> we aren't using it to cleanup the replace cancel situation, instead
>> we just let the replace cancel ioctl thread to cleanup the target
>> device and return and out of synchronous with the scrub code.
>>
>> This patch 4/9, 5/9 and 6/9 uses the return code of btrfs_scrub_cancel()
>> to check if the scrub was really running. And if its not then shall
>> return an error to the user (replace not started error) so that user
>> can retry replace cancel. And uses btrfs_dev_replace_finishing() code
>> to cleanup after successful cancel of the replace scrub.
>>
>> Further, a suspended replace, when tries to restart, and if it fails
>> (for example target device missing, or excl ops running) it goes to the
>> started state, and so the cli 'btrfs replace status /mnt' hangs with no
>> progress. So patches 2/9 and 3/9 fixes that.
>>
>> As the originals code idea of ECANCELED was limited to the situation of
>> the error only and not user requested, there are unnecessary error log
>> and warn log which 7/9 and 8/9 patches fixes.
>>
>> Patches 1/9 and 9/9 are good to have fixes. Makes a function static and
>> code readability good.
>>
>> Testing: (I did some attempt to convert these into xfstests but need a
>> mechanism where kernel thread can wait for user land script. I thought
>> I could do it using ebfp, but needs more digging on how).
>> As of now hand tested with using procfs to hold kernel thread at
>> (wait_for_user(..)) until user land issues go.
> 
> This could be tricky to get implemented but would be of course useful. I
> saw the crash about once a week so will watch if this still happens.

  That will be nice.

>> Anand Jain (9):
>>    btrfs: mark btrfs_dev_replace_start() as static
>>    btrfs: replace go back to suspended if target missing
>>    btrfs: replace back to suspend state if EXCL OP is running
>>    btrfs: fix UAF due to race between replace start and cancel
>>    btrfs: replace cancel is successful if scrub cancel is successful
>>    btrfs: replace's scrub must not be running in replace suspended state
>>    btrfs: quiten warn if the replace is canceled at finish
>>    btrfs: user requsted replace cancel is not an error
>>    btrfs: add explicit check for replace result no error
> 
> The above is merged to misc-next, except:
> 
> btrfs: quiten warn if the replace is canceled at finish
> btrfs: user requsted replace cancel is not an error
> 
> with replies under the patches what could be improved. The changes can
> be sent independently if you need to do that in several patches. Thanks.

  We need these patch otherwise you will see WARN_ON and btrfs_err
  after a successful replace cancel. Will send revised patch.

Thanks, Anand

      reply	other threads:[~2018-11-16  9:32 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-11 14:22 [PATCH 0/9 v2] fix replace-start and replace-cancel racing Anand Jain
2018-11-11 14:22 ` [PATCH 1/9] btrfs: mark btrfs_dev_replace_start() as static Anand Jain
2018-11-11 14:22 ` [PATCH 2/9] btrfs: replace go back to suspended if target missing Anand Jain
2018-11-11 14:22 ` [PATCH 3/9] btrfs: replace back to suspend state if EXCL OP is running Anand Jain
2018-11-11 14:22 ` [PATCH 4/9] btrfs: fix UAF due to race between replace start and cancel Anand Jain
2018-11-13 17:24   ` David Sterba
2018-11-14  1:28     ` Anand Jain
2018-11-15 14:00       ` David Sterba
2018-11-15 15:25         ` David Sterba
2018-11-16  6:37           ` Anand Jain
2018-11-14  5:50   ` [PATCH 4/9 v2.1] " Anand Jain
2018-11-11 14:22 ` [PATCH 5/9] btrfs: replace cancel is successful if scrub cancel is successful Anand Jain
2018-11-15 15:27   ` David Sterba
2018-11-16  6:38     ` Anand Jain
2018-11-11 14:22 ` [PATCH 6/9] btrfs: replace's scrub must not be running in replace suspended state Anand Jain
2018-11-11 14:22 ` [PATCH 7/9] btrfs: quiten warn if the replace is canceled at finish Anand Jain
2018-11-15 15:35   ` David Sterba
2018-11-16 12:06     ` Anand Jain
2018-11-16 18:49       ` David Sterba
2018-11-11 14:22 ` [PATCH 8/9] btrfs: user requsted replace cancel is not an error Anand Jain
2018-11-15 15:31   ` David Sterba
2018-11-16 10:29     ` Anand Jain
2018-11-16 11:05       ` Anand Jain
2018-11-11 14:22 ` [PATCH 9/9] btrfs: add explicit check for replace result no error Anand Jain
2018-11-13 17:33 ` [PATCH 0/9 v2] fix replace-start and replace-cancel racing David Sterba
2018-11-15 15:41 ` David Sterba
2018-11-16  9:32   ` Anand Jain [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e7e50d8-72df-f9b9-2cad-78e026e6a1c0@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).