Re: [PATCH 4/4] btrfs: validate device_list at scan for stray free

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Anand Jain <anand.jain@oracle.com>
To: Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 4/4] btrfs: validate device_list at scan for stray free
Date: Sat, 16 Mar 2024 19:43:23 +0530	[thread overview]
Message-ID: <718c4aaa-6891-49e6-a513-0bb0e3ec8036@oracle.com> (raw)
In-Reply-To: <20240314171158.GD3483638@zen.localdomain>

On 3/14/24 22:41, Boris Burkov wrote:
> On Sat, Mar 09, 2024 at 07:14:31PM +0530, Anand Jain wrote:
>> Tempfsid assumes all registered single devices in the fs_devicies list are
>> to be mounted; otherwise, they won't be in the btrfs_device list.
>>
>> We recently fixed a related bug caused by leaving failed-open device in
>> the list. This triggered tempfsid activation upon subsequent mounts of the
>> same fsid wrongly.
>>
>> To prevent this, scan the entire device list at mount for any stray
>> device and free them in btrfs_scan_one_device().
> 
> Is this an additional precaution on top of maintaining an invariant on
> every umount/failed mount that we have freed stale devices of single
> device fs-es? Or is it fundamentally impossible for us to enforce that
> invariant?
> 

Hmm. That's the ultimate goal: maintaining such an invariant. However,
there are bugs. So, this is the place where we can detect whether we
are successful. If we aren't, then we can work around it by freeing
the stale device and avoiding bad consequences.
I think I should also include a warning message when we detect and
free, so that it can be reviewed for the proper fix.
Does that seem reasonable?

> It feels like overkill to hack up free_stale_devices in this way,
> compared to just ensuring that we manage cleaning up single devices
> fs-es correctly when we are in a cleanup context. If this is practically
> the best way to ensure we don't get caught with our pants down by a
> random stale device, then I suppose it's fine.
> 
> A total aside I just thought of:
> I think it might also make sense to consider adding logic to look for
> single device fs-es with a device->bdev that is stale from the block
> layer's perspective, and somehow marking those in a way that tempfsid
> cares about. 


How would we know if the block layer considers a certain device's
block device (device->bdev) as stale?

If you mention a Write IO failure, we already put the filesystem
in read-only mode if that happens. But, we can't close the device
due to the pending write. (Some operating systems have an option
to call panic, which dumps the memory to the coredump device and
reboots.).

> That would help with things that like that case where we
> delete the block dev out from under a mounted fs and mount it a second
> time with tempfsid after it's recreated. Not a huge deal, as we've
> already discussed, though.

Yeah, thanks for brainstorming. Basically, we need a way to distinguish 
between the same physical-device with multiple nodes and different 
physical-devices with the same filesystem.

Thanks, Anand

> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/volumes.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 60d848392cd0..bb0857cfbef2 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1382,6 +1382,8 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>>   
>>   	lockdep_assert_held(&uuid_mutex);
>>   
>> +	btrfs_free_stale_devices(0, NULL, true);
>> +
>>   	/*
>>   	 * we would like to check all the supers, but that would make
>>   	 * a btrfs mount succeed after a mkfs from a different FS.
>> -- 
>> 2.38.1
>>

next prev parent reply	other threads:[~2024-03-16 14:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-09 13:44 [PATCH 0/4] btrfs: enhanced logic for stray single device Anand Jain
2024-03-09 13:44 ` [PATCH 1/4] btrfs: declare btrfs_free_stale_devices non-static Anand Jain
2024-03-09 13:44 ` [PATCH 2/4] btrfs: forget stray device on failed open Anand Jain
2024-03-14 17:05   ` Boris Burkov
2024-03-16 13:28     ` Anand Jain
2024-03-09 13:44 ` [PATCH 3/4] btrfs: refactor btrfs_free_stale_devices to free single stray device Anand Jain
2024-03-09 13:44 ` [PATCH 4/4] btrfs: validate device_list at scan for stray free Anand Jain
2024-03-14 17:11   ` Boris Burkov
2024-03-16 14:13     ` Anand Jain [this message]
2024-03-21 11:19       ` Anand Jain
2024-03-14 13:37 ` [PATCH 0/4] btrfs: enhanced logic for stray single device Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=718c4aaa-6891-49e6-a513-0bb0e3ec8036@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox