public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Alex Romosan <aromosan@gmail.com>
Cc: bernd.feige@gmx.net, dsterba@suse.cz, dsterba@suse.com,
	linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: do not skip re-registration for the mounted device
Date: Fri, 9 Feb 2024 13:41:13 +0530	[thread overview]
Message-ID: <eadf2480-950f-4e32-bc33-bef8e0093f9f@oracle.com> (raw)
In-Reply-To: <CAKLYgeKjvkkRM3D8ZDF5=o6j2hw5qrCzhufisnb0Gw-rvj12zA@mail.gmail.com>



On 2/9/24 01:34, Alex Romosan wrote:
> i'm attaching the boot log from 6.8-rc3 with your patch. update-grub
> works. i took a quick look at the log and i can see this (which wasn't
> in the unpatched kernel):
> 
> BTRFS info: devid 1 device path /dev/root changed to /dev/nvme0n1p3
> scanned by (udev-worker)
> 

  That's nice. Thanks for verifying.

Anand


> On Thu, Feb 8, 2024 at 6:22 PM Anand Jain <anand.jain@oracle.com> wrote:
>>
>>
>> Thanks for the kernel messages with debug enabled.
>>
>> I don't see the message to skip scannaing for
>> the mounted device. So it's not what we thought
>> was the issue.
>>
>>     pr_debug("BTRFS: skip registering single non-seed device %s\n", path);
>>
>>
>> Based on the assumption above, I have a fix below,
>> but I doubt its effectiveness.
>>
>>
>> https://patchwork.kernel.org/project/linux-btrfs/patch/8dd1990114aabb775d4631969f1beabeadaac5b7.1707132247.git.anand.jain@oracle.com/
>>
>> -Anand
>>
>>
>> On 2/8/24 18:05, Alex Romosan wrote:
>>> i'm attaching my boot log with 6.8.0-rc3 no fixes and btrfs debug
>>> enabled (i assume this is what you wanted). update-grub doesn't work.
>>> there was no patch in your last message. do you want me to try the
>>> patch you sent on monday? confused
>>>
>>> On Thu, Feb 8, 2024 at 3:23 AM Anand Jain <anand.jain@oracle.com> wrote:
>>>>
>>>>
>>>> Alex,
>>>>
>>>> Please provide the kernel boot messages with debugging enabled but
>>>> no fixes applied. Kindly collect these messages after reproducing
>>>> the problem.
>>>>
>>>> We've found issues with the previous fix. Please test this
>>>> new patch, as it may address the bug. Keep debugging enabled
>>>> during testing.
>>>>
>>>>
>>>> Thanks ,Anand
>>>>
>>>>
>>>> On 2/7/24 23:48, Alex Romosan wrote:
>>>>> Which version of the patch are we talking about? Let me know and I’ll
>>>>> try it with debugging on. I tried David’s patch and it seemed to work (I
>>>>> just booted into that kernel and ran update-grub) but I’ll try something
>>>>> else…
>>>>>
>>>>> On Wed, Feb 7, 2024 at 19:08 Anand Jain <anand.jain@oracle.com
>>>>> <mailto:anand.jain@oracle.com>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>       On 2/7/24 08:08, Anand Jain wrote:
>>>>>        >
>>>>>        >
>>>>>        >
>>>>>        > On 2/5/24 18:27, David Sterba wrote:
>>>>>        >> On Mon, Feb 05, 2024 at 07:45:05PM +0800, Anand Jain wrote:
>>>>>        >>> We skip device registration for a single device. However, we do
>>>>>       not do
>>>>>        >>> that if the device is already mounted, as it might be coming in
>>>>>       again
>>>>>        >>> for scanning a different path.
>>>>>        >>>
>>>>>        >>> This patch is lightly tested; for verification if it fixes.
>>>>>        >>>
>>>>>        >>> Signed-off-by: Anand Jain <anand.jain@oracle.com
>>>>>       <mailto:anand.jain@oracle.com>>
>>>>>        >>> ---
>>>>>        >>> I still have some unknowns about the problem. Pls test if this
>>>>>       fixes
>>>>>        >>> the problem.
>>>>>
>>>>>       Successfully tested with fstests (-g volume) and temp-fsid test cases.
>>>>>
>>>>>       Can someone verify if this patch fixes the problem? Also, when problem
>>>>>       occurs please provide kernel messages with Btrfs debugging support
>>>>>       option compiled in.
>>>>>
>>>>>       Thanks, Anand
>>>>>
>>>>>
>>>>>        >>>
>>>>>        >>>   fs/btrfs/volumes.c | 44
>>>>>       ++++++++++++++++++++++++++++++++++----------
>>>>>        >>>   fs/btrfs/volumes.h |  1 -
>>>>>        >>>   2 files changed, 34 insertions(+), 11 deletions(-)
>>>>>        >>>
>>>>>        >>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>>        >>> index 474ab7ed65ea..192c540a650c 100644
>>>>>        >>> --- a/fs/btrfs/volumes.c
>>>>>        >>> +++ b/fs/btrfs/volumes.c
>>>>>        >>> @@ -1299,6 +1299,31 @@ int btrfs_forget_devices(dev_t devt)
>>>>>        >>>       return ret;
>>>>>        >>>   }
>>>>>        >>> +static bool btrfs_skip_registration(struct btrfs_super_block
>>>>>        >>> *disk_super,
>>>>>        >>> +                    dev_t devt, bool mount_arg_dev)
>>>>>        >>> +{
>>>>>        >>> +    struct btrfs_fs_devices *fs_devices;
>>>>>        >>> +
>>>>>        >>> +    list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
>>>>>        >>> +        struct btrfs_device *device;
>>>>>        >>> +
>>>>>        >>> +        mutex_lock(&fs_devices->device_list_mutex);
>>>>>        >>> +        list_for_each_entry(device, &fs_devices->devices,
>>>>>       dev_list) {
>>>>>        >>> +            if (device->devt == devt) {
>>>>>        >>> +                mutex_unlock(&fs_devices->device_list_mutex);
>>>>>        >>> +                return false;
>>>>>        >>> +            }
>>>>>        >>> +        }
>>>>>        >>> +        mutex_unlock(&fs_devices->device_list_mutex);
>>>>>        >>
>>>>>        >> This is locking and unlocking again before going to
>>>>>       device_list_add, so
>>>>>        >> if something changes regarding the registered device then it's
>>>>>       not up to
>>>>>        >> date.
>>>>>        >>
>>>>>        >
>>>>>
>>>>>       We are in the uuid_mutex, a potentially racing thread will have to
>>>>>       acquire this mutex to delete from the list. So there can't a race.
>>>>>
>>>>>
>>>>>
>>>>>        > Right. A race might happen, but it is not an issue. At worst, there
>>>>>        > will be a stale device in the cache, which gets removed or re-used
>>>>>        > in the next mkfs or mount of the same device.
>>>>>        >
>>>>>        > However, this is a rough cut that we need to fix. I am reviewing
>>>>>        > your approach as well. I'm fine with any fix.
>>>>>        >
>>>>>        >
>>>>>        >>
>>>>>        >>> +    }
>>>>>        >>> +
>>>>>        >>> +    if (!mount_arg_dev && btrfs_super_num_devices(disk_super)
>>>>>       == 1 &&
>>>>>        >>> +        !(btrfs_super_flags(disk_super) &
>>>>>       BTRFS_SUPER_FLAG_SEEDING))
>>>>>        >>> +        return true;
>>>>>        >>
>>>>>        >> The way I implemented it is to check the above conditions as a
>>>>>        >> prerequisite but leave the heavy work for device_list_add that
>>>>>       does all
>>>>>        >> the uuid and device list locking and we are quite sure it
>>>>>       survives all
>>>>>        >> the races between scanning and mounts.
>>>>>        >>
>>>>>        >
>>>>>        > Hm. But isn't that the bug we need to fix? That we skipped the device
>>>>>        > scan thread that wanted to replace the device path from /dev/root to
>>>>>        > /dev/sdx?
>>>>>        >
>>>>>        > And we skipped, because it was not a mount thread
>>>>>        > (%mount_arg_dev=false), and the device is already mounted and the
>>>>>        > devt will match?
>>>>>        >
>>>>>        > So my fix also checked if devt is a match, then allow it to scan
>>>>>        > (so that the device path can be updated, such as /dev/root to
>>>>>       /dev/sdx).
>>>>>        >
>>>>>        > To confirm the bug, I asked for the debug kernel messages, I don't
>>>>>        > this we got it. Also, the existing kernel log shows no such issue.
>>>>>        >
>>>>>        >
>>>>>        >>> +
>>>>>        >>> +    return false;
>>>>>        >>> +}
>>>>>        >>> +
>>>>>        >>>   /*
>>>>>        >>>    * Look for a btrfs signature on a device. This may be called
>>>>>       out
>>>>>        >>> of the mount path
>>>>>        >>>    * and we are not allowed to call set_blocksize during the scan.
>>>>>        >>> The superblock
>>>>>        >>> @@ -1316,6 +1341,7 @@ struct btrfs_device
>>>>>        >>> *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>>>>>        >>>       struct btrfs_device *device = NULL;
>>>>>        >>>       struct bdev_handle *bdev_handle;
>>>>>        >>>       u64 bytenr, bytenr_orig;
>>>>>        >>> +    dev_t devt = 0;
>>>>>        >>>       int ret;
>>>>>        >>>       lockdep_assert_held(&uuid_mutex);
>>>>>        >>> @@ -1355,18 +1381,16 @@ struct btrfs_device
>>>>>        >>> *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>>>>>        >>>           goto error_bdev_put;
>>>>>        >>>       }
>>>>>        >>> -    if (!mount_arg_dev && btrfs_super_num_devices(disk_super)
>>>>>       == 1 &&
>>>>>        >>> -        !(btrfs_super_flags(disk_super) &
>>>>>       BTRFS_SUPER_FLAG_SEEDING)) {
>>>>>        >>> -        dev_t devt;
>>>>>        >>> +    ret = lookup_bdev(path, &devt);
>>>>>        >>> +    if (ret)
>>>>>        >>> +        btrfs_warn(NULL, "lookup bdev failed for path %s: %d",
>>>>>        >>> +               path, ret);
>>>>>        >>> -        ret = lookup_bdev(path, &devt);
>>>>>        >>
>>>>>        >> Do we actually need this check? It was added with the patch
>>>>>       skipping the
>>>>>        >> registration, so it's validating the block device but how can we
>>>>>       pass
>>>>>        >> something that is not a valid block device?
>>>>>        >>
>>>>>        >
>>>>>        > Do you mean to check if the lookup_bdev() is successful? Hm. It
>>>>>       should
>>>>>        > be okay not to check, but we do that consistently in other places.
>>>>>        >
>>>>>        >> Besides there's a call to bdev_open_by_path() that in turn does the
>>>>>        >> lookup_bdev so checking it here is redundant. It's not related
>>>>>       to the
>>>>>        >> fix itself but I deleted it in my fix.
>>>>>        >>
>>>>>        >
>>>>>        > Oh no. We need %devt to be set because:
>>>>>        >
>>>>>        > It will match if that device is already mounted/scanned.
>>>>>        > It will also free stale entries.
>>>>>        >
>>>>>        > Thx, Anand
>>>>>        >
>>>>>        >>> -        if (ret)
>>>>>        >>> -            btrfs_warn(NULL, "lookup bdev failed for path %s: %d",
>>>>>        >>> -                   path, ret);
>>>>>        >>> -        else
>>>>>        >>> +    if (btrfs_skip_registration(disk_super, devt,
>>>>>       mount_arg_dev)) {
>>>>>        >>> +        pr_debug("BTRFS: skip registering single non-seed
>>>>>       device %s\n",
>>>>>        >>> +              path);
>>>>>        >>> +        if (devt)
>>>>>        >>>               btrfs_free_stale_devices(devt, NULL);
>>>>>        >>> -
>>>>>        >>> -        pr_debug("BTRFS: skip registering single non-seed device
>>>>>        >>> %s\n", path);
>>>>>        >>>           device = NULL;
>>>>>        >>>           goto free_disk_super;
>>>>>        >>>       }
>>>>>

  reply	other threads:[~2024-02-09  8:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-05 11:45 [PATCH] btrfs: do not skip re-registration for the mounted device Anand Jain
2024-02-05 12:57 ` David Sterba
2024-02-07  2:38   ` Anand Jain
2024-02-07 18:08     ` Anand Jain
     [not found]       ` <CAKLYge+9ngrW-1EffUhyU1y13MzgP33osNDi3D6y6UVW6poJZA@mail.gmail.com>
2024-02-08  2:23         ` Anand Jain
2024-02-08 12:35           ` Alex Romosan
2024-02-08 17:22             ` Anand Jain
2024-02-08 19:45               ` Alex Romosan
2024-02-08 20:04               ` Alex Romosan
2024-02-09  8:11                 ` Anand Jain [this message]
2024-02-09 15:27                   ` Dr. Bernd Feige
2024-02-12 14:59 ` David Sterba
2024-02-13  0:35   ` Anand Jain
2024-02-13 19:38     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eadf2480-950f-4e32-bc33-bef8e0093f9f@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=aromosan@gmail.com \
    --cc=bernd.feige@gmx.net \
    --cc=dsterba@suse.com \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox