public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Martin Steigerwald <martin@lichtvoll.de>, linux-block@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Assumption on fixed device numbers in Plasma's desktop search Baloo
Date: Sat, 26 Jun 2021 08:27:54 +0800	[thread overview]
Message-ID: <fe83dadc-bbcf-2f85-6664-bad3fcd83553@gmx.com> (raw)
In-Reply-To: <41661070.mPYKQbcTYQ@ananda>



On 2021/6/26 上午3:06, Martin Steigerwald wrote:
> Hi!
>
> I found repeatedly that Baloo indexes the same files twice or even more
> often after a while.
>
> I reported this upstream in:
>
> Bug 438434 - Baloo appears to be indexing twice the number of files than
> are actually in my home directory
>
> https://bugs.kde.org/show_bug.cgi?id=438434
>
> And got back that if the device number changes, Baloo will think it has
> new files even tough the path is still the same. And found over time that
> the device number for the single BTRFS filesystem on a NVMe SSD in a
> ThinkPad T14 Gen1 AMD can change. It is not (maybe yet) RAID 1. I do
> have BTRFS RAID 1 in another laptop and there I also had this issue
> already.

Since btrfs has multi-device support by default, it reports anonymous
device number, just as if you use a filesystem over LVM.

The problem is why the anonymous device number change.

If the fs is always mounted at a fixed sequence with fixed
snapshots/subvolume mount, it should not get a new anonymous device number.

But if snapshots or new subvolumes are involved, or just
mounting/reading subvolumes in different order, then the device number
for each subvolume will change.

>
> I argued that a desktop application has no business to rely on a device
> number and got back that search/indexing is in the middle between an
> application and system software. And that Baloo needs an "invariant" for
> a file. See comment #11 of that bug report:
>
> https://bugs.kde.org/show_bug.cgi?id=438434#c11

Well, a lot of tools relies on device number to distinguish filesystem
boundary, like find.
Thus it's a little hard to argue.

But on the other hand, it also means baloo can't handle regular fs over
LVM cases well neither.

>
> I got the suggestion to try to find a way to tell the kernel to use a
> fixed device number.

I don't think it's possible for btrfs, as each subvolume get its
anonymous device number assigned when it gets first read.

Thus it's really hard to make it fixed, as the reason for anonymous
device number is to avoid conflicts.

>
> I still think, an application or an infrastructure service for a desktop
> environment or even anything else in user space should not rely on a
> device number to be fixed and never change upon reboots.

Well, LVM/device mapper is doing the same thing, a lot of behavior
change is never a good idea for the kernel.

Thus for use cases where we really need a proper mapping, we use hashes,
not just device number, like what we did in dupremover.

>
> But maybe you have a different idea about that and it is okay for an
> userspace component to do that. I would like to hear your idea about
> that.
>
> Another question would be whether I could somehow make sure that the
> device number does not change, even if just as a work-around.

If you really just want a fixed device number, you can ensure that by:

- Make sure all users of anonymous devices get fixed sequence
   Things like device mapper/LVM, btrfs should get loaded/initialized
   in a fixed order.

- Make sure the subvolume you care always get mounted/read before any
   other subvolumes
   So that the target subvolume always get the first device number in the
   pool.

   But this also means, all later subvolumes not in the fixed mount/read
   sequence can not get a fixed number.

Thanks,
Qu

> I know for
> NFS there is a fsid= mount option, but it does not appear to be
> something generic, at least the mount man page seems to have nothing
> related to fsid.
>
>
> Best,
>

  reply	other threads:[~2021-06-26  0:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-25 19:06 Assumption on fixed device numbers in Plasma's desktop search Baloo Martin Steigerwald
2021-06-26  0:27 ` Qu Wenruo [this message]
2021-06-26  8:49   ` Martin Steigerwald
2021-06-26  9:33     ` Qu Wenruo
2021-06-26 10:18       ` Martin Steigerwald
2021-06-26  0:54 ` NeilBrown
2021-06-26  3:38   ` Bart Van Assche
2021-06-26  5:17     ` NeilBrown
2021-06-26  6:14       ` Andrei Borzenkov
2021-06-26  6:24         ` Qu Wenruo
2021-06-26  8:51   ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe83dadc-bbcf-2f85-6664-bad3fcd83553@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin@lichtvoll.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox