From: Marc MERLIN <marc@merlins.org>
To: Anand Jain <anand.jain@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>,
dsterba@suse.com, linux-btrfs@vger.kernel.org,
kernel-team@fb.com
Subject: Re: 5.4.20: cannot mount device that blipped off the bus: duplicate device fsid:devid for
Date: Sun, 19 Apr 2020 12:13:04 -0700 [thread overview]
Message-ID: <20200419191304.GR21716@merlins.org> (raw)
In-Reply-To: <f85fccf5-eeb4-28ef-4dc4-500cf9221619@oracle.com>
On Thu, Apr 16, 2020 at 06:43:39PM +0800, Anand Jain wrote:
> > BTRFS info (device sde1): forced readonly
>
> Unfortunately that's the only thing we do as of now.
Of course, and that's fine, but I don't understand why after unmounting
the filesystem cleanly, the references aren't freed.
That part really seems like a bug to me.
> So the same device reappears as sdp. But btrfs does not close a failed
> device yet (patches are in the mailing list) the old path sde
> is still in the block layer and opened. I guess /proc/partitions
> doesn't show non working sde.
>
Correct on all points
> > gargamel:~# mount | grep sde
> better to have grep-ed sdp also, here.
it was not mounted yet, I checked that.
> And /proc/self/mounts will be more accurate as it probes the fs module.
Noted.
> > [1887142.826176] BTRFS error (device sde1): bdev /dev/sde1 errs: wr 1038, rd 4531, flush 0, corrupt 0, gen 0
> > [1887453.610947] BTRFS warning (device sde1): duplicate device fsid:devid for 727c7ba3-f6f9-462a-8472-453dd7d46d8a:1 old:/dev/sde1 new:/dev/sdp1
>
> Unmount wasn't successful above. Or it was remounted by automount? just
> guessing.
umount was successful and automount does not handle this device.
/dev/sde was not mounted for sure and /dev/sdp was unmountable
> > gargamel:/usr/local/bin# btrfs device scan --forget
> > gargamel:/usr/local/bin# mount /dev/sdp1 /mnt/mnt
> > mount: /mnt/mnt: mount(2) system call failed: File exists.
>
> Can you please send a complete kernel logs.
They contain a lot of crap that wouldn't fit on the list, but I pasted
everything relevant.
> sde disappears.
> btrfs does not close the device.
it remounts the mountpoints read only, which is fine (they can't be
unmounted because they are in use).
> block layer creates sdp when the disappeared device reappears.
> unmount of sde was tried but it might not have completely successful we
> don't have sufficient logs to prove it.
umount looked complete on my side, there is nothing in the logs that
shows otherwise, but as you said, unmount does not log anything.
> mount of sdp fails per log indicates that sde is still mounted.
correct.
> So thing(s) to fix is/are:
> The root of the issue - When sde fails we need to close the device
> so that block layer can reuse sde when it reappears (not sdp).
> In btrfs as we have closed the failed device btrfs dev scan --forget
> can work to cleanup the stale entries left behind during unmount.
ideally "btrfs dev scan --forget" should be automatic. It feels like
a weird command for an admin to know or have to use. Other filesystems
do not need it.
> We can do something better here:
> When two different device with same fsid uuid and devid and one of it
> is mounted we have to fail the scan/mount of the newer device for
> obvious reasons. That's when we get the log - 'duplicate device fsid'.
> But here the case it bit skewed that both are same device with same
> major number but different minor number (sde sdp). I need to figure
> out a way so that we don't treat these two device paths as different
> device. Probably should check the guid/wwid assigned by the block
> layer which should be same for both of these devices, or in the
> last resort check scsi inquiry_VPD page and get the serial number
> but its going too much beyond what FS should do. Let me check with
> block layer experts what they suggest.
defense in depth sounds great here, if any of those can work too, that'd
be great.
> Still unknown:
> unmount is successful? And mount logs shows that device sde still exists in
> btrfs.
It failed while mountpoints were still it use, and after the correct
fuser -kvm /path
umount worked great and the device disappeared from /proc/mounts.
As you said, there are no kernel logs on unmount, so it's hard to say
more.
If you want me to apply a patch that puts more logging on unmount
(against 5.5 or 5.6), please let me know, but of course, it could be
weeks or months before I get that blip again.
I think this could be reproduced by simply having a drive mounted, and
unplugging it while the machine is live, and plugging it back in at
runtime. I could technically do it with my hardware, but it happens on a
a database I don't really want to lose or corrupt.
> Sorry I was diverted into other stuffs when you reported last time, let me
> take a fresh look.
No worries, we've all been there :)
Also, it's not like I can get a refund on my support contract I don't have ;)
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08
next prev parent reply other threads:[~2020-04-19 19:13 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-21 20:23 5.4.20: cannot mount device that blipped off the bus: duplicate device fsid:devid for Marc MERLIN
2020-03-21 21:25 ` Nikolay Borisov
2020-03-25 20:14 ` Marc MERLIN
2020-03-25 23:56 ` Anand Jain
2020-03-26 1:30 ` Marc MERLIN
2020-03-26 3:33 ` Anand Jain
2020-03-26 4:26 ` Marc MERLIN
2020-04-14 0:38 ` Marc MERLIN
2020-04-16 10:43 ` Anand Jain
2020-04-19 19:13 ` Marc MERLIN [this message]
2020-04-20 11:10 ` Anand Jain
2020-04-20 14:56 ` Marc MERLIN
2020-04-21 7:33 ` Anand Jain
2020-04-22 5:54 ` Marc MERLIN
2020-04-21 7:21 ` [PATCH] btrfs: boilerplate: devlist and fsinfo Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200419191304.GR21716@merlins.org \
--to=marc@merlins.org \
--cc=anand.jain@oracle.com \
--cc=dsterba@suse.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.