From: Zygo Blaxell <zblaxell@furryterror.org>
To: Goffredo Baroncelli <kreijack@inwind.it>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS messes up snapshot LV with origin
Date: Tue, 25 Nov 2014 15:29:48 -0500 [thread overview]
Message-ID: <20141125202948.GP17380@hungrycats.org> (raw)
In-Reply-To: <5474AF87.6090702@inwind.it>
[-- Attachment #1: Type: text/plain, Size: 4835 bytes --]
On Tue, Nov 25, 2014 at 05:34:15PM +0100, Goffredo Baroncelli wrote:
> On 11/23/2014 01:19 AM, Zygo Blaxell wrote:
> [...]
> > md-raid works as long as you specify the devices, and because it's always
> > the lowest layer it can ignore LVs (snapshot or otherwise). It's also
> > not a particularly common use case, while making an LV snapshot of a
> > filesystem is a typical use case.
>
> I fully agree; but you still consider a *multi-device* btrfs over lvm...
> This is like a dm over lvm... which doesn't make sense at all (as you
> already wrote)
It makes sense for btrfs because btrfs can productively use LVs on
different PVs (e.g. btrfs-raid1 on two LVs, one on each PV). LVM is
the bottom layer because not everything in the world is btrfs--things
like ephemeral /tmp, boot, swap, and temporary backup copies of the btrfs
(e.g. before running btrfsck) have to live on the same physical drives
as the btrfs filesystems.
> >>> and mounting the filesystem fails at 3.
> >> Are you sure ?
> >
> > Yes, I'm sure. I've had to replace filesystems destroyed this way.
> >
> >> [working instance snipped]
> >
> >> On the basis of the example above, in case you want to mount a
> >> "single-disk", BTRFS seems me to work properly. You have to pay
> >> attention only to not mount the two filesystem at the same time.
> >
> > The problem is btrfs stops searching when it sees one disk with each UUID,
>
> BTRFS doens't search anything. It is udev which "push" the information
> on the kernel module. The btrfs module groups these information by UUID.
> When a new disk is inserted, overwrite the information of the old one.
Same result: when presented with multiple devices with the same UUID,
one is chosen arbitrarily instead of rejecting all of them.
> > so the set of disks (snapshot vs origin) that you get is *random*.
> > For a pair of origin + snapshots, there's a 50% chance it works, 50%
> > chance it eats your data.
>
> Sorry but I have to disagree: the code is quite clear
> (see fs/btrfs/volume.c, near line 512):
>
> [...]
>
> } else if (!device->name || strcmp(device->name->str, path)) {
> /*
> * When FS is already mounted.
> * 1. If you are here and if the device->name is NULL that
> * means this device was missing at time of FS mount.
> * 2. If you are here and if the device->name is different
> * from 'path' that means either
> * a. The same device disappeared and reappeared with
> * different name. or
> * b. The missing-disk-which-was-replaced, has
> * reappeared now.
If the FS is already mounted then there is no issue. It's when you're trying
to mount the FS that the fun occurs.
> *
> * We must allow 1 and 2a above. But 2b would be a spurious
> * and unintentional.
>
> [...]
>
> The case is the 2a; in this case btrfs store the new name and mount it.
>
> Anyway I made a small test: I created 1 btrfs filesystem, and
> made a lvm-snapshot. Then create two different file in the snapshot and in
> the original one. I run a program which mounts randomly the first or
> the latter, checks if the correct file is present; after more than 130 tests I
> never saw your "50% chance it works": it always works.
One btrfs filesystem on two LVs with a snapshot of each LV also present.
So you'd have:
lv00 - btrfs device 1
lv01 - btrfs device 2
lv00snap - snapshot of lv00
lv01snap - snapshot of lv01
If you mount by device UUID then you get one of these results at random:
lv00 + lv01 - OK
lv00snap + lv01snap - also OK
lv00 + lv01snap - failure
lv00snap + lv01 - failure
2 failures, 2 successes = 50% failure rate.
If you mount by the name of one of the devices then you only get the two
rows of the above table that match the device you named, but you still
get one success row and one failure row.
Which result you get seems to depend on the order in which LVM enumerates
the LVs, so if you are doing a mount/umount loop then you won't see any
problems as btrfs will consistently make the same choice of LVs over
and over again. Rebooting or creating other LVs in between mounts will
definitely cause problems.
> BR
> G.Baroncelli
>
> >
> >> BR
> >> G.Baroncelli
> >>
> >>
> >> --
> >> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> >> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
>
>
> --
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2014-11-25 20:29 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-16 21:35 BTRFS messes up snapshot LV with origin MegaBrutal
2014-11-17 1:42 ` Duncan
2014-11-17 6:59 ` Brendan Hide
2014-11-17 7:35 ` Daniel Dressler
2014-11-17 9:00 ` Brendan Hide
2014-11-17 19:04 ` Goffredo Baroncelli
[not found] ` <CAE8gLh=VubBbZdeKTAuWRjOxPF7C+ouUeeVvmGfT2ckYWGhQVA@mail.gmail.com>
2014-11-17 19:45 ` Fwd: " MegaBrutal
2014-11-17 20:32 ` Goffredo Baroncelli
2014-11-18 6:16 ` Chris Murphy
2014-11-18 15:42 ` Phillip Susi
2014-11-18 19:17 ` Chris Murphy
2014-11-18 20:17 ` Phillip Susi
2014-11-19 2:54 ` Chris Murphy
2014-11-19 15:20 ` Phillip Susi
2014-11-19 18:35 ` Chris Murphy
2014-11-19 19:23 ` Phillip Susi
2014-11-21 4:28 ` Zygo Blaxell
2014-11-21 6:22 ` Duncan
2014-11-21 11:35 ` Robert White
2014-11-21 11:54 ` Duncan
2014-11-21 17:56 ` Zygo Blaxell
2014-11-21 23:09 ` Duncan
2014-11-21 18:23 ` Chris Murphy
2014-11-21 22:49 ` Duncan
2014-11-21 23:41 ` Duncan
2014-11-21 23:51 ` Duncan
2014-11-22 17:34 ` Goffredo Baroncelli
2014-11-23 0:19 ` Zygo Blaxell
2014-11-25 16:34 ` Goffredo Baroncelli
2014-11-25 20:29 ` Zygo Blaxell [this message]
2014-11-25 21:59 ` Goffredo Baroncelli
2014-11-25 22:21 ` Zygo Blaxell
2014-11-25 22:47 ` Chris Murphy
[not found] ` <CAJCQCtQUM=viSoPtcJMcyKquYb1DLmEsqBi=p++uXPy63+r3Ow@mail.gmail.com>
[not found] ` <20141126021134.GR17380@hungrycats.org>
2014-11-26 4:48 ` Chris Murphy
2014-11-26 17:19 ` Goffredo Baroncelli
2014-11-27 4:15 ` Zygo Blaxell
2014-11-28 17:05 ` Goffredo Baroncelli
2014-11-29 1:25 ` Robert White
2014-11-29 7:35 ` Goffredo Baroncelli
2014-11-29 8:02 ` Robert White
2014-11-29 7:37 ` MegaBrutal
2014-11-29 4:59 ` Zygo Blaxell
2014-11-29 7:55 ` Robert White
2014-12-01 15:25 ` Zygo Blaxell
2014-11-26 3:22 ` Duncan
2014-11-26 5:11 ` Chris Murphy
2014-11-26 22:08 ` Robert White
2014-11-27 9:08 ` Duncan
2014-11-28 7:10 ` Chris Murphy
2014-11-29 7:29 ` Duncan
2014-11-29 8:20 ` Robert White
2014-11-29 9:41 ` Duncan
2014-11-29 16:33 ` Robert White
2014-11-29 16:50 ` Robert White
2014-11-30 6:46 ` Duncan
2014-11-29 21:15 ` Chris Murphy
2014-11-18 20:41 ` MegaBrutal
2014-11-19 1:29 ` Robert White
2014-11-19 3:37 ` Duncan
2014-11-21 4:24 ` Zygo Blaxell
2014-11-18 6:21 ` Chris Murphy
2014-11-18 12:13 ` Duncan
2014-11-18 20:01 ` Goffredo Baroncelli
-- strict thread matches above, loose matches on Subject: below --
2014-11-17 8:00 MegaBrutal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141125202948.GP17380@hungrycats.org \
--to=zblaxell@furryterror.org \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).