* btrfs und lvm-cache?
@ 2015-12-23 10:45 Neuer User
2015-12-23 11:21 ` Martin Steigerwald
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Neuer User @ 2015-12-23 10:45 UTC (permalink / raw)
To: linux-btrfs
Hello
I want to setup a small homeserver, based on a HP Microserver Gen8 (4GB
RAM, 2x3TB HDD + 1x120GB SSD) and Proxmox as distro.
The server will be used to host a (small) number of virtual machines,
most of them being LXC containers, few being KVM machines. One of the
LXC containers will host a fileserver with app 1 TB of data and another
one a backup system for the desktops / laptops in my household, thus
probably holding quite a lot of files. The lxc containers will use the
filesystem of the proxmox host, the KVM machines probably raw disk files
(or qcow2).
I would like to combine high data integrity with some speed, so I
thought of the following layout:
- both hdd and ssd in one LVM VG
- one LV on each hdd, containing a btrfs filesystem
- both btrfs LV configured as RAID1
- the single SDD used as a LVM cache device for both HDD LVs to speed up
random access, where possible
Now, I wonder if that is a good architecture to go for. Any input on
that? Is btrfs the right way to go for, or should I better go for ZFS
(and purchase some more gigs of RAM)?
Will there be any problems arising from the lvmcache? btrfs only sees
the HDDs, LVM does the SDD handling.
Thanks for any input. I like btrfs very much, but data integrity is
important for this.
Michael
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 10:45 btrfs und lvm-cache? Neuer User
@ 2015-12-23 11:21 ` Martin Steigerwald
2015-12-23 11:38 ` Neuer User
2015-12-23 20:24 ` Neuer User
` (2 subsequent siblings)
3 siblings, 1 reply; 18+ messages in thread
From: Martin Steigerwald @ 2015-12-23 11:21 UTC (permalink / raw)
To: Neuer User; +Cc: linux-btrfs
Am Mittwoch, 23. Dezember 2015, 11:45:28 CET schrieb Neuer User:
> Hello
Hi.
> I want to setup a small homeserver, based on a HP Microserver Gen8 (4GB
> RAM, 2x3TB HDD + 1x120GB SSD) and Proxmox as distro.
>
> The server will be used to host a (small) number of virtual machines,
> most of them being LXC containers, few being KVM machines. One of the
> LXC containers will host a fileserver with app 1 TB of data and another
> one a backup system for the desktops / laptops in my household, thus
> probably holding quite a lot of files. The lxc containers will use the
> filesystem of the proxmox host, the KVM machines probably raw disk files
> (or qcow2).
>
> I would like to combine high data integrity with some speed, so I
> thought of the following layout:
>
> - both hdd and ssd in one LVM VG
> - one LV on each hdd, containing a btrfs filesystem
> - both btrfs LV configured as RAID1
> - the single SDD used as a LVM cache device for both HDD LVs to speed up
> random access, where possible
>
> Now, I wonder if that is a good architecture to go for. Any input on
> that? Is btrfs the right way to go for, or should I better go for ZFS
> (and purchase some more gigs of RAM)?
>
> Will there be any problems arising from the lvmcache? btrfs only sees
> the HDDs, LVM does the SDD handling.
As far as I understand this way you basically loose the RAID 1 semantics of
BTRFS. While the data is redundant on the HDDs, it is not redundant on the
SSD. It may work for a pure read cache, but for write-through you definately
loose any data integrity protection a RAID 1 gives you.
Of course, you can use two SSDs and have them work as RAID 1 as well.
There is a patch set for in-BTRFS SSD-caching. It consists of a patch set to
add hot data tracking to VFS and a patch set for adding support in BTRFS. But
I didn´t see anything of these in quite some time.
Happy christmas,
--
Martin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 11:21 ` Martin Steigerwald
@ 2015-12-23 11:38 ` Neuer User
2015-12-23 19:45 ` Noah Massey
2015-12-23 19:49 ` Chris Murphy
0 siblings, 2 replies; 18+ messages in thread
From: Neuer User @ 2015-12-23 11:38 UTC (permalink / raw)
To: linux-btrfs
Am 23.12.2015 um 12:21 schrieb Martin Steigerwald:
> Hi.
>
> As far as I understand this way you basically loose the RAID 1 semantics of
> BTRFS. While the data is redundant on the HDDs, it is not redundant on the
> SSD. It may work for a pure read cache, but for write-through you definately
> loose any data integrity protection a RAID 1 gives you.
>
Hmm, are you sure? I thought LVM lies underneath btrfs. Btrfs thus
should not know about the caching SSD at all. It only knows of the two
LVs on the HDDs, reading and writing data from or to one or both of the
two LVs.
Only then lvmcache decides if it reads the data from the underlying HDD
or from the cache ssd. LVM shouldn't even know that the two LVs are
configured as RAID1 on btrfs as this is a level higher. So for LVM the
two LVs are diffeent data, both of which would need to be cached
independently on the SDD.
What might happen though, is that there is a data loss on the SDD,
returning a mismatching checksum, so btrfs might think that the data is
incorrect on one LV (=HDD), although it is indeed correct there. That
would lead btrfs to read the data from the second LV (which might also
be in the SDD cache or not) and then updating the (correct and
identical) data of the first LV with it.
Or do I see that wrong?
> Of course, you can use two SSDs and have them work as RAID 1 as well.
>
> There is a patch set for in-BTRFS SSD-caching. It consists of a patch set to
> add hot data tracking to VFS and a patch set for adding support in BTRFS. But
> I didn´t see anything of these in quite some time.
That would be interesting, but for my project it's probably too late.
>
> Happy christmas,
>
Yeah, happy christmas to you and eveybody on the list.
Michael
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 11:38 ` Neuer User
@ 2015-12-23 19:45 ` Noah Massey
2015-12-23 20:07 ` Neuer User
2015-12-23 19:49 ` Chris Murphy
1 sibling, 1 reply; 18+ messages in thread
From: Noah Massey @ 2015-12-23 19:45 UTC (permalink / raw)
To: Neuer User; +Cc: linux-btrfs
On Wed, Dec 23, 2015 at 6:38 AM, Neuer User <auslands-kv@gmx.de> wrote:
> Am 23.12.2015 um 12:21 schrieb Martin Steigerwald:
>> Hi.
>>
>> As far as I understand this way you basically loose the RAID 1 semantics of
>> BTRFS. While the data is redundant on the HDDs, it is not redundant on the
>> SSD. It may work for a pure read cache, but for write-through you definately
>> loose any data integrity protection a RAID 1 gives you.
>>
> Hmm, are you sure? I thought LVM lies underneath btrfs. Btrfs thus
> should not know about the caching SSD at all. It only knows of the two
> LVs on the HDDs, reading and writing data from or to one or both of the
> two LVs.
I believe Martin's concern is two-fold:
The first, major issue, concerns the default writeback cache mode,
which makes the SSD a single point of failure.
(in writeback mode, a write to a block that is cached will go only to
the cache and the block
will be marked dirty in the metadata.) If the SSD fails with dirty
data in the cache which has not been flushed to the backing devices,
the filesystem may be in a unrecoverable state, because writes which
BTRFS was told had succeeded are not present on disk.
The second potential issue is that if the SSD performs internal
deduplication, the two copies of cached data (contents on drive 1,
content on drive 2) may actually be a reference to the same bits of
internal storage, meaning a single corruption will affect both cached
copies. If in writeback, then corrupted data could flush down to both
disks. I'm not sure what would happen in writethrough.
~ Noah
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 11:38 ` Neuer User
2015-12-23 19:45 ` Noah Massey
@ 2015-12-23 19:49 ` Chris Murphy
2015-12-23 20:21 ` Neuer User
1 sibling, 1 reply; 18+ messages in thread
From: Chris Murphy @ 2015-12-23 19:49 UTC (permalink / raw)
To: Neuer User; +Cc: Btrfs BTRFS
On Wed, Dec 23, 2015 at 4:38 AM, Neuer User <auslands-kv@gmx.de> wrote:
> Am 23.12.2015 um 12:21 schrieb Martin Steigerwald:
>> Hi.
>>
>> As far as I understand this way you basically loose the RAID 1 semantics of
>> BTRFS. While the data is redundant on the HDDs, it is not redundant on the
>> SSD. It may work for a pure read cache, but for write-through you definately
>> loose any data integrity protection a RAID 1 gives you.
>>
> Hmm, are you sure? I thought LVM lies underneath btrfs. Btrfs thus
> should not know about the caching SSD at all. It only knows of the two
> LVs on the HDDs, reading and writing data from or to one or both of the
> two LVs.
>
> Only then lvmcache decides if it reads the data from the underlying HDD
> or from the cache ssd. LVM shouldn't even know that the two LVs are
> configured as RAID1 on btrfs as this is a level higher. So for LVM the
> two LVs are diffeent data, both of which would need to be cached
> independently on the SDD.
>
> What might happen though, is that there is a data loss on the SDD,
> returning a mismatching checksum, so btrfs might think that the data is
> incorrect on one LV (=HDD), although it is indeed correct there. That
> would lead btrfs to read the data from the second LV (which might also
> be in the SDD cache or not) and then updating the (correct and
> identical) data of the first LV with it.
Seems to me if the LV's on the two HDDs are exposed, the lvmcache has
to separately keep track of those LVs. So as long as everything is
working correctly, it should be fine. That includes either transient
or persistent, but consistent, errors for either HDD or the SSD, and
Btrfs can fix up those bad reads with data from the other. If the SSD
were to decide to go nutty, chances are reads through lvmcache would
be corrupt no matter what LV is being read by Btrfs, and it'll be
aware of that and discard those reads. Any corrupt writes in this
case, won't be immediately known by Btrfs because it (like any file
system) assumes writes are OK unless the device reports a write
failure, but those too would be found on read.
The question I have, that I don't know the answer to, is if the stack
arrives at a point where all writes are corrupt but hardware isn't
reporting write errors, and it continues to happen for a while, once
you've resolved that problem and try to mount the file system again,
how well does Btrfs disregard all those bad writes? How well would any
filesystem?
--
Chris Murphy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 19:45 ` Noah Massey
@ 2015-12-23 20:07 ` Neuer User
2015-12-23 20:38 ` Holger Hoffstätte
0 siblings, 1 reply; 18+ messages in thread
From: Neuer User @ 2015-12-23 20:07 UTC (permalink / raw)
To: linux-btrfs; +Cc: linux-btrfs
Am 23.12.2015 um 20:45 schrieb Noah Massey:
> On Wed, Dec 23, 2015 at 6:38 AM, Neuer User <auslands-kv@gmx.de> wrote:
> I believe Martin's concern is two-fold:
>
> The first, major issue, concerns the default writeback cache mode,
> which makes the SSD a single point of failure.
> (in writeback mode, a write to a block that is cached will go only to
> the cache and the block
> will be marked dirty in the metadata.) If the SSD fails with dirty
> data in the cache which has not been flushed to the backing devices,
> the filesystem may be in a unrecoverable state, because writes which
> BTRFS was told had succeeded are not present on disk.
Ok, I see. Would it help, if the cache were set to writethrough then? In
this case the data on the hdds should be always ok, right? (At least as
long as the hdds are fine.)
>
> The second potential issue is that if the SSD performs internal
> deduplication, the two copies of cached data (contents on drive 1,
> content on drive 2) may actually be a reference to the same bits of
> internal storage, meaning a single corruption will affect both cached
> copies. If in writeback, then corrupted data could flush down to both
> disks. I'm not sure what would happen in writethrough.
>
Understood. However, do SSDs really do automatic deduplication? I might
be completely wrong here, but that sounds to be a rather complex
mechanism, requiring lots of RAM to deduplicate 100 GB. I wouldn't have
thought that typical SSDs include that?
> ~ Noah
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 19:49 ` Chris Murphy
@ 2015-12-23 20:21 ` Neuer User
2015-12-23 20:56 ` Chris Murphy
0 siblings, 1 reply; 18+ messages in thread
From: Neuer User @ 2015-12-23 20:21 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
Am 23.12.2015 um 20:49 schrieb Chris Murphy:
> Seems to me if the LV's on the two HDDs are exposed, the lvmcache has
> to separately keep track of those LVs. So as long as everything is
> working correctly, it should be fine. That includes either transient
> or persistent, but consistent, errors for either HDD or the SSD, and
> Btrfs can fix up those bad reads with data from the other. If the SSD
> were to decide to go nutty, chances are reads through lvmcache would
> be corrupt no matter what LV is being read by Btrfs, and it'll be
> aware of that and discard those reads. Any corrupt writes in this
> case, won't be immediately known by Btrfs because it (like any file
> system) assumes writes are OK unless the device reports a write
> failure, but those too would be found on read.
What corrupt write do you mean? The "nuts" SSD is not going to write to
the HDDs, that will be done by lvmcache. So the HDDs should get the
correct data, only the SSD will be bad, right?
And that would become obvious with the next reads, in which case btrfs
probably would throw an error as it gets crazy data from apparently both
LVs (but only coming from the SSD). So, that could be fixed by removing
the SSD without any data loss from the HDDs, right?
>
> The question I have, that I don't know the answer to, is if the stack
> arrives at a point where all writes are corrupt but hardware isn't
> reporting write errors, and it continues to happen for a while, once
> you've resolved that problem and try to mount the file system again,
> how well does Btrfs disregard all those bad writes? How well would any
> filesystem?
>
Hmm, again the writes to the HDDs should be ok. Only the SSD would have
pretty corrupt data, right? In such a case it might depend on how much
bad data is read back from the SSDs and what the filesystem does in
raction to these?
P.S.: Of course, one other possibility would be to use a second SSD, so
that each LV has a separate caching SSD. In this case, there would
always be a valid source (given that not both SSDs go nuts the same
time...).
But I would need another slot for this. If the pros are very high,
that's ok. If it works nicely with just one SSD, then even better.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 10:45 btrfs und lvm-cache? Neuer User
2015-12-23 11:21 ` Martin Steigerwald
@ 2015-12-23 20:24 ` Neuer User
2015-12-23 20:59 ` Chris Murphy
2015-12-24 2:04 ` Duncan
2015-12-24 14:56 ` Piotr Pawłow
3 siblings, 1 reply; 18+ messages in thread
From: Neuer User @ 2015-12-23 20:24 UTC (permalink / raw)
To: linux-btrfs
One other thing:
I read that btrfs has some options that are turned off for SSDs as they
might be harmful or so. In my case btrfs, however, would not know about
the SSD and probably use its HDD optimized settings. The result,
however, would be forwared also to the SSD via lvmcache. Do I see that
right? Would that give any serious problems?
Am 23.12.2015 um 11:45 schrieb Neuer User:
> Hello
>
> I want to setup a small homeserver, based on a HP Microserver Gen8 (4GB
> RAM, 2x3TB HDD + 1x120GB SSD) and Proxmox as distro.
>
> The server will be used to host a (small) number of virtual machines,
> most of them being LXC containers, few being KVM machines. One of the
> LXC containers will host a fileserver with app 1 TB of data and another
> one a backup system for the desktops / laptops in my household, thus
> probably holding quite a lot of files. The lxc containers will use the
> filesystem of the proxmox host, the KVM machines probably raw disk files
> (or qcow2).
>
> I would like to combine high data integrity with some speed, so I
> thought of the following layout:
>
> - both hdd and ssd in one LVM VG
> - one LV on each hdd, containing a btrfs filesystem
> - both btrfs LV configured as RAID1
> - the single SDD used as a LVM cache device for both HDD LVs to speed up
> random access, where possible
>
> Now, I wonder if that is a good architecture to go for. Any input on
> that? Is btrfs the right way to go for, or should I better go for ZFS
> (and purchase some more gigs of RAM)?
>
> Will there be any problems arising from the lvmcache? btrfs only sees
> the HDDs, LVM does the SDD handling.
>
> Thanks for any input. I like btrfs very much, but data integrity is
> important for this.
>
> Michael
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 20:07 ` Neuer User
@ 2015-12-23 20:38 ` Holger Hoffstätte
0 siblings, 0 replies; 18+ messages in thread
From: Holger Hoffstätte @ 2015-12-23 20:38 UTC (permalink / raw)
To: linux-btrfs
On 12/23/15 21:07, Neuer User wrote:
> Understood. However, do SSDs really do automatic deduplication? I might
> be completely wrong here, but that sounds to be a rather complex
> mechanism, requiring lots of RAM to deduplicate 100 GB. I wouldn't have
> thought that typical SSDs include that?
tl;dr: no, because delta encoding/write buffer coalescing is not dedupe.
This is one of those persistent myth that has been kept alive by the
internet rumor machine. It has its roots in a series of blog articles [1]
and turned out to be panic coupled with FUD and fueled by a lack of factual
information.
I suggest everyone read the article(s), ALL the comments and then get back
to drinking. :o)
In SSD arrays dedupe is generally seen as a good thing.
-h
[1] http://storagemojo.com/2011/06/27/de-dup-too-much-of-good-thing/
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 20:21 ` Neuer User
@ 2015-12-23 20:56 ` Chris Murphy
2015-12-24 15:19 ` Neuer User
0 siblings, 1 reply; 18+ messages in thread
From: Chris Murphy @ 2015-12-23 20:56 UTC (permalink / raw)
To: Neuer User; +Cc: Chris Murphy, Btrfs BTRFS
On Wed, Dec 23, 2015 at 1:21 PM, Neuer User <auslands-kv@gmx.de> wrote:
> Am 23.12.2015 um 20:49 schrieb Chris Murphy:
>> Seems to me if the LV's on the two HDDs are exposed, the lvmcache has
>> to separately keep track of those LVs. So as long as everything is
>> working correctly, it should be fine. That includes either transient
>> or persistent, but consistent, errors for either HDD or the SSD, and
>> Btrfs can fix up those bad reads with data from the other. If the SSD
>> were to decide to go nutty, chances are reads through lvmcache would
>> be corrupt no matter what LV is being read by Btrfs, and it'll be
>> aware of that and discard those reads. Any corrupt writes in this
>> case, won't be immediately known by Btrfs because it (like any file
>> system) assumes writes are OK unless the device reports a write
>> failure, but those too would be found on read.
>
> What corrupt write do you mean? The "nuts" SSD is not going to write to
> the HDDs, that will be done by lvmcache. So the HDDs should get the
> correct data, only the SSD will be bad, right?
Btrfs always writes to the 'cache LV' and then it's up to lvmcache to
determine how and when things are written to the 'cache pool LV' vs
the 'origin LV' and I have no idea if there's a case with writeback
mode where things write to the SSD and only later get copied from SSD
to the HDD, in which case a wildly misbehaving SSD might corrupt data
on the origin.
If you use writethrough, the default, then the data on HDDs should be
fine even if the single SSD goes crazy for some reason. Even if all
reads go bad, worse case is Btrfs should stop and go read-only. If the
SSD read errors are more transient, then Btrfs tries to fix them with
COW writes, so even if these fixes aren't needed on HDD, they should
arrive safely on both HDDs and hence still no corruption.
I mean *really* if data integrity is paramount you probably would do
this with production methods. Anything that has high IOPS like a mail
server, just write that stuff only to the SSD, and then occasionally
rsync it to conventionally raided (md or lvm) HDDs with XFS. You could
even use lvm snapshots and do this often, and now you not only have
something fast and safe but also you have an integrated backup that's
mirrored, in a sense you have three copies. Whereas what you're
attempting is rather complicated, and while it ought to work and it
gets testing, you're really being a test candidate not least of which
is Btrfs but also lvmcache, but you're also combining both tests. I'd
just say make sure you have regular backups - snapshot the rw
subvolume regularly and sync it to another filesystem. As often as the
workflow can tolerate.
>
> And that would become obvious with the next reads, in which case btrfs
> probably would throw an error as it gets crazy data from apparently both
> LVs (but only coming from the SSD). So, that could be fixed by removing
> the SSD without any data loss from the HDDs, right?
Only if you're using writethrough mode, but yes.
>
>>
>> The question I have, that I don't know the answer to, is if the stack
>> arrives at a point where all writes are corrupt but hardware isn't
>> reporting write errors, and it continues to happen for a while, once
>> you've resolved that problem and try to mount the file system again,
>> how well does Btrfs disregard all those bad writes? How well would any
>> filesystem?
>>
> Hmm, again the writes to the HDDs should be ok. Only the SSD would have
> pretty corrupt data, right? In such a case it might depend on how much
> bad data is read back from the SSDs and what the filesystem does in
> raction to these?
>
> P.S.: Of course, one other possibility would be to use a second SSD, so
> that each LV has a separate caching SSD. In this case, there would
> always be a valid source (given that not both SSDs go nuts the same
> time...).
Simplistically, SSDs seem to fail two ways: a series of transient
errors that Btrfs can pretty much always account for; and then totally
face planting. The way they faceplant can be all writes fail, reads
work, or the whole device just vanishes off the bus. I don't know how
that affects lvmcache writethrough if the entire cache pool vanishes.
It should still write to the HDDs but I don't know that it does.
> But I would need another slot for this. If the pros are very high,
> that's ok. If it works nicely with just one SSD, then even better.
Yeah if it's a decent name brand SSD and not one of the ones with
known crap firmware, then I think it's fine to just have one. Either
way, each origin LV gets a separate cache pool LV if I understand
lvmcache correctly.
Off hand I don't know if you need separate VGs to make sure the 'cache
LVs' you format with Btrfs in fact use different PVs as origins.
That's important. The usual lvcreate command has a way to specify one
or more PVs to use, rather than have it just grab a pile of extents
from the VG (which could be from either PV), but I don't know if
that's the way it works in conjunction with lvmcache.
You're probably best off configuring this, and while doing write, pull
a device. Do that three times, once for each HDD and once for the SSD,
and see if you can recover. If it has to be bullet proof, you need to
spray it with bullets.
--
Chris Murphy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 20:24 ` Neuer User
@ 2015-12-23 20:59 ` Chris Murphy
0 siblings, 0 replies; 18+ messages in thread
From: Chris Murphy @ 2015-12-23 20:59 UTC (permalink / raw)
To: Neuer User; +Cc: Btrfs BTRFS
On Wed, Dec 23, 2015 at 1:24 PM, Neuer User <auslands-kv@gmx.de> wrote:
> One other thing:
>
> I read that btrfs has some options that are turned off for SSDs as they
> might be harmful or so. In my case btrfs, however, would not know about
> the SSD and probably use its HDD optimized settings. The result,
> however, would be forwared also to the SSD via lvmcache. Do I see that
> right? Would that give any serious problems?
No, Btrfs is fine for SSD with or without the optimization. And with
optimization is OK for hard drives also. I think you're unlikely to
notice any difference, but you can test it if you want with mount
options ssd or nossd, depending on how the cache LV is detected (I'd
guess it's detected as non rotational so ssd option is default).
--
Chris Murphy
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 10:45 btrfs und lvm-cache? Neuer User
2015-12-23 11:21 ` Martin Steigerwald
2015-12-23 20:24 ` Neuer User
@ 2015-12-24 2:04 ` Duncan
2015-12-24 15:24 ` Neuer User
2015-12-24 14:56 ` Piotr Pawłow
3 siblings, 1 reply; 18+ messages in thread
From: Duncan @ 2015-12-24 2:04 UTC (permalink / raw)
To: linux-btrfs
Neuer User posted on Wed, 23 Dec 2015 11:45:28 +0100 as excerpted:
> - both hdd and ssd in one LVM VG
> - one LV on each hdd, containing a btrfs filesystem
> - both btrfs LV configured as RAID1
> - the single SDD used as a LVM cache device for both HDD LVs to speed up
> random access, where possible
I'll let others debate the lvm-cache details, which I don't know much
about, but I do have a couple points to add, one of which is detail,
one rather higher level. The higher level one first:
1) While I've seen both bcache and lvm-cache discussed as potential
options here, there is at least one user using bcache on top of btrfs
that posts to bcache-related threads here with some regularity.
While there were some serious bugs to work thru early on, his
recent posts suggest current bcache works very well with current
btrfs, and given that he has posted to several threads with some
time separation between them, he does appear to be a regular here,
and I expect he'd be posting pretty fast if things started going
buggy for him once again.
There hasn't been a corresponding regular poster here using lvm-cache,
so while it may work well, we don't know that. At minimum, postings
thus suggest that bcache on btrfs is a better tested solution at
this point, and thus, would be recommended, while lvm-cache on btrfs,
while an equally valid technical choice in theory, doesn't have much
if any real-world data going for it at this point, and is thus
in practice an unknown.
2) Not being the person using bcache and not being familiar with it
or lvm-cache personally, I don't know how either one handle btrfs
multi-device. However, it occurs to me that if it's necessary,
in addition to the multiple ssds suggested by the others to cover
such multi-device caching, you should also be able to partition
up the ssd, and use each partition as an individual device cache.
That's almost certainly what I'd do here if I needed to (except
that above a certain size, ssd prices per GiB start to go up
dramatically, so if I wanted total ssd cache sizes above that I'd
of course pay less for multiple smaller ssds again) instead of
fiddling with multiple physical ssds, but again, not knowing
how the caching works, I'm not sure if multiple cache devices
would be needed to cache a multi-device btrfs at the back end,
or not, so I don't know whether I'd need to bother with such
partitioning or not.
The key here is that on ssds, seek time is zero anyway, so
partitioning up the ssd and using both partitions as cache
doesn't have the latency issues that attempting to do something
like that (or for example btrfs raid1 on two partitions on the
same physical device) would have on spinning rust.
I thought I'd throw those points out, in case you had failed to
notice bcache as an option and would prefer it as better tested,
once you knew about it, and in case the partitioned ssd idea
does help with the multi-device btrfs caching thing.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 10:45 btrfs und lvm-cache? Neuer User
` (2 preceding siblings ...)
2015-12-24 2:04 ` Duncan
@ 2015-12-24 14:56 ` Piotr Pawłow
2015-12-24 15:29 ` Neuer User
3 siblings, 1 reply; 18+ messages in thread
From: Piotr Pawłow @ 2015-12-24 14:56 UTC (permalink / raw)
To: Neuer User, linux-btrfs
Hello,
> - both hdd and ssd in one LVM VG
> - one LV on each hdd, containing a btrfs filesystem
> - both btrfs LV configured as RAID1
> - the single SDD used as a LVM cache device for both HDD LVs to speed up
> random access, where possible
I have a setup like this for my /home. It works but it's a crappy solution.
The effective capacity for caching is halved, and it takes twice as much
time to fully cache your working set, because you get a cache miss at
least once for each mirror.
There are also some gotchas:
- you should use "device=" mount options, or else there is a danger of
btrfs mounting origin devices and even mixing cached and origin in one
FS. I completely broke my FS before realizing what's going on.
- you should use writethrough mode if you only have one SSD. There was a
bug in LVM where it wouldn't save the caching mode and revert to
writeback after restart, so make sure you use the latest version of LVM
tools.
- if your SSD dies, you may have to use vgcfgbackup, manual config edit,
then vgcfgrestore to remove the cache, because last time I checked, LVM
tools still were handling writethrough cache the same as writeback,
disallowing volume activation without the cache and removal of missing
cache.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-23 20:56 ` Chris Murphy
@ 2015-12-24 15:19 ` Neuer User
0 siblings, 0 replies; 18+ messages in thread
From: Neuer User @ 2015-12-24 15:19 UTC (permalink / raw)
To: linux-btrfs
Am 23.12.2015 um 21:56 schrieb Chris Murphy:
> Btrfs always writes to the 'cache LV' and then it's up to lvmcache to
> determine how and when things are written to the 'cache pool LV' vs
> the 'origin LV' and I have no idea if there's a case with writeback
> mode where things write to the SSD and only later get copied from SSD
> to the HDD, in which case a wildly misbehaving SSD might corrupt data
> on the origin.
>
> If you use writethrough, the default, then the data on HDDs should be
> fine even if the single SSD goes crazy for some reason. Even if all
> reads go bad, worse case is Btrfs should stop and go read-only. If the
> SSD read errors are more transient, then Btrfs tries to fix them with
> COW writes, so even if these fixes aren't needed on HDD, they should
> arrive safely on both HDDs and hence still no corruption.
>
Yeah, that's what I hoped for.
> I mean *really* if data integrity is paramount you probably would do
> this with production methods. Anything that has high IOPS like a mail
> server, just write that stuff only to the SSD, and then occasionally
> rsync it to conventionally raided (md or lvm) HDDs with XFS. You could
> even use lvm snapshots and do this often, and now you not only have
> something fast and safe but also you have an integrated backup that's
> mirrored, in a sense you have three copies. Whereas what you're
Something like that is what I used before. Ext4 on LVM with LVM
snapshots and external backups. But it does not help against silent
bitrot. And that is the scary thing, I would like to care for and thus
having a deeper look at btrfs.
http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/
> attempting is rather complicated, and while it ought to work and it
> gets testing, you're really being a test candidate not least of which
> is Btrfs but also lvmcache, but you're also combining both tests. I'd
> just say make sure you have regular backups - snapshot the rw
> subvolume regularly and sync it to another filesystem. As often as the
> workflow can tolerate.
>
Yeah, that's what I thought of. Snapshotting the system and
send/receiving it to a second external disk. Additionally, using
duplicity/rsync with Amazon Glacier.
>
> Yeah if it's a decent name brand SSD and not one of the ones with
> known crap firmware, then I think it's fine to just have one. Either
> way, each origin LV gets a separate cache pool LV if I understand
> lvmcache correctly.
>
Is there a list of "SSDs with known crap firmware"? Currently I have a
Toshiba disk Q300 in use.
> Off hand I don't know if you need separate VGs to make sure the 'cache
> LVs' you format with Btrfs in fact use different PVs as origins.
> That's important. The usual lvcreate command has a way to specify one
> or more PVs to use, rather than have it just grab a pile of extents
> from the VG (which could be from either PV), but I don't know if
> that's the way it works in conjunction with lvmcache.
>
> You're probably best off configuring this, and while doing write, pull
> a device. Do that three times, once for each HDD and once for the SSD,
> and see if you can recover. If it has to be bullet proof, you need to
> spray it with bullets.
>
Yeah, I should do a test like this. Looks scary, but it is probably the
best.
Thanks for your input!
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-24 2:04 ` Duncan
@ 2015-12-24 15:24 ` Neuer User
0 siblings, 0 replies; 18+ messages in thread
From: Neuer User @ 2015-12-24 15:24 UTC (permalink / raw)
To: linux-btrfs
Am 24.12.2015 um 03:04 schrieb Duncan:
I had a look at bcache, but focused on lvmcache mainly because of the
flexibility it offers. It can be easily added and removed. For LVM it is
just another LV, so all the LVM magic applies.
But thanks, I should take another look at bcache.
> I'll let others debate the lvm-cache details, which I don't know much
> about, but I do have a couple points to add, one of which is detail,
> one rather higher level. The higher level one first:
>
> 1) While I've seen both bcache and lvm-cache discussed as potential
> options here, there is at least one user using bcache on top of btrfs
> that posts to bcache-related threads here with some regularity.
> While there were some serious bugs to work thru early on, his
> recent posts suggest current bcache works very well with current
> btrfs, and given that he has posted to several threads with some
> time separation between them, he does appear to be a regular here,
> and I expect he'd be posting pretty fast if things started going
> buggy for him once again.
>
> There hasn't been a corresponding regular poster here using lvm-cache,
> so while it may work well, we don't know that. At minimum, postings
> thus suggest that bcache on btrfs is a better tested solution at
> this point, and thus, would be recommended, while lvm-cache on btrfs,
> while an equally valid technical choice in theory, doesn't have much
> if any real-world data going for it at this point, and is thus
> in practice an unknown.
>
> 2) Not being the person using bcache and not being familiar with it
> or lvm-cache personally, I don't know how either one handle btrfs
> multi-device. However, it occurs to me that if it's necessary,
> in addition to the multiple ssds suggested by the others to cover
> such multi-device caching, you should also be able to partition
> up the ssd, and use each partition as an individual device cache.
> That's almost certainly what I'd do here if I needed to (except
> that above a certain size, ssd prices per GiB start to go up
> dramatically, so if I wanted total ssd cache sizes above that I'd
> of course pay less for multiple smaller ssds again) instead of
> fiddling with multiple physical ssds, but again, not knowing
> how the caching works, I'm not sure if multiple cache devices
> would be needed to cache a multi-device btrfs at the back end,
> or not, so I don't know whether I'd need to bother with such
> partitioning or not.
>
> The key here is that on ssds, seek time is zero anyway, so
> partitioning up the ssd and using both partitions as cache
> doesn't have the latency issues that attempting to do something
> like that (or for example btrfs raid1 on two partitions on the
> same physical device) would have on spinning rust.
>
>
> I thought I'd throw those points out, in case you had failed to
> notice bcache as an option and would prefer it as better tested,
> once you knew about it, and in case the partitioned ssd idea
> does help with the multi-device btrfs caching thing.
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-24 14:56 ` Piotr Pawłow
@ 2015-12-24 15:29 ` Neuer User
2015-12-24 16:42 ` Piotr Pawłow
0 siblings, 1 reply; 18+ messages in thread
From: Neuer User @ 2015-12-24 15:29 UTC (permalink / raw)
To: linux-btrfs
Am 24.12.2015 um 15:56 schrieb Piotr Pawłow:
> Hello,
>> - both hdd and ssd in one LVM VG
>> - one LV on each hdd, containing a btrfs filesystem
>> - both btrfs LV configured as RAID1
>> - the single SDD used as a LVM cache device for both HDD LVs to speed up
>> random access, where possible
>
> I have a setup like this for my /home. It works but it's a crappy solution.
>
Indeed? Exactly like this? Great to hear. But sad to hear it is not a
good solution.
> The effective capacity for caching is halved, and it takes twice as much
> time to fully cache your working set, because you get a cache miss at
> least once for each mirror.
>
> There are also some gotchas:
>
> - you should use "device=" mount options, or else there is a danger of
> btrfs mounting origin devices and even mixing cached and origin in one
> FS. I completely broke my FS before realizing what's going on.
Hmm, strange. I thought btrfs should not even know of the lvmcache. It
would just try to mount the HDD LVs and the caching is done
automatically via lvmcache?
> - you should use writethrough mode if you only have one SSD. There was a
> bug in LVM where it wouldn't save the caching mode and revert to
> writeback after restart, so make sure you use the latest version of LVM
> tools.
Do you know, which version is good?
> - if your SSD dies, you may have to use vgcfgbackup, manual config edit,
> then vgcfgrestore to remove the cache, because last time I checked, LVM
> tools still were handling writethrough cache the same as writeback,
> disallowing volume activation without the cache and removal of missing
> cache.
>
Sounds complicated, but possible. Pity that LVM does not auto-remove the
cache if it is not there...
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-24 15:29 ` Neuer User
@ 2015-12-24 16:42 ` Piotr Pawłow
2015-12-25 17:11 ` Neuer User
0 siblings, 1 reply; 18+ messages in thread
From: Piotr Pawłow @ 2015-12-24 16:42 UTC (permalink / raw)
To: Neuer User, linux-btrfs
W dniu 24.12.2015 o 16:29, Neuer User pisze:
> Am 24.12.2015 um 15:56 schrieb Piotr Pawłow:
>> Hello,
>>> - both hdd and ssd in one LVM VG
>>> - one LV on each hdd, containing a btrfs filesystem
>>> - both btrfs LV configured as RAID1
>>> - the single SDD used as a LVM cache device for both HDD LVs to speed up
>>> random access, where possible
>> I have a setup like this for my /home. It works but it's a crappy solution.
>>
> Indeed? Exactly like this? Great to hear. But sad to hear it is not a
> good solution.
Exactly. Single SSD caching 2 LVs used for btrfs RAID1. Don't get me
wrong, it's still a lot better than without caching, but not optimal. An
optimal solution would have to be integrated with the FS like in ZFS.
>> The effective capacity for caching is halved, and it takes twice as much
>> time to fully cache your working set, because you get a cache miss at
>> least once for each mirror.
>>
>> There are also some gotchas:
>>
>> - you should use "device=" mount options, or else there is a danger of
>> btrfs mounting origin devices and even mixing cached and origin in one
>> FS. I completely broke my FS before realizing what's going on.
> Hmm, strange. I thought btrfs should not even know of the lvmcache. It
> would just try to mount the HDD LVs and the caching is done
> automatically via lvmcache?
Unfortunately, at least on my system, there are device files for origin LVs:
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log
Cpy%Sync Convert
home1 pp Cwi-aoC--- 1,42t [cache0] [home1_corig] 100,00
11,02 0,00
home2 pp Cwi-aoC--- 1,42t [cache1] [home2_corig] 100,00
11,02 0,00
[...]
# ls -1 /dev/mapper/
[...]
pp-home1
pp-home1_corig
pp-home2
pp-home2_corig
[...]
... which btrfs would detect, pick up at random and assemble into the
RAID set. I had to do this in fstab to force only specified devices:
UUID=[...] /home btrfs
noatime,autodefrag,subvol=@home,device=/dev/mapper/pp-home1,device=/dev/mapper/pp-home2
>> - you should use writethrough mode if you only have one SSD. There was a
>> bug in LVM where it wouldn't save the caching mode and revert to
>> writeback after restart, so make sure you use the latest version of LVM
>> tools.
> Do you know, which version is good?
I know it was buggy in Ubuntu Vivid (version 2.02.111 I think), and in
Wily it's OK (curently 2.02.122).
Looking at the changelog, it may have been fixed in 2.02.112 by commit
9d57aa9a0fe00322cb188ad1f3103d57392546e7:
"cache-pool: Fix specification of cachemode when converting to cache-pool
Failure to copy the 'feature_flags' lvconvert_param to the matching
lv_segment field meant that when a user specified the cachemode argument,
the request was not honored."
Of cource I may be wrong, I haven't bisected it.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs und lvm-cache?
2015-12-24 16:42 ` Piotr Pawłow
@ 2015-12-25 17:11 ` Neuer User
0 siblings, 0 replies; 18+ messages in thread
From: Neuer User @ 2015-12-25 17:11 UTC (permalink / raw)
To: linux-btrfs
Thanks for all the answers from all you guys. They are really very much
appreciated!
Taken together, it seems I am left with the following options:
1) Btrfs/RAID1 with lvmcache: Not well proven, at least partly buggy.
Caches can be easily added and removed to existing partitions.
2) BTRFS/RADI1 with bcache: sees to be more stable. HDDs can however,
not be used easily without bcache. Complete data conversion is needed.
3) ZFS with ZFS cache device: Well proven and stable, but VERY memory
hungry and not in main kernel.
Well, I guess, I should take some time thinking about it...
To everybody, enjoy christmas!
Am 24.12.2015 um 17:42 schrieb Piotr Pawłow:
>> Indeed? Exactly like this? Great to hear. But sad to hear it is not a
>> good solution.
>
> Exactly. Single SSD caching 2 LVs used for btrfs RAID1. Don't get me
> wrong, it's still a lot better than without caching, but not optimal. An
> optimal solution would have to be integrated with the FS like in ZFS.
>
>>> The effective capacity for caching is halved, and it takes twice as much
>>> time to fully cache your working set, because you get a cache miss at
>>> least once for each mirror.
>>>
>>> There are also some gotchas:
>>>
>>> - you should use "device=" mount options, or else there is a danger of
>>> btrfs mounting origin devices and even mixing cached and origin in one
>>> FS. I completely broke my FS before realizing what's going on.
>> Hmm, strange. I thought btrfs should not even know of the lvmcache. It
>> would just try to mount the HDD LVs and the caching is done
>> automatically via lvmcache?
>
> Unfortunately, at least on my system, there are device files for origin
> LVs:
>
> # lvs
> LV VG Attr LSize Pool Origin Data% Meta% Move Log
> Cpy%Sync Convert
> home1 pp Cwi-aoC--- 1,42t [cache0] [home1_corig] 100,00
> 11,02 0,00
> home2 pp Cwi-aoC--- 1,42t [cache1] [home2_corig] 100,00
> 11,02 0,00
> [...]
>
> # ls -1 /dev/mapper/
> [...]
> pp-home1
> pp-home1_corig
> pp-home2
> pp-home2_corig
> [...]
>
> ... which btrfs would detect, pick up at random and assemble into the
> RAID set. I had to do this in fstab to force only specified devices:
>
> UUID=[...] /home btrfs
> noatime,autodefrag,subvol=@home,device=/dev/mapper/pp-home1,device=/dev/mapper/pp-home2
>
>
>>> - you should use writethrough mode if you only have one SSD. There was a
>>> bug in LVM where it wouldn't save the caching mode and revert to
>>> writeback after restart, so make sure you use the latest version of LVM
>>> tools.
>> Do you know, which version is good?
>
> I know it was buggy in Ubuntu Vivid (version 2.02.111 I think), and in
> Wily it's OK (curently 2.02.122).
>
> Looking at the changelog, it may have been fixed in 2.02.112 by commit
> 9d57aa9a0fe00322cb188ad1f3103d57392546e7:
>
> "cache-pool: Fix specification of cachemode when converting to cache-pool
>
> Failure to copy the 'feature_flags' lvconvert_param to the matching
> lv_segment field meant that when a user specified the cachemode argument,
> the request was not honored."
>
> Of cource I may be wrong, I haven't bisected it.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2015-12-25 17:11 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-23 10:45 btrfs und lvm-cache? Neuer User
2015-12-23 11:21 ` Martin Steigerwald
2015-12-23 11:38 ` Neuer User
2015-12-23 19:45 ` Noah Massey
2015-12-23 20:07 ` Neuer User
2015-12-23 20:38 ` Holger Hoffstätte
2015-12-23 19:49 ` Chris Murphy
2015-12-23 20:21 ` Neuer User
2015-12-23 20:56 ` Chris Murphy
2015-12-24 15:19 ` Neuer User
2015-12-23 20:24 ` Neuer User
2015-12-23 20:59 ` Chris Murphy
2015-12-24 2:04 ` Duncan
2015-12-24 15:24 ` Neuer User
2015-12-24 14:56 ` Piotr Pawłow
2015-12-24 15:29 ` Neuer User
2015-12-24 16:42 ` Piotr Pawłow
2015-12-25 17:11 ` Neuer User
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox