* layering question.
@ 2015-08-04 16:20 A. James Lewis
2015-08-04 17:01 ` Jens-U. Mozdzen
2015-08-05 6:28 ` Kai Krakow
0 siblings, 2 replies; 14+ messages in thread
From: A. James Lewis @ 2015-08-04 16:20 UTC (permalink / raw)
To: linux-bcache
Hi all...
I've heard rumours that layering bcache with other block device drivers
might not be recommended... I wonder what the truth really is... perhaps
someone can advise.
I was planning to use 2 SSD's... combined with 4 large spinning drives
to create a large filesystem with BTRFS... my questions are as follows.
1. Is there a way to use 2 SSD's directly, or would it be OK to use MD
to stripe them?... then used the MD array as the cache device?
2. I would be using BTRFS, so would it be better to create 4 separate
bcache devices each attached to the single cache device, and then use
BTRFS to raid 4 bcache devices... obviously this would be more flexible,
or would I need to make an MD raid of the 4 devices, and then use that
to create a single bcache device and build a BTRFS filesystem on top of
that.
Hope that's clear, any clarification would be appreciated...
Also, there's talk about a pending on-disk cache format change some time
around 3.19, but no details... is this over with, or still pending?
James
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-04 16:20 layering question A. James Lewis
@ 2015-08-04 17:01 ` Jens-U. Mozdzen
2015-08-04 17:16 ` A. James Lewis
2015-08-05 6:28 ` Kai Krakow
1 sibling, 1 reply; 14+ messages in thread
From: Jens-U. Mozdzen @ 2015-08-04 17:01 UTC (permalink / raw)
To: A. James Lewis; +Cc: linux-bcache
Hi James,
Zitat von "A. James Lewis" <james@fsck.co.uk>:
> Hi all...
>
> I've heard rumours that layering bcache with other block device
> drivers might not be recommended... I wonder what the truth really
> is... perhaps someone can advise.
to me it's more than rumors. We're facing severe difficulties (server
reboots, disks marked faulty by MDRAID, hangs) in our layered setup:
- physical disks MD-RAID6 (data) plus two SSDs MD-RAID1 (cache)
- bcache
- LVM
- DRBD for many of the logical volumes (always primary, no fail-overs)
- ext4 fs
- NFS / Samba / SCST (fileio)
> I was planning to use 2 SSD's... combined with 4 large spinning
> drives to create a large filesystem with BTRFS... my questions are
> as follows.
>
> 1. Is there a way to use 2 SSD's directly, or would it be OK to use
> MD to stripe them?... then used the MD array as the cache device?
MD-RAID1 is what our current configuration looks like. We've also
combined the spinning disks into a RAID6.
> 2. I would be using BTRFS, so would it be better to create 4
> separate bcache devices each attached to the single cache device,
> and then use BTRFS to raid 4 bcache devices... obviously this would
> be more flexible, or would I need to make an MD raid of the 4
> devices, and then use that to create a single bcache device and
> build a BTRFS filesystem on top of that.
I have no btrfs experience, so I cannot answer that one. I went for a
single data and cache device (via RAID) so I won't have to partition
my SSDs - that would not have been scalable (we're planning to add
plenty of physical disks over time, and to use many LVs/file systems).
> Also, there's talk about a pending on-disk cache format change some
> time around 3.19, but no details... is this over with, or still
> pending?
Someone else might want to help with that one as well?
Regards,
Jens
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-04 17:01 ` Jens-U. Mozdzen
@ 2015-08-04 17:16 ` A. James Lewis
2015-08-05 6:56 ` Jens-U. Mozdzen
0 siblings, 1 reply; 14+ messages in thread
From: A. James Lewis @ 2015-08-04 17:16 UTC (permalink / raw)
To: linux-bcache
Thanks for the details... to clarify, you are using raid1 for SSD cache
devices, and then creating a RAID6 MD device to act as backing store?
What kernel are you using? You are having some stability issues, but in
principle it works?... what is performance like?
James
On 04/08/15 18:01, Jens-U. Mozdzen wrote:
> Hi James,
>
> Zitat von "A. James Lewis" <james@fsck.co.uk>:
>> Hi all...
>>
>> I've heard rumours that layering bcache with other block device
>> drivers might not be recommended... I wonder what the truth really
>> is... perhaps someone can advise.
>
> to me it's more than rumors. We're facing severe difficulties (server
> reboots, disks marked faulty by MDRAID, hangs) in our layered setup:
>
> - physical disks MD-RAID6 (data) plus two SSDs MD-RAID1 (cache)
> - bcache
> - LVM
> - DRBD for many of the logical volumes (always primary, no fail-overs)
> - ext4 fs
> - NFS / Samba / SCST (fileio)
>
>> I was planning to use 2 SSD's... combined with 4 large spinning
>> drives to create a large filesystem with BTRFS... my questions are
>> as follows.
>>
>> 1. Is there a way to use 2 SSD's directly, or would it be OK to use
>> MD to stripe them?... then used the MD array as the cache device?
>
> MD-RAID1 is what our current configuration looks like. We've also
> combined the spinning disks into a RAID6.
>
>> 2. I would be using BTRFS, so would it be better to create 4 separate
>> bcache devices each attached to the single cache device, and then use
>> BTRFS to raid 4 bcache devices... obviously this would be more
>> flexible, or would I need to make an MD raid of the 4 devices, and
>> then use that to create a single bcache device and build a BTRFS
>> filesystem on top of that.
>
> I have no btrfs experience, so I cannot answer that one. I went for a
> single data and cache device (via RAID) so I won't have to partition
> my SSDs - that would not have been scalable (we're planning to add
> plenty of physical disks over time, and to use many LVs/file systems).
>
>> Also, there's talk about a pending on-disk cache format change some
>> time around 3.19, but no details... is this over with, or still pending?
>
> Someone else might want to help with that one as well?
>
> Regards,
> Jens
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-04 16:20 layering question A. James Lewis
2015-08-04 17:01 ` Jens-U. Mozdzen
@ 2015-08-05 6:28 ` Kai Krakow
2015-08-05 7:04 ` Jens-U. Mozdzen
1 sibling, 1 reply; 14+ messages in thread
From: Kai Krakow @ 2015-08-05 6:28 UTC (permalink / raw)
To: linux-bcache
A. James Lewis <james@fsck.co.uk> schrieb:
> I've heard rumours that layering bcache with other block device drivers
> might not be recommended... I wonder what the truth really is... perhaps
> someone can advise.
I think this is not just rumours. Multiple people reported problems when
layering caching or backing devices on top of MD devices. This may be an
implementation problem in MD which is gone in later kernel versions and had
to do with correctly passing discards through the layers. If you want to use
that, I'd at least recommend to disable discards, and to disable write-back
caching (just use write-through or write-around which is obviously slower
but might fit you workload).
> I was planning to use 2 SSD's... combined with 4 large spinning drives
> to create a large filesystem with BTRFS... my questions are as follows.
Using one SSD with 3 spinning drives here.
> 1. Is there a way to use 2 SSD's directly, or would it be OK to use MD
> to stripe them?... then used the MD array as the cache device?
I think currently you can only add on caching device to a cache set. I think
it is planned to allow that in a later development stage but currently your
only way to go would be a MD array if you want to use MD. I'd better suggest
to use hardware RAID for that.
> 2. I would be using BTRFS, so would it be better to create 4 separate
> bcache devices each attached to the single cache device, and then use
> BTRFS to raid 4 bcache devices... obviously this would be more flexible,
> or would I need to make an MD raid of the 4 devices, and then use that
> to create a single bcache device and build a BTRFS filesystem on top of
> that.
You can just attach multiple backing devices (each sub device of your btrfs
pool) to the same cache set - so you caching device would cache all backing
devices.
> Hope that's clear, any clarification would be appreciated...
I'd go with the following setup:
I'm not sure which btrfs RAID level you are going to use. Maybe RAID 10,
probably RAID 0. This means, btrfs tries to evenly spread writes and reads
across all devices.
I suggest using 2 cache sets. One bcache for btrfs pool member 1 and 2, one
bcache or btrfs pool member 3 and 4. If you add more members to the pool
later, just attach them in alternating order to the first or second cache
set.
This should give you most value out of your current setup.
If you plan to use 2 SSDs for improved reboustness (combined into RAID 1),
you obviously don't have this option. You could try with MDRAID tho I
wouldn't recommend this without doing your tests and backups. Better use
hardware RAID then, tho most controllers will disable the ability to use
discard then so you probably want to leave some spare space unused on your
SSDs for improved life time and long-term performance.
> Also, there's talk about a pending on-disk cache format change some time
> around 3.19, but no details... is this over with, or still pending?
No idea, I'm on 4.1 now and used bcache since 3.18 or 3.19 - not sure. It
worked well.
PS: Disable "autodefrag" if you use btrfs+bcache... ;-) It helps performance
a bit but it eats SSD lifetime (used 20% of my proposed SSD lifetime in only
2 months - according to smartctl). Better just defragment meta data from
time to time (read: use btrfs defrag on directories only) if you want to
defrag.
--
Replies to list only preferred.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-04 17:16 ` A. James Lewis
@ 2015-08-05 6:56 ` Jens-U. Mozdzen
0 siblings, 0 replies; 14+ messages in thread
From: Jens-U. Mozdzen @ 2015-08-05 6:56 UTC (permalink / raw)
To: A. James Lewis; +Cc: linux-bcache
Hi James,
Zitat von "A. James Lewis" <james@fsck.co.uk>:
> Thanks for the details... to clarify, you are using raid1 for SSD
> cache devices, and then creating a RAID6 MD device to act as backing
> store?
yes - my two SSDs are RAID1 (I named it /dev/md/linux:san02-cache),
seven 1TB disks are RAID6 (/dev/md/linux:san02-data), and these two
were prepared using "make-bcache -C /dev/md/linux\:san02-cache -B
/dev/md/linux\:san02-data" to create /dev/bcach0.
> What kernel are you using? You are having some stability issues, but
> in principle it works?... what is performance like?
Originally, I used the latest OpenSUSE 13.1 stable kernel
(kernel-default-3.11.10-29.1), but seeing random reboots that seemed
to match bugs fixed in later DRBD versions, the servers were updated
to kernel-default-3.18.8-5.1.
Basically, the system works like a charm, we were enthusiastic with
the first results. I don't have absolute numbers in terms of
throughput, but since switching to bcache, or I/O waits dropped from
"5 to 45 %" to "0 to 5%". The servers are mainly used to provide
virtual disks for about virtual machines, which are running on a
separate server farm connected via Fiber Channel. Additionally, the
servers provide NFS access to various file system (among them the home
directories of the local users and the working area for a distributed
development environment). Add in a small amount of SMB traffic for a
few MS-Win machines and you have the overall picture... mostly
small-sized accesses, with plenty of reads and writes from various
sources. Even with lots of memory caching, that mix did bring our
servers to a user-noticable i/o load, which basically vanished when
introducing bcache. Only large consecutive writes (i.e. ISOs) do go
the disks directly and hence lead to measurable I/O waits... but
that's rare and only turns up in monitoring, rather than "felt by
users".
As I detailed in the other recent thread, when switching to 3.18.8 we
suddenly were unable to create new file systems on one of the servers,
mkfs reproducibly lead to a server reboot. Turning bcache to
"writethrough" solved this, but made MD report disks in our backing
device as failing, always in the context of what seemed to be hanging
disk accesses, matching the "bcache locking problem" pattern. (I did
apply the set of known patches from this mailing list.)
Since last Saturday, we're back to "writeback" and no more disks
failed - but I haven't tried creating new file systems since, I'll
have to wait for a maintenance window ;)
Just for completeness: We use /dev/bcache0 as the only "physical
volume" to a LVM volume group, and create various logical volumes on
top (both for local system use, and many that are used per NFS
resource and per "Fiber Channel"-connected VM). The non-system LVs are
mirrored to a separate machine via DRBD (which then will periodically
break the link and backup the "snapshots" to external media) and the
actual file systems (Ext4) are created based on these DRBD resources.
Regards,
Jens
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-05 6:28 ` Kai Krakow
@ 2015-08-05 7:04 ` Jens-U. Mozdzen
2015-08-05 23:10 ` Kai Krakow
0 siblings, 1 reply; 14+ messages in thread
From: Jens-U. Mozdzen @ 2015-08-05 7:04 UTC (permalink / raw)
To: Kai Krakow; +Cc: linux-bcache
Hi Kai,
Zitat von Kai Krakow <hurikhan77@gmail.com>:
> A. James Lewis <james@fsck.co.uk> schrieb:
>
>> I've heard rumours that layering bcache with other block device drivers
>> might not be recommended... I wonder what the truth really is... perhaps
>> someone can advise.
>
> I think this is not just rumours. Multiple people reported problems when
> layering caching or backing devices on top of MD devices. This may be an
> implementation problem in MD which is gone in later kernel versions [...]
being rather new to bcache, I did only browse the last few months of
mailing list history - are you saying that these problems were fixed
(or simply vanished) some point after 3.18.8? Because if so, I'd of
course try to upgrade our servers to a more recent kernel :)
Regards,
Jens
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-05 7:04 ` Jens-U. Mozdzen
@ 2015-08-05 23:10 ` Kai Krakow
2015-08-06 0:54 ` A. James Lewis
0 siblings, 1 reply; 14+ messages in thread
From: Kai Krakow @ 2015-08-05 23:10 UTC (permalink / raw)
To: linux-bcache
Jens-U. Mozdzen <jmozdzen@nde.ag> schrieb:
> Zitat von Kai Krakow <hurikhan77@gmail.com>:
>> A. James Lewis <james@fsck.co.uk> schrieb:
>>
>>> I've heard rumours that layering bcache with other block device drivers
>>> might not be recommended... I wonder what the truth really is... perhaps
>>> someone can advise.
>>
>> I think this is not just rumours. Multiple people reported problems when
>> layering caching or backing devices on top of MD devices. This may be an
>> implementation problem in MD which is gone in later kernel versions [...]
>
> being rather new to bcache, I did only browse the last few months of
> mailing list history - are you saying that these problems were fixed
> (or simply vanished) some point after 3.18.8? Because if so, I'd of
> course try to upgrade our servers to a more recent kernel :)
Latest posts imply it is still a problem. It fits with earlier reports:
Caching on native device, backing on md device... Bcache breaks within the
caching device (although this is not on md). There seem to still be bugs
with bcache and md to properly interact.
It was suspected that bcache uses a faulty discard implementation. Some
reports miss details about this setting. However, my setups are working fine
with discards fully enabled on SSD - but without using MD. And it has been
robust to accidental or implied reboots since all time I'm using it (even
with btrfs as the filesystem on bcache).
So I'd probably remove MD from your plans on using bcache.
BTW: My system uses vanilla gentoo kernel, 4.1.4 currently.
--
Replies to list only preferred.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-05 23:10 ` Kai Krakow
@ 2015-08-06 0:54 ` A. James Lewis
2015-08-06 23:12 ` Kai Krakow
0 siblings, 1 reply; 14+ messages in thread
From: A. James Lewis @ 2015-08-06 0:54 UTC (permalink / raw)
To: linux-bcache
The problem is tho... with a very large backing store, I'm not really
happy with a single point of failure in the cache... is there another
way to mirror the cache device?
Does anyone make an M.2 raid controller... being PCIe, I don't know it
that could even be possible.
James
On 06/08/15 00:10, Kai Krakow wrote:
> Jens-U. Mozdzen <jmozdzen@nde.ag> schrieb:
>
>> Zitat von Kai Krakow <hurikhan77@gmail.com>:
>>> A. James Lewis <james@fsck.co.uk> schrieb:
>>>
>>>> I've heard rumours that layering bcache with other block device drivers
>>>> might not be recommended... I wonder what the truth really is... perhaps
>>>> someone can advise.
>>> I think this is not just rumours. Multiple people reported problems when
>>> layering caching or backing devices on top of MD devices. This may be an
>>> implementation problem in MD which is gone in later kernel versions [...]
>> being rather new to bcache, I did only browse the last few months of
>> mailing list history - are you saying that these problems were fixed
>> (or simply vanished) some point after 3.18.8? Because if so, I'd of
>> course try to upgrade our servers to a more recent kernel :)
> Latest posts imply it is still a problem. It fits with earlier reports:
> Caching on native device, backing on md device... Bcache breaks within the
> caching device (although this is not on md). There seem to still be bugs
> with bcache and md to properly interact.
>
> It was suspected that bcache uses a faulty discard implementation. Some
> reports miss details about this setting. However, my setups are working fine
> with discards fully enabled on SSD - but without using MD. And it has been
> robust to accidental or implied reboots since all time I'm using it (even
> with btrfs as the filesystem on bcache).
>
> So I'd probably remove MD from your plans on using bcache.
>
> BTW: My system uses vanilla gentoo kernel, 4.1.4 currently.
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-06 0:54 ` A. James Lewis
@ 2015-08-06 23:12 ` Kai Krakow
2015-08-07 12:43 ` Jens-U. Mozdzen
0 siblings, 1 reply; 14+ messages in thread
From: Kai Krakow @ 2015-08-06 23:12 UTC (permalink / raw)
To: linux-bcache
Hi!
A. James Lewis <james@fsck.co.uk> schrieb:
> The problem is tho... with a very large backing store, I'm not really
> happy with a single point of failure in the cache... is there another
> way to mirror the cache device?
Well, AFAIR there are plans to add such capabilities into bcache itself -
read: make it possible to add more than one caching device to a cache set.
It will use some sort of hybrid mirror / striping to get the best
combination of speed and safety - at least that's what the idea is about. I
just don't remember where I've read about it, neither do I know the status
of it.
If you want to eliminate the single point of failure, you may want to try
mdadm with its write-mostly option instead of using bcache. It's slower for
writes obviously but gracefully falls back if the SSD fails. Obviously, you
can also not benefit from having a huge storage because it's classic RAID-1
and thus the smallest member will limit your storage size.
Bcache also has countermeasures for a failing caching device but I didn't
really look into that yet. You should read the documentation about it in
Documentation/bcache.txt (Error Handling). The safest mode to use here is
writethrough.
> On 06/08/15 00:10, Kai Krakow wrote:
>> Jens-U. Mozdzen <jmozdzen@nde.ag> schrieb:
>>
>>> Zitat von Kai Krakow <hurikhan77@gmail.com>:
>>>> A. James Lewis <james@fsck.co.uk> schrieb:
>>>>
>>>>> I've heard rumours that layering bcache with other block device
>>>>> drivers might not be recommended... I wonder what the truth really
>>>>> is... perhaps someone can advise.
>>>> I think this is not just rumours. Multiple people reported problems
>>>> when layering caching or backing devices on top of MD devices. This may
>>>> be an implementation problem in MD which is gone in later kernel
>>>> versions [...]
>>> being rather new to bcache, I did only browse the last few months of
>>> mailing list history - are you saying that these problems were fixed
>>> (or simply vanished) some point after 3.18.8? Because if so, I'd of
>>> course try to upgrade our servers to a more recent kernel :)
>> Latest posts imply it is still a problem. It fits with earlier reports:
>> Caching on native device, backing on md device... Bcache breaks within
>> the caching device (although this is not on md). There seem to still be
>> bugs with bcache and md to properly interact.
>>
>> It was suspected that bcache uses a faulty discard implementation. Some
>> reports miss details about this setting. However, my setups are working
>> fine with discards fully enabled on SSD - but without using MD. And it
>> has been robust to accidental or implied reboots since all time I'm using
>> it (even with btrfs as the filesystem on bcache).
>>
>> So I'd probably remove MD from your plans on using bcache.
>>
>> BTW: My system uses vanilla gentoo kernel, 4.1.4 currently.
>>
--
Replies to list only preferred.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-06 23:12 ` Kai Krakow
@ 2015-08-07 12:43 ` Jens-U. Mozdzen
2015-08-07 14:38 ` A. James Lewis
0 siblings, 1 reply; 14+ messages in thread
From: Jens-U. Mozdzen @ 2015-08-07 12:43 UTC (permalink / raw)
To: linux-bcache
Hi *,
Zitat von Kai Krakow <hurikhan77@gmail.com>:
> Hi!
>
> A. James Lewis <james@fsck.co.uk> schrieb:
>
>> The problem is tho... with a very large backing store, I'm not really
>> happy with a single point of failure in the cache... is there another
>> way to mirror the cache device?
>
> Well, AFAIR there are plans to add such capabilities into bcache itself -
> read: make it possible to add more than one caching device to a cache set.
> It will use some sort of hybrid mirror / striping to get the best
> combination of speed and safety - at least that's what the idea is about. I
> just don't remember where I've read about it, neither do I know the status
> of it.
>
> If you want to eliminate the single point of failure, you may want to try
> mdadm with its write-mostly option instead of using bcache. It's slower for
> writes obviously but gracefully falls back if the SSD fails. Obviously, you
> can also not benefit from having a huge storage because it's classic RAID-1
> and thus the smallest member will limit your storage size.
>
> Bcache also has countermeasures for a failing caching device but I didn't
> really look into that yet. You should read the documentation about it in
> Documentation/bcache.txt (Error Handling). The safest mode to use here is
> writethrough.
A work of caution here: At least in my layered (kernel 3.18.8)
situation, the upper layers from time to time run into some sort of
time-out situation when writing to (bcached) disk. Teh writes abort
(bad, but tolerable in my circumstances), but on top this makes MD
mark the current disk faulty, degrading your RAID.
When using "writeback", the likeliness for this to happen is
relatively small (not more than once every few days), probably because
the writes to SSD are fairly quick. These hit then have always been on
the caching device (MD-RAID1 in my case).
When using "writethrough", the likeliness was extremely higher (I've
seen 2 hits within 6 hours, not later than 28 hours after switching to
"writethrough") and the hit was on the data device (MD-RAID6 in my
case).
Had I only set up RAID5, my data array would have dropped dead then.
After switching back to "writeback", I've had *one* further incident,
again on the caching device, within 6 days.
I would definitely not call "writethrough" "the safest mode" when
using MD-RAID for the bcache devices, on kernel 3.18.8.
Regards,
Jens
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-07 12:43 ` Jens-U. Mozdzen
@ 2015-08-07 14:38 ` A. James Lewis
2015-08-07 15:36 ` Jens-U. Mozdzen
0 siblings, 1 reply; 14+ messages in thread
From: A. James Lewis @ 2015-08-07 14:38 UTC (permalink / raw)
To: linux-bcache, Jens-U. Mozdzen
That's interesting, are you putting your MD on top of multiple bcache
devices... rather than bcache on top of an MD device... I wonder what
the rationale behind this is?
Also, can anyone give me a summary of how bcache compares with dm-cache?
James
On 07/08/15 13:43, Jens-U. Mozdzen wrote:
> Hi *,
>
> Zitat von Kai Krakow <hurikhan77@gmail.com>:
>> Hi!
>>
>> A. James Lewis <james@fsck.co.uk> schrieb:
>>
>>> The problem is tho... with a very large backing store, I'm not really
>>> happy with a single point of failure in the cache... is there another
>>> way to mirror the cache device?
>>
>> Well, AFAIR there are plans to add such capabilities into bcache
>> itself -
>> read: make it possible to add more than one caching device to a cache
>> set.
>> It will use some sort of hybrid mirror / striping to get the best
>> combination of speed and safety - at least that's what the idea is
>> about. I
>> just don't remember where I've read about it, neither do I know the
>> status
>> of it.
>>
>> If you want to eliminate the single point of failure, you may want to
>> try
>> mdadm with its write-mostly option instead of using bcache. It's
>> slower for
>> writes obviously but gracefully falls back if the SSD fails.
>> Obviously, you
>> can also not benefit from having a huge storage because it's classic
>> RAID-1
>> and thus the smallest member will limit your storage size.
>>
>> Bcache also has countermeasures for a failing caching device but I
>> didn't
>> really look into that yet. You should read the documentation about it in
>> Documentation/bcache.txt (Error Handling). The safest mode to use
>> here is
>> writethrough.
>
> A work of caution here: At least in my layered (kernel 3.18.8)
> situation, the upper layers from time to time run into some sort of
> time-out situation when writing to (bcached) disk. Teh writes abort
> (bad, but tolerable in my circumstances), but on top this makes MD
> mark the current disk faulty, degrading your RAID.
>
> When using "writeback", the likeliness for this to happen is
> relatively small (not more than once every few days), probably because
> the writes to SSD are fairly quick. These hit then have always been on
> the caching device (MD-RAID1 in my case).
>
> When using "writethrough", the likeliness was extremely higher (I've
> seen 2 hits within 6 hours, not later than 28 hours after switching to
> "writethrough") and the hit was on the data device (MD-RAID6 in my case).
>
> Had I only set up RAID5, my data array would have dropped dead then.
>
> After switching back to "writeback", I've had *one* further incident,
> again on the caching device, within 6 days.
>
> I would definitely not call "writethrough" "the safest mode" when
> using MD-RAID for the bcache devices, on kernel 3.18.8.
>
> Regards,
> Jens
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-07 14:38 ` A. James Lewis
@ 2015-08-07 15:36 ` Jens-U. Mozdzen
2015-08-07 16:16 ` A. James Lewis
0 siblings, 1 reply; 14+ messages in thread
From: Jens-U. Mozdzen @ 2015-08-07 15:36 UTC (permalink / raw)
To: A. James Lewis; +Cc: linux-bcache
Hi James,
Zitat von "A. James Lewis" <james@fsck.co.uk>:
> That's interesting, are you putting your MD on top of multiple
> bcache devices... rather than bcache on top of an MD device... I
> wonder what the rationale behind this is?
Hi James, no such thing here...
bcache is running on top of two MD-RAIDs - RAID6 with 7 spinning
drives and RAID1 with two SSDs.
The stack is, from bottom to top:
- MD-RAID6 data, MD-RAID1 cache
- bcache (/dev/bcache0, used as an LVM PV)
- LVM
- many LVs
- DRBD on top of most of the LVs
- Ext4 on each of the DRBD devices
- SCST / NFS / SMB sharing these file systems
In the referenced incidents, SCST reports that (many) writes failed
due to time-out, and MD reports a single disk faulty. No other traces
in syslog, especially no stalled processes, locking problems or kernel
bugs.
The i/o pattern is highly parallel reads and writes, mostly via SCST.
Regards,
Jens
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
2015-08-07 15:36 ` Jens-U. Mozdzen
@ 2015-08-07 16:16 ` A. James Lewis
0 siblings, 0 replies; 14+ messages in thread
From: A. James Lewis @ 2015-08-07 16:16 UTC (permalink / raw)
To: linux-bcache
OK, but in that case bcache is not between your MD RAID and it's disks,
so if your disks are dropping out of the MD array, that has to be either
an independent problem, or a very complex bug.
James
On 07/08/15 16:36, Jens-U. Mozdzen wrote:
> Hi James,
>
> Zitat von "A. James Lewis" <james@fsck.co.uk>:
>> That's interesting, are you putting your MD on top of multiple bcache
>> devices... rather than bcache on top of an MD device... I wonder what
>> the rationale behind this is?
>
> Hi James, no such thing here...
>
> bcache is running on top of two MD-RAIDs - RAID6 with 7 spinning
> drives and RAID1 with two SSDs.
>
> The stack is, from bottom to top:
>
> - MD-RAID6 data, MD-RAID1 cache
> - bcache (/dev/bcache0, used as an LVM PV)
> - LVM
> - many LVs
> - DRBD on top of most of the LVs
> - Ext4 on each of the DRBD devices
> - SCST / NFS / SMB sharing these file systems
>
> In the referenced incidents, SCST reports that (many) writes failed
> due to time-out, and MD reports a single disk faulty. No other traces
> in syslog, especially no stalled processes, locking problems or kernel
> bugs.
>
> The i/o pattern is highly parallel reads and writes, mostly via SCST.
>
> Regards,
> Jens
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: layering question.
@ 2015-08-07 16:24 Jens-U. Mozdzen
0 siblings, 0 replies; 14+ messages in thread
From: Jens-U. Mozdzen @ 2015-08-07 16:24 UTC (permalink / raw)
To: linux-bcache
Hi James,
Zitat von "A. James Lewis" <james@fsck.co.uk>:
> OK, but in that case bcache is not between your MD RAID and it's
> disks, so if your disks are dropping out of the MD array, that has
> to be either an independent problem, or a very complex bug.
My guess is that it's a rather simple timeout / locking problem, which
leads to an expiring timer in the MD code. And bcache has a well-known
history for locking problems, according to the mailing list.
Regards,
Jens
> James
>
>
> On 07/08/15 16:36, Jens-U. Mozdzen wrote:
>> Hi James,
>>
>> Zitat von "A. James Lewis" <james@fsck.co.uk>:
>>> That's interesting, are you putting your MD on top of multiple
>>> bcache devices... rather than bcache on top of an MD device... I
>>> wonder what the rationale behind this is?
>>
>> Hi James, no such thing here...
>>
>> bcache is running on top of two MD-RAIDs - RAID6 with 7 spinning
>> drives and RAID1 with two SSDs.
>>
>> The stack is, from bottom to top:
>>
>> - MD-RAID6 data, MD-RAID1 cache
>> - bcache (/dev/bcache0, used as an LVM PV)
>> - LVM
>> - many LVs
>> - DRBD on top of most of the LVs
>> - Ext4 on each of the DRBD devices
>> - SCST / NFS / SMB sharing these file systems
>>
>> In the referenced incidents, SCST reports that (many) writes failed
>> due to time-out, and MD reports a single disk faulty. No other
>> traces in syslog, especially no stalled processes, locking problems
>> or kernel bugs.
>>
>> The i/o pattern is highly parallel reads and writes, mostly via SCST.
>>
>> Regards,
>> Jens
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-08-07 16:24 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-04 16:20 layering question A. James Lewis
2015-08-04 17:01 ` Jens-U. Mozdzen
2015-08-04 17:16 ` A. James Lewis
2015-08-05 6:56 ` Jens-U. Mozdzen
2015-08-05 6:28 ` Kai Krakow
2015-08-05 7:04 ` Jens-U. Mozdzen
2015-08-05 23:10 ` Kai Krakow
2015-08-06 0:54 ` A. James Lewis
2015-08-06 23:12 ` Kai Krakow
2015-08-07 12:43 ` Jens-U. Mozdzen
2015-08-07 14:38 ` A. James Lewis
2015-08-07 15:36 ` Jens-U. Mozdzen
2015-08-07 16:16 ` A. James Lewis
-- strict thread matches above, loose matches on Subject: below --
2015-08-07 16:24 Jens-U. Mozdzen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox