3.10LTS ok for production?

public inbox for linux-bcache@vger.kernel.org
 help / color / mirror / Atom feed

* 3.10LTS ok for production?
@ 2013-11-09  3:01 Paul B. Henson
       [not found] ` <20131109030128.GJ5474-eJ6RpuielZ6oHZ9hTG1MgCsmlnnoMqry@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Paul B. Henson @ 2013-11-09  3:01 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA

I'd kinda like to use the 3.10 LTS kernel for a virtualization server
I'm building, but it seems like every time somebody reports a problem
the recommendation is to make sure you're using the latest bleeding edge
kernel. Is it intended for bcache to be considered production ready in
the 3.10 LTS branch, or do you pretty much have to run the latest stable
of the week for now if you want to be sure to get all the bcache bugfixes
necessary for a stable system? Specifically, I'd like to use a raid1 of 2
256G SSDs to be a write-back cache for a raid10 of 4 2TB HDs. Occasional
reboots aren't an issue for kernel updates, but I'd prefer to avoid the
potential instability and config churn of tracking the mainline kernel.

Thanks...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 3.10LTS ok for production?
       [not found] ` <20131109030128.GJ5474-eJ6RpuielZ6oHZ9hTG1MgCsmlnnoMqry@public.gmane.org>
@ 2013-11-09  5:29   ` Matthew Patton
       [not found]     ` <op.w59n7e06f3gqgg-r49W/1Cwd2cba4AQcYcrVKxOck334EZe@public.gmane.org>
  2013-11-09  6:47   ` Kent Overstreet
  1 sibling, 1 reply; 7+ messages in thread
From: Matthew Patton @ 2013-11-09  5:29 UTC (permalink / raw)
  To: linux-bcache-u79uwXL29TY76Z2rM5mHXA, Paul B. Henson

The following is opinion, MY opinion.

On Fri, 08 Nov 2013 22:01:28 -0500, Paul B. Henson <henson-HInyCGIudOg@public.gmane.org> wrote:

> kernel. Is it intended for bcache to be considered production ready in
> the 3.10 LTS branch, or do you pretty much have to run the latest stable
> of the week for now if you want to be sure to get all the bcache bugfixes
> necessary for a stable system?

I think that's hard to say. The .10 code wasn't re-worked like the .11  
branch and it may well have fewer issues than the .11 series. It's also  
not clear that EVERY bug uncovered in the .11 branch (that wasn't narrowly  
specific to .11) has been properly back-ported.

> Specifically, I'd like to use a raid1 of 2
> 256G SSDs to be a write-back cache for a raid10 of 4 2TB HDs. Occasional
> reboots aren't an issue for kernel updates, but I'd prefer to avoid the
> potential instability and config churn of tracking the mainline kernel.

Storage is the LAST place to cut corners. Unless of course your data isn't  
important, can be thrown away, or recreated without a lot of time and  
sweat. Don't get me wrong, I like what BCache is trying to do and I sent  
Kent $100 of my own money to support his efforts back when continued  
development seemed to be in jeopardy.

Personally I think it needs another 3 months to bake, even in the 3.11.6  
guise.

As to your specific example, are WRITE IOPs of critical importance? If  
not, just use WRITE-THRU and have the SSDs be a READ cache for hot data.  
There is no or almost zero risk to your data in that configuration.  
Despite all the hand-waving by sysadmins, READ cache is far more useful as  
a practical matter than WRITE. If you have a heavy WRITE load, then there  
is no good solution that doesn't cost money.

If your 4 disks can't support the desired IOPs, then bite the bullet and  
get faster disks, more disks, or more cache on the RAID controller, or try  
the alternative software solutions both of which are free: IOEnhance from  
STEC or the in-kernel MD-hotspot. I have no useful degree of experience  
with either, however.

Failing that, shell out the money for a ZFS-friendly setup and abstract  
the storage away from your virtual machines. Indeed that's a much better  
design anyway.

I personally run LSI controllers with CacheCade (sadly limited to 500GB of  
SSD cache) or you can spring for an equivalent feature set from Adaptec -7  
series (unlimited SSD cache) for under $800.

My other fancy controller is an Areca with 4GB of battery-backed RAM.

My storage nodes also have battery-backed 512MB NVRAM boards (dirt cheap  
on Ebay) and I use those as targets for filesystem journals or MD Raid1  
intent logs.

Lastly maybe forget KVM/Xen and get VMware ESXi as your hypervisor. It  
supports SSDs as block cache too but I'm not sure which level of product  
is needed to activate it. It can be as cheap as $500 for 3 two-socket  
physical hosts to $1500+/socket.

In conclusion, if staying with BCache use it in write-thru mode.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 3.10LTS ok for production?
       [not found] ` <20131109030128.GJ5474-eJ6RpuielZ6oHZ9hTG1MgCsmlnnoMqry@public.gmane.org>
  2013-11-09  5:29   ` Matthew Patton
@ 2013-11-09  6:47   ` Kent Overstreet
  2013-11-09  7:11     ` Stefan Priebe
  2013-11-13  0:21     ` Paul B. Henson
  1 sibling, 2 replies; 7+ messages in thread
From: Kent Overstreet @ 2013-11-09  6:47 UTC (permalink / raw)
  To: Paul B. Henson; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

On Fri, Nov 08, 2013 at 07:01:28PM -0800, Paul B. Henson wrote:
> I'd kinda like to use the 3.10 LTS kernel for a virtualization server
> I'm building, but it seems like every time somebody reports a problem
> the recommendation is to make sure you're using the latest bleeding edge
> kernel. Is it intended for bcache to be considered production ready in
> the 3.10 LTS branch, or do you pretty much have to run the latest stable
> of the week for now if you want to be sure to get all the bcache bugfixes
> necessary for a stable system? Specifically, I'd like to use a raid1 of 2
> 256G SSDs to be a write-back cache for a raid10 of 4 2TB HDs. Occasional
> reboots aren't an issue for kernel updates, but I'd prefer to avoid the
> potential instability and config churn of tracking the mainline kernel.

Yes - 3.10 LTS (or 3.11) has been what you want to be running for awhile
now; I've been making sure all the bugfixes get backported quickly. The
only bugfix I know of that I wasn't backported was a fix for a suspend
issue, because it was part of a fairly involved allocator rework.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 3.10LTS ok for production?
  2013-11-09  6:47   ` Kent Overstreet
@ 2013-11-09  7:11     ` Stefan Priebe
       [not found]       ` <527DE027.2050606-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
  2013-11-13  0:21     ` Paul B. Henson
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Priebe @ 2013-11-09  7:11 UTC (permalink / raw)
  To: Kent Overstreet, Paul B. Henson; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

at least i'm suffering from two problems on 3.10:

1.) dirty value is often wrong / can go negative
2.) writeback cache is only cleared / written back when having 
writeback_percent => 0

The first one is already fixed by kent - just waiting for a backport.

Greets,
Stefan

Am 09.11.2013 07:47, schrieb Kent Overstreet:
> On Fri, Nov 08, 2013 at 07:01:28PM -0800, Paul B. Henson wrote:
>> I'd kinda like to use the 3.10 LTS kernel for a virtualization server
>> I'm building, but it seems like every time somebody reports a problem
>> the recommendation is to make sure you're using the latest bleeding edge
>> kernel. Is it intended for bcache to be considered production ready in
>> the 3.10 LTS branch, or do you pretty much have to run the latest stable
>> of the week for now if you want to be sure to get all the bcache bugfixes
>> necessary for a stable system? Specifically, I'd like to use a raid1 of 2
>> 256G SSDs to be a write-back cache for a raid10 of 4 2TB HDs. Occasional
>> reboots aren't an issue for kernel updates, but I'd prefer to avoid the
>> potential instability and config churn of tracking the mainline kernel.
>
> Yes - 3.10 LTS (or 3.11) has been what you want to be running for awhile
> now; I've been making sure all the bugfixes get backported quickly. The
> only bugfix I know of that I wasn't backported was a fix for a suspend
> issue, because it was part of a fairly involved allocator rework.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: 3.10LTS ok for production?
       [not found]     ` <op.w59n7e06f3gqgg-r49W/1Cwd2cba4AQcYcrVKxOck334EZe@public.gmane.org>
@ 2013-11-13  0:17       ` Paul B. Henson
  0 siblings, 0 replies; 7+ messages in thread
From: Paul B. Henson @ 2013-11-13  0:17 UTC (permalink / raw)
  To: 'Matthew Patton', linux-bcache-u79uwXL29TY76Z2rM5mHXA

> From: Matthew Patton [mailto:pattonme-/E1597aS9LQAvxtiuMwx3w@public.gmane.org]
> Sent: Friday, November 08, 2013 9:29 PM
>
> The following is opinion, MY opinion.

Noted; thanks for taking the time to share it :).

> I think that's hard to say. The .10 code wasn't re-worked like the .11
> branch and it may well have fewer issues than the .11 series.

There was a re-factoring between .10 and .11? I hadn't noticed that.

> Storage is the LAST place to cut corners. Unless of course your data isn't
> important, can be thrown away, or recreated without a lot of time and
> sweat.

Well, technically, this particular deployment is for my house ;), and while
I wouldn't really agree with any of those statements for my data, this hobby
box has already become ridiculously expensive, and I'd like to make the best
of the pieces I already have.

> Personally I think it needs another 3 months to bake, even in the 3.11.6
> guise.

Hmm, won't 3.11 be EOL before then? So presumably the result of that bake
time would be in 3.12.

> As to your specific example, are WRITE IOPs of critical importance? If
> not, just use WRITE-THRU and have the SSDs be a READ cache for hot data.
>
> There is no or almost zero risk to your data in that configuration.

Well, I don't know if I'd agree with that; bugs in bcache could result in
corrupted data being returned from reads or ending up on the backing devices
right even in write through, definitely less risk I would think then write
back, but none?

> Despite all the hand-waving by sysadmins, READ cache is far more useful as
> a practical matter than WRITE. If you have a heavy WRITE load, then there
> is no good solution that doesn't cost money.

Theoretically, caching the writes through the SSD should decrease latency
and turn random IO into a sequential stream for the backing device,
resulting in increased performance. Ideally, I'd like to avail of that :). 

> the alternative software solutions both of which are free: IOEnhance from
> STEC

It looks like they was some activity back in February about getting that
into the staging driver section of the kernel, but I don't see it there, and
I don't see any further activity, so not sure what happened there. I'd
prefer to use functionality in the standard kernel, as opposed to compiling
in outside stuff.

> the in-kernel MD-hotspot

Do you have a reference for that? I can't seem to find anything via Google.

> Failing that, shell out the money for a ZFS-friendly setup and abstract
> the storage away from your virtual machines. Indeed that's a much better
> design anyway.

I actually have a storage server sitting right next to the virtualization
server running illumos/zfs, with roughly 21TB of storage, which is going to
provide bulk storage, but I plan to have the vm operating system files and
smaller data on the virtualization server itself.

> Lastly maybe forget KVM/Xen and get VMware ESXi as your hypervisor.

We use ESXi at my day job, it's got a pretty good feature set, but I'm
trying to stick with open source for my home deployments...

Thanks for your thoughts.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: 3.10LTS ok for production?
  2013-11-09  6:47   ` Kent Overstreet
  2013-11-09  7:11     ` Stefan Priebe
@ 2013-11-13  0:21     ` Paul B. Henson
  1 sibling, 0 replies; 7+ messages in thread
From: Paul B. Henson @ 2013-11-13  0:21 UTC (permalink / raw)
  To: 'Kent Overstreet'; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

> From: Kent Overstreet [mailto:kmo-PEzghdH756F8UrSeD/g0lQ@public.gmane.org]
> Sent: Friday, November 08, 2013 10:47 PM
>
> Yes - 3.10 LTS (or 3.11) has been what you want to be running for awhile
> now; I've been making sure all the bugfixes get backported quickly.

Cool, thanks for the feedback. I ended up starting with a 3.11.7 kernel
after all, I'm going to play with that and see what happens. I'm looking
forward to the potential support for redundant cache devices within bcache
itself, so I won't have to mirror my two SSDs, but still have redundancy for
writeback and more overall space for read caching. Not sure what the
timeline is for that, but imagine it wouldn't be backported to 3.10.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: 3.10LTS ok for production?
       [not found]       ` <527DE027.2050606-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
@ 2013-11-13  0:21         ` Paul B. Henson
  0 siblings, 0 replies; 7+ messages in thread
From: Paul B. Henson @ 2013-11-13  0:21 UTC (permalink / raw)
  To: 'Stefan Priebe'; +Cc: linux-bcache-u79uwXL29TY76Z2rM5mHXA

> From: Stefan Priebe [mailto:s.priebe-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org]
> Sent: Friday, November 08, 2013 11:12 PM
>
> 1.) dirty value is often wrong / can go negative
> 2.) writeback cache is only cleared / written back when having
> writeback_percent => 0

Hmm, neither of those result in data loss or corruption though?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-11-13  0:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-09  3:01 3.10LTS ok for production? Paul B. Henson
     [not found] ` <20131109030128.GJ5474-eJ6RpuielZ6oHZ9hTG1MgCsmlnnoMqry@public.gmane.org>
2013-11-09  5:29   ` Matthew Patton
     [not found]     ` <op.w59n7e06f3gqgg-r49W/1Cwd2cba4AQcYcrVKxOck334EZe@public.gmane.org>
2013-11-13  0:17       ` Paul B. Henson
2013-11-09  6:47   ` Kent Overstreet
2013-11-09  7:11     ` Stefan Priebe
     [not found]       ` <527DE027.2050606-2Lf/h1ldwEHR5kwTpVNS9A@public.gmane.org>
2013-11-13  0:21         ` Paul B. Henson
2013-11-13  0:21     ` Paul B. Henson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox