linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: How important is bcache cache device in write-thru mode? (was Re: Tiered bcache)
@ 2014-01-28 14:20 ` Dr. Greg Wettstein
  2014-01-28 15:45   ` [Scst-devel] " matthew patton
                     ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Dr. Greg Wettstein @ 2014-01-28 14:20 UTC (permalink / raw)
  To: Patrick Zwahlen, linux-bcache@vger.kernel.org; +Cc: scst-devel

On Jan 21,  6:44pm, Patrick Zwahlen wrote:
} Subject: RE: How important is bcache cache device in write-thru mode? (was

Hi, hope the week is going well for everyone.

> > -----Original Message-----
> > From: linux-bcache-owner@vger.kernel.org [mailto:linux-bcache-
> > owner@vger.kernel.org] On Behalf Of matthew patton
> > Sent: mardi 21 janvier 2014 18:35
> > To: linux-bcache@vger.kernel.org
> > Cc: Patrick Zwahlen
> > Subject: How important is bcache cache device in write-thru mode? (was
> > Re: Tiered bcache)
> > 
> > >>  wait, the cache DEVICE for bcache is a Btier device composed of an
> > SSD
> > >>  and RAM? So in effect you want btier to move the really hot blocks
> > >>  within the bcache cache device into RAM? So effectively bcache
> > metadata
> > >>  and any really hot blocks will live in RAM and the rest of the
> > > 'read'
> > >>  cache will sit on the SSD.
> > >
> > > Exactly! Apologies if that wasn't clear in the first place but that
> > > describes 100% what we're currently testing.

> > you REALLY want to check with Kent as to what happens when the bcache
> > caching device (and any meta-data it stores there) routinely get blown
> > away or run a high risk of experiencing sudden destruction. I'm afraid
> > this is not a test case that has undergone enough scrutiny.

> Thanks Matthew for raising this in this list.
>
> I should add that we have two SAN servers sharing the
> JBOD. Clustering is managed by pacemaker. During normal operations,
> we can migrate a whole RAID from one node to the other and we do a
> proper cache detach on node #1 (that would even write dirty data if
> were doing write-back) and re-attach the RAID to the existing cache
> on the node #2. Beauty here is we can "share" a cache set between
> multiple backend devices.
>
> We made the assumption that as bcache is designed for potentially
> failing SSDs, moving to a potentially failing SSD+RAM shouldn't make
> a difference.
>
> I'm definitely not expert enough to assess the risk any further and
> I rely on you guys.

Interestingly enough we have been working on infrastructure to support
this type of model for some time.  Our primary focus is on
accelerating SCST based storage targets and software defined storage
(SDS) devices.

At one point in time we had entertained discussions with the SCST
developers to pay for an implementation of RAM based block device
cacheing in SCST itself, for a variety of reasons that didn't move
forward.  We recognized early on that Kent's work with bcache was
going to make that strategy irrelevant.

SCST using FILEIO is blindingly fast but I don't know of any serious
storage architects that are going to trust 50-60 gigabytes of a
database or filesystem to the Linux pagecache and associated vagaries
of VM writeback behavior.  So the architectural question becomes how
to take advantage of the fact that it is now tractable to provision
commodity based storage targets with a quarter terrabyte of RAM and
how to take advantage of this in a manner which protects data and
provides deterministic performance characteristics.

So Izzy (our golden retriever) and I spent a lot of time down at our
lake place over the holidays cross-country skiing and working on a
hugepage backed block device driver.  We are in the process of putting
the beta through various beatings and addressing some issues with the
device model implementation.  We are hoping to have something to
release in the next week or so before we leave for Colorado and some
downhill skiing.

The goal for this driver is a block based interface to RAM for use as
a cache set for bcache.  Since it sits directly on the physical
hugepage allocator and associated page magazine the block devices can
be dynamically configured, unlike the current RAM based block device,
which also has the disadvantage of being implemented on top of page
cache.  None of this should be construed as a gripe against the Linux
VM but obviously one does not want memory pressure to start driving a
high speed cache store out onto disk.

This model is obviously dependent on solid behavior of bcache in
write-through mode.  We are testing aggressively against 3.10.x and
haven't tipped it over it but we will turn up the pressure on that and
see if it gives.  I'm pretty confident there is enough community and
commercial interest in all this to get the bugs beaten out pretty
thoroughly, provided people report them back.... :-)

We will copy both the bcache and SCST lists when we have something up
on the FTP site as it would seem to be of interest to both
communities.

> - Patrick -

Have a good week.

Greg

}-- End of excerpt from Patrick Zwahlen

As always,
Greg Wettstein, Ph.D.       Enjellic Systems Development, LLC.
4206 N. 19th Ave.           Specializing in information infra-structure
Fargo, ND  58102            development.
PH: 701-281-1686
FAX: 701-281-3949           EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"Heaven goes by favor.  If it went by merit, you would stay out and your
 dog would go in."
                                -- Mark Twain

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Scst-devel] How important is bcache cache device in write-thru mode? (was Re: Tiered bcache)
  2014-01-28 14:20 ` How important is bcache cache device in write-thru mode? (was Re: Tiered bcache) Dr. Greg Wettstein
@ 2014-01-28 15:45   ` matthew patton
  2014-01-28 18:06   ` How important is bcache cache device in write-thru mode? (was " jason
  2014-01-28 22:21   ` How important is bcache cache device in write-thru mode? (was " Patrick Zwahlen
  2 siblings, 0 replies; 4+ messages in thread
From: matthew patton @ 2014-01-28 15:45 UTC (permalink / raw)
  To: linux-bcache@vger.kernel.org; +Cc: scst-devel@lists.sourceforge.net

<quote>
SCST using FILEIO is blindingly fast but I don't know of any serious
storage architects that are going to trust 50-60 gigabytes of a
database or filesystem to the Linux pagecache and associated vagaries
of VM writeback behavior.
</quote>

Maybe it's not much and with memory footprints that big, my solution is too small. But why not write the EXT3/4 journal to a NVRAM board? The 512MB ones are pretty darn cheap on EBAY. I've got a fistful myself. If you journal meta+data at least you have recovery points. Otherwise a RAID set of high-quality SSD could be a reliable journal, no?

Linux VFS has lots of tunables, and I think you'd be able to find a happy medium between absorbing sudden, big writes, and reliably de-staging to disk. I wish EXT4/XFS et. al. had the ability to do round-robbin journaling ala Oracle LogWriter so that once the first chunk of ~256MB was written to the journal everything didn't come to a screeching stop until the corresponding bits got written out to 'permanent' storage. But I guess at that point maybe the answer is to use ZFS?

> block based interface to RAM

I could have sworn there already was such a driver...Yeah 'brd'. Did that not fit the intended use?
http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/drivers/block/brd.c

and another exercise
http://www.linuxforu.com/2012/02/device-drivers-disk-on-ram-block-drivers/


though this one is probably more interesting yet
http://rapiddisk.org/index.php?title=Changelog

http://rapiddisk.org/index.php?title=RapidCache

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How important is bcache cache device in write-thru mode? (was Tiered bcache)
  2014-01-28 14:20 ` How important is bcache cache device in write-thru mode? (was Re: Tiered bcache) Dr. Greg Wettstein
  2014-01-28 15:45   ` [Scst-devel] " matthew patton
@ 2014-01-28 18:06   ` jason
  2014-01-28 22:21   ` How important is bcache cache device in write-thru mode? (was " Patrick Zwahlen
  2 siblings, 0 replies; 4+ messages in thread
From: jason @ 2014-01-28 18:06 UTC (permalink / raw)
  To: greg; +Cc: paz, linux-bcache, scst-devel


on Jan 28, 2014, Dr. Greg Wettstein <greg@wind.enjellic.com> wrote:
>
>The goal for this driver is a block based interface to RAM for use as
>a cache set for bcache. Since it sits directly on the physical
>hugepage allocator and associated page magazine the block devices can
>be dynamically configured, unlike the current RAM based block device,
>which also has the disadvantage of being implemented on top of page
>cache. None of this should be construed as a gripe against the Linux
>VM but obviously one does not want memory pressure to start driving a
>high speed cache store out onto disk.

Thank you.  In my opinion a dynamic ram disk is one of the few pieces still needed in the block device layer.  I would be happy to help beat the crap out of it!
 
Any thoughts on this driver making use of the DM or MD framework?  Obviously your not going to have data persist across a reboot but do you know how you will handle boot time configuration on a device(s) that you want to come back without manual or scripted intervention?

>
>This model is obviously dependent on solid behavior of bcache in
>write-through mode. We are testing aggressively against 3.10.x and
>haven't tipped it over it but we will turn up the pressure on that and
>see if it gives. I'm pretty confident there is enough community and
>commercial interest in all this to get the bugs beaten out pretty
>thoroughly, provided people report them back.... :-)
>
>We will copy both the bcache and SCST lists when we have something up
>on the FTP site as it would seem to be of interest to both
>communities.
>
>> - Patrick -
>
>Have a good week.
>
>Greg
>
>}-- End of excerpt from Patrick Zwahlen
>
>As always,
>Greg Wettstein, Ph.D. Enjellic Systems Development, LLC.
>4206 N. 19th Ave. Specializing in information infra-structure
>Fargo, ND 58102 development.
>PH: 701-281-1686
>FAX: 701-281-3949 EMAIL: greg@enjellic.com
>------------------------------------------------------------------------------
>"Heaven goes by favor. If it went by merit, you would stay out and your
> dog would go in."
> -- Mark Twain
>--
>To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: How important is bcache cache device in write-thru mode? (was Re: Tiered bcache)
  2014-01-28 14:20 ` How important is bcache cache device in write-thru mode? (was Re: Tiered bcache) Dr. Greg Wettstein
  2014-01-28 15:45   ` [Scst-devel] " matthew patton
  2014-01-28 18:06   ` How important is bcache cache device in write-thru mode? (was " jason
@ 2014-01-28 22:21   ` Patrick Zwahlen
  2 siblings, 0 replies; 4+ messages in thread
From: Patrick Zwahlen @ 2014-01-28 22:21 UTC (permalink / raw)
  To: linux-bcache@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 3245 bytes --]

> -----Original Message-----
> From: Dr. Greg Wettstein [mailto:greg@wind.enjellic.com]
> Sent: mardi 28 janvier 2014 15:20
> To: Patrick Zwahlen; linux-bcache@vger.kernel.org
> Cc: scst-devel@lists.sourceforge.net
> Subject: RE: How important is bcache cache device in write-thru mode?
> (was Re: Tiered bcache)
> 
> Interestingly enough we have been working on infrastructure to support
> this type of model for some time.  Our primary focus is on
> accelerating SCST based storage targets and software defined storage
> (SDS) devices.
> 
> At one point in time we had entertained discussions with the SCST
> developers to pay for an implementation of RAM based block device
> cacheing in SCST itself, for a variety of reasons that didn't move
> forward.  We recognized early on that Kent's work with bcache was
> going to make that strategy irrelevant.
> 
> SCST using FILEIO is blindingly fast but I don't know of any serious
> storage architects that are going to trust 50-60 gigabytes of a
> database or filesystem to the Linux pagecache and associated vagaries
> of VM writeback behavior.  So the architectural question becomes how
> to take advantage of the fact that it is now tractable to provision
> commodity based storage targets with a quarter terrabyte of RAM and
> how to take advantage of this in a manner which protects data and
> provides deterministic performance characteristics.
> 
> So Izzy (our golden retriever) and I spent a lot of time down at our
> lake place over the holidays cross-country skiing and working on a
> hugepage backed block device driver.  We are in the process of putting
> the beta through various beatings and addressing some issues with the
> device model implementation.  We are hoping to have something to
> release in the next week or so before we leave for Colorado and some
> downhill skiing.
> 
> The goal for this driver is a block based interface to RAM for use as
> a cache set for bcache.  Since it sits directly on the physical
> hugepage allocator and associated page magazine the block devices can
> be dynamically configured, unlike the current RAM based block device,
> which also has the disadvantage of being implemented on top of page
> cache.  None of this should be construed as a gripe against the Linux
> VM but obviously one does not want memory pressure to start driving a
> high speed cache store out onto disk.

Our initial idea with btier and bcache was to use both SSD and DRAM. I
understand that you focus on DRAM only.

However, I also understand that at some point we might be able to mix
several cachesets so we can dream of a kind of tiered cache implemented
entirely within bcache.

> 
> This model is obviously dependent on solid behavior of bcache in
> write-through mode.  We are testing aggressively against 3.10.x and
> haven't tipped it over it but we will turn up the pressure on that and
> see if it gives.  I'm pretty confident there is enough community and
> commercial interest in all this to get the bugs beaten out pretty
> thoroughly, provided people report them back.... :-)
> 
> We will copy both the bcache and SCST lists when we have something up
> on the FTP site as it would seem to be of interest to both
> communities.

Regards, - Patrick -

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6043 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-01-28 22:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <paz@navixia.com>
2014-01-28 14:20 ` How important is bcache cache device in write-thru mode? (was Re: Tiered bcache) Dr. Greg Wettstein
2014-01-28 15:45   ` [Scst-devel] " matthew patton
2014-01-28 18:06   ` How important is bcache cache device in write-thru mode? (was " jason
2014-01-28 22:21   ` How important is bcache cache device in write-thru mode? (was " Patrick Zwahlen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).