From: Daniel Phillips <phillips@phunq.net>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [ANNOUNCE] Ramback: faster than a speeding bullet
Date: Mon, 10 Mar 2008 19:50:40 -0800 [thread overview]
Message-ID: <200803102050.40567.phillips@phunq.net> (raw)
In-Reply-To: <20080310092213.7ba878b3@core>
Hi Alan,
Nice to see so many redhatters taking an avid interest in storage :-)
On Monday 10 March 2008 02:22, Alan Cox wrote:
> > So now you can ask some hard questions: what if the power goes out
> > completely or the host crashes or something else goes wrong while
> > critical data is still in the ramdisk? Easy: use reliable components.
>
> Nice fiction - stuff crashes eventually - not that this isn't useful. For
> a long time simply loading a 2-3GB Ramdisk off hard disk has been a good
> way to build things like compile engines where loss of state is not bad.
Right, and now with ramback you will be able to preserve that state and
have the performance too. It is a wonderful world.
> > If UPS power runs out while ramback still holds unflushed dirty data
> > then things get ugly. Hopefully a fsck -f will be able to pull
> > something useful out of the mess. (This is where you might want to be
> > running Ext3.) The name of the game is to install sufficient UPS power
> > to get your dirty ramdisk data onto stable storage this time, every
> > time.
>
> Ext3 is only going to help you if the ramdisk writeback respects barriers
> and ordering rules ?
I was alluding to to e2fsck's amazing repair ability, not ext3's journal.
> > * Previously saved data must be reloaded into the ramdisk on startup.
>
> /bin/cp from initrd
But that does not satisfy the requirement you snipped:
* Applications need to be able to read and write ramback data during
initial loading.
> > * Cannot transfer directly between ramdisk and backing store, so must
> > first transfer into memory then relaunch to destination.
>
> Why not - providing you clear the dirty bit before the write and you
> check it again after ? And on the disk size as you are going to have to
More accurately: in general, cannot transfer directly. The ramdisk may
be external and not present a memory interface. Even an external
ramdisk with a memory interface (the Violin box has this) would require
extra programming to maintain cache consistency. Then there is the
issue of ramdisks on the way that exceed the 40 bit physical addressing
of current generation processors.
Even for the simple case where the ramdisk is just part of the kernel
unified cache, I would rather not go delving into that code when these
transfers are on the slow path anyway. Application IO does its normal
single copy_to/from_user thing. If somebody wants to fiddle with vm,
the place to attack is right there. The copy_to/from_user can be
eliminated (provided alignment requirements are met) using stupid page
table tricks. In spite of Linus claiming there is no performance win
to be had, I would like to see that put to the test.
> suck all the content back in presumably a log structure is not a big
> concern ?
Sorry, I failed to parse that.
> > * Per chunk locking is not feasible for a terabyte scale ramdisk.
>
> And we care 8) ?
"640K should be enough for anyone"
http://www.violin-memory.com/products/violin1010.html <- 504 GB ramdisk
> > * Handle chunk size other than PAGE_SIZE.
>
> If you are prepared to go bigger than the fs chunk size so lose the
> ordering guarantees your chunk size really ought to be *big* IMHO
The finer the granularity the faster the ramdisk syncs to backing
store. The only attraction of coarse granularity I know of is
shrinking the bitmap, which is currently not so big that it presents
a problem.
Your comment re fs chunk size reveals that I have failed to
communicate the most basic principle of the ramback design: the
backing store is not expected to represent a consistent filesystem
state during normal operation. Only the ramdisk needs to maintain a
consistent state, which I have taken care to ensure. You just need
to believe in your battery, Linux and the hardware it runs on. Which
of these do you mistrust?
Regards,
Daniel
next prev parent reply other threads:[~2008-03-11 3:50 UTC|newest]
Thread overview: 153+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-10 6:46 [ANNOUNCE] Ramback: faster than a speeding bullet Daniel Phillips
2008-03-10 7:51 ` Grzegorz Kulewski
2008-03-10 8:23 ` Daniel Phillips
2008-03-10 9:37 ` Alan Cox
2008-03-10 21:03 ` Lars Marowsky-Bree
2008-03-11 11:14 ` Daniel Phillips
2008-03-11 11:23 ` Lars Marowsky-Bree
2008-03-11 11:50 ` Daniel Phillips
2008-03-11 17:26 ` Chris Friesen
2008-03-11 19:56 ` Daniel Phillips
2008-03-11 20:53 ` Willy Tarreau
2008-03-12 8:17 ` Daniel Phillips
2008-03-12 14:41 ` Mike Snitzer
2008-03-13 20:34 ` Rik van Riel
2008-03-14 2:20 ` Daniel Phillips
2008-03-11 21:56 ` Lars Marowsky-Bree
2008-03-11 23:02 ` Daniel Phillips
2008-03-12 13:25 ` Benny Amorsen
2008-03-12 13:30 ` Alan Cox
2008-03-13 15:29 ` Benny Amorsen
2008-03-14 9:30 ` Pavel Machek
2008-03-14 11:07 ` Ric Wheeler
2008-03-14 11:41 ` Benny Amorsen
2008-03-14 12:12 ` Ric Wheeler
2008-03-14 12:56 ` Theodore Tso
2008-03-14 15:47 ` Ric Wheeler
2008-03-14 16:49 ` Theodore Tso
2008-03-14 17:04 ` Ric Wheeler
2008-03-14 18:03 ` david
2008-03-14 19:03 ` writeback cache dangers " Pavel Machek
2008-03-14 19:29 ` Theodore Tso
2008-03-13 9:15 ` Matthias Schniedermeyer
2008-03-11 23:30 ` Daniel Phillips
2008-03-13 13:27 ` Ric Wheeler
2008-03-13 19:02 ` Daniel Phillips
2008-03-13 19:12 ` Ric Wheeler
2008-03-13 19:38 ` Daniel Phillips
2008-03-11 4:23 ` Daniel Phillips
2008-03-10 9:22 ` Alan Cox
2008-03-10 19:01 ` Rik van Riel
2008-03-11 4:28 ` Daniel Phillips
2008-03-11 3:50 ` Daniel Phillips [this message]
2008-03-11 13:32 ` Artur Skawina
2008-03-11 14:31 ` Artur Skawina
2008-03-12 13:11 ` Alan Cox
2008-03-12 17:29 ` Daniel Phillips
2008-03-12 18:11 ` Chris Friesen
2008-03-12 22:56 ` Daniel Phillips
2008-03-13 5:45 ` David Newall
2008-03-13 6:17 ` Daniel Phillips
2008-03-13 6:30 ` David Newall
2008-03-13 6:50 ` Daniel Phillips
2008-03-13 7:05 ` David Newall
2008-03-13 7:13 ` Daniel Phillips
2008-03-15 13:32 ` Pavel Machek
2008-03-15 20:22 ` Daniel Phillips
2008-03-15 21:33 ` Pavel Machek
2008-03-15 21:47 ` Daniel Phillips
2008-03-13 6:32 ` david
2008-03-13 7:12 ` Daniel Phillips
2008-03-13 7:55 ` david
2008-03-13 8:06 ` Daniel Phillips
2008-03-13 8:39 ` david
2008-03-13 9:16 ` Daniel Phillips
2008-03-13 16:25 ` david
2008-03-13 19:32 ` Daniel Phillips
2008-03-13 19:50 ` David Newall
2008-03-13 20:03 ` Daniel Phillips
2008-03-14 17:53 ` Jeff Moyer
2008-03-15 20:26 ` Pavel Machek
2008-03-15 20:40 ` Mike Snitzer
2008-03-15 21:05 ` Daniel Phillips
2008-03-15 20:18 ` Pavel Machek
2008-03-15 20:51 ` Daniel Phillips
2008-03-13 9:49 ` Daniel Phillips
2008-03-13 5:39 ` David Newall
2008-03-13 6:14 ` Daniel Phillips
2008-03-13 13:22 ` Alan Cox
2008-03-13 19:14 ` Daniel Phillips
2008-03-13 20:27 ` Rik van Riel
2008-03-14 2:23 ` Daniel Phillips
2008-03-14 5:22 ` David Newall
2008-03-14 5:42 ` Daniel Phillips
2008-03-14 14:00 ` John Stoffel
2008-03-15 20:59 ` Willy Tarreau
2008-03-15 20:56 ` Alan Cox
2008-03-15 21:25 ` Daniel Phillips
2008-03-15 21:08 ` Alan Cox
2008-03-15 21:51 ` Daniel Phillips
2008-03-15 21:17 ` Daniel Phillips
2008-03-15 21:03 ` Alan Cox
2008-03-15 22:00 ` Daniel Phillips
2008-03-15 23:05 ` Alan Cox
2008-03-16 21:57 ` Daniel Phillips
2008-03-16 21:55 ` Alan Cox
2008-03-16 22:36 ` Daniel Phillips
2008-03-16 22:46 ` Alan Cox
2008-03-16 23:39 ` Daniel Phillips
2008-03-17 11:53 ` Alan Cox
2008-03-17 1:31 ` David Newall
2008-03-17 2:42 ` Daniel Phillips
2008-03-17 3:59 ` david
2008-03-17 5:52 ` Daniel Phillips
2008-03-17 6:49 ` david
2008-03-17 8:16 ` Daniel Phillips
2008-03-17 10:39 ` Alan Cox
2008-03-17 13:52 ` Ric Wheeler
2008-03-17 14:42 ` david
2008-03-17 17:23 ` david
2008-03-17 17:30 ` Willy Tarreau
[not found] ` <200803180233.10156.phillips@phunq.net>
2008-03-18 13:03 ` David Newall
2008-03-18 16:36 ` david
2008-03-31 11:40 ` Daniel Phillips
2008-04-01 0:28 ` david
2008-04-01 4:07 ` Daniel Phillips
2008-04-01 4:23 ` david
2008-04-01 6:08 ` Daniel Phillips
2008-03-18 13:57 ` Alan Cox
2008-03-31 11:39 ` Daniel Phillips
2008-03-17 7:14 ` David Newall
2008-03-17 8:25 ` Daniel Phillips
2008-03-17 18:56 ` David Newall
2008-03-23 9:33 ` Pavel Machek
2008-03-23 20:44 ` Daniel Phillips
2008-03-15 21:54 ` Willy Tarreau
2008-03-15 22:33 ` Daniel Phillips
2008-03-15 23:22 ` david
2008-03-15 23:57 ` Krzysztof Halasa
2008-03-15 23:22 ` Willy Tarreau
2008-03-16 3:33 ` Daniel Phillips
2008-03-16 5:24 ` David Newall
2008-03-16 12:49 ` Ingo Oeser
2008-03-16 6:56 ` Willy Tarreau
2008-03-16 22:12 ` Krzysztof Halasa
2008-03-16 13:14 ` Alan Cox
2008-03-16 19:04 ` Theodore Tso
2008-03-16 22:02 ` Krzysztof Halasa
2008-03-15 23:18 ` Bernd Eckenfels
2008-03-16 5:42 ` David Newall
2008-03-16 20:48 ` Daniel Phillips
2008-03-16 22:15 ` Krzysztof Halasa
2008-03-16 22:38 ` Daniel Phillips
2008-03-16 23:08 ` Krzysztof Halasa
2008-03-16 23:43 ` Daniel Phillips
2008-03-10 14:51 ` Artur Skawina
2008-03-10 18:49 ` Chris Snook
2008-03-11 5:06 ` Greg KH
2008-03-11 5:22 ` Daniel Phillips
2008-03-11 5:48 ` david
2008-03-11 6:27 ` Greg KH
2008-03-12 12:01 ` tvrtko.ursulin
2008-03-12 17:27 ` Daniel Phillips
[not found] <OFA00954A4.45F32CA2-ON8025740B.005D7B40-8025740B.005EECA6@sophos.com>
2008-03-13 19:34 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200803102050.40567.phillips@phunq.net \
--to=phillips@phunq.net \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox