From: Willy Tarreau <w@1wt.eu>
To: Daniel Phillips <phillips@phunq.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>,
David Newall <davidn@davidnewall.com>,
linux-kernel@vger.kernel.org
Subject: Re: [ANNOUNCE] Ramback: faster than a speeding bullet
Date: Sun, 16 Mar 2008 07:56:25 +0100 [thread overview]
Message-ID: <20080316065625.GA32135@1wt.eu> (raw)
In-Reply-To: <200803152033.08788.phillips@phunq.net>
On Sat, Mar 15, 2008 at 07:33:07PM -0800, Daniel Phillips wrote:
> On Saturday 15 March 2008 16:22, Willy Tarreau wrote:
> > > That would have been a miscommunication then. I see arguments coming
> > > in that suggest embedded solutions, EMC for example, are inherently more
> > > reliable than a Linux based solution. Well guess what? Some of those
> > > embedded solutions already use Linux.
> >
> > But their RAM does not depend on a lot of factors to remain valid and
> > usable, which is the problem with the common PC.
>
> For example?
What I mean is that in a PC, RAM contents are very fragile :
- weak batteries in your UPS => end of game
- loosy power cable between UPS and PC => end of game (BTW I have a customer
who had such a problem, cables had both disconnected because of their own
weight).
- kernel panic => end of game
- user error during planned maintenance => end of game
- flaky driver writing to wrong memory location => can't trust your data
In a normal PC, even if the RAM itself is a reliable component (ECC, ...)
a lot of such problems which may happen will render it unusable. If you
have to reboot, your BIOS will clean it up for you. That's why people are
trying to explain to you that linux is not reliable enough to work like
this.
Now if you have all your RAM on a PCI-E board with a battery and which is
not initialized by the BIOS so that it survives reboots, it changes a LOT
of things, because all the problems mentionned above go away. Let me
repeat it, the problem is not that those components are too unreliable
to build a transactional system, it is that used in this manner, a very
simple failure of any of them is enough to lose/corrupt all of your data.
Reason why people insist on ordered writes with regular flushes.
> Anecdote time. Remember there used to be "brand name" floppy disks and
> generic floppy disks, and the brand name ones cost a lot more because
> they were supposedly safer? Well, big secret, studies were done and
> the no-name disks came out better. Why? Because selling at commodity
> prices the generic makers could not afford returns. So they made them
> well.
That was not my experience when I was a student. We would buy very cheap
diskettes which were only sold by 100. 20% of them were already defective,
and 20% of the remaining ones could not keep our data till the next morning!
I knew guys who finally stopped copying games due to those diskettes, so
we believed they were sold by game editors :-)
> It is like that with PCs. Supposedly you get a lot more reliability
> when you spend more money and buy all high end near-custom gear. In
> fact, the cheap stuff just keeps on chugging, because those guys can't
> afford to have it break.
>
> So please don't underestimate the reliability of a PC.
If you have understood what I explained above, now you'll understand that
I'm not underestimating the reliability of my PC, just the fact that keeping
access to my RAM contents involves a lot of components, any of which will
definitely ruin my data in case of failure.
> There are bits of Linux that are undeniably dodgy. We get a lot of bug
> reports about usb for example, keyboards just quitting and it's not the
> keyboard's fault. Just say no to usb in a server, at least until some
> fundamental cleanup happens there.
unfortunately, new servers are often USB-only.
> The worst bug I've seen in a server this year? A buggy bios in a Dell
> server that would issue a keyboard error and sit and wait for somebody
> to press F1 when there was no keyboard attached.
I thought this stupidity disappeared about 5 years ago ? I was about to
build PIC-based PS/2 "terminators" to plug into machines to avoid this
problem at that time.
> That is embedded software for you. Personally, I think we do way
> better than that in Linux.
>
> > > Also, peecees are much more reliable than people give them credit for,
> > > especially if you harden up the obvious points of failure such as fans
> > > and spinning disks.
> >
> > and PSU.
>
> Yes. Dual power supplies are highly recommended for this application.
> With dual power supplies you can carry out preemptive maintenance on
> the UPS.
>
> > Securing every component simply reduces the risk of a loss of service.
> > What is important with data is to know the consequences of loss of service.
> > If that only means that no one can work and that the last second of work is
> > lost, it's generally acceptable. If it means everything is lost to a corrupted
> > FS, obviously it's not.
>
> So mirror two of them, I keep saying. If that is not good enough for
> you, then make it three way, and replicate for good measure. The thing
> is, none of that hurts the microsecond level performance, and it gets
> you whatever data security you desire. Whereas anything that requires
> waiting on disk transactions does hurt performance. Since my interest
> currently lies in high performance, that is where my effort goes.
I never spoke about waiting for disk transactions. The RAM must be the
only source and target of user data. Disk is there for permanent storage
and should be written to in the background. YOU proposed the write-through
alternative with your "echo 1". But obviously this voids any advantage of
your work.
> And do I need to say it: patches gratefully accepted.
Hey thanks, but we're not on freshmeat : "here's version 0.1 of foobar,
right now it does nothing but given a massive amount of contributors it
will replace a datacenter in a matchbox".
> For my immediate application... hacking the kernel in comfort... just
> replicating will provide all the data safety I need.
Daniel, you must understand that it is not because it suits *your* needs
that your project will get broad adoption. Many people are showing you
what they don't like in it, and it's not even a design problem, it's just
the way data are synchronized. I think that if you spent your time on
your code instead of arguing by mail against each of us, you would have
already got ordered writes working.
> > Sorry if I was not clear. I was not speaking about replacing the RAM with
> > flash, but only the disks. You keep the RAM for the speed, and use flash
> > for permanent storage instead of disks. No seek time, average RW speed now
> > slightly better than disks, that combined with your ramdisk and ordered
> > write-backs writes will have the best of both worlds : RAM speed and flash
> > reliability.
>
> Right. What we are talking about is filling in a missing level in the
> cache hierarchy, something like:
>
> L1 .3 ns
> L2 3 ns
> L3 30 ns
> Ramdisk 2 us
> Flash 20 us
> Disk 3 ms
>
> Approximate, numbers not necessarily too accurate, but you know what I
> mean. Currently there is this gigantic performance cliff between L3
> memory and disk. Something like the Violin ramdisk fills it in nicely.
> And see, you still need that rotating media because it always will be
> an order of magnitude cheaper than flash.
"always" is far from being a certitude here. "still" is right though.
Prices are driven by customer demand. And building a 128 GB flash
requires a lot less efforts than a hard drive containing a lot of
fragile mechanics. However, I'm not sure that flash will be as much
resistant to environmental annoyances that we're happy to ignore
today, such as solar winds and cosmic rays. Future will tell.
> Tape might still fit in
> there too, though these days it seems increasingly doubtful.
Tapes are used for long-term archival. You can read a tape 20 years
after having written it. A disk... well, the interface to plug it
does not exist anymore, even the electronics process have changed,
as well as voltage levels. Check in your boxes if you have an old
MFM or RLL disk, and see where you can plug it. Maybe you'll find
an old ISA controller with a corrupted BIOS (too old) or at least
which does not support machines faster than 25 MHz.
Tape vendors will still sell you the tape drive (at an amazing
price BTW).
Willy
next prev parent reply other threads:[~2008-03-16 6:57 UTC|newest]
Thread overview: 153+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-10 6:46 [ANNOUNCE] Ramback: faster than a speeding bullet Daniel Phillips
2008-03-10 7:51 ` Grzegorz Kulewski
2008-03-10 8:23 ` Daniel Phillips
2008-03-10 9:37 ` Alan Cox
2008-03-10 21:03 ` Lars Marowsky-Bree
2008-03-11 11:14 ` Daniel Phillips
2008-03-11 11:23 ` Lars Marowsky-Bree
2008-03-11 11:50 ` Daniel Phillips
2008-03-11 17:26 ` Chris Friesen
2008-03-11 19:56 ` Daniel Phillips
2008-03-11 20:53 ` Willy Tarreau
2008-03-12 8:17 ` Daniel Phillips
2008-03-12 14:41 ` Mike Snitzer
2008-03-13 20:34 ` Rik van Riel
2008-03-14 2:20 ` Daniel Phillips
2008-03-11 21:56 ` Lars Marowsky-Bree
2008-03-11 23:02 ` Daniel Phillips
2008-03-12 13:25 ` Benny Amorsen
2008-03-12 13:30 ` Alan Cox
2008-03-13 15:29 ` Benny Amorsen
2008-03-14 9:30 ` Pavel Machek
2008-03-14 11:07 ` Ric Wheeler
2008-03-14 11:41 ` Benny Amorsen
2008-03-14 12:12 ` Ric Wheeler
2008-03-14 12:56 ` Theodore Tso
2008-03-14 15:47 ` Ric Wheeler
2008-03-14 16:49 ` Theodore Tso
2008-03-14 17:04 ` Ric Wheeler
2008-03-14 18:03 ` david
2008-03-14 19:03 ` writeback cache dangers " Pavel Machek
2008-03-14 19:29 ` Theodore Tso
2008-03-13 9:15 ` Matthias Schniedermeyer
2008-03-11 23:30 ` Daniel Phillips
2008-03-13 13:27 ` Ric Wheeler
2008-03-13 19:02 ` Daniel Phillips
2008-03-13 19:12 ` Ric Wheeler
2008-03-13 19:38 ` Daniel Phillips
2008-03-11 4:23 ` Daniel Phillips
2008-03-10 9:22 ` Alan Cox
2008-03-10 19:01 ` Rik van Riel
2008-03-11 4:28 ` Daniel Phillips
2008-03-11 3:50 ` Daniel Phillips
2008-03-11 13:32 ` Artur Skawina
2008-03-11 14:31 ` Artur Skawina
2008-03-12 13:11 ` Alan Cox
2008-03-12 17:29 ` Daniel Phillips
2008-03-12 18:11 ` Chris Friesen
2008-03-12 22:56 ` Daniel Phillips
2008-03-13 5:45 ` David Newall
2008-03-13 6:17 ` Daniel Phillips
2008-03-13 6:30 ` David Newall
2008-03-13 6:50 ` Daniel Phillips
2008-03-13 7:05 ` David Newall
2008-03-13 7:13 ` Daniel Phillips
2008-03-15 13:32 ` Pavel Machek
2008-03-15 20:22 ` Daniel Phillips
2008-03-15 21:33 ` Pavel Machek
2008-03-15 21:47 ` Daniel Phillips
2008-03-13 6:32 ` david
2008-03-13 7:12 ` Daniel Phillips
2008-03-13 7:55 ` david
2008-03-13 8:06 ` Daniel Phillips
2008-03-13 8:39 ` david
2008-03-13 9:16 ` Daniel Phillips
2008-03-13 16:25 ` david
2008-03-13 19:32 ` Daniel Phillips
2008-03-13 19:50 ` David Newall
2008-03-13 20:03 ` Daniel Phillips
2008-03-14 17:53 ` Jeff Moyer
2008-03-15 20:26 ` Pavel Machek
2008-03-15 20:40 ` Mike Snitzer
2008-03-15 21:05 ` Daniel Phillips
2008-03-15 20:18 ` Pavel Machek
2008-03-15 20:51 ` Daniel Phillips
2008-03-13 9:49 ` Daniel Phillips
2008-03-13 5:39 ` David Newall
2008-03-13 6:14 ` Daniel Phillips
2008-03-13 13:22 ` Alan Cox
2008-03-13 19:14 ` Daniel Phillips
2008-03-13 20:27 ` Rik van Riel
2008-03-14 2:23 ` Daniel Phillips
2008-03-14 5:22 ` David Newall
2008-03-14 5:42 ` Daniel Phillips
2008-03-14 14:00 ` John Stoffel
2008-03-15 20:59 ` Willy Tarreau
2008-03-15 20:56 ` Alan Cox
2008-03-15 21:25 ` Daniel Phillips
2008-03-15 21:08 ` Alan Cox
2008-03-15 21:51 ` Daniel Phillips
2008-03-15 21:17 ` Daniel Phillips
2008-03-15 21:03 ` Alan Cox
2008-03-15 22:00 ` Daniel Phillips
2008-03-15 23:05 ` Alan Cox
2008-03-16 21:57 ` Daniel Phillips
2008-03-16 21:55 ` Alan Cox
2008-03-16 22:36 ` Daniel Phillips
2008-03-16 22:46 ` Alan Cox
2008-03-16 23:39 ` Daniel Phillips
2008-03-17 11:53 ` Alan Cox
2008-03-17 1:31 ` David Newall
2008-03-17 2:42 ` Daniel Phillips
2008-03-17 3:59 ` david
2008-03-17 5:52 ` Daniel Phillips
2008-03-17 6:49 ` david
2008-03-17 8:16 ` Daniel Phillips
2008-03-17 10:39 ` Alan Cox
2008-03-17 13:52 ` Ric Wheeler
2008-03-17 14:42 ` david
2008-03-17 17:23 ` david
2008-03-17 17:30 ` Willy Tarreau
[not found] ` <200803180233.10156.phillips@phunq.net>
2008-03-18 13:03 ` David Newall
2008-03-18 16:36 ` david
2008-03-31 11:40 ` Daniel Phillips
2008-04-01 0:28 ` david
2008-04-01 4:07 ` Daniel Phillips
2008-04-01 4:23 ` david
2008-04-01 6:08 ` Daniel Phillips
2008-03-18 13:57 ` Alan Cox
2008-03-31 11:39 ` Daniel Phillips
2008-03-17 7:14 ` David Newall
2008-03-17 8:25 ` Daniel Phillips
2008-03-17 18:56 ` David Newall
2008-03-23 9:33 ` Pavel Machek
2008-03-23 20:44 ` Daniel Phillips
2008-03-15 21:54 ` Willy Tarreau
2008-03-15 22:33 ` Daniel Phillips
2008-03-15 23:22 ` david
2008-03-15 23:57 ` Krzysztof Halasa
2008-03-15 23:22 ` Willy Tarreau
2008-03-16 3:33 ` Daniel Phillips
2008-03-16 5:24 ` David Newall
2008-03-16 12:49 ` Ingo Oeser
2008-03-16 6:56 ` Willy Tarreau [this message]
2008-03-16 22:12 ` Krzysztof Halasa
2008-03-16 13:14 ` Alan Cox
2008-03-16 19:04 ` Theodore Tso
2008-03-16 22:02 ` Krzysztof Halasa
2008-03-15 23:18 ` Bernd Eckenfels
2008-03-16 5:42 ` David Newall
2008-03-16 20:48 ` Daniel Phillips
2008-03-16 22:15 ` Krzysztof Halasa
2008-03-16 22:38 ` Daniel Phillips
2008-03-16 23:08 ` Krzysztof Halasa
2008-03-16 23:43 ` Daniel Phillips
2008-03-10 14:51 ` Artur Skawina
2008-03-10 18:49 ` Chris Snook
2008-03-11 5:06 ` Greg KH
2008-03-11 5:22 ` Daniel Phillips
2008-03-11 5:48 ` david
2008-03-11 6:27 ` Greg KH
2008-03-12 12:01 ` tvrtko.ursulin
2008-03-12 17:27 ` Daniel Phillips
[not found] <OFA00954A4.45F32CA2-ON8025740B.005D7B40-8025740B.005EECA6@sophos.com>
2008-03-13 19:34 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080316065625.GA32135@1wt.eu \
--to=w@1wt.eu \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=davidn@davidnewall.com \
--cc=linux-kernel@vger.kernel.org \
--cc=phillips@phunq.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox