From: Andreas Dilger <adilger@clusterfs.com>
To: Larry McVoy <lm@work.bitmover.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Linus Torvalds <torvalds@transmeta.com>,
Cort Dougan <cort@fsmlabs.com>,
Benjamin LaHaise <bcrl@redhat.com>,
Rusty Russell <rusty@rustcorp.com.au>,
Robert Love <rml@tech9.net>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: latest linus-2.5 BK broken
Date: Thu, 20 Jun 2002 01:26:17 -0600 [thread overview]
Message-ID: <20020620072617.GK22427@clusterfs.com> (raw)
In-Reply-To: <20020619222444.A26194@work.bitmover.com>
On Jun 19, 2002 22:24 -0700, Larry McVoy wrote:
> Linus Torvalds <torvalds@transmeta.com> writes:
> > The compute cluster problem is an interesting one. The big items
> > I see on the todo list are:
> >
> > - Scalable fast distributed file system (Lustre looks like a
> > possibility)
Well, I can speak to this a little bit... Given Lustre's ext3
underpinnings, we have been thinking of some interesting methods
by which we could take an existing ext3 filesystem on a disk and
"clusterify" it (i.e. have distributed coherency across multiple
clients). This would be perfectly suited for application on a
CC cluster.
Given that the network communication protocols are also abstracted
out from the Lustre core, it would probably be trivial for someone
with network/VM experience to write a "no-op" networking layer
which basically did little more than passing around page addresses
and faulting the right pages into each OSlet. The protocol design
is already set up to handle direct DMA between client and storage
target, and a CC cluster could also do away with the actual copy
involved in the DMA. We can already do "zero copy" I/O between
user-space and a remote disk with O_DIRECT and the right network
hardware (which does direct DMA from one node to another).
> "Paul McKenney" <Paul.McKenney@us.ibm.com> writes:
> Access to Devices Owned by Some Other OSlet
>
> Larry mentioned a /rdev, but if we discussed any details
> of this, I have lost them. Presumably, one would use some
> sort of IPC or doors to make this work.
I would just make access to remote devices act like NBD or something,
and have similar "network/proxy" kernel drivers to all "remote" devices.
At boot time something like devfs would instantiate the "proxy"
drivers for all of the kernels except the one which is "in control"
of that device.
For example /dev/hda would be a real IDE disk device driver on the
controlling node, but would be NBD in all of the other OSlets. It would
have the same major/minor number across all OSlets so that it presented
a uniform interface to user-space. While in some cases (e.g. FC) you
could have shared-access directly to the device, other devices don't
have the correct locking mechanisms internally to be accessed by more
than one thread at a time.
As the "network" layer between two OSlets would run basically at memory
speeds, this would not impose much of an overhead. The proxy device
interfaces would be equally useful between OSlets as with two remote
machines (e.g. remote modem access), so I have no doubt that many of
them already exist, and the others could be written rather easily.
> Access to Filesystems Owned by Some Other OSlet.
>
> For the most part, this reduces to the mmap case. However,
> partitioning popular filesystems over the OSlets could be
> very helpful. Larry mentioned that this had been prototyped.
> Paul cannot remember if Larry promised to send papers or
> other documentation, but duly requests them after the fact.
>
> Larry suggests having a local /tmp, so that /tmp is in effect
> private to each OSlet. There would be a /gtmp that would
> be a globally visible /tmp equivalent. We went round and
> round on software compatibility, Paul suggesting a hashed
> filesystem as an alternative. Larry eventually pointed out
> that one could just issue different mount commands to get
> a global filesystem in /tmp, and create a per-OSlet /ltmp.
> This would allow people to determine their own level of
> risk/performance.
Nah, just use a cluster filesystem for everything ;-). As I mentioned
previously, Lustre could run from a single (optionally shared-access) disk
(with proper, relatively minor, hacks that are just in the discussion
phase now), or it can run from distributed disks that serve the data to
the remote clients. With smart allocation of resources, OSlets will
prefer to create new files on their "local" storage unless there are
resource shortages. The fast "networking" between OSlets means even
"remote" disk access is cheap.
Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/
next prev parent reply other threads:[~2002-06-20 7:28 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-06-18 17:18 latest linus-2.5 BK broken James Simmons
2002-06-18 17:46 ` Robert Love
2002-06-18 18:51 ` Rusty Russell
2002-06-18 18:43 ` Zwane Mwaikambo
2002-06-18 18:56 ` Linus Torvalds
2002-06-18 18:59 ` Robert Love
2002-06-18 20:05 ` Rusty Russell
2002-06-18 20:05 ` Linus Torvalds
2002-06-18 20:31 ` Rusty Russell
2002-06-18 20:41 ` Linus Torvalds
2002-06-18 21:12 ` Benjamin LaHaise
2002-06-18 21:08 ` Cort Dougan
2002-06-18 21:47 ` Linus Torvalds
2002-06-19 12:29 ` Eric W. Biederman
2002-06-19 17:27 ` Linus Torvalds
2002-06-20 3:57 ` Eric W. Biederman
2002-06-20 5:24 ` Larry McVoy
2002-06-20 7:26 ` Andreas Dilger [this message]
2002-06-20 14:54 ` Eric W. Biederman
2002-06-20 15:41 ` McVoy's Clusters (was Re: latest linus-2.5 BK broken) Sandy Harris
2002-06-20 17:10 ` William Lee Irwin III
2002-06-20 20:42 ` Timothy D. Witham
2002-06-21 5:16 ` Eric W. Biederman
2002-06-22 14:14 ` Kai Henningsen
2002-06-20 16:30 ` latest linus-2.5 BK broken Cort Dougan
2002-06-20 17:15 ` Linus Torvalds
2002-06-21 6:15 ` Eric W. Biederman
2002-06-21 17:50 ` Larry McVoy
2002-06-21 17:55 ` Robert Love
2002-06-21 18:09 ` Linux, the microkernel (was Re: latest linus-2.5 BK broken) Jeff Garzik
2002-06-21 18:46 ` Cort Dougan
2002-06-21 20:25 ` Daniel Phillips
2002-06-22 1:07 ` Horst von Brand
2002-06-22 1:23 ` Larry McVoy
2002-06-22 12:41 ` Roman Zippel
2002-06-23 15:15 ` Sandy Harris
2002-06-23 17:29 ` Jakob Oestergaard
2002-06-24 6:27 ` Craig I. Hagan
2002-06-24 13:06 ` J.A. Magallon
2002-06-24 10:59 ` Eric W. Biederman
2002-06-21 19:34 ` Rob Landley
2002-06-22 15:31 ` Alan Cox
2002-06-22 12:24 ` Rob Landley
2002-06-22 19:00 ` Ruth Ivimey-Cook
2002-06-22 21:09 ` jdow
2002-06-23 17:56 ` John Alvord
2002-06-23 20:48 ` jdow
2002-06-23 21:40 ` [OT] " Xavier Bestel
2002-06-22 18:25 ` latest linus-2.5 BK broken Eric W. Biederman
2002-06-22 19:26 ` Larry McVoy
2002-06-22 22:25 ` Eric W. Biederman
2002-06-22 23:10 ` Larry McVoy
2002-06-23 6:34 ` William Lee Irwin III
2002-06-23 22:56 ` Kai Henningsen
2002-06-20 17:16 ` RW Hawkins
2002-06-20 17:23 ` Cort Dougan
2002-06-20 20:40 ` Martin Dalecki
2002-06-20 20:53 ` Linus Torvalds
2002-06-20 21:27 ` Martin Dalecki
2002-06-20 21:37 ` Linus Torvalds
2002-06-20 21:59 ` Martin Dalecki
2002-06-20 22:18 ` Linus Torvalds
2002-06-20 22:41 ` Martin Dalecki
2002-06-21 0:09 ` Allen Campbell
2002-06-21 7:43 ` Zwane Mwaikambo
2002-06-21 21:02 ` Rob Landley
2002-06-22 3:57 ` (RFC)i386 arch autodetect( was Re: latest linus-2.5 BK broken ) Matthew D. Pitts
2002-06-22 4:54 ` William Lee Irwin III
2002-06-21 16:01 ` Re: latest linus-2.5 BK broken Sandy Harris
2002-06-21 20:38 ` Rob Landley
2002-06-20 21:13 ` Timothy D. Witham
2002-06-21 19:53 ` Rob Landley
2002-06-21 5:34 ` Eric W. Biederman
2002-06-19 10:21 ` Padraig Brady
2002-06-18 21:45 ` Bill Huey
2002-06-18 20:55 ` Robert Love
2002-06-19 13:31 ` Rusty Russell
2002-06-18 19:29 ` Benjamin LaHaise
2002-06-18 19:19 ` Zwane Mwaikambo
2002-06-18 19:49 ` Benjamin LaHaise
2002-06-18 19:27 ` Zwane Mwaikambo
2002-06-18 20:13 ` Rusty Russell
2002-06-18 20:21 ` Linus Torvalds
2002-06-18 22:03 ` Ingo Molnar
-- strict thread matches above, loose matches on Subject: below --
2002-06-18 23:38 Michael Hohnbaum
2002-06-18 23:57 ` Ingo Molnar
2002-06-19 0:08 ` Ingo Molnar
2002-06-19 1:00 ` Matthew Dobson
2002-06-19 23:48 ` Michael Hohnbaum
[not found] <E17KSLb-0007Dj-00@wagner.rustcorp.com.au>
2002-06-19 0:12 ` Linus Torvalds
2002-06-19 15:23 ` Rusty Russell
2002-06-19 16:28 ` Linus Torvalds
2002-06-19 20:57 ` Rusty Russell
2002-06-20 23:48 Miles Lane
2002-06-21 7:31 Martin Knoblauch
2002-06-21 12:59 Jesse Pollard
2002-06-24 21:28 Paul McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020620072617.GK22427@clusterfs.com \
--to=adilger@clusterfs.com \
--cc=bcrl@redhat.com \
--cc=cort@fsmlabs.com \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lm@work.bitmover.com \
--cc=rml@tech9.net \
--cc=rusty@rustcorp.com.au \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox