From: David Howells <dhowells@cambridge.redhat.com>
To: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: David Howells <dhowells@cambridge.redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] AFS filesystem for Linux (2/2)
Date: Fri, 04 Oct 2002 16:34:32 +0100 [thread overview]
Message-ID: <27276.1033745672@warthog.cambridge.redhat.com> (raw)
In-Reply-To: Message from Jan Harkes <jaharkes@cs.cmu.edu> of "Fri, 04 Oct 2002 10:02:29 EDT." <20021004140229.GA11066@ravel.coda.cs.cmu.edu>
Hi Jan,
> We don't. Coda has the all or nothing fetch. When we get an open upcall and
> the file isn't cached we get the whole file, and return the filehandle we
> just used to fetch the file, that way even when pages haven't been flushed
> to disk yet, the kernel will see the same data. All reads and writes are
> wrapped in such a way that readpage and writepage directly access the cache
> file, when an mmap is active the Coda inode isn't even touched anymore.
So, say you've got a really big file in your cache. Someone changes a single
byte in it. You can work out that the file has changed by one of a number of
means. How do you avoid having to throw the entire file back to the server?
> Pretty much, but we need that extra space for the disconnected writes
> anyways, that way we can always roll back to a consistent version.
How do you sync up with the server on reconnection? This is similar to the
problem you pointed out that I have to solve - how to deal with the file on
the server being changed by another client whilst I'm also trying to write to
it.
> Ok, traditional AFS semantics is 'session semantics'. Very strict, whenever
> you read from a file, everything is consistent wrt. the time you opened the
> file. Whatever you write isn't committed on the server until you close the
> file. This model has great advantages such as minimizing network traffic,
> giving lock free read/write consistency.
Whatever model you choose, you have to accept some compromises.
The model I'm thinking of is as follows:
(*) Data that the client doesn't have immediately to hand (ie is not yet
cached) it fetched a chunk at a time upon request of the VM (thus
allowing data readahead to be driven by the VM).
(*) All data cached by a client for a particular file is zapped if I get a
callback from the server and/or the data version number of that file
appears to have changed.
(*) In O_SYNC mode, data is written back to the server as promptly as
possible within the write() call (maybe through the auspices of
prepare_write and commit_write).
(*) In non-O_SYNC mode, I would like the data to be written back through the
page cache's writepage() routine(s). By setting the dirty bits on pages,
the write will be scheduled by the VM at some point. This would permit
better write coelescing locally. However, security becomes a problem,
since I have to say to the server which user I'm doing a store as, and if
the data is coelesced from writes done as several different users, then
there could be a problem if the store is rejected.
How does Coda deal with this security problem?
I admit there are a number of problems with this model that might be
alleviated by better operations being available in the AFS spec (such as data
insertion without having to nominate a new EOF position, and data appending
without needing to know the old or specify a new EOF position.
> Now people started throwing big databases in the filesystem, and the cache
> issues became important. So they introduced 'chunked access', dirty chunks
> are still written when the file is closed, but also when the cache is full
> (oops, lost write consistency).
Anyone using a network filesystem of any type to store a big databases is
probably just asking for trouble. IMHO they're far better off talking to a
distributed DB through its own network access protocol. But that's besides the
point.
> > (4) "Diff" the page in the pagecache against a copy stored in the cache
> > and try to send the changes to the server.
>
> As far as I know this is impossible without changing the existing AFS
> servers.
It can be done entirely in the client, provided it has a copy of the
unmodified page still in its cache.
> Doesn't AFS3 give callbacks when it updates a chunk of the file? I guess it
> still has retained at least that part of the original semantics, send
> callbacks when the file is closed (and the data is 'officially'
> committed). It is still up in the air what clients see that read the file
> between the chunked writes and the actual file close.
All it says is that a given file has changed. It doesn't provide a clue as to
where. Hence the entire file has to be flushed.
> Disconnected operation has never been 'AFS semantics'. That's a Coda thing.
I didn't say it was. It's explicitly denied in the AFS docs, though it's
discussed under the future developments section.
> A Coda cell is simply a FQDN, whenever the userspace cachemanager accesses a
> new cell it a locally unique ID, which will exist as long as there are
> objects from that cell in the cache.
The 32-bit ID still has to be mapped through a catalogue somewhere (may be a
text file that the cache manager reads on starting), and if the catalogue is
external to the cache, the catalogue may change what the ID corresponds to
without the cache being invalidated.
> Why would it probably be swapped out to disc? If you're really worried about
> that you could mlock the memory. And if you think that is too expensive, it
> is still better to mlock memory in userspace that to allocate that same
> memory in kernel space.
I'm storing mine on disc as do normal disc-based FS's. That means it can be a
lot bigger. Besides, you don't really want to mlock memory or store it in the
kernel - that would be a big chunk of memory permanently committed and
unavailable for other uses.
> Yeah, that's why Coda is using a recoverable VM, basically a mmapped
> file with an log where modifications are recorded so that we can
> replay/rollback uncommitted operations when we're restarting.
What's "VM" in this context?
David
next prev parent reply other threads:[~2002-10-04 15:29 UTC|newest]
Thread overview: 316+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <7146.1033580256@warthog.cambridge.redhat.com>
2002-10-03 0:36 ` [PATCH] AFS filesystem for Linux (2/2) Linus Torvalds
2002-10-03 9:05 ` David Howells
2002-10-03 16:53 ` Jan Harkes
2002-10-03 17:45 ` Jan Harkes
2002-10-03 21:46 ` David Howells
2002-10-04 8:13 ` David Howells
[not found] ` <15381.1033681790@warthog.cambridge.redhat.com>
2002-10-04 14:02 ` Jan Harkes
2002-10-04 14:40 ` Trond Myklebust
2002-10-04 15:35 ` David Howells
2002-10-04 15:53 ` Trond Myklebust
2002-10-04 15:56 ` David Howells
2002-10-04 16:03 ` Trond Myklebust
2002-10-04 16:17 ` David Howells
2002-10-04 17:04 ` Trond Myklebust
2002-10-04 17:29 ` David Howells
2002-10-07 14:14 ` David Howells
2002-10-07 14:54 ` Trond Myklebust
2002-10-07 15:36 ` David Howells
2002-10-04 16:30 ` Andreas Dilger
2002-10-04 15:34 ` David Howells [this message]
2002-10-04 16:07 ` Jan Harkes
2002-10-04 16:56 ` David Howells
2002-10-04 17:36 ` Jan Harkes
2002-10-07 9:14 ` David Howells
2002-10-06 16:49 ` Troy Benjegerdes
2002-10-07 9:16 ` David Howells
2002-10-04 14:11 ` [patch] [kkern] " Patrick Audley
2001-05-14 19:19 LANANA: To Pending Device Number Registrants H. Peter Anvin
2001-05-14 19:36 ` Jeff Garzik
2001-05-14 19:57 ` H. Peter Anvin
2001-05-14 20:04 ` Jeff Garzik
2001-05-14 20:09 ` Alan Cox
2001-05-14 20:24 ` Jeff Garzik
2001-05-14 20:27 ` H. Peter Anvin
2001-05-14 22:21 ` Alan Cox
2001-05-14 23:43 ` Jan Niehusmann
2001-05-14 23:48 ` Alan Cox
2001-05-14 20:29 ` Linus Torvalds
2001-05-14 20:55 ` Neil Brown
2001-05-14 21:20 ` Alan Cox
2001-05-14 21:37 ` Neil Brown
2001-05-14 21:24 ` Jeff Garzik
2001-05-14 21:33 ` Neil Brown
2001-05-15 6:41 ` Linus Torvalds
2001-05-15 8:57 ` Alan Cox
2001-05-15 9:08 ` Linus Torvalds
2001-05-15 9:26 ` Alan Cox
2001-05-15 9:49 ` Alexander Viro
2001-05-15 9:51 ` Alan Cox
2001-05-15 10:12 ` Alexander Viro
2001-05-15 10:36 ` Alan Cox
2001-05-15 15:16 ` Linus Torvalds
2001-05-15 20:55 ` Alan Cox
2001-05-15 15:10 ` Linus Torvalds
2001-05-15 15:29 ` Alexander Viro
2001-05-15 17:21 ` James Simmons
2001-05-15 17:25 ` Alexander Viro
2001-05-15 17:29 ` James Simmons
2001-05-15 17:32 ` Alexander Viro
2001-05-15 17:44 ` James Simmons
2001-05-15 18:18 ` Ingo Oeser
2001-05-15 18:36 ` James Simmons
2001-05-15 18:42 ` Alexander Viro
2001-05-16 8:29 ` Helge Hafting
2001-05-16 17:16 ` James Simmons
2001-05-15 21:46 ` Chip Salzenberg
2001-05-15 21:50 ` James Simmons
2001-05-15 18:04 ` Linus Torvalds
2001-05-15 18:58 ` Johannes Erdfelt
2001-05-15 19:17 ` Linus Torvalds
2001-05-15 19:23 ` H. Peter Anvin
2001-05-15 19:43 ` Johannes Erdfelt
2001-05-15 21:58 ` Chip Salzenberg
2001-05-16 8:51 ` Helge Hafting
2001-05-17 10:20 ` Pavel Machek
2001-05-18 17:32 ` Johannes Erdfelt
2001-05-19 10:21 ` Pavel Machek
2001-05-17 20:40 ` Kai Henningsen
2001-05-17 22:46 ` Johannes Erdfelt
2001-05-19 8:18 ` Kai Henningsen
2001-05-15 20:03 ` James Simmons
2001-05-15 20:06 ` H. Peter Anvin
2001-05-15 20:28 ` James Simmons
2001-05-15 21:20 ` Nicolas Pitre
2001-05-15 21:28 ` James Simmons
2001-05-15 21:31 ` H. Peter Anvin
2001-05-16 7:11 ` Kai Henningsen
2001-05-16 7:43 ` Alexander Viro
2001-05-16 9:45 ` Malcolm Beattie
2001-05-15 21:43 ` Johannes Erdfelt
2001-05-15 21:49 ` James Simmons
2001-05-16 7:05 ` Kai Henningsen
2001-05-15 22:07 ` Alan Cox
2001-05-16 0:59 ` Daniel Phillips
2001-05-16 1:34 ` Nicolas Pitre
2001-05-16 1:51 ` Jonathan Lundell
2001-05-16 7:17 ` Kai Henningsen
2001-05-16 11:34 ` Erik Mouw
2001-05-17 17:07 ` Eric W. Biederman
2001-05-17 19:30 ` Jeff Randall
2001-05-15 20:14 ` Alexander Viro
2001-05-15 20:30 ` H. Peter Anvin
2001-05-15 20:41 ` Alexander Viro
2001-05-15 20:51 ` Linus Torvalds
2001-05-16 1:01 ` Daniel Phillips
2001-05-16 1:04 ` H. Peter Anvin
2001-05-15 20:37 ` Linus Torvalds
2001-05-15 20:56 ` Jeff Garzik
2001-05-15 21:22 ` James Simmons
2001-05-17 10:42 ` Pavel Machek
2001-05-18 18:32 ` James Simmons
2001-05-19 10:23 ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants] Pavel Machek
2001-05-19 19:00 ` Linus Torvalds
2001-05-19 19:17 ` Pavel Machek
2001-05-19 19:35 ` Linus Torvalds
2001-05-19 19:43 ` Pavel Machek
2001-05-19 20:31 ` Tim Jansen
2001-05-19 23:57 ` Alexander Viro
2001-05-20 7:18 ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber Registrants] Abramo Bagnara
2001-05-20 7:41 ` Alexander Viro
2001-05-20 8:30 ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumberRegistrants] Abramo Bagnara
2001-05-20 10:09 ` Alexander Viro
2001-05-20 9:53 ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Num Kai Henningsen
2001-05-20 13:40 ` Alexander Viro
2001-05-20 14:27 ` Tim Jansen
2001-05-20 14:30 ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNum Abramo Bagnara
2001-05-20 14:45 ` Alexander Viro
2001-05-20 15:00 ` Abramo Bagnara
2001-05-20 15:18 ` Alexander Viro
2001-05-20 15:40 ` Abramo Bagnara
2001-05-20 16:01 ` Alexander Viro
2001-05-20 15:26 ` Jakob Østergaard
2001-05-20 15:42 ` Alexander Viro
2001-05-21 17:45 ` Oliver Xymoron
2001-05-21 18:14 ` Alexander Viro
2001-05-21 18:37 ` Oliver Xymoron
2001-05-21 18:49 ` Alexander Viro
2001-05-21 19:08 ` Oliver Xymoron
2001-05-22 5:56 ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Num Pavel Machek
2001-05-20 0:01 ` no ioctls for serial ports? [was Re: LANANA: To Pending Device Number Registrants] Alexander Viro
2001-05-20 11:17 ` handling network using filesystem [was Re: no ioctls for serial ports?] Pavel Machek
2001-05-19 20:11 ` no ioctls for serial ports? [was Re: LANANA: To Pending DeviceNumber Registrants] Abramo Bagnara
2001-05-17 20:33 ` LANANA: To Pending Device Number Registrants Kai Henningsen
2001-05-15 20:57 ` James Simmons
2001-05-15 20:17 ` H. Peter Anvin
2001-05-15 21:59 ` Chip Salzenberg
2001-05-15 22:51 ` James Simmons
2001-05-15 21:22 ` Jan Harkes
2001-05-15 21:39 ` Martin Dalecki
2001-05-15 18:02 ` Ingo Oeser
2001-05-15 19:31 ` Richard Gooch
2001-05-15 19:37 ` H. Peter Anvin
2001-05-15 20:10 ` Alan Cox
2001-05-15 21:41 ` Richard Gooch
2001-05-15 21:47 ` Alexander Viro
2001-05-15 22:24 ` Richard Gooch
2001-05-15 22:27 ` H. Peter Anvin
2001-05-15 22:38 ` Alexander Viro
2001-05-15 22:14 ` Alan Cox
2001-05-15 22:28 ` Richard Gooch
2001-05-15 22:32 ` H. Peter Anvin
2001-05-15 22:33 ` Alan Cox
2001-05-16 7:21 ` Geert Uytterhoeven
2001-05-16 18:22 ` Richard Gooch
2001-05-16 19:36 ` H. Peter Anvin
2001-05-16 20:01 ` Richard Gooch
2001-05-16 20:05 ` H. Peter Anvin
2001-05-16 20:54 ` Richard Gooch
2001-05-16 21:36 ` H. Peter Anvin
2001-05-16 22:11 ` Ingo Oeser
2001-05-16 22:13 ` H. Peter Anvin
2001-05-16 22:21 ` Jens Axboe
2001-05-16 23:03 ` Richard Gooch
2001-05-16 23:25 ` H. Peter Anvin
2001-05-16 23:37 ` Richard Gooch
2001-05-16 23:38 ` H. Peter Anvin
2001-05-16 23:41 ` Richard Gooch
2001-05-16 23:43 ` H. Peter Anvin
2001-05-16 23:49 ` Richard Gooch
2001-05-16 23:55 ` H. Peter Anvin
2001-05-17 21:12 ` Kai Henningsen
2001-05-17 21:06 ` Kai Henningsen
2001-05-16 20:18 ` Linus Torvalds
2001-05-16 20:44 ` Richard Gooch
2001-05-16 23:51 ` Alan Cox
2001-05-16 23:58 ` Richard Gooch
2001-05-17 0:12 ` H. Peter Anvin
2001-05-17 0:24 ` Alan Cox
2001-05-17 1:35 ` Jeff Garzik
2001-05-17 9:33 ` Guest section DW
2001-05-15 20:58 ` Alan Cox
2001-05-15 21:42 ` Chip Salzenberg
2001-05-15 21:46 ` Alexander Viro
2001-05-15 21:57 ` H. Peter Anvin
2001-05-15 22:07 ` Chip Salzenberg
2001-05-15 22:11 ` H. Peter Anvin
2001-05-15 22:18 ` Alan Cox
2001-05-15 21:40 ` Chip Salzenberg
2001-05-15 22:12 ` Alan Cox
2001-05-15 22:19 ` H. Peter Anvin
2001-05-15 22:28 ` Alan Cox
2001-05-15 22:34 ` H. Peter Anvin
2001-05-15 23:39 ` Chip Salzenberg
2001-05-16 20:37 ` Alan Cox
2001-05-15 22:49 ` James Simmons
2001-05-15 23:22 ` Kenneth Johansson
2001-05-15 9:28 ` Alan Cox
2001-05-15 15:15 ` Linus Torvalds
2001-05-15 15:19 ` Jeff Garzik
2001-05-15 15:45 ` Linus Torvalds
2001-05-15 17:27 ` James Simmons
2001-05-15 17:43 ` Linus Torvalds
2001-05-15 18:04 ` Jeff Garzik
2001-05-15 18:15 ` Linus Torvalds
2001-05-15 19:33 ` Kai Henningsen
2001-05-15 19:36 ` Jonathan Lundell
2001-05-15 20:18 ` Linus Torvalds
2001-05-15 20:26 ` Dan Hollis
2001-05-15 22:14 ` Miles Lane
2001-05-15 21:29 ` Alex Bligh - linux-kernel
2001-05-15 21:36 ` Linus Torvalds
2001-05-15 22:03 ` Jeff Mahoney
2001-05-15 22:42 ` Andreas Dilger
2001-05-15 21:51 ` Mark Frazer
2001-05-15 22:35 ` Bob Glamm
2001-05-16 0:56 ` Jonathan Lundell
2001-05-16 2:31 ` Andrew Morton
2001-05-16 6:56 ` Jonathan Lundell
2001-05-16 8:02 ` Vojtech Pavlik
2001-05-16 14:37 ` Jonathan Lundell
2001-05-16 14:57 ` Vojtech Pavlik
2001-05-16 15:24 ` Jonathan Lundell
2001-05-16 12:20 ` Bogdan Costescu
2001-05-16 7:24 ` Geert Uytterhoeven
2001-05-16 23:26 ` Alan Cox
2001-05-16 23:31 ` H. Peter Anvin
2001-05-16 23:53 ` Linus Torvalds
2001-05-17 0:21 ` Alan Cox
2001-05-17 7:57 ` Geert Uytterhoeven
2001-05-17 16:26 ` James Simmons
2001-05-17 6:43 ` Thomas Sailer
2001-05-17 16:58 ` Tim Jansen
2001-05-17 17:18 ` James Simmons
2001-05-17 17:29 ` Geert Uytterhoeven
2001-05-17 17:41 ` Tim Jansen
2001-05-17 22:03 ` Oliver Neukum
2001-05-16 23:52 ` Linus Torvalds
2001-05-17 1:26 ` Joel Becker
2001-05-16 16:04 ` Michael Meissner
2001-05-16 21:36 ` Andreas Dilger
2001-05-17 21:23 ` Kai Henningsen
2001-05-18 2:18 ` Jonathan Lundell
2001-05-19 8:42 ` Kai Henningsen
2001-05-19 17:36 ` Jonathan Lundell
2001-05-20 9:37 ` Eric W. Biederman
2001-05-20 15:54 ` Jonathan Lundell
2001-05-20 14:16 ` Chris Wedgwood
2001-05-20 15:57 ` Jonathan Lundell
2001-05-19 17:45 ` Jonathan Lundell
2001-05-16 7:25 ` Geert Uytterhoeven
2001-05-15 18:19 ` James Simmons
2001-05-15 20:23 ` Alan Cox
2001-05-15 20:28 ` H. Peter Anvin
2001-05-15 21:52 ` Andreas Dilger
2001-05-15 20:02 ` Dan Hollis
2001-05-15 11:44 ` Neil Brown
2001-05-15 15:34 ` Linus Torvalds
2001-05-16 1:00 ` Daniel Phillips
2001-05-16 12:58 ` Jens Axboe
2001-05-16 3:25 ` Neil Brown
2001-05-15 15:51 ` John Fremlin
2001-05-14 21:09 ` Andi Kleen
2001-05-14 23:34 ` Richard Gooch
2001-05-14 21:11 ` Rik van Riel
2001-05-14 21:23 ` Alan Cox
2001-05-15 0:33 ` Rik van Riel
2001-05-16 9:04 ` Ingo Oeser
2001-05-14 21:16 ` Alan Cox
2001-05-14 22:05 ` Alexander Viro
2001-05-14 22:30 ` Alan Cox
2001-05-14 22:48 ` Alexander Viro
2001-05-14 22:46 ` Alan Cox
2001-05-14 22:53 ` Alexander Viro
2001-05-14 22:54 ` H. Peter Anvin
2001-05-14 23:00 ` Alexander Viro
2001-05-14 22:58 ` Alan Cox
2001-05-14 23:29 ` Alexander Viro
2001-05-14 23:39 ` Richard Gooch
2001-05-15 4:20 ` God
2001-05-15 7:48 ` 2.4 " bert hubert
2001-05-15 8:54 ` Alan Cox
2001-05-15 9:09 ` bert hubert
2001-05-14 23:18 ` LANANA: " Arjan van de Ven
2001-05-14 23:20 ` Alan Cox
2001-05-15 18:57 ` Kai Henningsen
2001-05-15 5:56 ` Oliver Neukum
2001-05-15 5:59 ` H. Peter Anvin
2001-05-14 22:55 ` Alan Cox
2001-05-14 23:11 ` Dan Hollis
2001-05-14 23:19 ` Alan Cox
2001-05-14 23:23 ` Alexander Viro
2001-05-15 1:10 ` Keith Owens
2001-05-15 4:12 ` LANANA: Getting out of hand? God
2001-05-15 4:30 ` Linus Torvalds
2001-05-15 5:17 ` Linus Torvalds
2001-05-15 8:24 ` Geert Uytterhoeven
2001-05-15 8:48 ` Alan Cox
2001-05-15 21:16 ` Martin Dalecki
2001-05-14 23:01 ` Interrupted sound with 2.4.4-ac6 Hermann Himmelbauer
2001-05-14 21:18 ` LANANA: To Pending Device Number Registrants Alan Cox
2001-05-14 23:32 ` Richard Gooch
2001-05-14 20:09 ` Richard Gooch
2001-05-14 20:14 ` Jeff Garzik
2001-05-15 17:37 ` Pavel Machek
2001-05-17 11:32 ` Alan Cox
2001-05-16 15:58 ` Kurt Garloff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=27276.1033745672@warthog.cambridge.redhat.com \
--to=dhowells@cambridge.redhat.com \
--cc=jaharkes@cs.cmu.edu \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox