From: ebiederman@uswest.net (Eric W. Biederman)
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Alexander Viro <viro@math.psu.edu>,
Daniel Phillips <phillips@innominate.de>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] Generic deferred file writing
Date: 30 Dec 2000 15:00:43 -0700 [thread overview]
Message-ID: <m1u27lpo1g.fsf@frodo.biederman.org> (raw)
In-Reply-To: <Pine.LNX.4.10.10012301214210.1017-100000@penguin.transmeta.com>
In-Reply-To: Linus Torvalds's message of "Sat, 30 Dec 2000 12:21:50 -0800 (PST)"
Linus Torvalds <torvalds@transmeta.com> writes:
> In short, I don't see _those_ kinds of issues. I do see error reporting as
> a major issue, though. If we need to do proper low-level block allocation
> in order to get correct ENOSPC handling, then the win from doing deferred
> writes is not very big.
To get ENOSPC handling 99% correct all we need to do is decrement a counter,
that remembers how many disks blocks are free. If we need a better
estimate than just the data blocks it should not be hard to add an
extra callback to the filesystem.
There look to be some interesting cases to handle when we fill up a
filesystem. Before actually failing and returning ENOSPC the
filesystem might want to fsync itself. And see how correct it's
estimates were. But that is the rare case and shouldn't affect
performance.
<rant>
In the long term VFS support for deferred writes looks like a major
win. Knowing how large a file is before we write it to disk allows
very efficient disk organization, and fast file access (esp combined
with an extent based fs). Support for compressing files in real time
falls out naturally. Support for filesystems maintain coherency by
never writing the same block back to the same disk location also
appears.
</rant>
One other thing to think about for the VFS/MM layer is limiting the
total number of dirty pages in the system (to what disk pressure shows
the disk can handle), to keep system performance smooth when swapping.
All cases except mmaped files are easy, and they can be handled by a
modified page fault handler that directly puts the dirty bit on the
struct page. (Except that is buggy with respect to clearing the dirty
bit on the struct page.) In reality we would have to create a queue
of pointers to dirty pte's from the page fault handler and depending
on a timer or memory pressure move the dirty bits to the actual page.
Combined with the code to make sync and fsync to work on the page
cache we msync would be obsolete?
Of course the most important part is that when all of that is
working, the VFS/MM layer it would be perfect. World domination
would be achieved. For who would be caught using an OS with an
imperfect VFS layer :)
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2000-12-30 22:43 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2000-12-30 0:25 test13-pre6 Linus Torvalds
2000-12-30 0:49 ` test13-pre6 Alexander Viro
2000-12-30 1:03 ` test13-pre6 Linus Torvalds
2000-12-30 18:09 ` test13-pre6 Alexander Viro
2000-12-30 2:25 ` test13-pre6 Daniel Phillips
2000-12-30 3:16 ` test13-pre6 Linus Torvalds
2000-12-30 18:58 ` [RFC] Generic deferred file writing Daniel Phillips
2000-12-30 20:05 ` Linus Torvalds
2000-12-30 20:06 ` Alexander Viro
2000-12-30 20:21 ` Linus Torvalds
2000-12-30 21:10 ` Andreas Dilger
2000-12-30 21:46 ` Alexander Viro
2000-12-30 23:12 ` Daniel Phillips
2000-12-30 22:00 ` Eric W. Biederman [this message]
2000-12-30 22:44 ` Linus Torvalds
2000-12-31 0:26 ` Eric W. Biederman
2000-12-31 1:02 ` Andrea Arcangeli
2000-12-31 1:13 ` Chris Wedgwood
2000-12-31 1:50 ` Alexander Viro
2000-12-31 2:34 ` Andrea Arcangeli
2000-12-31 2:09 ` Roman Zippel
2000-12-31 2:28 ` Linus Torvalds
2000-12-31 12:58 ` Roman Zippel
2001-04-21 20:06 ` Races in affs_unlink(), affs_rmdir() and affs_rename() Alexander Viro
2001-04-21 22:16 ` Roman Zippel
2001-04-22 5:53 ` Alexander Viro
2001-04-22 12:57 ` Roman Zippel
2001-04-22 13:15 ` Alexander Viro
2000-12-31 14:38 ` [RFC] Generic deferred file writing Andrea Arcangeli
2000-12-31 16:33 ` Linus Torvalds
2000-12-31 16:50 ` Andrea Arcangeli
2000-12-31 16:51 ` Alexander Viro
2000-12-31 17:12 ` Linus Torvalds
2000-12-31 18:30 ` Daniel Phillips
2000-12-31 18:44 ` Linus Torvalds
2000-12-31 19:10 ` Daniel Phillips
2000-12-31 19:31 ` Linus Torvalds
2000-12-31 21:03 ` Roman Zippel
2000-12-31 21:32 ` Linus Torvalds
2001-01-02 18:27 ` Chris Mason
2000-12-30 3:08 ` test13-pre6 (Fork Bug with Athlons? Temporary Fix) Byron Stanoszek
2000-12-30 3:36 ` Linus Torvalds
2000-12-30 5:55 ` Andi Kleen
2000-12-30 5:13 ` Linus Torvalds
2000-12-30 8:13 ` Graham Murray
2000-12-30 4:21 ` test13-pre6 Dan Aloni
2001-01-04 20:23 ` test13-pre6 Stephen C. Tweedie
2001-01-04 22:15 ` test13-pre6 stewart
[not found] <Pine.LNX.4.10.10012311726230.1671-100000@penguin.transmeta.com>
2001-01-01 2:50 ` [RFC] Generic deferred file writing Roman Zippel
2001-01-01 3:47 ` Alexander Viro
2001-01-01 12:44 ` Roman Zippel
2001-01-01 15:16 ` Alexander Viro
2001-01-02 3:00 ` Roman Zippel
2001-01-02 5:00 ` Alexander Viro
2001-01-02 16:53 ` Roman Zippel
2001-01-01 20:00 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1u27lpo1g.fsf@frodo.biederman.org \
--to=ebiederman@uswest.net \
--cc=linux-kernel@vger.kernel.org \
--cc=phillips@innominate.de \
--cc=torvalds@transmeta.com \
--cc=viro@math.psu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox