From: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
To: Bill Huey <billh@gnuppy.monkey.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Andreas Dilger <adilger@clusterfs.com>,
Marat Buharov <marat.buharov@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mike Galbraith <efault@gmx.de>,
LKML <linux-kernel@vger.kernel.org>,
Jens Axboe <jens.axboe@oracle.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
Alex Tomas <alex@clusterfs.com>
Subject: Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)
Date: Sun, 29 Apr 2007 00:38:03 +0200 (CEST) [thread overview]
Message-ID: <Pine.LNX.4.64.0704290006330.12830@artax.karlin.mff.cuni.cz> (raw)
In-Reply-To: <20070428215756.GA13477@gnuppy.monkey.org>
> On Sat, Apr 28, 2007 at 07:37:17AM +0200, Mikulas Patocka wrote:
>> SpadFS doesn't write to unallocated parts like log filesystems (LFS) or
>> phase tree filesystems (TUX2); it writes inside normal used structures,
>> but it marks each structure with generation tags --- when it updates
>> global table of tags, it atomically makes several structures valid. I
>> don't know about this idea being used elsewhere.
>
> So how is this generation structure organized ? paper ?
Paper is in CITSA 2006 proceedings (but you likely don't have them and I
signed some statement that I can't post it elsewhere :-( )
Basicly the idea is this:
* you have array containing 65536 32-bit numbers --- crash count table ---
that array is on disk and in memory (see struct __spadfs->cct in my sources)
* you have 16-bit value --- crash count, that value is on disk and in memory
too (see struct __spadfs->cc)
* On mount, you load crash count table and crash count from disk to
memory. You increment carsh count on disk (but leave old in memory). You
increment one entry in crash count table - cct[cc] in memory, but leave
old on disk.
* On sync you write all metadata buffers, do write barrier, write one
sector of crash count table from memory to disk and do write
barrier again.
* On unmount, you sync and decrement crash count on disk.
--- so crash count counts crashes --- it is increased each time you mount
and don't unmount.
Consistency of structures:
* Each directory entry has two tags --- 32-bit transaction count (txc)
and 16-bit crash count(cc).
* You create directory entry with entry->txc = fs->txc[fs->cc] and
entry->cc = fs->cc
* Directory entry is considered valid if fs->txc[entry->cc] >= entry->txc
(see macro CC_VALID)
* If the directory entry is not valid, it is skipped during directory
scan, as if it wasn't there
--- so you create a directory entry and its valid. If the system crashes,
it will load crash count table from disk and there's one-less value than
entry->txc, so the entry will be invalid. It will also run with increased
cc, so it will never touch txc at an old index, so the entry will be valid
forever.
--- if you sync, you write crash count table to disk and directory entry
will be atomically made valid forever (because values in crash count table
never decrease)
In my implementation, the top bit of entry->txc is used to mark whether
the entry is scheduled for adding or delete, so that you can atomically
add one directory entry and delete other.
Space allocation bitmaps or lists are managed in such a way that there are
two copies and cc/txc pair determining which one is valid.
Files are extended in such a way that each file has two "size" entries and
cc/txc pair denoting which one is valid, so that you can atomically
extend/truncate file and mark its space allocated/freed in bitmaps or
lists (BTW. this cc/txc pair is the same one that denotes if the directory
entry is valid and another bit determines one of these two functions ---
to save space).
Mikulas
next prev parent reply other threads:[~2007-04-28 22:38 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1177660767.6567.41.camel@Homer.simpson.net>
2007-04-27 8:33 ` [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation) Andrew Morton
2007-04-27 9:23 ` Mike Galbraith
2007-04-27 10:17 ` Mike Galbraith
2007-04-27 11:59 ` Marat Buharov
2007-04-27 12:30 ` Peter Zijlstra
2007-04-27 13:50 ` Mark Lord
2007-04-27 12:39 ` Manoj Joseph
2007-04-27 15:30 ` Linus Torvalds
2007-04-27 19:31 ` Andreas Dilger
2007-04-27 19:44 ` Mike Galbraith
2007-04-27 19:50 ` Linus Torvalds
2007-04-27 20:05 ` Hua Zhong
2007-04-27 20:12 ` Bill Huey
2007-04-28 5:37 ` Mikulas Patocka
2007-04-28 5:45 ` Mikulas Patocka
2007-04-28 21:57 ` Bill Huey
2007-04-28 22:38 ` Mikulas Patocka [this message]
2007-04-27 20:29 ` Gabriel C
2007-04-27 20:54 ` Manoj Joseph
2007-04-28 8:45 ` Matthias Andree
2007-04-27 22:18 ` Andrew Morton
2007-05-03 17:38 ` Alex Tomas
2007-05-03 23:54 ` Andrew Morton
2007-05-04 6:18 ` Alex Tomas
2007-05-04 6:38 ` Andrew Morton
2007-05-04 6:57 ` Alex Tomas
2007-05-04 7:18 ` Andrew Morton
2007-05-04 7:39 ` Alex Tomas
2007-05-04 8:02 ` Andrew Morton
2007-08-16 18:20 ` Alex Tomas
2007-08-16 18:46 ` Andrew Morton
2007-08-17 2:24 ` Alex Tomas
2007-08-17 6:52 ` Andrew Morton
2007-08-17 8:36 ` Alex Tomas
2007-08-17 9:02 ` Andrew Morton
2007-08-17 18:42 ` Alex Tomas
2007-04-28 8:44 ` Matthias Andree
2007-04-28 20:46 ` Mikulas Patocka
2007-04-28 21:12 ` Lee Revell
2007-04-29 20:49 ` Mark Lord
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0704290006330.12830@artax.karlin.mff.cuni.cz \
--to=mikulas@artax.karlin.mff.cuni.cz \
--cc=adilger@clusterfs.com \
--cc=akpm@linux-foundation.org \
--cc=alex@clusterfs.com \
--cc=billh@gnuppy.monkey.org \
--cc=efault@gmx.de \
--cc=jens.axboe@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marat.buharov@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).