From: Theodore Tso <tytso@MIT.EDU>
To: Alexey Zaytsev <alexey.zaytsev@gmail.com>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Rik van Riel <riel@surriel.com>
Subject: Re: Mentor for a GSoC application wanted (Online ext2/3 filesystem checker)
Date: Mon, 21 Apr 2008 08:53:58 -0400 [thread overview]
Message-ID: <20080421125358.GD9700@mit.edu> (raw)
In-Reply-To: <f19298770804201723v12b78da6w187984debf8ef97c@mail.gmail.com>
On Mon, Apr 21, 2008 at 04:23:42AM +0400, Alexey Zaytsev wrote:
> Not really. In my application I propose some changes to the fsck pass
> order to avoid the need to rerun it. And I don't get what dependency you
> are talking about. The only one I see is between the directory entries and
> the directory inode. Should not be hard to solve.
> (Or do I miss something? Could you give more examples maybe?)
And *this* is why I ultimately decided I didn't have the time to
mentor you. There are large numbers of other dependencies.
For example, between the direct and indirect blocks in the inode, and
the block allocation bitmaps. (Note that e2fsck keeps up to 3
different block bitmaps and 6 different inofr bitmaps.)
You need to know which inodes are directories and which inodes are
regular files. E2fsck currently keeps these bitmaps so we don't have
the cache the entire 128 byte inode for all inodes. (Instead, we
cache a single bit for every single inode. There's a ***reason*** for
all of these bitmaps.)
You also need to know which blocks are being used to store extended
attributes, which may potentially be shared across multiple inodes.
That's just *three* additional dependencis, and there are many more.
If you can't think of them, how much time would it take for me as
mentor to explain all of this to you?
> > In either case, there is still the issue of knowing exactly whether a
> > particular read happened before or after some change in the
> > filesystem. This race condition is a really hard one to deal with,
> > especially on a multiple CPU system and the filesystem checker is
> > running in userspace.
>
> I don't see why should fsck care about this. The notification is always sent
> after the write happened, so fsck should just re-read the data. No problem
> if it already read the (half-)updated version just before the notification.
Keep in mind that when a file gets deleted, a *large* number of
metadata blocks will potentially get updated. So while e2fsck is
handling these reads, a bunch more can start coming in from other
filesystem transactions, and since the kernel doesn't know what
userspace has already cached, it will have to send them again... and
again...
In fact if the filesystem is being very quickly updated, the
notifications could easily overrun whatever buffers has been set up to
transfer this information from userspace to the kernel side. Worse
yet, unless you also send down transaction boundaries, the userspace
won't know when the filesystem has reached a "stable state" which
would be internally consistent.
There are ways that this could be solved, but at the end of the day,
the $1,000,000 question is why not just do a kernel-side snapshot?
Then you don't have to completely rewrite e2fsck --- and given that
you've claimed the e2fsck code is "hard to understand", it seems
especially audacious that you would have thought you could do this in
3 months. If you really don't want to use LVM, you could have
proposed a snapshot solution which didn't involve devicemapper. It's
not clear it would have entered mainline, but at least there would
have been some non-zero chance that you would complete the project
successfully.
Regards,
- Ted
prev parent reply other threads:[~2008-04-21 12:53 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <f19298770804180720w2e72b821j95b709c1dd1b1c25@mail.gmail.com>
[not found] ` <20080419012952.GE25797@mit.edu>
2008-04-19 9:44 ` Mentor for a GSoC application wanted (Online ext2/3 filesystem checker) Alexey Zaytsev
2008-04-19 18:56 ` Theodore Tso
2008-04-19 19:07 ` Eric Sandeen
2008-04-19 22:04 ` Theodore Tso
2008-04-20 1:24 ` Eric Sandeen
2008-04-20 23:30 ` Andi Kleen
2008-04-20 23:42 ` Jamie Lokier
2008-04-21 8:01 ` Andi Kleen
[not found] ` <20080421080111.GD14446@one.firstfloor.org>
2008-04-21 11:51 ` Jamie Lokier
2008-04-21 17:29 ` Ricardo M. Correia
2008-04-21 17:40 ` Andi Kleen
2008-04-21 18:27 ` Ricardo M. Correia
2008-04-22 14:48 ` Jamie Lokier
2008-04-21 18:15 ` Ric Wheeler
2008-04-21 18:25 ` Eric Sandeen
2008-04-21 18:44 ` Ric Wheeler
2008-04-21 18:58 ` Matthew Wilcox
2008-04-21 19:11 ` Ric Wheeler
2008-04-21 0:27 ` Alexey Zaytsev
2008-04-21 9:45 ` Andi Kleen
2008-04-22 16:54 ` Peter Teoh
2008-04-22 17:02 ` Eric Sandeen
2008-04-22 23:37 ` Andreas Dilger
2008-04-23 0:52 ` Eric Sandeen
[not found] ` <480E4950.1090300@oracle.com>
[not found] ` <804dabb00804221633g1f61029dh7b27737134fc0b7a@mail.gmail.com>
[not found] ` <480E7954.9090408@oracle.com>
2008-04-23 1:02 ` Peter Teoh
2008-04-20 23:37 ` Andi Kleen
2008-04-21 2:33 ` Theodore Tso
2008-04-21 14:43 ` Andi Kleen
2008-04-21 0:23 ` Alexey Zaytsev
2008-04-21 12:53 ` Theodore Tso [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080421125358.GD9700@mit.edu \
--to=tytso@mit.edu \
--cc=alexey.zaytsev@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=riel@surriel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).