From: "Alexey Zaytsev" <alexey.zaytsev@gmail.com>
To: "Theodore Tso" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"Rik van Riel" <riel@surriel.com>
Subject: Re: Mentor for a GSoC application wanted (Online ext2/3 filesystem checker)
Date: Mon, 21 Apr 2008 04:23:42 +0400 [thread overview]
Message-ID: <f19298770804201723v12b78da6w187984debf8ef97c@mail.gmail.com> (raw)
In-Reply-To: <20080419185603.GA30449@mit.edu>
On Sat, Apr 19, 2008 at 10:56 PM, Theodore Tso <tytso@mit.edu> wrote:
> On Sat, Apr 19, 2008 at 01:44:51PM +0400, Alexey Zaytsev wrote:
> > If it is a block containing a metadata object fsck has already read,
> > than we already know what kind of object it is (there must be a way
> > to quickly find all cached objects derived from a given block), and
> > can update the cached version. And if fsck has not yet read the
> > block, it can just be ignored, no matter what kind of data it
> > contains. If it contains metadata and fsck is intrested in it, it
> > will read it sooner or later anyway. If it contains file data, why
> > should fsck even care?
>
> The problem is that e2fsck makes calculations on the filesystem data
> read out from the disk and stores that in a highly compressed format.
> So it doesn't remember that block #12345 was an indirect block for
> inode #123, and that it contained data block numbers 17, 42, and 45.
> Instead it just marks blocks #12345, #17, #42, and #45 as in use, and
> then moves on.
>
> If you are going to store all of the cached objects then you will need
> to effectively store *all* of the filesystem metatdata in memory at
> the same time. For a large filesystem, you won't have enough *room*
> in memory store all of the cached objects. That's one of the reasons
> why e2fsck has a lot of very clever design so that summary information
> can be stored in a very compressed form in memory so that things can
> be fast (by avoid re-reading objects from disk) as well as not
> requiring vast amounts of memory.
>
Yes, I agree on this problem. Do you have any estimates on how
much RAM the current e2fsck uses in some test cases? I hope
my approach will not add much to this. The only big thing I see
is the data needed to associate each inode/dir entry with the parent
block. Probably one radix tree to enumerate the blocks and a
pointer added to the ext2_inode and ext2_dir_entry structures
to form a linked list of objects belonging to the same block.
Still no idea how much RAM the whole thing would consume.
> Even if you *do* store all of the cached objects, it still takes time
> to examine all of the objects and in the mean time, more changes will
> have come rolling in, and you will either need to add a huge amount of
> dependency to figure out what internal data structures need to be
> updated based on the changes in some of the cached objects --- or you
> will end up restarting the e2fsck checking process from scratch.
>
Not really. In my application I propose some changes to the fsck pass
order to avoid the need to rerun it. And I don't get what dependency you
are talking about. The only one I see is between the directory entries and
the directory inode. Should not be hard to solve.
(Or do I miss something? Could you give more examples maybe?)
> In either case, there is still the issue of knowing exactly whether a
> particular read happened before or after some change in the
> filesystem. This race condition is a really hard one to deal with,
> especially on a multiple CPU system and the filesystem checker is
> running in userspace.
I don't see why should fsck care about this. The notification is always sent
after the write happened, so fsck should just re-read the data. No problem
if it already read the (half-)updated version just before the notification.
Btw, how about an even simplyer method: just watch the journal commits
(changes to jbd needed). This way we can get all actual metadata updates,
without being flooded by the file data updates.
>
> > But you are probably right, this project may be not doable in just three
> > months. The changes on the kernel side probably are, but there is a
> > huge e2fsck work.
>
> Yes, that is the concern. And without implementing the user-space
> side, you'll never besure whether you completely got the kernel side
> changes right!
>
> Regards,
>
> - Ted
>
next prev parent reply other threads:[~2008-04-21 0:23 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <f19298770804180720w2e72b821j95b709c1dd1b1c25@mail.gmail.com>
[not found] ` <20080419012952.GE25797@mit.edu>
2008-04-19 9:44 ` Mentor for a GSoC application wanted (Online ext2/3 filesystem checker) Alexey Zaytsev
2008-04-19 18:56 ` Theodore Tso
2008-04-19 19:07 ` Eric Sandeen
2008-04-19 22:04 ` Theodore Tso
2008-04-20 1:24 ` Eric Sandeen
2008-04-20 23:30 ` Andi Kleen
2008-04-20 23:42 ` Jamie Lokier
2008-04-21 8:01 ` Andi Kleen
[not found] ` <20080421080111.GD14446@one.firstfloor.org>
2008-04-21 11:51 ` Jamie Lokier
2008-04-21 17:29 ` Ricardo M. Correia
2008-04-21 17:40 ` Andi Kleen
2008-04-21 18:27 ` Ricardo M. Correia
2008-04-22 14:48 ` Jamie Lokier
2008-04-21 18:15 ` Ric Wheeler
2008-04-21 18:25 ` Eric Sandeen
2008-04-21 18:44 ` Ric Wheeler
2008-04-21 18:58 ` Matthew Wilcox
2008-04-21 19:11 ` Ric Wheeler
2008-04-21 0:27 ` Alexey Zaytsev
2008-04-21 9:45 ` Andi Kleen
2008-04-22 16:54 ` Peter Teoh
2008-04-22 17:02 ` Eric Sandeen
2008-04-22 23:37 ` Andreas Dilger
2008-04-23 0:52 ` Eric Sandeen
[not found] ` <480E4950.1090300@oracle.com>
[not found] ` <804dabb00804221633g1f61029dh7b27737134fc0b7a@mail.gmail.com>
[not found] ` <480E7954.9090408@oracle.com>
2008-04-23 1:02 ` Peter Teoh
2008-04-20 23:37 ` Andi Kleen
2008-04-21 2:33 ` Theodore Tso
2008-04-21 14:43 ` Andi Kleen
2008-04-21 0:23 ` Alexey Zaytsev [this message]
2008-04-21 12:53 ` Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f19298770804201723v12b78da6w187984debf8ef97c@mail.gmail.com \
--to=alexey.zaytsev@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=riel@surriel.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).