linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Alexey Zaytsev" <alexey.zaytsev@gmail.com>
To: "Theodore Tso" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	"Rik van Riel" <riel@surriel.com>
Subject: Re: Mentor for a GSoC application wanted (Online ext2/3 filesystem checker)
Date: Mon, 21 Apr 2008 04:23:42 +0400	[thread overview]
Message-ID: <f19298770804201723v12b78da6w187984debf8ef97c@mail.gmail.com> (raw)
In-Reply-To: <20080419185603.GA30449@mit.edu>

On Sat, Apr 19, 2008 at 10:56 PM, Theodore Tso <tytso@mit.edu> wrote:
> On Sat, Apr 19, 2008 at 01:44:51PM +0400, Alexey Zaytsev wrote:
>  > If it is a block containing a metadata object fsck has already read,
>  > than we already know what kind of object it is (there must be a way
>  > to quickly find all cached objects derived from a given block), and
>  > can update the cached version. And if fsck has not yet read the
>  > block, it can just be ignored, no matter what kind of data it
>  > contains. If it contains metadata and fsck is intrested in it, it
>  > will read it sooner or later anyway. If it contains file data, why
>  > should fsck even care?
>
>  The problem is that e2fsck makes calculations on the filesystem data
>  read out from the disk and stores that in a highly compressed format.
>  So it doesn't remember that block #12345 was an indirect block for
>  inode #123, and that it contained data block numbers 17, 42, and 45.
>  Instead it just marks blocks #12345, #17, #42, and #45 as in use, and
>  then moves on.
>
>  If you are going to store all of the cached objects then you will need
>  to effectively store *all* of the filesystem metatdata in memory at
>  the same time.  For a large filesystem, you won't have enough *room*
>  in memory store all of the cached objects.  That's one of the reasons
>  why e2fsck has a lot of very clever design so that summary information
>  can be stored in a very compressed form in memory so that things can
>  be fast (by avoid re-reading objects from disk) as well as not
>  requiring vast amounts of memory.
>

Yes, I agree on this problem. Do you have any estimates on how
much RAM the current e2fsck uses in some test cases? I hope
my approach will not add much to this. The only big thing I see
is the data needed to associate each inode/dir entry with the parent
block. Probably one radix tree to enumerate the blocks and a
pointer added to the ext2_inode and ext2_dir_entry structures
to form a linked list of objects belonging to the same block.
Still no idea how much RAM the whole thing would consume.

>  Even if you *do* store all of the cached objects, it still takes time
>  to examine all of the objects and in the mean time, more changes will
>  have come rolling in, and you will either need to add a huge amount of
>  dependency to figure out what internal data structures need to be
>  updated based on the changes in some of the cached objects --- or you
>  will end up restarting the e2fsck checking process from scratch.
>

Not really. In my application I propose some changes to the fsck pass
order to avoid the need to rerun it. And I don't get what dependency you
are talking about. The only one I see is between the directory entries and
the directory inode. Should not be hard to solve.
(Or do I miss something? Could you give more examples maybe?)

>  In either case, there is still the issue of knowing exactly whether a
>  particular read happened before or after some change in the
>  filesystem.  This race condition is a really hard one to deal with,
>  especially on a multiple CPU system and the filesystem checker is
>  running in userspace.

I don't see why should fsck care about this. The notification is always sent
after the write happened, so fsck should just re-read the data. No problem
if it already read the (half-)updated version just before the notification.

Btw, how about an even simplyer method: just watch the journal commits
(changes to jbd needed). This way we can get all actual metadata updates,
without being flooded by the file data updates.

>
>  > But you are probably right, this project may be not doable in just three
>  > months. The changes on the kernel side probably are, but there is a
>  > huge e2fsck work.
>
>  Yes, that is the concern.  And without implementing the user-space
>  side, you'll never besure whether you completely got the kernel side
>  changes right!
>
>  Regards,
>
>                                                 - Ted
>

  parent reply	other threads:[~2008-04-21  0:23 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <f19298770804180720w2e72b821j95b709c1dd1b1c25@mail.gmail.com>
     [not found] ` <20080419012952.GE25797@mit.edu>
2008-04-19  9:44   ` Mentor for a GSoC application wanted (Online ext2/3 filesystem checker) Alexey Zaytsev
2008-04-19 18:56     ` Theodore Tso
2008-04-19 19:07       ` Eric Sandeen
2008-04-19 22:04         ` Theodore Tso
2008-04-20  1:24           ` Eric Sandeen
2008-04-20 23:30           ` Andi Kleen
2008-04-20 23:42             ` Jamie Lokier
2008-04-21  8:01               ` Andi Kleen
     [not found]               ` <20080421080111.GD14446@one.firstfloor.org>
2008-04-21 11:51                 ` Jamie Lokier
2008-04-21 17:29                 ` Ricardo M. Correia
2008-04-21 17:40                   ` Andi Kleen
2008-04-21 18:27                     ` Ricardo M. Correia
2008-04-22 14:48                     ` Jamie Lokier
2008-04-21 18:15                 ` Ric Wheeler
2008-04-21 18:25                   ` Eric Sandeen
2008-04-21 18:44                     ` Ric Wheeler
2008-04-21 18:58                       ` Matthew Wilcox
2008-04-21 19:11                         ` Ric Wheeler
2008-04-21  0:27         ` Alexey Zaytsev
2008-04-21  9:45           ` Andi Kleen
2008-04-22 16:54         ` Peter Teoh
2008-04-22 17:02           ` Eric Sandeen
2008-04-22 23:37             ` Andreas Dilger
2008-04-23  0:52               ` Eric Sandeen
     [not found]           ` <480E4950.1090300@oracle.com>
     [not found]             ` <804dabb00804221633g1f61029dh7b27737134fc0b7a@mail.gmail.com>
     [not found]               ` <480E7954.9090408@oracle.com>
2008-04-23  1:02                 ` Peter Teoh
2008-04-20 23:37       ` Andi Kleen
2008-04-21  2:33         ` Theodore Tso
2008-04-21 14:43           ` Andi Kleen
2008-04-21  0:23       ` Alexey Zaytsev [this message]
2008-04-21 12:53         ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f19298770804201723v12b78da6w187984debf8ef97c@mail.gmail.com \
    --to=alexey.zaytsev@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=riel@surriel.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).