From: David Woodhouse <dwmw2@infradead.org>
To: dedekind@infradead.org
Cc: linux-mtd@lists.infradead.org
Subject: Re: Duplication of dirent names in JFFS2 summary
Date: Fri, 19 May 2006 15:50:03 +0100 [thread overview]
Message-ID: <1148050203.3875.167.camel@pmac.infradead.org> (raw)
In-Reply-To: <1148048246.3199.34.camel@sauron.oktetlabs.ru>
On Fri, 2006-05-19 at 18:17 +0400, Artem B. Bityutskiy wrote:
> Let's talk not in 'summary' context. Suppose there is no summary. I'm
> talking about how to avoid reading names (and CRC-checking) in this
> case.
OK.
> In the current implementation, we have to read names. At the source
> code-level language, because we need nhash at jffs2_add_fd_to_list().
We could use the name_crc for that instead. We don't actually need the
name itself. It just has to be a hash of the name -- the Linux nhash
just happened to be a good, cheap one.
> But I'll repeat my statement. The only fundamental reason why we need
> names at the scanning time is to provide correct nlink of inodes. This
> is because the way how JFFS2 unlinks: it writes direntry with the same
> name but with target ino=0.
We still only need the name _if_ the crc32 and length match a previous
dirent in the same directory. For ~99% of dirents, we _don't_ need the
name at scan time at all.
We only actually need to look at the name in the case where there are
two or more dirents which have the same pino, crc32 (or nhash) and
length.
Let us assume that genuine hash collisions are rare enough to be
ignored, statistically. So on NOR flash without summary, where we
actually mark the old nodes obsolete, you'll almost _never_ actually
have to use the real name in jffs2_add_fd_to_list(). You might as well
not read it during the scan.
On NAND flash, or with summary, we don't mark old nodes obsolete. So
it'll happen a bit more often -- specifically, in two cases:
First, there's the case where we've garbage-collected a dirent node but
haven't _yet_ deleted the original. But deleting the original is the
whole _point_ in GC, so that's not a common case.
Second, there's the case where we've unlinked a dirent node but the
original hasn't yet been actually erased. That may be a _little_ more
common, but I don't think it'll be _so_ common.
So I still think that the benefit we get from dropping the full name
from all the rest of the dirents in the summary (or just not reading it
at scan time, in the !SUMMARY version) is going to be _more_ than the
slowdown caused by occasionally having to go back to the flash and read
the name when there's a collision.
If the time taken by reading names later really _is_ significant, we
could perhaps reduce that further by including "obsoleted version
number" in the summary for dirent nodes.
> So, if we'd not read names at the scanning time at all, we'd have no
> possibility to calculate correct nlinks.
Not so. We could calculate correct nlinks in the majority of cases. We
really don't need the full name very often.
> They'd be greater or equivalent to the correct values. And some dead
> inodes would no go at scan time. And my idea: so what? Nothing really
> bad in this. We'll adjust them later.
But nlink is visible to userspace. We'd be giving incorrect values to
userspace until we've finished the scan.
--
dwmw2
next prev parent reply other threads:[~2006-05-19 14:50 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-19 0:44 Duplication of dirent names in JFFS2 summary David Woodhouse
2006-05-19 1:01 ` David Woodhouse
2006-05-19 11:53 ` David Woodhouse
2006-05-19 11:58 ` Artem B. Bityutskiy
2006-05-19 12:11 ` David Woodhouse
2006-05-19 12:14 ` Artem B. Bityutskiy
2006-05-19 12:31 ` David Woodhouse
2006-05-19 14:34 ` Artem B. Bityutskiy
2006-05-19 14:59 ` David Woodhouse
2006-05-19 16:41 ` David Woodhouse
2006-05-19 16:43 ` David Woodhouse
2006-05-19 6:05 ` Jörn Engel
2006-05-19 10:08 ` David Woodhouse
2006-05-19 11:32 ` Artem B. Bityutskiy
2006-05-19 11:57 ` Artem B. Bityutskiy
2006-05-19 12:05 ` Artem B. Bityutskiy
2006-05-19 12:23 ` David Woodhouse
2006-05-19 14:17 ` Artem B. Bityutskiy
2006-05-19 14:50 ` David Woodhouse [this message]
2006-05-19 15:07 ` Artem B. Bityutskiy
2006-05-19 15:26 ` Artem B. Bityutskiy
2006-05-19 15:33 ` David Woodhouse
2006-05-19 15:38 ` Artem B. Bityutskiy
2006-05-19 15:43 ` David Woodhouse
2006-05-19 15:46 ` Artem B. Bityutskiy
2006-05-20 8:38 ` Artem B. Bityutskiy
2006-05-20 9:08 ` Artem B. Bityutskiy
2006-05-19 15:37 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1148050203.3875.167.camel@pmac.infradead.org \
--to=dwmw2@infradead.org \
--cc=dedekind@infradead.org \
--cc=linux-mtd@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox