From: Rogier Wolff <R.E.Wolff@BitWizard.nl>
To: Ric Wheeler <rwheeler@redhat.com>
Cc: Christian Brandt <brandtc@psi5.com>, linux-ext4@vger.kernel.org
Subject: Re: fsck.ext4 taking months
Date: Tue, 29 Mar 2011 08:03:00 +0200 [thread overview]
Message-ID: <20110329060300.GA27142@bitwizard.nl> (raw)
In-Reply-To: <4D909E92.4080209@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 1979 bytes --]
On Mon, Mar 28, 2011 at 10:43:30AM -0400, Ric Wheeler wrote:
> On 03/27/2011 07:28 AM, Christian Brandt wrote:
> >Situation: External 500GB drive holds lots of snapshots using lots of
> >hard links made by rsync --link-dest. The controller went bad and
> >destroyed superblock and directory structures. The drive contains
> >roughly a million files and four complete directory-tree-snapshots with
> >each roughly a million hardlinks.
> >
> >Tried
> >
> >e2fsck 1.41.12 (17-May-2010)
> > Benutze EXT2FS Library version 1.41.12, 17-May-2010
> >
> >e2fsck 1.41.11 (14-Mar-2010)
> > Benutze EXT2FS Library version 1.41.11, 14-Mar-2010
> >
> >Symptoms: fsck.ext4 -y -f takes nearly a month to fix the structures on
> >a P4@2,8Ghz, with very little access to the drive and 100% cpu use.
> >
> >output of fsck looks much like this:
> >
> >File ??? (Inode #123456, modify time Wed Jul 22 16:20:23 2009)
> > block Nr. 6144 double block(s), used with four file(s):
> > <filesystem metadata>
> > ??? (Inode #123457, mod time Wed Jul 22 16:20:23 2009)
> > ??? (Inode #123458, mod time Wed Jul 22 16:20:23 2009)
> > ...
> >multiply claimed block map? Yes
> >
> >Is there an adhoc method of getting my data back faster?
> >
> >Is the slow performance with lots of hard links a known issue?
Yes, it is a known issue.
You get to test my patch. :-)
I strongly suspect that (just like me) sometime in the past you've
seen e2fsck run out of memory and were advised to enable the
on-disk-databases.
Roger.
--
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
[-- Attachment #2: tdb_init_fix.diff --]
[-- Type: text/x-diff, Size: 2610 bytes --]
diff --git a/e2fsck/dirinfo.c b/e2fsck/dirinfo.c
index 901235c..9b29f23 100644
--- a/e2fsck/dirinfo.c
+++ b/e2fsck/dirinfo.c
@@ -62,7 +62,7 @@ static void setup_tdb(e2fsck_t ctx, ext2_ino_t num_dirs)
uuid_unparse(ctx->fs->super->s_uuid, uuid);
sprintf(db->tdb_fn, "%s/%s-dirinfo-XXXXXX", tdb_dir, uuid);
fd = mkstemp(db->tdb_fn);
- db->tdb = tdb_open(db->tdb_fn, 0, TDB_CLEAR_IF_FIRST,
+ db->tdb = tdb_open(db->tdb_fn, 999931, TDB_NOLOCK | TDB_NOSYNC,
O_RDWR | O_CREAT | O_TRUNC, 0600);
close(fd);
}
diff --git a/lib/ext2fs/icount.c b/lib/ext2fs/icount.c
index bec0f5f..bdd5b26 100644
--- a/lib/ext2fs/icount.c
+++ b/lib/ext2fs/icount.c
@@ -173,6 +173,19 @@ static void uuid_unparse(void *uu, char *out)
uuid.node[3], uuid.node[4], uuid.node[5]);
}
+static unsigned int my_tdb_hash(TDB_DATA *key)
+{
+ unsigned int value; /* Used to compute the hash value. */
+ int i; /* Used to cycle through random values. */
+
+ /* initial value 0 is as good as any one. */
+ for (value = 0, i=0; i < key->dsize; i++)
+ value = value * 256 + key->dptr[i] + (value >> 24) * 241;
+
+ return value;
+}
+
+
errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
int flags, ext2_icount_t *ret)
{
@@ -180,6 +193,7 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
errcode_t retval;
char *fn, uuid[40];
int fd;
+ int hash_size;
retval = alloc_icount(fs, flags, &icount);
if (retval)
@@ -192,9 +206,20 @@ errcode_t ext2fs_create_icount_tdb(ext2_filsys fs, char *tdb_dir,
sprintf(fn, "%s/%s-icount-XXXXXX", tdb_dir, uuid);
fd = mkstemp(fn);
+ /*
+ hash_size should be on the same order of the number of entries actually
+ used. The tdb default used to be 131 which gives us a big performance
+ penalty with normal inode numbers. We now trust the superblock. If it's
+ wrong, don't worry, tdb will manage, it will just cost a little bit more
+ CPUtime.
+ If the hash function is good and distributes the values uniformly across
+ the 32bit output space, it doesn't really matter that we didn't chose a
+ prime. The default tdb hash function is pretty worthless. Someone didn't
+ read Knuth. */
+ hash_size = fs->super->s_inodes_count - fs->super->s_free_inodes_count;
icount->tdb_fn = fn;
- icount->tdb = tdb_open(fn, 0, TDB_CLEAR_IF_FIRST,
- O_RDWR | O_CREAT | O_TRUNC, 0600);
+ icount->tdb = tdb_open_ex(fn, hash_size, TDB_NOLOCK | TDB_NOSYNC,
+ O_RDWR | O_CREAT | O_TRUNC, 0600, NULL, my_tdb_hash);
if (icount->tdb) {
close(fd);
*ret = icount;
next prev parent reply other threads:[~2011-03-29 6:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-27 11:28 fsck.ext4 taking months Christian Brandt
2011-03-28 14:43 ` Ric Wheeler
2011-03-29 6:03 ` Rogier Wolff [this message]
2011-03-29 20:26 ` Christian Brandt
2011-03-30 8:45 ` Rogier Wolff
2011-03-29 20:21 ` Christian Brandt
2011-03-28 15:07 ` Eric Sandeen
2011-03-28 15:47 ` Ted Ts'o
2011-03-29 22:02 ` Christian Brandt
2011-03-30 8:34 ` Rogier Wolff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110329060300.GA27142@bitwizard.nl \
--to=r.e.wolff@bitwizard.nl \
--cc=brandtc@psi5.com \
--cc=linux-ext4@vger.kernel.org \
--cc=rwheeler@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).