From: Chris Mason <chris.mason@fusionio.com>
To: Zach Brown <zab@zabbo.net>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: getdents spinning on 0x7fffffff
Date: Mon, 17 Dec 2012 18:28:40 -0500 [thread overview]
Message-ID: <20121217232840.GB20954@shiny> (raw)
In-Reply-To: <20121217230907.GI9195@lenny.home.zabbo.net>
On Mon, Dec 17, 2012 at 04:09:07PM -0700, Zach Brown wrote:
> I was flipping through the code recently and noticed that we still have
> the double whammy of allocating dir entry positions with
> parent_dir->counter++ and that weird setting of f_pos to 2^31-1.
>
> So after enough creates (and deletes :)) in a directory we end up with
> an entry item whose key is past that value. f_pos gets rewound instead
> of being set to that magical EOF. readdir() gets stuck returning the
> entries after INT_MAX over and over (just one in this strace):
>
> getdents(3, {{d_ino=257, d_off=2147483647, d_reclen=32, d_name="file-54"}}, 32768) = 32
> getdents(3, {{d_ino=257, d_off=2147483647, d_reclen=32, d_name="file-54"}}, 32768) = 32
>
> It took around 10 hours on a workstationy box over here to reproduce
> this with createmany.c from the lustre tests ("./createmany -m f- -u f-
> 0x8000000" mknod()s and unlink()s 2^31 files), but that's tedious. It's
> easier to force initialization of index_cnt in the kernel to test
> things.
>
> 1) The fundamental fix is to re-use deleted entry positions. Do we add
> another cache to index unlinked positions? Do we add an unreliable
> best-effort walk of the tree looking for holes in the key space? At the
> very least test index_cnt in unlink to get the basically useless
> index_cnt--? :)
The index is dense enough that we can search for free spots without too
much pain. But, more below.
>
> 2) Regardless of that, we have to deal with existing entry items with
> giant keys. If for no other reason than big jerks making corrupt images
> and leaving them on usb keys in Josef's driveway. Should we drop the
> silly INT_MAX setting for 64bit callers and return -EOVERFLOW for 32bit
> callers? (That'd be gross, but not unheard of. ext4 has grown htree
> behaviour that depends on compat detection: see its is_32bit_api()
> callers.)
>
> I can make up some fixes but I'd love to hear strong opinions first, if
> anyone's got 'em :).
If we go past the 32 bit number we can use the hash offsets in readdir,
and just flag the directory as hashme-in-readdir
-chris
next prev parent reply other threads:[~2012-12-17 23:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-17 23:09 getdents spinning on 0x7fffffff Zach Brown
2012-12-17 23:28 ` Chris Mason [this message]
2012-12-17 23:50 ` Zach Brown
2012-12-18 0:06 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121217232840.GB20954@shiny \
--to=chris.mason@fusionio.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=zab@zabbo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.