From: Chris Mason <chris.mason@fusionio.com>
To: Zach Brown <zab@zabbo.net>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: getdents spinning on 0x7fffffff
Date: Mon, 17 Dec 2012 18:28:40 -0500 [thread overview]
Message-ID: <20121217232840.GB20954@shiny> (raw)
In-Reply-To: <20121217230907.GI9195@lenny.home.zabbo.net>
On Mon, Dec 17, 2012 at 04:09:07PM -0700, Zach Brown wrote:
> I was flipping through the code recently and noticed that we still have
> the double whammy of allocating dir entry positions with
> parent_dir->counter++ and that weird setting of f_pos to 2^31-1.
>
> So after enough creates (and deletes :)) in a directory we end up with
> an entry item whose key is past that value. f_pos gets rewound instead
> of being set to that magical EOF. readdir() gets stuck returning the
> entries after INT_MAX over and over (just one in this strace):
>
> getdents(3, {{d_ino=257, d_off=2147483647, d_reclen=32, d_name="file-54"}}, 32768) = 32
> getdents(3, {{d_ino=257, d_off=2147483647, d_reclen=32, d_name="file-54"}}, 32768) = 32
>
> It took around 10 hours on a workstationy box over here to reproduce
> this with createmany.c from the lustre tests ("./createmany -m f- -u f-
> 0x8000000" mknod()s and unlink()s 2^31 files), but that's tedious. It's
> easier to force initialization of index_cnt in the kernel to test
> things.
>
> 1) The fundamental fix is to re-use deleted entry positions. Do we add
> another cache to index unlinked positions? Do we add an unreliable
> best-effort walk of the tree looking for holes in the key space? At the
> very least test index_cnt in unlink to get the basically useless
> index_cnt--? :)
The index is dense enough that we can search for free spots without too
much pain. But, more below.
>
> 2) Regardless of that, we have to deal with existing entry items with
> giant keys. If for no other reason than big jerks making corrupt images
> and leaving them on usb keys in Josef's driveway. Should we drop the
> silly INT_MAX setting for 64bit callers and return -EOVERFLOW for 32bit
> callers? (That'd be gross, but not unheard of. ext4 has grown htree
> behaviour that depends on compat detection: see its is_32bit_api()
> callers.)
>
> I can make up some fixes but I'd love to hear strong opinions first, if
> anyone's got 'em :).
If we go past the 32 bit number we can use the hash offsets in readdir,
and just flag the directory as hashme-in-readdir
-chris
next prev parent reply other threads:[~2012-12-17 23:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-17 23:09 getdents spinning on 0x7fffffff Zach Brown
2012-12-17 23:28 ` Chris Mason [this message]
2012-12-17 23:50 ` Zach Brown
2012-12-18 0:06 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121217232840.GB20954@shiny \
--to=chris.mason@fusionio.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=zab@zabbo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).