* getdents spinning on 0x7fffffff
@ 2012-12-17 23:09 Zach Brown
2012-12-17 23:28 ` Chris Mason
0 siblings, 1 reply; 4+ messages in thread
From: Zach Brown @ 2012-12-17 23:09 UTC (permalink / raw)
To: linux-btrfs
I was flipping through the code recently and noticed that we still have
the double whammy of allocating dir entry positions with
parent_dir->counter++ and that weird setting of f_pos to 2^31-1.
So after enough creates (and deletes :)) in a directory we end up with
an entry item whose key is past that value. f_pos gets rewound instead
of being set to that magical EOF. readdir() gets stuck returning the
entries after INT_MAX over and over (just one in this strace):
getdents(3, {{d_ino=257, d_off=2147483647, d_reclen=32, d_name="file-54"}}, 32768) = 32
getdents(3, {{d_ino=257, d_off=2147483647, d_reclen=32, d_name="file-54"}}, 32768) = 32
It took around 10 hours on a workstationy box over here to reproduce
this with createmany.c from the lustre tests ("./createmany -m f- -u f-
0x8000000" mknod()s and unlink()s 2^31 files), but that's tedious. It's
easier to force initialization of index_cnt in the kernel to test
things.
1) The fundamental fix is to re-use deleted entry positions. Do we add
another cache to index unlinked positions? Do we add an unreliable
best-effort walk of the tree looking for holes in the key space? At the
very least test index_cnt in unlink to get the basically useless
index_cnt--? :)
2) Regardless of that, we have to deal with existing entry items with
giant keys. If for no other reason than big jerks making corrupt images
and leaving them on usb keys in Josef's driveway. Should we drop the
silly INT_MAX setting for 64bit callers and return -EOVERFLOW for 32bit
callers? (That'd be gross, but not unheard of. ext4 has grown htree
behaviour that depends on compat detection: see its is_32bit_api()
callers.)
I can make up some fixes but I'd love to hear strong opinions first, if
anyone's got 'em :).
- z
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: getdents spinning on 0x7fffffff
2012-12-17 23:09 getdents spinning on 0x7fffffff Zach Brown
@ 2012-12-17 23:28 ` Chris Mason
2012-12-17 23:50 ` Zach Brown
0 siblings, 1 reply; 4+ messages in thread
From: Chris Mason @ 2012-12-17 23:28 UTC (permalink / raw)
To: Zach Brown; +Cc: linux-btrfs@vger.kernel.org
On Mon, Dec 17, 2012 at 04:09:07PM -0700, Zach Brown wrote:
> I was flipping through the code recently and noticed that we still have
> the double whammy of allocating dir entry positions with
> parent_dir->counter++ and that weird setting of f_pos to 2^31-1.
>
> So after enough creates (and deletes :)) in a directory we end up with
> an entry item whose key is past that value. f_pos gets rewound instead
> of being set to that magical EOF. readdir() gets stuck returning the
> entries after INT_MAX over and over (just one in this strace):
>
> getdents(3, {{d_ino=257, d_off=2147483647, d_reclen=32, d_name="file-54"}}, 32768) = 32
> getdents(3, {{d_ino=257, d_off=2147483647, d_reclen=32, d_name="file-54"}}, 32768) = 32
>
> It took around 10 hours on a workstationy box over here to reproduce
> this with createmany.c from the lustre tests ("./createmany -m f- -u f-
> 0x8000000" mknod()s and unlink()s 2^31 files), but that's tedious. It's
> easier to force initialization of index_cnt in the kernel to test
> things.
>
> 1) The fundamental fix is to re-use deleted entry positions. Do we add
> another cache to index unlinked positions? Do we add an unreliable
> best-effort walk of the tree looking for holes in the key space? At the
> very least test index_cnt in unlink to get the basically useless
> index_cnt--? :)
The index is dense enough that we can search for free spots without too
much pain. But, more below.
>
> 2) Regardless of that, we have to deal with existing entry items with
> giant keys. If for no other reason than big jerks making corrupt images
> and leaving them on usb keys in Josef's driveway. Should we drop the
> silly INT_MAX setting for 64bit callers and return -EOVERFLOW for 32bit
> callers? (That'd be gross, but not unheard of. ext4 has grown htree
> behaviour that depends on compat detection: see its is_32bit_api()
> callers.)
>
> I can make up some fixes but I'd love to hear strong opinions first, if
> anyone's got 'em :).
If we go past the 32 bit number we can use the hash offsets in readdir,
and just flag the directory as hashme-in-readdir
-chris
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: getdents spinning on 0x7fffffff
2012-12-17 23:28 ` Chris Mason
@ 2012-12-17 23:50 ` Zach Brown
2012-12-18 0:06 ` Chris Mason
0 siblings, 1 reply; 4+ messages in thread
From: Zach Brown @ 2012-12-17 23:50 UTC (permalink / raw)
To: Chris Mason, linux-btrfs@vger.kernel.org
On Mon, Dec 17, 2012 at 06:28:40PM -0500, Chris Mason wrote:
> On Mon, Dec 17, 2012 at 04:09:07PM -0700, Zach Brown wrote:
> > 1) The fundamental fix is to re-use deleted entry positions. Do we add
> > another cache to index unlinked positions? Do we add an unreliable
> > best-effort walk of the tree looking for holes in the key space? At the
> > very least test index_cnt in unlink to get the basically useless
> > index_cnt--? :)
>
> The index is dense enough that we can search for free spots without too
> much pain. But, more below.
OK. Want me to take a stab at it? Is there a similar use somewhere I
should work from?
> > 2) Regardless of that, we have to deal with existing entry items with
> > giant keys. If for no other reason than big jerks making corrupt images
> > and leaving them on usb keys in Josef's driveway. Should we drop the
> > silly INT_MAX setting for 64bit callers and return -EOVERFLOW for 32bit
> > callers? (That'd be gross, but not unheard of. ext4 has grown htree
> > behaviour that depends on compat detection: see its is_32bit_api()
> > callers.)
> >
> > I can make up some fixes but I'd love to hear strong opinions first, if
> > anyone's got 'em :).
>
> If we go past the 32 bit number we can use the hash offsets in readdir,
> and just flag the directory as hashme-in-readdir
Hmm. That sounds painful given either hash collisions or reconnecting
nfs clients with previous f_pos values in use.
With a dencent entry key allocator it sounds a lot cleaner to me to just
admit that 32bit apps can't see more dirents than their f_pos can
represent. It's already based on the number of entries rather than the
bytes of entries so it'd be pretty hard to exhaust.
Am I .. not right? :)
- z
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: getdents spinning on 0x7fffffff
2012-12-17 23:50 ` Zach Brown
@ 2012-12-18 0:06 ` Chris Mason
0 siblings, 0 replies; 4+ messages in thread
From: Chris Mason @ 2012-12-18 0:06 UTC (permalink / raw)
To: Zach Brown; +Cc: Chris Mason, linux-btrfs@vger.kernel.org
On Mon, Dec 17, 2012 at 04:50:39PM -0700, Zach Brown wrote:
> On Mon, Dec 17, 2012 at 06:28:40PM -0500, Chris Mason wrote:
> > On Mon, Dec 17, 2012 at 04:09:07PM -0700, Zach Brown wrote:
> > > 1) The fundamental fix is to re-use deleted entry positions. Do we add
> > > another cache to index unlinked positions? Do we add an unreliable
> > > best-effort walk of the tree looking for holes in the key space? At the
> > > very least test index_cnt in unlink to get the basically useless
> > > index_cnt--? :)
> >
> > The index is dense enough that we can search for free spots without too
> > much pain. But, more below.
>
> OK. Want me to take a stab at it?
Sure, please do. I'd say that we should allow a higher value on 64 bit
machines though. I'd also cache the lowest free one if possible.
> Is there a similar use somewhere I
> should work from?
>
> > > 2) Regardless of that, we have to deal with existing entry items with
> > > giant keys. If for no other reason than big jerks making corrupt images
> > > and leaving them on usb keys in Josef's driveway. Should we drop the
> > > silly INT_MAX setting for 64bit callers and return -EOVERFLOW for 32bit
> > > callers? (That'd be gross, but not unheard of. ext4 has grown htree
> > > behaviour that depends on compat detection: see its is_32bit_api()
> > > callers.)
> > >
> > > I can make up some fixes but I'd love to hear strong opinions first, if
> > > anyone's got 'em :).
> >
> > If we go past the 32 bit number we can use the hash offsets in readdir,
> > and just flag the directory as hashme-in-readdir
>
> Hmm. That sounds painful given either hash collisions or reconnecting
> nfs clients with previous f_pos values in use.
>
> With a dencent entry key allocator it sounds a lot cleaner to me to just
> admit that 32bit apps can't see more dirents than their f_pos can
> represent. It's already based on the number of entries rather than the
> bytes of entries so it'd be pretty hard to exhaust.
>
> Am I .. not right? :)
Oh no, you're right. But the 32 bit case is the exception and not the
rule, so I'm not against pushing them through some ugly corners to keep
64 bit happy/fast.
-chris
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-12-18 0:06 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-17 23:09 getdents spinning on 0x7fffffff Zach Brown
2012-12-17 23:28 ` Chris Mason
2012-12-17 23:50 ` Zach Brown
2012-12-18 0:06 ` Chris Mason
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).