* [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents()
@ 2009-09-23 16:49 Eric Sandeen
2009-09-23 18:42 ` Eric Sandeen
2009-09-25 19:42 ` [PATCH V2] " Eric Sandeen
0 siblings, 2 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-09-23 16:49 UTC (permalink / raw)
To: xfs mailing list; +Cc: tobias
This is for bug #850,
http://oss.sgi.com/bugzilla/show_bug.cgi?id=850
XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31
The above only showed up on a CONFIG_XFS_DEBUG=y kernel, because
xfs_bmapi() ASSERTs that it has been asked for at least one map,
and it was getting 0.
The root cause is that our guesstimated "bufsize" from xfs_file_readdir
was fairly small, and the
bufsize -= length;
in the loop was going negative - except bufsize is a size_t, so it
was wrapping to a very large number.
Then when we did
ra_want = howmany(bufsize + mp->m_dirblksize,
mp->m_sb.sb_blocksize) - 1;
with that very large number, the (int) ra_want was coming out
negative, and a subsequent compare:
if (1 + ra_want > map_blocks ...
was coming out -true- (negative int compare w/ uint) and we went
back to xfs_bmapi() for more, even though we did not need more,
and asked for 0 maps, and hit the ASSERT.
We have kind of a type mess here, but just keeping bufsize from
going negative is probably sufficient to avoid the problem.
I'm open to more tidy or obvious solutions though. :)
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
---
diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index fa913e4..946562d 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -854,6 +854,7 @@ xfs_dir2_leaf_getdents(
*/
ra_want = howmany(bufsize + mp->m_dirblksize,
mp->m_sb.sb_blocksize) - 1;
+ ASSERT(ra_want >= 0);
/*
* If we don't have as many as we want, and we haven't
@@ -1088,7 +1089,11 @@ xfs_dir2_leaf_getdents(
*/
ptr += length;
curoff += length;
- bufsize -= length;
+ /* bufsize may have just been a guess; don't go negative */
+ if (bufsize >= length)
+ bufsize -= length;
+ else
+ bufsize = 0;
}
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents()
2009-09-23 16:49 [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents() Eric Sandeen
@ 2009-09-23 18:42 ` Eric Sandeen
2009-09-23 20:29 ` Michael Monnerie
2009-09-25 19:42 ` [PATCH V2] " Eric Sandeen
1 sibling, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-09-23 18:42 UTC (permalink / raw)
To: xfs mailing list
Eric Sandeen wrote:
> This is for bug #850,
> http://oss.sgi.com/bugzilla/show_bug.cgi?id=850
> XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31
Grr well this slowed things down a little, on about 200,000 entries in a
~10MB directory on a single sata spindle.
stracing /bin/ls (no color/stats, output to /dev/null) 4x in a row with
cache drops in between shows:
stock:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.241378 414 583 getdents
100.00 0.231012 396 583 getdents
100.00 0.244977 420 583 getdents
100.00 0.258624 444 583 getdents
patched:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.285928 769 372 getdents
100.00 0.273747 736 372 getdents
100.00 0.271060 729 372 getdents
100.00 0.251360 676 372 getdents
so that's slowed down a bit. Weird that more calls, originally, made it
faster overall...?
But one thing I noticed is that we choose readahead based on a guess at
the readdir buffer size, and at least for glibc's readdir it has this:
const size_t default_allocation =
(4 * BUFSIZ < sizeof (struct dirent64) ?
sizeof (struct dirent64) : 4 * BUFSIZ);
where BUFSIZ is a magical 8192.
But we do at max PAGE_SIZE which gives us almost no readahead ...
So bumping our "bufsize" up to 32k, things speed up nicely. Wonder if
the stock broken bufsize method led to more inadvertent readahead....
32k:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.176826 475 372 getdents
100.00 0.177491 477 372 getdents
100.00 0.176548 475 372 getdents
100.00 0.139812 376 372 getdents
Think it's worth it?
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents()
2009-09-23 18:42 ` Eric Sandeen
@ 2009-09-23 20:29 ` Michael Monnerie
0 siblings, 0 replies; 8+ messages in thread
From: Michael Monnerie @ 2009-09-23 20:29 UTC (permalink / raw)
To: xfs
On Mittwoch 23 September 2009 Eric Sandeen wrote:
> so that's slowed down a bit. Weird that more calls, originally, made
> it faster overall...?
You wrote in the patch description that bufsize went very large when it
got below zero. Could it mean that a that times a big readahead appeared
and that's why the speed improved?
Why have there been more calls before that patch?
> But one thing I noticed is that we choose readahead based on a guess
> at the readdir buffer size, and at least for glibc's readdir it has
> this:
>
> const size_t default_allocation =
> (4 * BUFSIZ < sizeof (struct dirent64) ?
> sizeof (struct dirent64) : 4 * BUFSIZ);
>
> where BUFSIZ is a magical 8192.
>
> But we do at max PAGE_SIZE which gives us almost no readahead ...
>
> So bumping our "bufsize" up to 32k, things speed up nicely. Wonder
> if the stock broken bufsize method led to more inadvertent
> readahead....
Is it possible to increase it more, to see if things still improve?
Maybe that made the difference in the old version?
In general, I'd opt for at least 64KB buffers, that's the smallest I/O
size to keep hard disks busy, and RAIDs usually have 64KB stripe sets or
bigger. But I don't know how scattered dirs in XFS are, or if you can
expect them to be sequential.
mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0660 / 415 65 31 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH V2] fix readahead calculations in xfs_dir2_leaf_getdents()
2009-09-23 16:49 [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents() Eric Sandeen
2009-09-23 18:42 ` Eric Sandeen
@ 2009-09-25 19:42 ` Eric Sandeen
2009-09-26 17:04 ` Christoph Hellwig
1 sibling, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-09-25 19:42 UTC (permalink / raw)
To: xfs mailing list; +Cc: tobias
This is for bug #850,
http://oss.sgi.com/bugzilla/show_bug.cgi?id=850
XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31
The above only showed up on a CONFIG_XFS_DEBUG=y kernel, because
xfs_bmapi() ASSERTs that it has been asked for at least one map,
and it was getting 0.
The root cause is that our guesstimated "bufsize" from xfs_file_readdir
was fairly small, and the
bufsize -= length;
in the loop was going negative - except bufsize is a size_t, so it
was wrapping to a very large number.
Then when we did
ra_want = howmany(bufsize + mp->m_dirblksize,
mp->m_sb.sb_blocksize) - 1;
with that very large number, the (int) ra_want was coming out
negative, and a subsequent compare:
if (1 + ra_want > map_blocks ...
was coming out -true- (negative int compare w/ uint) and we went
back to xfs_bmapi() for more, even though we did not need more,
and asked for 0 maps, and hit the ASSERT.
We have kind of a type mess here, but just keeping bufsize from
going negative is probably sufficient to avoid the problem.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
---
V2: use min() as suggested by Jeff, it's tidier.
diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index fa913e4..4467d61 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -854,6 +854,7 @@ xfs_dir2_leaf_getdents(
*/
ra_want = howmany(bufsize + mp->m_dirblksize,
mp->m_sb.sb_blocksize) - 1;
+ ASSERT(ra_want >= 0);
/*
* If we don't have as many as we want, and we haven't
@@ -1088,7 +1089,8 @@ xfs_dir2_leaf_getdents(
*/
ptr += length;
curoff += length;
- bufsize -= length;
+ /* bufsize may have just been a guess; don't go negative */
+ bufsize = min((bufsize - length), (size_t)0);
}
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH V2] fix readahead calculations in xfs_dir2_leaf_getdents()
2009-09-25 19:42 ` [PATCH V2] " Eric Sandeen
@ 2009-09-26 17:04 ` Christoph Hellwig
2009-09-26 18:03 ` Eric Sandeen
0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2009-09-26 17:04 UTC (permalink / raw)
To: Eric Sandeen; +Cc: tobias, xfs mailing list
On Fri, Sep 25, 2009 at 02:42:26PM -0500, Eric Sandeen wrote:
> V2: use min() as suggested by Jeff, it's tidier.
I disagree with that, with the cast it looks pretty horrible.
At least use min_t to avoid the case, but what's wrong with:
> + /* bufsize may have just been a guess; don't go negative */
> + bufsize = min((bufsize - length), (size_t)0);
bufsize = bufsize - length > 0 ? bufsize - length : 0;
Anyway, takes this as a
Reviewed-by: Christoph Hellwig <hch@lst.de>
for any variant.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] fix readahead calculations in xfs_dir2_leaf_getdents()
2009-09-26 17:04 ` Christoph Hellwig
@ 2009-09-26 18:03 ` Eric Sandeen
2009-10-07 22:22 ` Alex Elder
0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-09-26 18:03 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: tobias, xfs mailing list
Christoph Hellwig wrote:
> On Fri, Sep 25, 2009 at 02:42:26PM -0500, Eric Sandeen wrote:
>> V2: use min() as suggested by Jeff, it's tidier.
>
> I disagree with that, with the cast it looks pretty horrible.
> At least use min_t to avoid the case, but what's wrong with:
>
>> + /* bufsize may have just been a guess; don't go negative */
>> + bufsize = min((bufsize - length), (size_t)0);
>
> bufsize = bufsize - length > 0 ? bufsize - length : 0;
ok, that's fine too.
I'll pick one.
> Anyway, takes this as a
>
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
> for any variant.
>
thanks,
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH V2] fix readahead calculations in xfs_dir2_leaf_getdents()
2009-09-26 18:03 ` Eric Sandeen
@ 2009-10-07 22:22 ` Alex Elder
2009-10-07 22:24 ` Eric Sandeen
0 siblings, 1 reply; 8+ messages in thread
From: Alex Elder @ 2009-10-07 22:22 UTC (permalink / raw)
To: Eric Sandeen, Christoph Hellwig; +Cc: xfs mailing list, tobias
Eric Sandeen wrote:
> Christoph Hellwig wrote:
>> On Fri, Sep 25, 2009 at 02:42:26PM -0500, Eric Sandeen wrote:
>>> V2: use min() as suggested by Jeff, it's tidier.
>>
>> I disagree with that, with the cast it looks pretty horrible.
>> At least use min_t to avoid the case, but what's wrong with:
>>
>>> + /* bufsize may have just been a guess; don't go negative */
>>> + bufsize = min((bufsize - length), (size_t)0);
>>
>> bufsize = bufsize - length > 0 ? bufsize - length : 0;
>
> ok, that's fine too.
>
> I'll pick one.
>
>> Anyway, takes this as a
>>
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>
>> for any variant.
>>
>
> thanks,
> -Eric
I'm going to put in this version:
bufsize = bufsize > length ? bufsize - length : 0;
I.e.:
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -854,6 +854,7 @@ xfs_dir2_leaf_getdents(
*/
ra_want = howmany(bufsize + mp->m_dirblksize,
mp->m_sb.sb_blocksize) - 1;
+ ASSERT(ra_want >= 0);
/*
* If we don't have as many as we want, and we haven't
@@ -1088,7 +1089,8 @@ xfs_dir2_leaf_getdents(
*/
ptr += length;
curoff += length;
- bufsize -= length;
+ /* bufsize may have just been a guess; don't go negative */
+ bufsize = bufsize > length ? bufsize - length : 0;
}
/*
-Alex
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH V2] fix readahead calculations in xfs_dir2_leaf_getdents()
2009-10-07 22:22 ` Alex Elder
@ 2009-10-07 22:24 ` Eric Sandeen
0 siblings, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-10-07 22:24 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs mailing list, tobias
Alex Elder wrote:
> Eric Sandeen wrote:
>> Christoph Hellwig wrote:
>>> On Fri, Sep 25, 2009 at 02:42:26PM -0500, Eric Sandeen wrote:
>>>> V2: use min() as suggested by Jeff, it's tidier.
>>> I disagree with that, with the cast it looks pretty horrible.
>>> At least use min_t to avoid the case, but what's wrong with:
>>>
>>>> + /* bufsize may have just been a guess; don't go negative */
>>>> + bufsize = min((bufsize - length), (size_t)0);
>>> bufsize = bufsize - length > 0 ? bufsize - length : 0;
>> ok, that's fine too.
>>
>> I'll pick one.
>>
>>> Anyway, takes this as a
>>>
>>>
>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>
>>> for any variant.
>>>
>> thanks,
>> -Eric
>
> I'm going to put in this version:
>
> bufsize = bufsize > length ? bufsize - length : 0;
Fine by me, thanks!
-Eric
> I.e.:
>
> --- a/fs/xfs/xfs_dir2_leaf.c
> +++ b/fs/xfs/xfs_dir2_leaf.c
> @@ -854,6 +854,7 @@ xfs_dir2_leaf_getdents(
> */
> ra_want = howmany(bufsize + mp->m_dirblksize,
> mp->m_sb.sb_blocksize) - 1;
> + ASSERT(ra_want >= 0);
>
> /*
> * If we don't have as many as we want, and we haven't
> @@ -1088,7 +1089,8 @@ xfs_dir2_leaf_getdents(
> */
> ptr += length;
> curoff += length;
> - bufsize -= length;
> + /* bufsize may have just been a guess; don't go negative */
> + bufsize = bufsize > length ? bufsize - length : 0;
> }
>
> /*
>
> -Alex
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-10-07 22:23 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-23 16:49 [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents() Eric Sandeen
2009-09-23 18:42 ` Eric Sandeen
2009-09-23 20:29 ` Michael Monnerie
2009-09-25 19:42 ` [PATCH V2] " Eric Sandeen
2009-09-26 17:04 ` Christoph Hellwig
2009-09-26 18:03 ` Eric Sandeen
2009-10-07 22:22 ` Alex Elder
2009-10-07 22:24 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox