public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents()
@ 2009-09-23 16:49 Eric Sandeen
  2009-09-23 18:42 ` Eric Sandeen
  2009-09-25 19:42 ` [PATCH V2] " Eric Sandeen
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-09-23 16:49 UTC (permalink / raw)
  To: xfs mailing list; +Cc: tobias

This is for bug #850,
http://oss.sgi.com/bugzilla/show_bug.cgi?id=850
XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31

The above only showed up on a CONFIG_XFS_DEBUG=y kernel, because
xfs_bmapi() ASSERTs that it has been asked for at least one map,
and it was getting 0.

The root cause is that our guesstimated "bufsize" from xfs_file_readdir
was fairly small, and the

		bufsize -= length;

in the loop was going negative - except bufsize is a size_t, so it 
was wrapping to a very large number.

Then when we did
		ra_want = howmany(bufsize + mp->m_dirblksize,
				  mp->m_sb.sb_blocksize) - 1;

with that very large number, the (int) ra_want was coming out
negative, and a subsequent compare:

		if (1 + ra_want > map_blocks ...

was coming out -true- (negative int compare w/ uint) and we went 
back to xfs_bmapi() for more, even though we did not need more,
and asked for 0 maps, and hit the ASSERT.

We have kind of a type mess here, but just keeping bufsize from 
going negative is probably sufficient to avoid the problem.

I'm open to more tidy or obvious solutions though.  :)

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
---

diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index fa913e4..946562d 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -854,6 +854,7 @@ xfs_dir2_leaf_getdents(
 			 */
 			ra_want = howmany(bufsize + mp->m_dirblksize,
 					  mp->m_sb.sb_blocksize) - 1;
+			ASSERT(ra_want >= 0);
 
 			/*
 			 * If we don't have as many as we want, and we haven't
@@ -1088,7 +1089,11 @@ xfs_dir2_leaf_getdents(
 		 */
 		ptr += length;
 		curoff += length;
-		bufsize -= length;
+		/* bufsize may have just been a guess; don't go negative */
+		if (bufsize >= length)
+			bufsize -= length;
+		else
+			bufsize = 0;
 	}
 
 	/*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents()
  2009-09-23 16:49 [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents() Eric Sandeen
@ 2009-09-23 18:42 ` Eric Sandeen
  2009-09-23 20:29   ` Michael Monnerie
  2009-09-25 19:42 ` [PATCH V2] " Eric Sandeen
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-09-23 18:42 UTC (permalink / raw)
  To: xfs mailing list

Eric Sandeen wrote:
> This is for bug #850,
> http://oss.sgi.com/bugzilla/show_bug.cgi?id=850
> XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31


Grr well this slowed things down a little, on about 200,000 entries in a
~10MB directory on a single sata spindle.

stracing /bin/ls (no color/stats, output to /dev/null) 4x in a row with
cache drops in between shows:

stock:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.241378         414       583           getdents
100.00    0.231012         396       583           getdents
100.00    0.244977         420       583           getdents
100.00    0.258624         444       583           getdents

patched:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.285928         769       372           getdents
100.00    0.273747         736       372           getdents
100.00    0.271060         729       372           getdents
100.00    0.251360         676       372           getdents

so that's slowed down a bit.  Weird that more calls, originally, made it
faster overall...?

But one thing I noticed is that we choose readahead based on a guess at
the readdir buffer size, and at least for glibc's readdir it has this:

const size_t default_allocation =
  (4 * BUFSIZ < sizeof (struct dirent64) ?
	sizeof (struct dirent64) : 4 * BUFSIZ);

where BUFSIZ is a magical 8192.

But we do at max PAGE_SIZE which gives us almost no readahead ...

So bumping our "bufsize" up to 32k, things speed up nicely.  Wonder if
the stock broken bufsize method led to more inadvertent readahead....

32k:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.176826         475       372           getdents
100.00    0.177491         477       372           getdents
100.00    0.176548         475       372           getdents
100.00    0.139812         376       372           getdents

Think it's worth it?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents()
  2009-09-23 18:42 ` Eric Sandeen
@ 2009-09-23 20:29   ` Michael Monnerie
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Monnerie @ 2009-09-23 20:29 UTC (permalink / raw)
  To: xfs

On Mittwoch 23 September 2009 Eric Sandeen wrote:
> so that's slowed down a bit.  Weird that more calls, originally, made
> it faster overall...?

You wrote in the patch description that bufsize went very large when it 
got below zero. Could it mean that a that times a big readahead appeared 
and that's why the speed improved?

Why have there been more calls before that patch?

> But one thing I noticed is that we choose readahead based on a guess
> at the readdir buffer size, and at least for glibc's readdir it has
> this:
>
> const size_t default_allocation =
>   (4 * BUFSIZ < sizeof (struct dirent64) ?
>         sizeof (struct dirent64) : 4 * BUFSIZ);
>
> where BUFSIZ is a magical 8192.
>
> But we do at max PAGE_SIZE which gives us almost no readahead ...
>
> So bumping our "bufsize" up to 32k, things speed up nicely.  Wonder
> if the stock broken bufsize method led to more inadvertent
> readahead....

Is it possible to increase it more, to see if things still improve? 
Maybe that made the difference in the old version?

In general, I'd opt for at least 64KB buffers, that's the smallest I/O 
size to keep hard disks busy, and RAIDs usually have 64KB stripe sets or 
bigger. But I don't know how scattered dirs in XFS are, or if you can 
expect them to be sequential.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2] fix readahead calculations in xfs_dir2_leaf_getdents()
  2009-09-23 16:49 [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents() Eric Sandeen
  2009-09-23 18:42 ` Eric Sandeen
@ 2009-09-25 19:42 ` Eric Sandeen
  2009-09-26 17:04   ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-09-25 19:42 UTC (permalink / raw)
  To: xfs mailing list; +Cc: tobias

This is for bug #850,
http://oss.sgi.com/bugzilla/show_bug.cgi?id=850
XFS file system segfaults , repeatedly and 100% reproducable in 2.6.30 , 2.6.31

The above only showed up on a CONFIG_XFS_DEBUG=y kernel, because
xfs_bmapi() ASSERTs that it has been asked for at least one map,

and it was getting 0.

The root cause is that our guesstimated "bufsize" from xfs_file_readdir
was fairly small, and the

		bufsize -= length;

in the loop was going negative - except bufsize is a size_t, so it 
was wrapping to a very large number.

Then when we did
		ra_want = howmany(bufsize + mp->m_dirblksize,
				  mp->m_sb.sb_blocksize) - 1;

with that very large number, the (int) ra_want was coming out
negative, and a subsequent compare:

		if (1 + ra_want > map_blocks ...

was coming out -true- (negative int compare w/ uint) and we went 
back to xfs_bmapi() for more, even though we did not need more,
and asked for 0 maps, and hit the ASSERT.

We have kind of a type mess here, but just keeping bufsize from 
going negative is probably sufficient to avoid the problem.


Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
---

V2: use min() as suggested by Jeff, it's tidier.

diff --git a/fs/xfs/xfs_dir2_leaf.c b/fs/xfs/xfs_dir2_leaf.c
index fa913e4..4467d61 100644
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -854,6 +854,7 @@ xfs_dir2_leaf_getdents(
 			 */
 			ra_want = howmany(bufsize + mp->m_dirblksize,
 					  mp->m_sb.sb_blocksize) - 1;
+			ASSERT(ra_want >= 0);
 
 			/*
 			 * If we don't have as many as we want, and we haven't
@@ -1088,7 +1089,8 @@ xfs_dir2_leaf_getdents(
 		 */
 		ptr += length;
 		curoff += length;
-		bufsize -= length;
+		/* bufsize may have just been a guess; don't go negative */
+		bufsize = min((bufsize - length), (size_t)0);
 	}
 
 	/*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] fix readahead calculations in xfs_dir2_leaf_getdents()
  2009-09-25 19:42 ` [PATCH V2] " Eric Sandeen
@ 2009-09-26 17:04   ` Christoph Hellwig
  2009-09-26 18:03     ` Eric Sandeen
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2009-09-26 17:04 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: tobias, xfs mailing list

On Fri, Sep 25, 2009 at 02:42:26PM -0500, Eric Sandeen wrote:
> V2: use min() as suggested by Jeff, it's tidier.

I disagree with that, with the cast it looks pretty horrible.
At least use min_t to avoid the case, but what's wrong with:

> +		/* bufsize may have just been a guess; don't go negative */
> +		bufsize = min((bufsize - length), (size_t)0);

		bufsize = bufsize - length > 0 ? bufsize - length : 0;

Anyway, takes this as a


Reviewed-by: Christoph Hellwig <hch@lst.de>

for any variant.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] fix readahead calculations in xfs_dir2_leaf_getdents()
  2009-09-26 17:04   ` Christoph Hellwig
@ 2009-09-26 18:03     ` Eric Sandeen
  2009-10-07 22:22       ` Alex Elder
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-09-26 18:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: tobias, xfs mailing list

Christoph Hellwig wrote:
> On Fri, Sep 25, 2009 at 02:42:26PM -0500, Eric Sandeen wrote:
>> V2: use min() as suggested by Jeff, it's tidier.
> 
> I disagree with that, with the cast it looks pretty horrible.
> At least use min_t to avoid the case, but what's wrong with:
> 
>> +		/* bufsize may have just been a guess; don't go negative */
>> +		bufsize = min((bufsize - length), (size_t)0);
> 
> 		bufsize = bufsize - length > 0 ? bufsize - length : 0;

ok, that's fine too.

I'll pick one.

> Anyway, takes this as a
> 
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> for any variant.
> 

thanks,
-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH V2] fix readahead calculations in xfs_dir2_leaf_getdents()
  2009-09-26 18:03     ` Eric Sandeen
@ 2009-10-07 22:22       ` Alex Elder
  2009-10-07 22:24         ` Eric Sandeen
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Elder @ 2009-10-07 22:22 UTC (permalink / raw)
  To: Eric Sandeen, Christoph Hellwig; +Cc: xfs mailing list, tobias

Eric Sandeen wrote:
> Christoph Hellwig wrote:
>> On Fri, Sep 25, 2009 at 02:42:26PM -0500, Eric Sandeen wrote:
>>> V2: use min() as suggested by Jeff, it's tidier.
>> 
>> I disagree with that, with the cast it looks pretty horrible.
>> At least use min_t to avoid the case, but what's wrong with:
>> 
>>> +		/* bufsize may have just been a guess; don't go negative */
>>> +		bufsize = min((bufsize - length), (size_t)0);
>> 
>> 		bufsize = bufsize - length > 0 ? bufsize - length : 0;
> 
> ok, that's fine too.
> 
> I'll pick one.
> 
>> Anyway, takes this as a
>> 
>> 
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> 
>> for any variant.
>> 
> 
> thanks,
> -Eric

I'm going to put in this version:

		bufsize = bufsize > length ? bufsize - length : 0;

I.e.:

--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -854,6 +854,7 @@ xfs_dir2_leaf_getdents(
 			 */
 			ra_want = howmany(bufsize + mp->m_dirblksize,
 					  mp->m_sb.sb_blocksize) - 1;
+			ASSERT(ra_want >= 0);
 
 			/*
 			 * If we don't have as many as we want, and we haven't
@@ -1088,7 +1089,8 @@ xfs_dir2_leaf_getdents(
 		 */
 		ptr += length;
 		curoff += length;
-		bufsize -= length;
+		/* bufsize may have just been a guess; don't go negative */
+		bufsize = bufsize > length ? bufsize - length : 0;
 	}
 
 	/*

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] fix readahead calculations in xfs_dir2_leaf_getdents()
  2009-10-07 22:22       ` Alex Elder
@ 2009-10-07 22:24         ` Eric Sandeen
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-10-07 22:24 UTC (permalink / raw)
  To: Alex Elder; +Cc: Christoph Hellwig, xfs mailing list, tobias

Alex Elder wrote:
> Eric Sandeen wrote:
>> Christoph Hellwig wrote:
>>> On Fri, Sep 25, 2009 at 02:42:26PM -0500, Eric Sandeen wrote:
>>>> V2: use min() as suggested by Jeff, it's tidier.
>>> I disagree with that, with the cast it looks pretty horrible.
>>> At least use min_t to avoid the case, but what's wrong with:
>>>
>>>> +		/* bufsize may have just been a guess; don't go negative */
>>>> +		bufsize = min((bufsize - length), (size_t)0);
>>> 		bufsize = bufsize - length > 0 ? bufsize - length : 0;
>> ok, that's fine too.
>>
>> I'll pick one.
>>
>>> Anyway, takes this as a
>>>
>>>
>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>
>>> for any variant.
>>>
>> thanks,
>> -Eric
> 
> I'm going to put in this version:
> 
> 		bufsize = bufsize > length ? bufsize - length : 0;

Fine by me, thanks!

-Eric

> I.e.:
> 
> --- a/fs/xfs/xfs_dir2_leaf.c
> +++ b/fs/xfs/xfs_dir2_leaf.c
> @@ -854,6 +854,7 @@ xfs_dir2_leaf_getdents(
>  			 */
>  			ra_want = howmany(bufsize + mp->m_dirblksize,
>  					  mp->m_sb.sb_blocksize) - 1;
> +			ASSERT(ra_want >= 0);
>  
>  			/*
>  			 * If we don't have as many as we want, and we haven't
> @@ -1088,7 +1089,8 @@ xfs_dir2_leaf_getdents(
>  		 */
>  		ptr += length;
>  		curoff += length;
> -		bufsize -= length;
> +		/* bufsize may have just been a guess; don't go negative */
> +		bufsize = bufsize > length ? bufsize - length : 0;
>  	}
>  
>  	/*
> 
> 					-Alex
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-10-07 22:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-23 16:49 [PATCH] fix readahead calculations in xfs_dir2_leaf_getdents() Eric Sandeen
2009-09-23 18:42 ` Eric Sandeen
2009-09-23 20:29   ` Michael Monnerie
2009-09-25 19:42 ` [PATCH V2] " Eric Sandeen
2009-09-26 17:04   ` Christoph Hellwig
2009-09-26 18:03     ` Eric Sandeen
2009-10-07 22:22       ` Alex Elder
2009-10-07 22:24         ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox