[RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
@ 2012-03-23 22:17 Andi Kleen
  2012-03-24  1:10 ` Andreas Dilger
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2012-03-23 22:17 UTC (permalink / raw)
  To: tytso, linux-ext4, tim.c.chen

Hi,

We found that the super block cache line containing the extent cache hit/miss 
statistics counters is very hot in IO workloads.  

Besides the counter seems racy anyways and can lose updates because it's not
atomic.

Disabling the counter make a very measurable difference in some IO micro 
benchmarks on a 4S system.

I just disabled it for now because it doesn't seem very useful due to its
racy nature. An alternative would be to turn it into a per cpu counter,
if someone really needs it.

Simple patch to disable it appended.

-Andi

---

Disable statistics counter in ext4

This super bloc cache line is very hot and slows down high IO 
workloads.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 74f23c2..7310b0e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2049,10 +2049,12 @@ static int ext4_ext_check_cache(struct inode *inode, ext4_lblk_t block,
 		ret = 1;
 	}
 errout:
+#if 0
 	if (!ret)
 		sbi->extent_cache_misses++;
 	else
 		sbi->extent_cache_hits++;
+#endif
 	trace_ext4_ext_in_cache(inode, block, ret);
 	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
 	return ret;

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-03-23 22:17 [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache Andi Kleen
@ 2012-03-24  1:10 ` Andreas Dilger
  2012-03-24  3:13   ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Andreas Dilger @ 2012-03-24  1:10 UTC (permalink / raw)
  To: Andi Kleen; +Cc: tytso, linux-ext4, tim.c.chen

On 2012-03-23, at 4:17 PM, Andi Kleen wrote:
> We found that the super block cache line containing the extent cache
> hit/miss statistics counters is very hot in IO workloads.  
> 
> Besides the counter seems racy anyways and can lose updates because
> it's not atomic.
> 
> Disabling the counter make a very measurable difference in some IO
> micro benchmarks on a 4S system.
> 
> I just disabled it for now because it doesn't seem very useful due
> to its racy nature. An alternative would be to turn it into a per
> cpu counter, if someone really needs it.
> 
> Simple patch to disable it appended.
> 
> -Andi

The patch only disables the statistics counting, but not the sysfs
files that will now report bogus statistics.  I don't disagree with
disabling these stats, but if you are disabling the accounting you
should also comment out the functions in fs/ext4/super.c that are
printing out these statistics.

Cheers, Andreas

> ---
> 
> Disable statistics counter in ext4
> 
> This super bloc cache line is very hot and slows down high IO 
> workloads.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 74f23c2..7310b0e 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2049,10 +2049,12 @@ static int ext4_ext_check_cache(struct inode *inode, ext4_lblk_t block,
> 		ret = 1;
> 	}
> errout:
> +#if 0
> 	if (!ret)
> 		sbi->extent_cache_misses++;
> 	else
> 		sbi->extent_cache_hits++;
> +#endif
> 	trace_ext4_ext_in_cache(inode, block, ret);
> 	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> 	return ret;
> 
> -- 
> ak@linux.intel.com -- Speaking for myself only
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-03-24  1:10 ` Andreas Dilger
@ 2012-03-24  3:13   ` Andi Kleen
  2012-03-26 22:26     ` Vivek Haldar
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2012-03-24  3:13 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: tytso, linux-ext4, tim.c.chen

> The patch only disables the statistics counting, but not the sysfs
> files that will now report bogus statistics.  I don't disagree with
> disabling these stats, but if you are disabling the accounting you
> should also comment out the functions in fs/ext4/super.c that are
> printing out these statistics.

Aware of that -- that is why it was just a RFC.
The choice is between per cpu counters or completely removing them.
I would prefer the later.

-Andi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-03-24  3:13   ` Andi Kleen
@ 2012-03-26 22:26     ` Vivek Haldar
  2012-03-26 23:00       ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Vivek Haldar @ 2012-03-26 22:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andreas Dilger, tytso, linux-ext4, tim.c.chen

Andi --

I realized the problem soon after the original patch, and submitted
another patch to make these per cpu  counters.

http://patchwork.ozlabs.org/patch/124213/

However, looks like that fell through the cracks.

Andreas, Ted --  could you please take a look at that patch?

Andi --  perhaps you could patch it in  and check if  you still see
the hot cache line?

On Fri, Mar 23, 2012 at 8:13 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> The patch only disables the statistics counting, but not the sysfs
>> files that will now report bogus statistics.  I don't disagree with
>> disabling these stats, but if you are disabling the accounting you
>> should also comment out the functions in fs/ext4/super.c that are
>> printing out these statistics.
>
> Aware of that -- that is why it was just a RFC.
> The choice is between per cpu counters or completely removing them.
> I would prefer the later.
>
> -Andi
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Regards,
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-03-26 22:26     ` Vivek Haldar
@ 2012-03-26 23:00       ` Andi Kleen
  2012-03-26 23:57         ` Ted Ts'o
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2012-03-26 23:00 UTC (permalink / raw)
  To: Vivek Haldar; +Cc: Andreas Dilger, tytso, linux-ext4, tim.c.chen

On 3/26/2012 3:26 PM, Vivek Haldar wrote:
> Andi --
>
> I realized the problem soon after the original patch, and submitted
> another patch to make these per cpu  counters.

Is there a clear use case having these counters on every production system?

Especially given that they are racy and prone to lost updates today.

And you can still get the information if you really need it by putting a 
systemtap etc. probe.

-Andi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-03-26 23:00       ` Andi Kleen
@ 2012-03-26 23:57         ` Ted Ts'o
  2012-04-11 16:59           ` Andi Kleen
  0 siblings, 1 reply; 21+ messages in thread
From: Ted Ts'o @ 2012-03-26 23:57 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Vivek Haldar, Andreas Dilger, linux-ext4, tim.c.chen

On Mon, Mar 26, 2012 at 04:00:47PM -0700, Andi Kleen wrote:
> On 3/26/2012 3:26 PM, Vivek Haldar wrote:
> >Andi --
> >
> >I realized the problem soon after the original patch, and submitted
> >another patch to make these per cpu  counters.
> 
> Is there a clear use case having these counters on every production system?

Today, with the current single entry extent cache, I don't think
there's a good justification for it, no.

Vivek has worked on a rather more sophisticated extent cache which
could cache several extent entries (and indeed, combine multiple
on-disk extent entries into a single in-memory extent).  There are a
variety of reasons that hasn't gone upstream yet; one of which is
there are some interesting questions about how to control memory usage
of the extent cache; how do we trim it back in the case of memory
pressure?

One of the other things that we need to consider as we think about
getting this upstream is the "status" or "delayed" extents patches
which Allison and Yongqiang were looking at.  Does it make sense to
have two parallel datastructures which are indexed by logical block
number?  On the one hand, using an in-memory tree structure is pretty
expensive, just because of all of the 64-bit logical block numbers and
64-bit pointers.  On the other hand, would that make things too
complicated?

Once we start having multiple knobs to adjust, having these counters
available does make sense.  For now, using a per-cpu counter is
relatively low cost, except on extreme SGI Altix-like machines with
hundreds of CPU's, where the memory utilization is something to think
about.  Given that Vivek has submitted a patch to convert to per-cpu,
I can see applying it just to fix it; or just removing the stats for
now until we get the more sophisticated extent cache merged in.

    	     	     	  		       - Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-03-26 23:57         ` Ted Ts'o
@ 2012-04-11 16:59           ` Andi Kleen
  2012-04-13 17:42             ` Linus Torvalds
  0 siblings, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2012-04-11 16:59 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Vivek Haldar, Andreas Dilger, linux-ext4, tim.c.chen, torvalds

Ted Ts'o <tytso@mit.edu> writes:

> On Mon, Mar 26, 2012 at 04:00:47PM -0700, Andi Kleen wrote:
>> On 3/26/2012 3:26 PM, Vivek Haldar wrote:
>> >Andi --
>> >
>> >I realized the problem soon after the original patch, and submitted
>> >another patch to make these per cpu  counters.
>> 
>> Is there a clear use case having these counters on every production system?
>
> Today, with the current single entry extent cache, I don't think
> there's a good justification for it, no.

Ping. This scalability problem is still in 3.4-rc* and causes
major slowdowns.

Can we please revert fix it or revert 
556b27abf73833923d5cd4be80006292e1b31662 before release.

-Andi

(keeping context)

>
> Vivek has worked on a rather more sophisticated extent cache which
> could cache several extent entries (and indeed, combine multiple
> on-disk extent entries into a single in-memory extent).  There are a
> variety of reasons that hasn't gone upstream yet; one of which is
> there are some interesting questions about how to control memory usage
> of the extent cache; how do we trim it back in the case of memory
> pressure?
>
> One of the other things that we need to consider as we think about
> getting this upstream is the "status" or "delayed" extents patches
> which Allison and Yongqiang were looking at.  Does it make sense to
> have two parallel datastructures which are indexed by logical block
> number?  On the one hand, using an in-memory tree structure is pretty
> expensive, just because of all of the 64-bit logical block numbers and
> 64-bit pointers.  On the other hand, would that make things too
> complicated?
>
> Once we start having multiple knobs to adjust, having these counters
> available does make sense.  For now, using a per-cpu counter is
> relatively low cost, except on extreme SGI Altix-like machines with
> hundreds of CPU's, where the memory utilization is something to think
> about.  Given that Vivek has submitted a patch to convert to per-cpu,
> I can see applying it just to fix it; or just removing the stats for
> now until we get the more sophisticated extent cache merged in.
>
>     	     	     	  		       - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-11 16:59           ` Andi Kleen
@ 2012-04-13 17:42             ` Linus Torvalds
  2012-04-13 17:53               ` Tim Chen
                                 ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Linus Torvalds @ 2012-04-13 17:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ted Ts'o, Vivek Haldar, Andreas Dilger, linux-ext4,
	tim.c.chen

On Wed, Apr 11, 2012 at 9:59 AM, Andi Kleen <andi@firstfloor.org> wrote:
>
> Ping. This scalability problem is still in 3.4-rc* and causes
> major slowdowns.

Do you have numbers?

> Can we please revert fix it or revert
> 556b27abf73833923d5cd4be80006292e1b31662 before release.

That commit ID doesn't make any sense, and doesn't seem to have
anything to do with any statistics counters that your email talks
about. So regardless, you'd need to explain why that commit causes the
problems you talk about, I'm not going to revert a random commit that
doesn't even look what you describe.

Ted?

                Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 17:42             ` Linus Torvalds
@ 2012-04-13 17:53               ` Tim Chen
  2012-04-13 18:08                 ` Linus Torvalds
  2012-04-13 17:57               ` Andi Kleen
  2012-04-13 18:06               ` Ted Ts'o
  2 siblings, 1 reply; 21+ messages in thread
From: Tim Chen @ 2012-04-13 17:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Ted Ts'o, Vivek Haldar, Andreas Dilger,
	linux-ext4

On Fri, 2012-04-13 at 10:42 -0700, Linus Torvalds wrote:
> On Wed, Apr 11, 2012 at 9:59 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> > Ping. This scalability problem is still in 3.4-rc* and causes
> > major slowdowns.
> 
> Do you have numbers?
> 

In a benchmark that does mmaped-read, there is a 28% speed up
after getting rid of the counters.

> > Can we please revert fix it or revert
> > 556b27abf73833923d5cd4be80006292e1b31662 before release.
> 
> That commit ID doesn't make any sense, and doesn't seem to have
> anything to do with any statistics counters that your email talks
> about. So regardless, you'd need to explain why that commit causes the
> problems you talk about, I'm not going to revert a random commit that
> doesn't even look what you describe.
> 
> Ted?
> 
>                 Linus



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 17:53               ` Tim Chen
@ 2012-04-13 18:08                 ` Linus Torvalds
  2012-04-13 18:31                   ` Tim Chen
  0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2012-04-13 18:08 UTC (permalink / raw)
  To: Tim Chen; +Cc: Andi Kleen, Ted Ts'o, Vivek Haldar, Andreas Dilger,
	linux-ext4

On Fri, Apr 13, 2012 at 10:53 AM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>>
>> Do you have numbers?
>
> In a benchmark that does mmaped-read, there is a 28% speed up
> after getting rid of the counters.

Ok, that's big. But is it any actual real workload on a real
filesystem? It looks like this should only happen for actual IO, so I
get the feeling that this is some made-up benchmark for a filesystem
on a RAM-disk?

                  Linus

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 18:08                 ` Linus Torvalds
@ 2012-04-13 18:31                   ` Tim Chen
  2012-04-13 18:33                     ` Linus Torvalds
  2012-04-13 18:37                     ` Ted Ts'o
  0 siblings, 2 replies; 21+ messages in thread
From: Tim Chen @ 2012-04-13 18:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Ted Ts'o, Vivek Haldar, Andreas Dilger,
	linux-ext4

On Fri, 2012-04-13 at 11:08 -0700, Linus Torvalds wrote:
> On Fri, Apr 13, 2012 at 10:53 AM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> >>
> >> Do you have numbers?
> >
> > In a benchmark that does mmaped-read, there is a 28% speed up
> > after getting rid of the counters.
> 
> Ok, that's big. But is it any actual real workload on a real
> filesystem? It looks like this should only happen for actual IO, so I
> get the feeling that this is some made-up benchmark for a filesystem
> on a RAM-disk?
> 
>                   Linus

Benchmark is working on files on normal hard disk.  
However, I have I have a large number
of processes (80 processes, one for each cpu), each reading
a separate mmaped file.  The files are in the same directory.
That makes cache line bouncing on the counters particularly bad
due to the large number of processes running.

Tim


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 18:31                   ` Tim Chen
@ 2012-04-13 18:33                     ` Linus Torvalds
  2012-04-13 18:37                     ` Ted Ts'o
  1 sibling, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2012-04-13 18:33 UTC (permalink / raw)
  To: Tim Chen; +Cc: Andi Kleen, Ted Ts'o, Vivek Haldar, Andreas Dilger,
	linux-ext4

On Fri, Apr 13, 2012 at 11:31 AM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>
> Benchmark is working on files on normal hard disk.
> However, I have I have a large number
> of processes (80 processes, one for each cpu), each reading
> a separate mmaped file.  The files are in the same directory.
> That makes cache line bouncing on the counters particularly bad
> due to the large number of processes running.

Ok, that sounds like a potentially realistic case then, even if your
benchmark probably shows the worst possible case.

Considering that the counters are apparently not all that useful, it
does sound like they should just be removed, Ted.

                          Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 18:31                   ` Tim Chen
  2012-04-13 18:33                     ` Linus Torvalds
@ 2012-04-13 18:37                     ` Ted Ts'o
  2012-04-13 18:41                       ` Andi Kleen
  2012-04-13 19:26                       ` Tim Chen
  1 sibling, 2 replies; 21+ messages in thread
From: Ted Ts'o @ 2012-04-13 18:37 UTC (permalink / raw)
  To: Tim Chen
  Cc: Linus Torvalds, Andi Kleen, Vivek Haldar, Andreas Dilger,
	linux-ext4

On Fri, Apr 13, 2012 at 11:31:16AM -0700, Tim Chen wrote:
> 
> Benchmark is working on files on normal hard disk.  
> However, I have I have a large number
> of processes (80 processes, one for each cpu), each reading
> a separate mmaped file.  The files are in the same directory.
> That makes cache line bouncing on the counters particularly bad
> due to the large number of processes running.

OK, so this is with an 80 CPU machine?

And when you say 20% speed up, do you mean to say we are actually
being CPU constrained when reading from files on a normal hard disk?

The reason why I ask this is we're seeing anything like this with Eric
Whitney's 48 CPU scalability testing; we're not CPU bottlenecked, and
I don't even see evidence of a larger than usual CPU utilization
compared to other file systems.

So still I'm trying to understand why your results are so different
from what Eric has been seeing, and I'm still puzzled why this is
super urgent.

Ultimately, this isn't a regression and if Linus is willing to take a
change at this point, I'm willing to send it --- but I really don't
understand the urgency.

Best regards,

						- Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 18:37                     ` Ted Ts'o
@ 2012-04-13 18:41                       ` Andi Kleen
  2012-04-13 18:48                         ` Ted Ts'o
  2012-04-13 19:26                       ` Tim Chen
  1 sibling, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2012-04-13 18:41 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Tim Chen, Linus Torvalds, Andi Kleen, Vivek Haldar,
	Andreas Dilger, linux-ext4

On Fri, Apr 13, 2012 at 02:37:22PM -0400, Ted Ts'o wrote:
> On Fri, Apr 13, 2012 at 11:31:16AM -0700, Tim Chen wrote:
> > 
> > Benchmark is working on files on normal hard disk.  
> > However, I have I have a large number
> > of processes (80 processes, one for each cpu), each reading
> > a separate mmaped file.  The files are in the same directory.
> > That makes cache line bouncing on the counters particularly bad
> > due to the large number of processes running.
> 
> OK, so this is with an 80 CPU machine?

4 sockets, 40 cores, 80 threads.

> 
> And when you say 20% speed up, do you mean to say we are actually
> being CPU constrained when reading from files on a normal hard disk?

The files are in memory, but we're still CPU constrained due to various
other issues.

> The reason why I ask this is we're seeing anything like this with Eric
> Whitney's 48 CPU scalability testing; we're not CPU bottlenecked, and
> I don't even see evidence of a larger than usual CPU utilization
> compared to other file systems.

I bet Eric didn't test with this statistic counter.

> Ultimately, this isn't a regression and if Linus is willing to take a

The old kernel didn't have that problem, so it's a regression.

> change at this point, I'm willing to send it --- but I really don't
> understand the urgency.

If we don't fix performance regressions before each release then Linux
will get slower and slower. At least I don't want a slow Linux.

-Andi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 18:41                       ` Andi Kleen
@ 2012-04-13 18:48                         ` Ted Ts'o
  2012-04-13 19:01                           ` Eric Whitney
  0 siblings, 1 reply; 21+ messages in thread
From: Ted Ts'o @ 2012-04-13 18:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Tim Chen, Linus Torvalds, Vivek Haldar, Andreas Dilger,
	linux-ext4

On Fri, Apr 13, 2012 at 08:41:58PM +0200, Andi Kleen wrote:
> > The reason why I ask this is we're seeing anything like this with Eric
> > Whitney's 48 CPU scalability testing; we're not CPU bottlenecked, and
> > I don't even see evidence of a larger than usual CPU utilization
> > compared to other file systems.
> 
> I bet Eric didn't test with this statistic counter.

Huh?  You can't turn it off, and he's been doing regular scalability
tests at least once per kernel release.

Can you say a bit more about exactly how you are doing this test and
what are the "other issues" where this is becoming a bottleneck?  If
possible I'd like to ask Eric if he can add it to his regular
scalability tests.

Thanks,

						 - Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 18:48                         ` Ted Ts'o
@ 2012-04-13 19:01                           ` Eric Whitney
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Whitney @ 2012-04-13 19:01 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andi Kleen, Tim Chen, Linus Torvalds, Vivek Haldar,
	Andreas Dilger, linux-ext4

On 04/13/2012 02:48 PM, Ted Ts'o wrote:
> On Fri, Apr 13, 2012 at 08:41:58PM +0200, Andi Kleen wrote:
>>> The reason why I ask this is we're seeing anything like this with Eric
>>> Whitney's 48 CPU scalability testing; we're not CPU bottlenecked, and
>>> I don't even see evidence of a larger than usual CPU utilization
>>> compared to other file systems.
>>
>> I bet Eric didn't test with this statistic counter.
>
> Huh?  You can't turn it off, and he's been doing regular scalability
> tests at least once per kernel release.

Yes, as recently as 3.4-rc1.  I saw Andi's patch, and tested it this 
week against that baseline with the ffsb profiles we've been using for 
ext4 (and other filesystem) scalability measurements.

I didn't get a noticeable delta for throughput or reported CPU 
utilization on my 48 core eight node NUMA test setup.  That said, I plan 
to look at this more closely to verify that my workloads should have 
seen a delta in the first place.  Ted knows them well, though.  It's 
worth noting that I've got plenty of free CPU capacity while running the 
workload, which differs from Andi's/Tim's description.

>
> Can you say a bit more about exactly how you are doing this test and
> what are the "other issues" where this is becoming a bottleneck?  If
> possible I'd like to ask Eric if he can add it to his regular
> scalability tests.

Yes, I'm certainly willing to do that if practical, and I'm curious to 
know more about what the workload looks like.

Eric

>
> Thanks,
>
> 						 - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 18:37                     ` Ted Ts'o
  2012-04-13 18:41                       ` Andi Kleen
@ 2012-04-13 19:26                       ` Tim Chen
  2012-04-13 19:33                         ` Ted Ts'o
  1 sibling, 1 reply; 21+ messages in thread
From: Tim Chen @ 2012-04-13 19:26 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Linus Torvalds, Andi Kleen, Vivek Haldar, Andreas Dilger,
	linux-ext4

On Fri, 2012-04-13 at 14:37 -0400, Ted Ts'o wrote:
> On Fri, Apr 13, 2012 at 11:31:16AM -0700, Tim Chen wrote:
> > 
> > Benchmark is working on files on normal hard disk.  
> > However, I have I have a large number
> > of processes (80 processes, one for each cpu), each reading
> > a separate mmaped file.  The files are in the same directory.
> > That makes cache line bouncing on the counters particularly bad
> > due to the large number of processes running.
> 
> OK, so this is with an 80 CPU machine?
> 
> And when you say 20% speed up, do you mean to say we are actually
> being CPU constrained when reading from files on a normal hard disk?
> 

The files are sparse files.  So the amount of IO is limited and we are
not IO constrained.  

Tim


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 19:26                       ` Tim Chen
@ 2012-04-13 19:33                         ` Ted Ts'o
  0 siblings, 0 replies; 21+ messages in thread
From: Ted Ts'o @ 2012-04-13 19:33 UTC (permalink / raw)
  To: Tim Chen
  Cc: Linus Torvalds, Andi Kleen, Vivek Haldar, Andreas Dilger,
	linux-ext4

On Fri, Apr 13, 2012 at 12:26:53PM -0700, Tim Chen wrote:
> 
> The files are sparse files.  So the amount of IO is limited and we are
> not IO constrained.  
> 

... OK, and exactly how sparse are these files?  Can you say something
about the relatistic use case this workload is supposed to represent?
Or is this more of an artificial/micro benchmark?

					- Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 17:42             ` Linus Torvalds
  2012-04-13 17:53               ` Tim Chen
@ 2012-04-13 17:57               ` Andi Kleen
  2012-04-13 18:06               ` Ted Ts'o
  2 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2012-04-13 17:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Ted Ts'o, Vivek Haldar, Andreas Dilger,
	linux-ext4, tim.c.chen


> > Can we please revert fix it or revert
> > 556b27abf73833923d5cd4be80006292e1b31662 before release.
> 
> That commit ID doesn't make any sense, and doesn't seem to have
> anything to do with any statistics counters that your email talks
> about. So regardless, you'd need to explain why that commit causes the
> problems you talk about, I'm not going to revert a random commit that
> doesn't even look what you describe.

Sorry correct problematic commit is

commit 77f4135f2a219a2127be6cc1208c42e6175b11dd
Author: Vivek Haldar <haldar@google.com>
Date:   Sun May 22 21:24:16 2011 -0400

    ext4: count hits/misses of extent cache and expose in sysfs

The problem is that every IO operation in ext4 bangs on this super block
cache line when it checks the extent cache and then updates the hit/miss
count. On a system with more than two sockets this is very costly
in terms of interconnect traffic under high IO load.

A straight revert has a minor conflict on a trace statement that
was added later.

-Andi

Original patch:

commit 77f4135f2a219a2127be6cc1208c42e6175b11dd
Author: Vivek Haldar <haldar@google.com>
Date:   Sun May 22 21:24:16 2011 -0400

    ext4: count hits/misses of extent cache and expose in sysfs
    
    The number of hits and misses for each filesystem is exposed in
    /sys/fs/ext4/<dev>/extent_cache_{hits, misses}.
    
    Tested: fsstress, manual checks.
    Signed-off-by: Vivek Haldar <haldar@google.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 17:42             ` Linus Torvalds
  2012-04-13 17:53               ` Tim Chen
  2012-04-13 17:57               ` Andi Kleen
@ 2012-04-13 18:06               ` Ted Ts'o
  2012-04-13 18:22                 ` Andi Kleen
  2 siblings, 1 reply; 21+ messages in thread
From: Ted Ts'o @ 2012-04-13 18:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Vivek Haldar, Andreas Dilger, linux-ext4, tim.c.chen

On Fri, Apr 13, 2012 at 10:42:10AM -0700, Linus Torvalds wrote:
> 
> > Can we please revert fix it or revert
> > 556b27abf73833923d5cd4be80006292e1b31662 before release.
> 
> That commit ID doesn't make any sense, and doesn't seem to have
> anything to do with any statistics counters that your email talks
> about. So regardless, you'd need to explain why that commit causes the
> problems you talk about, I'm not going to revert a random commit that
> doesn't even look what you describe.
> 
> Ted?

I think Andi cited the wrong commit.  The commit in question is
77f4135f2a219a2127be6cc1208c42e6175b11dd, which first showed up in
2.6.39.  If you have a very fast (PCIe-attached would be required I
suspect) flash device, with a system with a large SMP box, and a
random-read benchmark which reads from large number of CPU's in
parallel, such that we thrash cache line containing
sbi->extent_cache_hits.

It makes sense that there is a problem here, and what had been
discussed was either (a) removing these counters entirely, or (b)
replacing them with a percpu counter.  At the moment these counters
aren't really worth it, although in the future when we replace the
single extent cache with a larger cache, the statistics would be more
interesting.

I just ran out of time before the 3.4 merge window, and quite frankly
I didn't think it was worth persuing this with a high urgency, since
(a) it's not a regression (the commit in question which added these
counters has been around since 2.6.39), and (b) in most circumstances
people it's not noticeable at all.

I'm willing to push commit to you to remove the counters for now, and
we'll probably add it back later using percpu counters if you think
it's worth making a change at this time --- or if Andi can explain why
he's treating this with a high degree of urgency.  Is there some
common use case that I'm missing which is being very badly impacted
with this cache-line thrashing?

Regards,

						- Ted

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
  2012-04-13 18:06               ` Ted Ts'o
@ 2012-04-13 18:22                 ` Andi Kleen
  0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2012-04-13 18:22 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Linus Torvalds, Andi Kleen, Vivek Haldar, Andreas Dilger,
	linux-ext4, tim.c.chen

> I think Andi cited the wrong commit.  The commit in question is
> 77f4135f2a219a2127be6cc1208c42e6175b11dd, which first showed up in
> 2.6.39.  If you have a very fast (PCIe-attached would be required I

I don't think it was a highend IO device.
This already hits for buffered IO I think.

> I'm willing to push commit to you to remove the counters for now, and
> we'll probably add it back later using percpu counters if you think
> it's worth making a change at this time --- or if Andi can explain why
> he's treating this with a high degree of urgency.  Is there some
> common use case that I'm missing which is being very badly impacted
> with this cache-line thrashing?

Well IO writes is a pretty common use case. On a smaller
system it's likely not ~30%, but likely a drag too. And we normally
try to fix scalability regressions each release.  Otherwise things
will just get worse and worse over time.

Scalability is unfortunately quite fragile and needs constant
attention. Every bad hot cache line can break it.

So yes I would like this to be reverted for the release.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-04-13 19:34 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-23 22:17 [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache Andi Kleen
2012-03-24  1:10 ` Andreas Dilger
2012-03-24  3:13   ` Andi Kleen
2012-03-26 22:26     ` Vivek Haldar
2012-03-26 23:00       ` Andi Kleen
2012-03-26 23:57         ` Ted Ts'o
2012-04-11 16:59           ` Andi Kleen
2012-04-13 17:42             ` Linus Torvalds
2012-04-13 17:53               ` Tim Chen
2012-04-13 18:08                 ` Linus Torvalds
2012-04-13 18:31                   ` Tim Chen
2012-04-13 18:33                     ` Linus Torvalds
2012-04-13 18:37                     ` Ted Ts'o
2012-04-13 18:41                       ` Andi Kleen
2012-04-13 18:48                         ` Ted Ts'o
2012-04-13 19:01                           ` Eric Whitney
2012-04-13 19:26                       ` Tim Chen
2012-04-13 19:33                         ` Ted Ts'o
2012-04-13 17:57               ` Andi Kleen
2012-04-13 18:06               ` Ted Ts'o
2012-04-13 18:22                 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).