* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 17:42 ` Linus Torvalds
@ 2012-04-13 17:53 ` Tim Chen
2012-04-13 18:08 ` Linus Torvalds
2012-04-13 17:57 ` Andi Kleen
2012-04-13 18:06 ` Ted Ts'o
2 siblings, 1 reply; 21+ messages in thread
From: Tim Chen @ 2012-04-13 17:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andi Kleen, Ted Ts'o, Vivek Haldar, Andreas Dilger,
linux-ext4
On Fri, 2012-04-13 at 10:42 -0700, Linus Torvalds wrote:
> On Wed, Apr 11, 2012 at 9:59 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >
> > Ping. This scalability problem is still in 3.4-rc* and causes
> > major slowdowns.
>
> Do you have numbers?
>
In a benchmark that does mmaped-read, there is a 28% speed up
after getting rid of the counters.
> > Can we please revert fix it or revert
> > 556b27abf73833923d5cd4be80006292e1b31662 before release.
>
> That commit ID doesn't make any sense, and doesn't seem to have
> anything to do with any statistics counters that your email talks
> about. So regardless, you'd need to explain why that commit causes the
> problems you talk about, I'm not going to revert a random commit that
> doesn't even look what you describe.
>
> Ted?
>
> Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 17:53 ` Tim Chen
@ 2012-04-13 18:08 ` Linus Torvalds
2012-04-13 18:31 ` Tim Chen
0 siblings, 1 reply; 21+ messages in thread
From: Linus Torvalds @ 2012-04-13 18:08 UTC (permalink / raw)
To: Tim Chen; +Cc: Andi Kleen, Ted Ts'o, Vivek Haldar, Andreas Dilger,
linux-ext4
On Fri, Apr 13, 2012 at 10:53 AM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>>
>> Do you have numbers?
>
> In a benchmark that does mmaped-read, there is a 28% speed up
> after getting rid of the counters.
Ok, that's big. But is it any actual real workload on a real
filesystem? It looks like this should only happen for actual IO, so I
get the feeling that this is some made-up benchmark for a filesystem
on a RAM-disk?
Linus
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 18:08 ` Linus Torvalds
@ 2012-04-13 18:31 ` Tim Chen
2012-04-13 18:33 ` Linus Torvalds
2012-04-13 18:37 ` Ted Ts'o
0 siblings, 2 replies; 21+ messages in thread
From: Tim Chen @ 2012-04-13 18:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andi Kleen, Ted Ts'o, Vivek Haldar, Andreas Dilger,
linux-ext4
On Fri, 2012-04-13 at 11:08 -0700, Linus Torvalds wrote:
> On Fri, Apr 13, 2012 at 10:53 AM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> >>
> >> Do you have numbers?
> >
> > In a benchmark that does mmaped-read, there is a 28% speed up
> > after getting rid of the counters.
>
> Ok, that's big. But is it any actual real workload on a real
> filesystem? It looks like this should only happen for actual IO, so I
> get the feeling that this is some made-up benchmark for a filesystem
> on a RAM-disk?
>
> Linus
Benchmark is working on files on normal hard disk.
However, I have I have a large number
of processes (80 processes, one for each cpu), each reading
a separate mmaped file. The files are in the same directory.
That makes cache line bouncing on the counters particularly bad
due to the large number of processes running.
Tim
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 18:31 ` Tim Chen
@ 2012-04-13 18:33 ` Linus Torvalds
2012-04-13 18:37 ` Ted Ts'o
1 sibling, 0 replies; 21+ messages in thread
From: Linus Torvalds @ 2012-04-13 18:33 UTC (permalink / raw)
To: Tim Chen; +Cc: Andi Kleen, Ted Ts'o, Vivek Haldar, Andreas Dilger,
linux-ext4
On Fri, Apr 13, 2012 at 11:31 AM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
>
> Benchmark is working on files on normal hard disk.
> However, I have I have a large number
> of processes (80 processes, one for each cpu), each reading
> a separate mmaped file. The files are in the same directory.
> That makes cache line bouncing on the counters particularly bad
> due to the large number of processes running.
Ok, that sounds like a potentially realistic case then, even if your
benchmark probably shows the worst possible case.
Considering that the counters are apparently not all that useful, it
does sound like they should just be removed, Ted.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 18:31 ` Tim Chen
2012-04-13 18:33 ` Linus Torvalds
@ 2012-04-13 18:37 ` Ted Ts'o
2012-04-13 18:41 ` Andi Kleen
2012-04-13 19:26 ` Tim Chen
1 sibling, 2 replies; 21+ messages in thread
From: Ted Ts'o @ 2012-04-13 18:37 UTC (permalink / raw)
To: Tim Chen
Cc: Linus Torvalds, Andi Kleen, Vivek Haldar, Andreas Dilger,
linux-ext4
On Fri, Apr 13, 2012 at 11:31:16AM -0700, Tim Chen wrote:
>
> Benchmark is working on files on normal hard disk.
> However, I have I have a large number
> of processes (80 processes, one for each cpu), each reading
> a separate mmaped file. The files are in the same directory.
> That makes cache line bouncing on the counters particularly bad
> due to the large number of processes running.
OK, so this is with an 80 CPU machine?
And when you say 20% speed up, do you mean to say we are actually
being CPU constrained when reading from files on a normal hard disk?
The reason why I ask this is we're seeing anything like this with Eric
Whitney's 48 CPU scalability testing; we're not CPU bottlenecked, and
I don't even see evidence of a larger than usual CPU utilization
compared to other file systems.
So still I'm trying to understand why your results are so different
from what Eric has been seeing, and I'm still puzzled why this is
super urgent.
Ultimately, this isn't a regression and if Linus is willing to take a
change at this point, I'm willing to send it --- but I really don't
understand the urgency.
Best regards,
- Ted
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 18:37 ` Ted Ts'o
@ 2012-04-13 18:41 ` Andi Kleen
2012-04-13 18:48 ` Ted Ts'o
2012-04-13 19:26 ` Tim Chen
1 sibling, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2012-04-13 18:41 UTC (permalink / raw)
To: Ted Ts'o
Cc: Tim Chen, Linus Torvalds, Andi Kleen, Vivek Haldar,
Andreas Dilger, linux-ext4
On Fri, Apr 13, 2012 at 02:37:22PM -0400, Ted Ts'o wrote:
> On Fri, Apr 13, 2012 at 11:31:16AM -0700, Tim Chen wrote:
> >
> > Benchmark is working on files on normal hard disk.
> > However, I have I have a large number
> > of processes (80 processes, one for each cpu), each reading
> > a separate mmaped file. The files are in the same directory.
> > That makes cache line bouncing on the counters particularly bad
> > due to the large number of processes running.
>
> OK, so this is with an 80 CPU machine?
4 sockets, 40 cores, 80 threads.
>
> And when you say 20% speed up, do you mean to say we are actually
> being CPU constrained when reading from files on a normal hard disk?
The files are in memory, but we're still CPU constrained due to various
other issues.
> The reason why I ask this is we're seeing anything like this with Eric
> Whitney's 48 CPU scalability testing; we're not CPU bottlenecked, and
> I don't even see evidence of a larger than usual CPU utilization
> compared to other file systems.
I bet Eric didn't test with this statistic counter.
> Ultimately, this isn't a regression and if Linus is willing to take a
The old kernel didn't have that problem, so it's a regression.
> change at this point, I'm willing to send it --- but I really don't
> understand the urgency.
If we don't fix performance regressions before each release then Linux
will get slower and slower. At least I don't want a slow Linux.
-Andi
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 18:41 ` Andi Kleen
@ 2012-04-13 18:48 ` Ted Ts'o
2012-04-13 19:01 ` Eric Whitney
0 siblings, 1 reply; 21+ messages in thread
From: Ted Ts'o @ 2012-04-13 18:48 UTC (permalink / raw)
To: Andi Kleen
Cc: Tim Chen, Linus Torvalds, Vivek Haldar, Andreas Dilger,
linux-ext4
On Fri, Apr 13, 2012 at 08:41:58PM +0200, Andi Kleen wrote:
> > The reason why I ask this is we're seeing anything like this with Eric
> > Whitney's 48 CPU scalability testing; we're not CPU bottlenecked, and
> > I don't even see evidence of a larger than usual CPU utilization
> > compared to other file systems.
>
> I bet Eric didn't test with this statistic counter.
Huh? You can't turn it off, and he's been doing regular scalability
tests at least once per kernel release.
Can you say a bit more about exactly how you are doing this test and
what are the "other issues" where this is becoming a bottleneck? If
possible I'd like to ask Eric if he can add it to his regular
scalability tests.
Thanks,
- Ted
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 18:48 ` Ted Ts'o
@ 2012-04-13 19:01 ` Eric Whitney
0 siblings, 0 replies; 21+ messages in thread
From: Eric Whitney @ 2012-04-13 19:01 UTC (permalink / raw)
To: Ted Ts'o
Cc: Andi Kleen, Tim Chen, Linus Torvalds, Vivek Haldar,
Andreas Dilger, linux-ext4
On 04/13/2012 02:48 PM, Ted Ts'o wrote:
> On Fri, Apr 13, 2012 at 08:41:58PM +0200, Andi Kleen wrote:
>>> The reason why I ask this is we're seeing anything like this with Eric
>>> Whitney's 48 CPU scalability testing; we're not CPU bottlenecked, and
>>> I don't even see evidence of a larger than usual CPU utilization
>>> compared to other file systems.
>>
>> I bet Eric didn't test with this statistic counter.
>
> Huh? You can't turn it off, and he's been doing regular scalability
> tests at least once per kernel release.
Yes, as recently as 3.4-rc1. I saw Andi's patch, and tested it this
week against that baseline with the ffsb profiles we've been using for
ext4 (and other filesystem) scalability measurements.
I didn't get a noticeable delta for throughput or reported CPU
utilization on my 48 core eight node NUMA test setup. That said, I plan
to look at this more closely to verify that my workloads should have
seen a delta in the first place. Ted knows them well, though. It's
worth noting that I've got plenty of free CPU capacity while running the
workload, which differs from Andi's/Tim's description.
>
> Can you say a bit more about exactly how you are doing this test and
> what are the "other issues" where this is becoming a bottleneck? If
> possible I'd like to ask Eric if he can add it to his regular
> scalability tests.
Yes, I'm certainly willing to do that if practical, and I'm curious to
know more about what the workload looks like.
Eric
>
> Thanks,
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 18:37 ` Ted Ts'o
2012-04-13 18:41 ` Andi Kleen
@ 2012-04-13 19:26 ` Tim Chen
2012-04-13 19:33 ` Ted Ts'o
1 sibling, 1 reply; 21+ messages in thread
From: Tim Chen @ 2012-04-13 19:26 UTC (permalink / raw)
To: Ted Ts'o
Cc: Linus Torvalds, Andi Kleen, Vivek Haldar, Andreas Dilger,
linux-ext4
On Fri, 2012-04-13 at 14:37 -0400, Ted Ts'o wrote:
> On Fri, Apr 13, 2012 at 11:31:16AM -0700, Tim Chen wrote:
> >
> > Benchmark is working on files on normal hard disk.
> > However, I have I have a large number
> > of processes (80 processes, one for each cpu), each reading
> > a separate mmaped file. The files are in the same directory.
> > That makes cache line bouncing on the counters particularly bad
> > due to the large number of processes running.
>
> OK, so this is with an 80 CPU machine?
>
> And when you say 20% speed up, do you mean to say we are actually
> being CPU constrained when reading from files on a normal hard disk?
>
The files are sparse files. So the amount of IO is limited and we are
not IO constrained.
Tim
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 19:26 ` Tim Chen
@ 2012-04-13 19:33 ` Ted Ts'o
0 siblings, 0 replies; 21+ messages in thread
From: Ted Ts'o @ 2012-04-13 19:33 UTC (permalink / raw)
To: Tim Chen
Cc: Linus Torvalds, Andi Kleen, Vivek Haldar, Andreas Dilger,
linux-ext4
On Fri, Apr 13, 2012 at 12:26:53PM -0700, Tim Chen wrote:
>
> The files are sparse files. So the amount of IO is limited and we are
> not IO constrained.
>
... OK, and exactly how sparse are these files? Can you say something
about the relatistic use case this workload is supposed to represent?
Or is this more of an artificial/micro benchmark?
- Ted
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 17:42 ` Linus Torvalds
2012-04-13 17:53 ` Tim Chen
@ 2012-04-13 17:57 ` Andi Kleen
2012-04-13 18:06 ` Ted Ts'o
2 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2012-04-13 17:57 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andi Kleen, Ted Ts'o, Vivek Haldar, Andreas Dilger,
linux-ext4, tim.c.chen
> > Can we please revert fix it or revert
> > 556b27abf73833923d5cd4be80006292e1b31662 before release.
>
> That commit ID doesn't make any sense, and doesn't seem to have
> anything to do with any statistics counters that your email talks
> about. So regardless, you'd need to explain why that commit causes the
> problems you talk about, I'm not going to revert a random commit that
> doesn't even look what you describe.
Sorry correct problematic commit is
commit 77f4135f2a219a2127be6cc1208c42e6175b11dd
Author: Vivek Haldar <haldar@google.com>
Date: Sun May 22 21:24:16 2011 -0400
ext4: count hits/misses of extent cache and expose in sysfs
The problem is that every IO operation in ext4 bangs on this super block
cache line when it checks the extent cache and then updates the hit/miss
count. On a system with more than two sockets this is very costly
in terms of interconnect traffic under high IO load.
A straight revert has a minor conflict on a trace statement that
was added later.
-Andi
Original patch:
commit 77f4135f2a219a2127be6cc1208c42e6175b11dd
Author: Vivek Haldar <haldar@google.com>
Date: Sun May 22 21:24:16 2011 -0400
ext4: count hits/misses of extent cache and expose in sysfs
The number of hits and misses for each filesystem is exposed in
/sys/fs/ext4/<dev>/extent_cache_{hits, misses}.
Tested: fsstress, manual checks.
Signed-off-by: Vivek Haldar <haldar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 17:42 ` Linus Torvalds
2012-04-13 17:53 ` Tim Chen
2012-04-13 17:57 ` Andi Kleen
@ 2012-04-13 18:06 ` Ted Ts'o
2012-04-13 18:22 ` Andi Kleen
2 siblings, 1 reply; 21+ messages in thread
From: Ted Ts'o @ 2012-04-13 18:06 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andi Kleen, Vivek Haldar, Andreas Dilger, linux-ext4, tim.c.chen
On Fri, Apr 13, 2012 at 10:42:10AM -0700, Linus Torvalds wrote:
>
> > Can we please revert fix it or revert
> > 556b27abf73833923d5cd4be80006292e1b31662 before release.
>
> That commit ID doesn't make any sense, and doesn't seem to have
> anything to do with any statistics counters that your email talks
> about. So regardless, you'd need to explain why that commit causes the
> problems you talk about, I'm not going to revert a random commit that
> doesn't even look what you describe.
>
> Ted?
I think Andi cited the wrong commit. The commit in question is
77f4135f2a219a2127be6cc1208c42e6175b11dd, which first showed up in
2.6.39. If you have a very fast (PCIe-attached would be required I
suspect) flash device, with a system with a large SMP box, and a
random-read benchmark which reads from large number of CPU's in
parallel, such that we thrash cache line containing
sbi->extent_cache_hits.
It makes sense that there is a problem here, and what had been
discussed was either (a) removing these counters entirely, or (b)
replacing them with a percpu counter. At the moment these counters
aren't really worth it, although in the future when we replace the
single extent cache with a larger cache, the statistics would be more
interesting.
I just ran out of time before the 3.4 merge window, and quite frankly
I didn't think it was worth persuing this with a high urgency, since
(a) it's not a regression (the commit in question which added these
counters has been around since 2.6.39), and (b) in most circumstances
people it's not noticeable at all.
I'm willing to push commit to you to remove the counters for now, and
we'll probably add it back later using percpu counters if you think
it's worth making a change at this time --- or if Andi can explain why
he's treating this with a high degree of urgency. Is there some
common use case that I'm missing which is being very badly impacted
with this cache-line thrashing?
Regards,
- Ted
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [RFC, PATCH] Avoid hot statistics cache line in ext4 extent cache
2012-04-13 18:06 ` Ted Ts'o
@ 2012-04-13 18:22 ` Andi Kleen
0 siblings, 0 replies; 21+ messages in thread
From: Andi Kleen @ 2012-04-13 18:22 UTC (permalink / raw)
To: Ted Ts'o
Cc: Linus Torvalds, Andi Kleen, Vivek Haldar, Andreas Dilger,
linux-ext4, tim.c.chen
> I think Andi cited the wrong commit. The commit in question is
> 77f4135f2a219a2127be6cc1208c42e6175b11dd, which first showed up in
> 2.6.39. If you have a very fast (PCIe-attached would be required I
I don't think it was a highend IO device.
This already hits for buffered IO I think.
> I'm willing to push commit to you to remove the counters for now, and
> we'll probably add it back later using percpu counters if you think
> it's worth making a change at this time --- or if Andi can explain why
> he's treating this with a high degree of urgency. Is there some
> common use case that I'm missing which is being very badly impacted
> with this cache-line thrashing?
Well IO writes is a pretty common use case. On a smaller
system it's likely not ~30%, but likely a drag too. And we normally
try to fix scalability regressions each release. Otherwise things
will just get worse and worse over time.
Scalability is unfortunately quite fragile and needs constant
attention. Every bad hot cache line can break it.
So yes I would like this to be reverted for the release.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 21+ messages in thread