* [PATCH v6 6/8] fs: only send IPI to invalidate LRU BH when needed @ 2012-01-08 16:27 Gilad Ben-Yossef 2012-01-11 7:04 ` Milton Miller 0 siblings, 1 reply; 3+ messages in thread From: Gilad Ben-Yossef @ 2012-01-08 16:27 UTC (permalink / raw) To: linux-kernel Cc: Gilad Ben-Yossef, Christoph Lameter, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen, Mel Gorman, Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity, Michal Nazarewicz, Kosaki Motohiro In several code paths, such as when unmounting a file system (but not only) we send an IPI to ask each cpu to invalidate its local LRU BHs. For multi-cores systems that have many cpus that may not have any LRU BH because they are idle or because they have no performed any file system access since last invalidation (e.g. CPU crunching on high perfomance computing nodes that write results to shared memory) this can lead to loss of performance each time someone switches KVM (the virtual keyboard and screen type, not the hypervisor) that has a USB storage stuck in. This patch attempts to only send the IPI to cpus that have LRU BH. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> CC: Christoph Lameter <cl@linux.com> CC: Chris Metcalf <cmetcalf@tilera.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Russell King <linux@arm.linux.org.uk> CC: linux-mm@kvack.org CC: Pekka Enberg <penberg@kernel.org> CC: Matt Mackall <mpm@selenic.com> CC: Sasha Levin <levinsasha928@gmail.com> CC: Rik van Riel <riel@redhat.com> CC: Andi Kleen <andi@firstfloor.org> CC: Mel Gorman <mel@csn.ul.ie> CC: Andrew Morton <akpm@linux-foundation.org> CC: Alexander Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org CC: Avi Kivity <avi@redhat.com> CC: Michal Nazarewicz <mina86@mina86.org> CC: Kosaki Motohiro <kosaki.motohiro@gmail.com> --- fs/buffer.c | 15 ++++++++++++++- 1 files changed, 14 insertions(+), 1 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 19d8eb7..b2378d4 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1434,10 +1434,23 @@ static void invalidate_bh_lru(void *arg) } put_cpu_var(bh_lrus); } + +static int local_bh_lru_avail(int cpu, void *dummy) +{ + struct bh_lru *b = per_cpu_ptr(&bh_lrus, cpu); + int i; + for (i = 0; i < BH_LRU_SIZE; i++) { + if (b->bhs[i]) + return 1; + } + + return 0; +} + void invalidate_bh_lrus(void) { - on_each_cpu(invalidate_bh_lru, NULL, 1); + on_each_cpu_cond(local_bh_lru_avail, invalidate_bh_lru, NULL, 1); } EXPORT_SYMBOL_GPL(invalidate_bh_lrus); -- 1.7.0.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v6 6/8] fs: only send IPI to invalidate LRU BH when needed 2012-01-08 16:27 [PATCH v6 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef @ 2012-01-11 7:04 ` Milton Miller 2012-01-22 9:59 ` Gilad Ben-Yossef 0 siblings, 1 reply; 3+ messages in thread From: Milton Miller @ 2012-01-11 7:04 UTC (permalink / raw) To: Gilad Ben-Yossef, linux-kernel Cc: Christoph Lameter, Michal Nazarewicz, Mel Gorman, KOSAKI Motohiro, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg, Matt Mackall, Rik van Riel, Andi Kleen, Sasha Levin, Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity On Sun Jan 08 2012 about 11:28:17 EST, Gilad Ben-Yossef wrote: > In several code paths, such as when unmounting a file system (but > not only) we send an IPI to ask each cpu to invalidate its local > LRU BHs. > > For multi-cores systems that have many cpus that may not have For multi-core systems that have many cpus, many may not have > any LRU BH because they are idle or because they have no performed not performed > any file system access since last invalidation (e.g. CPU crunching accesses > on high perfomance computing nodes that write results to shared > memory) this can lead to loss of performance each time someone memory). This can lead to a loss Also: or only using filesystems that do not use the bh layer. > switches KVM (the virtual keyboard and screen type, not the switches the KVM > hypervisor) that has a USB storage stuck in. if it has > > This patch attempts to only send the IPI to cpus that have LRU BH. send an IPI > + > +static int local_bh_lru_avail(int cpu, void *dummy) > +{ This is not about the availibilty of the lru, but rather the decision if it is empty. How about has_bh_in_lru() ? > + struct bh_lru *b = per_cpu_ptr(&bh_lrus, cpu); > + int i; > > + for (i = 0; i < BH_LRU_SIZE; i++) { > + if (b->bhs[i]) > + return 1; > + } If we change the loop in invalidate_bh to be end to beginning, then we could get by only checking b->bhs[0] instead of all BH_LRU_SIZE words. (The other loops all start by having entry 0 as a valid entry and pushing towards higher slots as they age.) We might say we don't care, but I think we need to know if another cpu is still invalidating in case it gets stuck in brelse, we need to wait for all the invalidates to occur before we can continue to kill the device. The other question is locking, what covers the window from getting the bh until it is installed if the lru was empty? It looks like it could be a large hole, but I'm not sure it wasn't there before. By when do we need them freed? The locking seems to be irq-disable for smp and preempt-disable for up, can we use an RCU grace period? There seem to be more on_each_cpu calls in the bdev invalidate so we need more patches, although each round trip though ipi takes time; we could also consider if they take time. > + > + return 0; > +} > + > void invalidate_bh_lrus(void) > { > - on_each_cpu(invalidate_bh_lru, NULL, 1); > + on_each_cpu_cond(local_bh_lru_avail, invalidate_bh_lru, NULL, 1); > } > EXPORT_SYMBOL_GPL(invalidate_bh_lrus); milton -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v6 6/8] fs: only send IPI to invalidate LRU BH when needed 2012-01-11 7:04 ` Milton Miller @ 2012-01-22 9:59 ` Gilad Ben-Yossef 0 siblings, 0 replies; 3+ messages in thread From: Gilad Ben-Yossef @ 2012-01-22 9:59 UTC (permalink / raw) To: Milton Miller Cc: linux-kernel, Christoph Lameter, Michal Nazarewicz, Mel Gorman, KOSAKI Motohiro, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg, Matt Mackall, Rik van Riel, Andi Kleen, Sasha Levin, Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity On Wed, Jan 11, 2012 at 9:04 AM, Milton Miller <miltonm@bga.com> wrote: > On Sun Jan 08 2012 about 11:28:17 EST, Gilad Ben-Yossef wrote: > >> + >> +static int local_bh_lru_avail(int cpu, void *dummy) >> +{ > > This is not about the availibilty of the lru, but rather the > decision if it is empty. How about has_bh_in_lru() ? Right. Good point. > >> + struct bh_lru *b = per_cpu_ptr(&bh_lrus, cpu); >> + int i; >> >> + for (i = 0; i < BH_LRU_SIZE; i++) { >> + if (b->bhs[i]) >> + return 1; >> + } > > > If we change the loop in invalidate_bh to be end to beginning, then we > could get by only checking b->bhs[0] instead of all BH_LRU_SIZE words. > (The other loops all start by having entry 0 as a valid entry and pushing > towards higher slots as they age.) We might say we don't care, but I > think we need to know if another cpu is still invalidating in case it > gets stuck in brelse, we need to wait for all the invalidates to occur > before we can continue to kill the device. hmm... less cycles for the check and less cache lines to pull from all the CPUs is certainly better, but I'm guessing walking backwards on the bh_lru array is less cache friendly then doing it in the straight up way because of data cache prediction logic. Which one has a bigger impact? I'm not really sure. I do think it's a little bit more complicated then the current dead simple approach. I don't really mind making the change but those invalidates are rare. The rest of logic in the bh lru management is "dopey-but-simple" (so says the comment so my gut feeling is that its better to have simple code even if possibly less optimized in this code path. > The other question is locking, what covers the window from getting > the bh until it is installed if the lru was empty? It looks like > it could be a large hole, but I'm not sure it wasn't there before. > By when do we need them freed? The locking seems to be irq-disable > for smp and preempt-disable for up, can we use an RCU grace period? Good question. As you realized already the only case where we race is when going from empty list to having a single item. We'll send an IPI in all other cases. The way I see it, with the current code if you were about to install a new bh lru and you got hit with the invalidating IPI before you disabled preemption/interrupts, you have the same race, and you also have the same race if you got the invalidate IPI, cleaned everything up and then someone calls to install a bh in the LRU, since there is nothing stopping you from installing an LRU bh after the invalidate. So, if the callers to the invalidate path care about this, they must somehow make sure whatever type of BHs (belong to a specific device or fs) they are interested in getting off the LRU are never being added (on any CPU) after the point they call the invalidate or they are broken already. Again, this is all true for the current code to the best of my understanding. My code doesn't change that. It does make the race window slightly bigger by a couple of instructions, though so it might expose subtle bugs we had all along. I can't really tell if it's a good or bad thing :-) > There seem to be more on_each_cpu calls in the bdev invalidate > so we need more patches, although each round trip though ipi > takes time; we could also consider if they take time. Sure, I plan to get to all of them in the end (and also all the work queue variants) and at least look at them, but I prefer to tackle the low hanging fruits first. I maintain a rough TODO list of those I intend to visit and status here: https://github.com/gby/linux/wiki Thanks! Gilad -- Gilad Ben-Yossef Chief Coffee Drinker gilad@benyossef.com Israel Cell: +972-52-8260388 US Cell: +1-973-8260388 http://benyossef.com "Unfortunately, cache misses are an equal opportunity pain provider." -- Mike Galbraith, LKML -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-01-22 9:59 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-08 16:27 [PATCH v6 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef 2012-01-11 7:04 ` Milton Miller 2012-01-22 9:59 ` Gilad Ben-Yossef
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).