* [parisc-linux] Improving performance of munmap
@ 2004-10-07 23:02 Randolph Chung
2004-10-08 2:03 ` Carlos O'Donell
0 siblings, 1 reply; 6+ messages in thread
From: Randolph Chung @ 2004-10-07 23:02 UTC (permalink / raw)
To: parisc-linux
Hi all,
On a 2.6 SMP kernel, munmap()ing 16MB currently takes on the order of
0.8-0.9 seconds on a 440MHz machine. on a UP machine it takes about 5ms.
profiling shows that we are spending a lot of time flushing caches:
tausq@ios:~/parisc/linux-2.6$ sort -nr -k3 ~/test/mmap/after |head
48586 cpu_idle 759.1562
37906 machine_restart 592.2812
10951 flush_user_icache_range_asm 304.1944
10946 flush_user_dcache_range_asm 304.0556
129 flush_kernel_icache_page 1.2900
29 _spin_unlock_bh 0.6042
9 fdsync 0.4500
10 _spin_unlock_irq 0.4167
i believe this is related to the way we implement flush_tlb_mm(). on SMP
we currently flush the entire tlb, instead of just invaliding the
process context. this is needed to get the correct behavior if a
multithreaded app is simultaneously running on >1 CPUs. James had
suggested previously that we might be able to do something smarter, such
as sending an ipi to the other CPU to switch the context.
would anybody like to look into this problem, or suggest some ways to
tackle this?
randolph
--
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [parisc-linux] Improving performance of munmap
2004-10-07 23:02 [parisc-linux] Improving performance of munmap Randolph Chung
@ 2004-10-08 2:03 ` Carlos O'Donell
2004-10-08 4:16 ` Randolph Chung
0 siblings, 1 reply; 6+ messages in thread
From: Carlos O'Donell @ 2004-10-08 2:03 UTC (permalink / raw)
To: Randolph Chung; +Cc: parisc-linux
On Thu, Oct 07, 2004 at 04:02:32PM -0700, Randolph Chung wrote:
> Hi all,
>
> On a 2.6 SMP kernel, munmap()ing 16MB currently takes on the order of
> 0.8-0.9 seconds on a 440MHz machine. on a UP machine it takes about 5ms.
>
> profiling shows that we are spending a lot of time flushing caches:
>
> tausq@ios:~/parisc/linux-2.6$ sort -nr -k3 ~/test/mmap/after |head
> 48586 cpu_idle 759.1562
> 37906 machine_restart 592.2812
> 10951 flush_user_icache_range_asm 304.1944
> 10946 flush_user_dcache_range_asm 304.0556
> 129 flush_kernel_icache_page 1.2900
> 29 _spin_unlock_bh 0.6042
> 9 fdsync 0.4500
> 10 _spin_unlock_irq 0.4167
>
> i believe this is related to the way we implement flush_tlb_mm(). on SMP
> we currently flush the entire tlb, instead of just invaliding the
> process context. this is needed to get the correct behavior if a
> multithreaded app is simultaneously running on >1 CPUs. James had
> suggested previously that we might be able to do something smarter, such
> as sending an ipi to the other CPU to switch the context.
>
> would anybody like to look into this problem, or suggest some ways to
> tackle this?
Is munmap() speed really a big issue? Does userland see the benefits of
a faster flush_tlb_mm()?
Perhaps if you call dlclose() *a lot* you might notice.
Or if you create and destroy threads really quickly.
Aside from that, who else in userspace is a big munmap caller?
c.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [parisc-linux] Improving performance of munmap
2004-10-08 2:03 ` Carlos O'Donell
@ 2004-10-08 4:16 ` Randolph Chung
2004-10-08 16:59 ` Randolph Chung
0 siblings, 1 reply; 6+ messages in thread
From: Randolph Chung @ 2004-10-08 4:16 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: parisc-linux
> Is munmap() speed really a big issue? Does userland see the benefits of
> a faster flush_tlb_mm()?
actually i was wrong, the problem is not with flush_tlb_mm(), but with
flush_user_{dcache,icache}_range.
> Perhaps if you call dlclose() *a lot* you might notice.
>
> Or if you create and destroy threads really quickly.
>
> Aside from that, who else in userspace is a big munmap caller?
lots of apps use mmap a lot..... but even if they don't, 0.9 seconds is
a very long time to make somebody wait.
i made some changes to the flush_user_{dcache,icache}_range
implementation and it seems to be better now. will post the patch
tomorrow after some cleanups.
randolph
--
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [parisc-linux] Improving performance of munmap
2004-10-08 4:16 ` Randolph Chung
@ 2004-10-08 16:59 ` Randolph Chung
2004-10-08 17:05 ` Randolph Chung
2004-10-08 17:35 ` Grant Grundler
0 siblings, 2 replies; 6+ messages in thread
From: Randolph Chung @ 2004-10-08 16:59 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: parisc-linux
In reference to a message from Randolph Chung, dated Oct 07:
> i made some changes to the flush_user_{dcache,icache}_range
> implementation and it seems to be better now. will post the patch
> tomorrow after some cleanups.
Here's the patch. Any comments before i check it in?
On the a500 that i am testing on, this bumps the cache threshold up
from 0.5MB to 1MB after the timing.
I've retained the assumption in the code that the icache and dcache
thresholds are the same. i don't know if that's really a good
assumption though.
One more thing -- the performance of flush_data_cache() (i.e. the
architected whole-cache flush) seems to be dependent on the current
contents of the cache. i suppose that when the cache is more heavily
populated it takes longer to flush. so if you do multiple timings of
flush_data_cache() one after the other, the calls after the first tend
to be much faster (2-3x in one experiment). i've decided to only use the
first measurement, with the assumption that on a running system the
cache is normally fairly populated.
randolph
Index: arch/parisc/kernel/cache.c
===================================================================
RCS file: /var/cvs/linux-2.6/arch/parisc/kernel/cache.c,v
retrieving revision 1.21
diff -u -p -r1.21 cache.c
--- arch/parisc/kernel/cache.c 13 Sep 2004 15:22:24 -0000 1.21
+++ arch/parisc/kernel/cache.c 8 Oct 2004 16:48:22 -0000
@@ -55,6 +55,11 @@ flush_data_cache(void)
{
on_each_cpu((void (*)(void *))flush_data_cache_local, NULL, 1, 1);
}
+void
+flush_instruction_cache(void)
+{
+ on_each_cpu((void (*)(void *))flush_instruction_cache_local, NULL, 1, 1);
+}
#endif
void
@@ -326,4 +331,36 @@ void clear_user_page_asm(void *page, uns
purge_tlb_start();
__clear_user_page_asm(page, vaddr);
purge_tlb_end();
+}
+
+#define FLUSH_THRESHOLD 0x80000 /* 0.5MB */
+int parisc_cache_flush_threshold = FLUSH_THRESHOLD;
+
+void parisc_setup_cache_timing(void)
+{
+ unsigned long rangetime, alltime;
+ extern char _text; /* start of kernel code, defined by linker */
+ extern char _end; /* end of BSS, defined by linker */
+ unsigned long size;
+
+ alltime = mfctl(16);
+ flush_data_cache();
+ alltime = mfctl(16) - alltime;
+
+ size = (unsigned long)(&_end - _text);
+ rangetime = mfctl(16);
+ flush_kernel_dcache_range((unsigned long)&_text, size);
+ rangetime = mfctl(16) - rangetime;
+
+ printk(KERN_DEBUG "Whole cache flush %lu cycles, flushing %lu bytes %lu cycles\n",
+ alltime, size, rangetime);
+
+ /* Racy, but if we see an intermediate value, it's ok too... */
+ parisc_cache_flush_threshold = size * alltime / rangetime;
+
+ parisc_cache_flush_threshold = (parisc_cache_flush_threshold + L1_CACHE_BYTES - 1) &~ (L1_CACHE_BYTES - 1);
+ if (!parisc_cache_flush_threshold)
+ parisc_cache_flush_threshold = FLUSH_THRESHOLD;
+
+ printk("Setting cache flush threshold to %x (%d CPUs online)\n", parisc_cache_flush_threshold, num_online_cpus());
}
Index: arch/parisc/kernel/setup.c
===================================================================
RCS file: /var/cvs/linux-2.6/arch/parisc/kernel/setup.c,v
retrieving revision 1.8
diff -u -p -r1.8 setup.c
--- arch/parisc/kernel/setup.c 4 Oct 2004 19:12:49 -0000 1.8
+++ arch/parisc/kernel/setup.c 8 Oct 2004 16:48:22 -0000
@@ -310,6 +310,8 @@ static int __init parisc_init(void)
boot_cpu_data.cpu_hz / 1000000,
boot_cpu_data.cpu_hz % 1000000 );
+ parisc_setup_cache_timing();
+
/* These are in a non-obvious order, will fix when we have an iotree */
#if defined(CONFIG_IOSAPIC)
iosapic_init();
Index: include/asm-parisc/cache.h
===================================================================
RCS file: /var/cvs/linux-2.6/include/asm-parisc/cache.h,v
retrieving revision 1.3
diff -u -p -r1.3 cache.h
--- include/asm-parisc/cache.h 5 Apr 2004 02:47:39 -0000 1.3
+++ include/asm-parisc/cache.h 8 Oct 2004 16:48:44 -0000
@@ -34,6 +34,7 @@ extern void flush_data_cache_local(void)
extern void flush_instruction_cache_local(void); /* flushes local code-cache only */
#ifdef CONFIG_SMP
extern void flush_data_cache(void); /* flushes data-cache only (all processors) */
+extern void flush_instruction_cache(void); /* flushes i-cache only (all processors) */
#else
#define flush_data_cache flush_data_cache_local
#define flush_instruction_cache flush_instruction_cache_local
Index: include/asm-parisc/cacheflush.h
===================================================================
RCS file: /var/cvs/linux-2.6/include/asm-parisc/cacheflush.h,v
retrieving revision 1.15
diff -u -p -r1.15 cacheflush.h
--- include/asm-parisc/cacheflush.h 30 Sep 2004 12:08:46 -0000 1.15
+++ include/asm-parisc/cacheflush.h 8 Oct 2004 16:48:44 -0000
@@ -33,36 +33,25 @@ static inline void flush_cache_all(void)
#define flush_cache_vmap(start, end) flush_cache_all()
#define flush_cache_vunmap(start, end) flush_cache_all()
-/* The following value needs to be tuned and probably scaled with the
- * cache size.
- */
-
-#define FLUSH_THRESHOLD 0x80000
+extern int parisc_cache_flush_threshold;
+void parisc_setup_cache_timing(void);
static inline void
flush_user_dcache_range(unsigned long start, unsigned long end)
{
-#ifdef CONFIG_SMP
- flush_user_dcache_range_asm(start,end);
-#else
- if ((end - start) < FLUSH_THRESHOLD)
+ if ((end - start) < parisc_cache_flush_threshold)
flush_user_dcache_range_asm(start,end);
else
flush_data_cache();
-#endif
}
static inline void
flush_user_icache_range(unsigned long start, unsigned long end)
{
-#ifdef CONFIG_SMP
- flush_user_icache_range_asm(start,end);
-#else
- if ((end - start) < FLUSH_THRESHOLD)
+ if ((end - start) < parisc_cache_flush_threshold)
flush_user_icache_range_asm(start,end);
else
flush_instruction_cache();
-#endif
}
extern void flush_dcache_page(struct page *page);
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [parisc-linux] Improving performance of munmap
2004-10-08 16:59 ` Randolph Chung
@ 2004-10-08 17:05 ` Randolph Chung
2004-10-08 17:35 ` Grant Grundler
1 sibling, 0 replies; 6+ messages in thread
From: Randolph Chung @ 2004-10-08 17:05 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: parisc-linux
> On the a500 that i am testing on, this bumps the cache threshold up
> from 0.5MB to 1MB after the timing.
sorry, that should read:
"On the a500 that i am testing on, this bumps the cache threshold up
from 0.5MB to 2MB after the timing."
randolph
--
Randolph Chung
Debian GNU/Linux Developer, hppa/ia64 ports
http://www.tausq.org/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [parisc-linux] Improving performance of munmap
2004-10-08 16:59 ` Randolph Chung
2004-10-08 17:05 ` Randolph Chung
@ 2004-10-08 17:35 ` Grant Grundler
1 sibling, 0 replies; 6+ messages in thread
From: Grant Grundler @ 2004-10-08 17:35 UTC (permalink / raw)
To: Randolph Chung; +Cc: parisc-linux
On Fri, Oct 08, 2004 at 09:59:26AM -0700, Randolph Chung wrote:
> +#define FLUSH_THRESHOLD 0x80000 /* 0.5MB */
> +int parisc_cache_flush_threshold = FLUSH_THRESHOLD;
6 monthes or so ago Alex Williamson ran some perf tests comparing a
CONSTANT vs global when tuning the IOMMU page size on IA64 ZX1 chipsets.
IIRC, using a global was 3-5% slower if it was the same value as the constant.
I don't know if parisc will offer a similar performance difference and if so,
a per arch (PA11 vs PA20) should work fine for everything but PA8800.
It may not be a problem, just something to consider.
The rest looks "correct" to me - ie I don't see any obviously wrong.
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-10-08 17:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-07 23:02 [parisc-linux] Improving performance of munmap Randolph Chung
2004-10-08 2:03 ` Carlos O'Donell
2004-10-08 4:16 ` Randolph Chung
2004-10-08 16:59 ` Randolph Chung
2004-10-08 17:05 ` Randolph Chung
2004-10-08 17:35 ` Grant Grundler
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.