* Inefficient TLB flushing
@ 2003-11-12 20:01 Jack Steiner
2003-11-12 20:49 ` David Mosberger
2003-11-13 3:56 ` David S. Miller
0 siblings, 2 replies; 6+ messages in thread
From: Jack Steiner @ 2003-11-12 20:01 UTC (permalink / raw)
To: davidm; +Cc: linux-ia64, linux-kernel
Something appears broken in TLB flushing on IA64 (& possibly other
architectures). Functionally, it works but performance is bad on
systems with large cpu counts.
The result is that TLB flushing in exit_mmap() is frequently being done via
IPIs to all cpus rather than with a "ptc" instruction or with a new context..
Take a look at exit_mmap() (in mm/mmap.c):
void exit_mmap(struct mm_struct *mm)
{
(1) tlb = tlb_gather_mmu(mm, 1);
unmap_vmas(&tlb,...
...
(2) tlb_finish_mmu(tlb, 0, MM_VM_SIZE(mm));
...
}
static inline void
tlb_finish_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
...
ia64_tlb_flush_mmu(tlb, start, end);
...
}
static inline void
ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
{
...
(3) if (tlb->fullmm) {
flush_tlb_mm(tlb->mm);
(4) } else if (unlikely (end - start >= 1TB ...
/* This should be very rare and is not worth optimizing for.
(5) flush_tlb_all();
...
flush_tlb_all() flushes with IPI to every cpu
<start> & <end> are passed from exit_mmap at line (2) as 0..MM_VM_SIZE which is
0..0xa000000000000000 (on IA64) & the expression at (4) is always TRUE. If tlb->fullmm is
false, the code in ia64_tlb_flush_mmu calls flush_tlb_all. flush_tlb_all uses IPIs to
do the TLB flushing.
Normally, tlb->fullmm at line (3) is 1 since it is initialized to 1 at line (1). However, in
unmap_vmas(), if a resched interrupt occurs during VMA teardown, the tlb flushing
is reinitialized & tlb_gather_mmu() is called with tlb_gather_mmu(mm, 0). When the
call to tlb_finish_mmu() is made at line (2), TLB flushing is done with an IPI
Does this analysis look correct?? AFAICT, this bug (inefficiency) may apply to other
architectures although I cant access it's peformance impact.
Here is the patch that I am currently testing:
--- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79 Wed Nov 12 13:56:25 2003
+++ linux/mm/memory.c Wed Nov 12 12:57:25 2003
@@ -574,9 +574,10 @@
if ((long)zap_bytes > 0)
continue;
if (need_resched()) {
+ int fullmm = (*tlbp)->fullmm;
tlb_finish_mmu(*tlbp, tlb_start, start);
cond_resched_lock(&mm->page_table_lock);
- *tlbp = tlb_gather_mmu(mm, 0);
+ *tlbp = tlb_gather_mmu(mm, fullmm);
tlb_start_valid = 0;
}
zap_bytes = ZAP_BLOCK_SIZE;
--
Thanks
Jack Steiner (steiner@sgi.com) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Inefficient TLB flushing
2003-11-12 20:01 Inefficient TLB flushing Jack Steiner
@ 2003-11-12 20:49 ` David Mosberger
2003-11-12 21:22 ` Andrew Morton
2003-11-13 3:56 ` David S. Miller
1 sibling, 1 reply; 6+ messages in thread
From: David Mosberger @ 2003-11-12 20:49 UTC (permalink / raw)
To: Jack Steiner; +Cc: akpm, davidm, linux-ia64, linux-kernel
>>>>> On Wed, 12 Nov 2003 14:01:19 -0600, Jack Steiner <steiner@sgi.com> said:
Jack> Does this analysis look correct??
Yup.
Jack> Here is the patch that I am currently testing:
Jack> --- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79 Wed Nov 12 13:56:25 2003
Jack> +++ linux/mm/memory.c Wed Nov 12 12:57:25 2003
Jack> @@ -574,9 +574,10 @@
Jack> if ((long)zap_bytes > 0)
Jack> continue;
Jack> if (need_resched()) {
Jack> + int fullmm = (*tlbp)->fullmm;
Jack> tlb_finish_mmu(*tlbp, tlb_start, start);
Jack> cond_resched_lock(&mm->page_table_lock);
Jack> - *tlbp = tlb_gather_mmu(mm, 0);
Jack> + *tlbp = tlb_gather_mmu(mm, fullmm);
Jack> tlb_start_valid = 0;
Jack> }
Jack> zap_bytes = ZAP_BLOCK_SIZE;
I think the patch will work fine, but it's not very clean, because it
bypasses the TLB-flush API and directly accesses
implementation-specific internals. Perhaps it would be better to pass
a "fullmm" flag to unmap_vmas(). Andrew?
--david
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Inefficient TLB flushing
2003-11-12 20:49 ` David Mosberger
@ 2003-11-12 21:22 ` Andrew Morton
2003-11-13 3:18 ` Jack Steiner
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2003-11-12 21:22 UTC (permalink / raw)
To: davidm; +Cc: steiner, linux-ia64, linux-kernel
David Mosberger <davidm@napali.hpl.hp.com> wrote:
>
> Jack> Here is the patch that I am currently testing:
>
> Jack> --- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79 Wed Nov 12 13:56:25 2003
> Jack> +++ linux/mm/memory.c Wed Nov 12 12:57:25 2003
> Jack> @@ -574,9 +574,10 @@
> Jack> if ((long)zap_bytes > 0)
> Jack> continue;
> Jack> if (need_resched()) {
> Jack> + int fullmm = (*tlbp)->fullmm;
> Jack> tlb_finish_mmu(*tlbp, tlb_start, start);
> Jack> cond_resched_lock(&mm->page_table_lock);
> Jack> - *tlbp = tlb_gather_mmu(mm, 0);
> Jack> + *tlbp = tlb_gather_mmu(mm, fullmm);
> Jack> tlb_start_valid = 0;
> Jack> }
> Jack> zap_bytes = ZAP_BLOCK_SIZE;
>
> I think the patch will work fine, but it's not very clean, because it
> bypasses the TLB-flush API and directly accesses
> implementation-specific internals. Perhaps it would be better to pass
> a "fullmm" flag to unmap_vmas(). Andrew?
Either that, or add a new interface function
int mmu_gather_is_full_mm(mmu_gather *tlb);
and use it...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Inefficient TLB flushing
2003-11-12 21:22 ` Andrew Morton
@ 2003-11-13 3:18 ` Jack Steiner
2003-11-13 3:31 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Jack Steiner @ 2003-11-13 3:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: davidm, linux-ia64, linux-kernel
On Wed, Nov 12, 2003 at 01:22:53PM -0800, Andrew Morton wrote:
> David Mosberger <davidm@napali.hpl.hp.com> wrote:
> >
> > Jack> Here is the patch that I am currently testing:
> >
> > Jack> --- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79 Wed Nov 12 13:56:25 2003
> > Jack> +++ linux/mm/memory.c Wed Nov 12 12:57:25 2003
> > Jack> @@ -574,9 +574,10 @@
> > Jack> if ((long)zap_bytes > 0)
> > Jack> continue;
> > Jack> if (need_resched()) {
> > Jack> + int fullmm = (*tlbp)->fullmm;
> > Jack> tlb_finish_mmu(*tlbp, tlb_start, start);
> > Jack> cond_resched_lock(&mm->page_table_lock);
> > Jack> - *tlbp = tlb_gather_mmu(mm, 0);
> > Jack> + *tlbp = tlb_gather_mmu(mm, fullmm);
> > Jack> tlb_start_valid = 0;
> > Jack> }
> > Jack> zap_bytes = ZAP_BLOCK_SIZE;
> >
> > I think the patch will work fine, but it's not very clean, because it
> > bypasses the TLB-flush API and directly accesses
> > implementation-specific internals. Perhaps it would be better to pass
> > a "fullmm" flag to unmap_vmas(). Andrew?
>
> Either that, or add a new interface function
>
> int mmu_gather_is_full_mm(mmu_gather *tlb);
>
> and use it...
>
How implementation independent should it be? Currently, there is only one
field in the mmu_gather structure that must be preserved. However, if we
want to make the interface truly implementation independent, it seems
like we should define something like:
if (need_resched()) {
struct mmu_gather_state state;
tlb_mmu_gather_save_state(*tlbp, &state);
tlb_finish_mmu(*tlbp, tlb_start, start);
...
*tlbp = tlb_mmu_gather_restore_state(&state);
}
Is this overkill?
Should we use the patch given above for 2.6.0 & replace it with an implementation
independent interface for 2.6.1?
--
Thanks
Jack Steiner (steiner@sgi.com) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Inefficient TLB flushing
2003-11-13 3:18 ` Jack Steiner
@ 2003-11-13 3:31 ` Andrew Morton
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2003-11-13 3:31 UTC (permalink / raw)
To: Jack Steiner; +Cc: davidm, linux-ia64, linux-kernel
Jack Steiner <steiner@sgi.com> wrote:
>
> > Either that, or add a new interface function
> >
> > int mmu_gather_is_full_mm(mmu_gather *tlb);
> >
> > and use it...
> >
>
> How implementation independent should it be? Currently, there is only one
> field in the mmu_gather structure that must be preserved. However, if we
> want to make the interface truly implementation independent, it seems
> like we should define something like:
>
> if (need_resched()) {
> struct mmu_gather_state state;
> tlb_mmu_gather_save_state(*tlbp, &state);
> tlb_finish_mmu(*tlbp, tlb_start, start);
> ...
> *tlbp = tlb_mmu_gather_restore_state(&state);
> }
>
> Is this overkill?
Think so ;) The `full_mm_flush' boolean is the only state thing we can pass
into tlb_gather_mmu anyway.
>
> Should we use the patch given above for 2.6.0 & replace it with an implementation
> independent interface for 2.6.1?
Just the little wrapper which doesn't assume the presence of
mmu_gather.full_mm should suffice.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Inefficient TLB flushing
2003-11-12 20:01 Inefficient TLB flushing Jack Steiner
2003-11-12 20:49 ` David Mosberger
@ 2003-11-13 3:56 ` David S. Miller
1 sibling, 0 replies; 6+ messages in thread
From: David S. Miller @ 2003-11-13 3:56 UTC (permalink / raw)
To: Jack Steiner; +Cc: davidm, linux-ia64, linux-kernel
On Wed, 12 Nov 2003 14:01:19 -0600
Jack Steiner <steiner@sgi.com> wrote:
> --- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79 Wed Nov 12 13:56:25 2003
> +++ linux/mm/memory.c Wed Nov 12 12:57:25 2003
> @@ -574,9 +574,10 @@
> if ((long)zap_bytes > 0)
> continue;
> if (need_resched()) {
> + int fullmm = (*tlbp)->fullmm;
> tlb_finish_mmu(*tlbp, tlb_start, start);
> cond_resched_lock(&mm->page_table_lock);
> - *tlbp = tlb_gather_mmu(mm, 0);
> + *tlbp = tlb_gather_mmu(mm, fullmm);
> tlb_start_valid = 0;
> }
> zap_bytes = ZAP_BLOCK_SIZE;
This patch looks perfectly fine, good analysis.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-11-13 3:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-12 20:01 Inefficient TLB flushing Jack Steiner
2003-11-12 20:49 ` David Mosberger
2003-11-12 21:22 ` Andrew Morton
2003-11-13 3:18 ` Jack Steiner
2003-11-13 3:31 ` Andrew Morton
2003-11-13 3:56 ` David S. Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox