Inefficient TLB flushing

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Inefficient TLB flushing
@ 2003-11-12 20:01 Jack Steiner
  2003-11-12 20:49 ` David Mosberger
  2003-11-13  3:56 ` David S. Miller
  0 siblings, 2 replies; 6+ messages in thread
From: Jack Steiner @ 2003-11-12 20:01 UTC (permalink / raw)
  To: davidm; +Cc: linux-ia64, linux-kernel

Something appears broken in TLB flushing on IA64 (& possibly other
architectures). Functionally, it works but performance is bad on 
systems with large cpu counts. 

The result is that TLB flushing in exit_mmap() is frequently being done via
IPIs to all cpus rather than with a "ptc" instruction or with a new context..

Take a look at exit_mmap()  (in mm/mmap.c):

	void exit_mmap(struct mm_struct *mm)
	{
  (1)		tlb = tlb_gather_mmu(mm, 1);
  		unmap_vmas(&tlb,...
		...
  (2)		tlb_finish_mmu(tlb, 0, MM_VM_SIZE(mm));
		...
	}

	static inline void
	tlb_finish_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
	{
		...
		ia64_tlb_flush_mmu(tlb, start, end);
		...
	}

	static inline void
	ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
	{
		...
   (3)		if (tlb->fullmm) {
			flush_tlb_mm(tlb->mm);
   (4)		} else if (unlikely (end - start >= 1TB ...
			/* This should be very rare and is not worth optimizing for.
   (5)			flush_tlb_all();
		...

	flush_tlb_all() flushes with IPI to every cpu

<start> & <end> are passed from exit_mmap at line (2) as 0..MM_VM_SIZE which is
0..0xa000000000000000 (on IA64) & the expression at (4) is always TRUE. If tlb->fullmm is 
false, the code in ia64_tlb_flush_mmu calls flush_tlb_all. flush_tlb_all uses IPIs to 
do the TLB flushing.

Normally, tlb->fullmm at line (3) is 1 since it is initialized to 1 at line (1). However, in 
unmap_vmas(), if a resched interrupt occurs during VMA teardown, the tlb flushing
is reinitialized & tlb_gather_mmu() is called with tlb_gather_mmu(mm, 0). When the
call to tlb_finish_mmu() is made at line (2), TLB flushing is done with an IPI

Does this analysis look correct??  AFAICT, this bug (inefficiency) may apply to other
architectures although I cant access it's peformance impact.

Here is the patch that I am currently testing:

--- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79	Wed Nov 12 13:56:25 2003
+++ linux/mm/memory.c	Wed Nov 12 12:57:25 2003
@@ -574,9 +574,10 @@
 			if ((long)zap_bytes > 0)
 				continue;
 			if (need_resched()) {
+				int fullmm = (*tlbp)->fullmm;
 				tlb_finish_mmu(*tlbp, tlb_start, start);
 				cond_resched_lock(&mm->page_table_lock);
-				*tlbp = tlb_gather_mmu(mm, 0);
+				*tlbp = tlb_gather_mmu(mm, fullmm);
 				tlb_start_valid = 0;
 			}
 			zap_bytes = ZAP_BLOCK_SIZE;
-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inefficient TLB flushing
  2003-11-12 20:01 Inefficient TLB flushing Jack Steiner
@ 2003-11-12 20:49 ` David Mosberger
  2003-11-12 21:22   ` Andrew Morton
  2003-11-13  3:56 ` David S. Miller
  1 sibling, 1 reply; 6+ messages in thread
From: David Mosberger @ 2003-11-12 20:49 UTC (permalink / raw)
  To: Jack Steiner; +Cc: akpm, davidm, linux-ia64, linux-kernel

>>>>> On Wed, 12 Nov 2003 14:01:19 -0600, Jack Steiner <steiner@sgi.com> said:

  Jack> Does this analysis look correct??

Yup.

  Jack> Here is the patch that I am currently testing:

  Jack> --- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79	Wed Nov 12 13:56:25 2003
  Jack> +++ linux/mm/memory.c	Wed Nov 12 12:57:25 2003
  Jack> @@ -574,9 +574,10 @@
  Jack>  			if ((long)zap_bytes > 0)
  Jack>  				continue;
  Jack>  			if (need_resched()) {
  Jack> +				int fullmm = (*tlbp)->fullmm;
  Jack>  				tlb_finish_mmu(*tlbp, tlb_start, start);
  Jack>  				cond_resched_lock(&mm->page_table_lock);
  Jack> -				*tlbp = tlb_gather_mmu(mm, 0);
  Jack> +				*tlbp = tlb_gather_mmu(mm, fullmm);
  Jack>  				tlb_start_valid = 0;
  Jack>  			}
  Jack>  			zap_bytes = ZAP_BLOCK_SIZE;

I think the patch will work fine, but it's not very clean, because it
bypasses the TLB-flush API and directly accesses
implementation-specific internals.  Perhaps it would be better to pass
a "fullmm" flag to unmap_vmas().  Andrew?

	--david

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inefficient TLB flushing
  2003-11-12 20:49 ` David Mosberger
@ 2003-11-12 21:22   ` Andrew Morton
  2003-11-13  3:18     ` Jack Steiner
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2003-11-12 21:22 UTC (permalink / raw)
  To: davidm; +Cc: steiner, linux-ia64, linux-kernel

David Mosberger <davidm@napali.hpl.hp.com> wrote:
>
>   Jack> Here is the patch that I am currently testing:
> 
>   Jack> --- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79	Wed Nov 12 13:56:25 2003
>   Jack> +++ linux/mm/memory.c	Wed Nov 12 12:57:25 2003
>   Jack> @@ -574,9 +574,10 @@
>   Jack>  			if ((long)zap_bytes > 0)
>   Jack>  				continue;
>   Jack>  			if (need_resched()) {
>   Jack> +				int fullmm = (*tlbp)->fullmm;
>   Jack>  				tlb_finish_mmu(*tlbp, tlb_start, start);
>   Jack>  				cond_resched_lock(&mm->page_table_lock);
>   Jack> -				*tlbp = tlb_gather_mmu(mm, 0);
>   Jack> +				*tlbp = tlb_gather_mmu(mm, fullmm);
>   Jack>  				tlb_start_valid = 0;
>   Jack>  			}
>   Jack>  			zap_bytes = ZAP_BLOCK_SIZE;
> 
> I think the patch will work fine, but it's not very clean, because it
> bypasses the TLB-flush API and directly accesses
> implementation-specific internals.  Perhaps it would be better to pass
> a "fullmm" flag to unmap_vmas().  Andrew?

Either that, or add a new interface function

	int mmu_gather_is_full_mm(mmu_gather *tlb);

and use it...


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inefficient TLB flushing
  2003-11-12 21:22   ` Andrew Morton
@ 2003-11-13  3:18     ` Jack Steiner
  2003-11-13  3:31       ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Jack Steiner @ 2003-11-13  3:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: davidm, linux-ia64, linux-kernel

On Wed, Nov 12, 2003 at 01:22:53PM -0800, Andrew Morton wrote:
> David Mosberger <davidm@napali.hpl.hp.com> wrote:
> >
> >   Jack> Here is the patch that I am currently testing:
> > 
> >   Jack> --- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79	Wed Nov 12 13:56:25 2003
> >   Jack> +++ linux/mm/memory.c	Wed Nov 12 12:57:25 2003
> >   Jack> @@ -574,9 +574,10 @@
> >   Jack>  			if ((long)zap_bytes > 0)
> >   Jack>  				continue;
> >   Jack>  			if (need_resched()) {
> >   Jack> +				int fullmm = (*tlbp)->fullmm;
> >   Jack>  				tlb_finish_mmu(*tlbp, tlb_start, start);
> >   Jack>  				cond_resched_lock(&mm->page_table_lock);
> >   Jack> -				*tlbp = tlb_gather_mmu(mm, 0);
> >   Jack> +				*tlbp = tlb_gather_mmu(mm, fullmm);
> >   Jack>  				tlb_start_valid = 0;
> >   Jack>  			}
> >   Jack>  			zap_bytes = ZAP_BLOCK_SIZE;
> > 
> > I think the patch will work fine, but it's not very clean, because it
> > bypasses the TLB-flush API and directly accesses
> > implementation-specific internals.  Perhaps it would be better to pass
> > a "fullmm" flag to unmap_vmas().  Andrew?
> 
> Either that, or add a new interface function
> 
> 	int mmu_gather_is_full_mm(mmu_gather *tlb);
> 
> and use it...
> 

How implementation independent should it be? Currently, there is only one
field in the mmu_gather structure that must be preserved. However, if we
want to make the interface truly implementation independent, it seems
like we should define something like:

	if (need_resched()) {
		struct mmu_gather_state state;
		tlb_mmu_gather_save_state(*tlbp, &state);
		tlb_finish_mmu(*tlbp, tlb_start, start);
		...
		*tlbp = tlb_mmu_gather_restore_state(&state);
	}

Is this overkill?

Should we use the patch given above for 2.6.0 & replace it with an implementation 
independent interface for 2.6.1?



-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inefficient TLB flushing
  2003-11-13  3:18     ` Jack Steiner
@ 2003-11-13  3:31       ` Andrew Morton
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2003-11-13  3:31 UTC (permalink / raw)
  To: Jack Steiner; +Cc: davidm, linux-ia64, linux-kernel

Jack Steiner <steiner@sgi.com> wrote:
>
> > Either that, or add a new interface function
>  > 
>  > 	int mmu_gather_is_full_mm(mmu_gather *tlb);
>  > 
>  > and use it...
>  > 
> 
>  How implementation independent should it be? Currently, there is only one
>  field in the mmu_gather structure that must be preserved. However, if we
>  want to make the interface truly implementation independent, it seems
>  like we should define something like:
> 
>  	if (need_resched()) {
>  		struct mmu_gather_state state;
>  		tlb_mmu_gather_save_state(*tlbp, &state);
>  		tlb_finish_mmu(*tlbp, tlb_start, start);
>  		...
>  		*tlbp = tlb_mmu_gather_restore_state(&state);
>  	}
> 
>  Is this overkill?

Think so ;) The `full_mm_flush' boolean is the only state thing we can pass
into tlb_gather_mmu anyway.

> 
>  Should we use the patch given above for 2.6.0 & replace it with an implementation 
>  independent interface for 2.6.1?

Just the little wrapper which doesn't assume the presence of
mmu_gather.full_mm should suffice.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inefficient TLB flushing
  2003-11-12 20:01 Inefficient TLB flushing Jack Steiner
  2003-11-12 20:49 ` David Mosberger
@ 2003-11-13  3:56 ` David S. Miller
  1 sibling, 0 replies; 6+ messages in thread
From: David S. Miller @ 2003-11-13  3:56 UTC (permalink / raw)
  To: Jack Steiner; +Cc: davidm, linux-ia64, linux-kernel

On Wed, 12 Nov 2003 14:01:19 -0600
Jack Steiner <steiner@sgi.com> wrote:

> --- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79	Wed Nov 12 13:56:25 2003
> +++ linux/mm/memory.c	Wed Nov 12 12:57:25 2003
> @@ -574,9 +574,10 @@
>  			if ((long)zap_bytes > 0)
>  				continue;
>  			if (need_resched()) {
> +				int fullmm = (*tlbp)->fullmm;
>  				tlb_finish_mmu(*tlbp, tlb_start, start);
>  				cond_resched_lock(&mm->page_table_lock);
> -				*tlbp = tlb_gather_mmu(mm, 0);
> +				*tlbp = tlb_gather_mmu(mm, fullmm);
>  				tlb_start_valid = 0;
>  			}
>  			zap_bytes = ZAP_BLOCK_SIZE;

This patch looks perfectly fine, good analysis.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-11-13  3:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-12 20:01 Inefficient TLB flushing Jack Steiner
2003-11-12 20:49 ` David Mosberger
2003-11-12 21:22   ` Andrew Morton
2003-11-13  3:18     ` Jack Steiner
2003-11-13  3:31       ` Andrew Morton
2003-11-13  3:56 ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox