From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264290AbTKLUB6 (ORCPT ); Wed, 12 Nov 2003 15:01:58 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264287AbTKLUB6 (ORCPT ); Wed, 12 Nov 2003 15:01:58 -0500 Received: from tolkor.SGI.COM ([198.149.18.6]:59836 "EHLO tolkor.sgi.com") by vger.kernel.org with ESMTP id S264286AbTKLUBw (ORCPT ); Wed, 12 Nov 2003 15:01:52 -0500 Date: Wed, 12 Nov 2003 14:01:19 -0600 From: Jack Steiner To: davidm@hpl.hp.com Cc: linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Inefficient TLB flushing Message-ID: <20031112200119.GA22429@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Something appears broken in TLB flushing on IA64 (& possibly other architectures). Functionally, it works but performance is bad on systems with large cpu counts. The result is that TLB flushing in exit_mmap() is frequently being done via IPIs to all cpus rather than with a "ptc" instruction or with a new context.. Take a look at exit_mmap() (in mm/mmap.c): void exit_mmap(struct mm_struct *mm) { (1) tlb = tlb_gather_mmu(mm, 1); unmap_vmas(&tlb,... ... (2) tlb_finish_mmu(tlb, 0, MM_VM_SIZE(mm)); ... } static inline void tlb_finish_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end) { ... ia64_tlb_flush_mmu(tlb, start, end); ... } static inline void ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end) { ... (3) if (tlb->fullmm) { flush_tlb_mm(tlb->mm); (4) } else if (unlikely (end - start >= 1TB ... /* This should be very rare and is not worth optimizing for. (5) flush_tlb_all(); ... flush_tlb_all() flushes with IPI to every cpu & are passed from exit_mmap at line (2) as 0..MM_VM_SIZE which is 0..0xa000000000000000 (on IA64) & the expression at (4) is always TRUE. If tlb->fullmm is false, the code in ia64_tlb_flush_mmu calls flush_tlb_all. flush_tlb_all uses IPIs to do the TLB flushing. Normally, tlb->fullmm at line (3) is 1 since it is initialized to 1 at line (1). However, in unmap_vmas(), if a resched interrupt occurs during VMA teardown, the tlb flushing is reinitialized & tlb_gather_mmu() is called with tlb_gather_mmu(mm, 0). When the call to tlb_finish_mmu() is made at line (2), TLB flushing is done with an IPI Does this analysis look correct?? AFAICT, this bug (inefficiency) may apply to other architectures although I cant access it's peformance impact. Here is the patch that I am currently testing: --- /usr/tmp/TmpDir.19957-0/linux/mm/memory.c_1.79 Wed Nov 12 13:56:25 2003 +++ linux/mm/memory.c Wed Nov 12 12:57:25 2003 @@ -574,9 +574,10 @@ if ((long)zap_bytes > 0) continue; if (need_resched()) { + int fullmm = (*tlbp)->fullmm; tlb_finish_mmu(*tlbp, tlb_start, start); cond_resched_lock(&mm->page_table_lock); - *tlbp = tlb_gather_mmu(mm, 0); + *tlbp = tlb_gather_mmu(mm, fullmm); tlb_start_valid = 0; } zap_bytes = ZAP_BLOCK_SIZE; -- Thanks Jack Steiner (steiner@sgi.com) 651-683-5302 Principal Engineer SGI - Silicon Graphics, Inc.