From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3yQbSd0HG5zDqmt for ; Tue, 31 Oct 2017 00:49:36 +1100 (AEDT) Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v9UDlvYn076690 for ; Mon, 30 Oct 2017 09:49:34 -0400 Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com [195.75.94.110]) by mx0a-001b2d01.pphosted.com with ESMTP id 2dx2xm8x36-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 30 Oct 2017 09:49:34 -0400 Received: from localhost by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 30 Oct 2017 13:49:31 -0000 From: "Aneesh Kumar K.V" To: Paul Mackerras Cc: benh@kernel.crashing.org, mpe@ellerman.id.au, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 00/16] Remove hash page table slot tracking from linux PTE In-Reply-To: <87r2tk22wo.fsf@linux.vnet.ibm.com> References: <20171027040833.3644-1-aneesh.kumar@linux.vnet.ibm.com> <20171027043430.GA27483@fergus.ozlabs.ibm.com> <20171027054136.GC27483@fergus.ozlabs.ibm.com> <87tvyg26us.fsf@linux.vnet.ibm.com> <87r2tk22wo.fsf@linux.vnet.ibm.com> Date: Mon, 30 Oct 2017 19:19:25 +0530 MIME-Version: 1.0 Content-Type: text/plain Message-Id: <87o9oo21ay.fsf@linux.vnet.ibm.com> List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , "Aneesh Kumar K.V" writes: > "Aneesh Kumar K.V" writes: > > >> I looked at the perf data and with the test, we are doing larger number >> of hash faults and then around 10k flush_hash_range. Can the small >> improvement in number be due to the fact that we are not storing slot >> number when doing an insert now?. Also in the flush path we are now not >> using real_pte_t. >> > > With THP disabled I am finding below. > > Without patch > > 35.62% a.out [kernel.vmlinux] [k] clear_user_page > 8.54% a.out [kernel.vmlinux] [k] __lock_acquire > 3.86% a.out [kernel.vmlinux] [k] native_flush_hash_range > 3.38% a.out [kernel.vmlinux] [k] save_context_stack > 2.98% a.out a.out [.] main > 2.59% a.out [kernel.vmlinux] [k] lock_acquire > 2.29% a.out [kernel.vmlinux] [k] mark_lock > 2.23% a.out [kernel.vmlinux] [k] native_hpte_insert > 1.87% a.out [kernel.vmlinux] [k] get_mem_cgroup_from_mm > 1.71% a.out [kernel.vmlinux] [k] rcu_lockdep_current_cpu_online > 1.68% a.out [kernel.vmlinux] [k] lock_release > 1.47% a.out [kernel.vmlinux] [k] __handle_mm_fault > 1.41% a.out [kernel.vmlinux] [k] validate_sp > > > With patch > 35.40% a.out [kernel.vmlinux] [k] clear_user_page > 8.82% a.out [kernel.vmlinux] [k] __lock_acquire > 3.66% a.out a.out [.] main > 3.49% a.out [kernel.vmlinux] [k] save_context_stack > 2.77% a.out [kernel.vmlinux] [k] lock_acquire > 2.45% a.out [kernel.vmlinux] [k] mark_lock > 1.80% a.out [kernel.vmlinux] [k] get_mem_cgroup_from_mm > 1.80% a.out [kernel.vmlinux] [k] native_hpte_insert > 1.79% a.out [kernel.vmlinux] [k] rcu_lockdep_current_cpu_online > 1.78% a.out [kernel.vmlinux] [k] lock_release > 1.73% a.out [kernel.vmlinux] [k] native_flush_hash_range > 1.53% a.out [kernel.vmlinux] [k] __handle_mm_fault > > That is we are now spending less time in native_flush_hash_range. > > -aneesh One possible explanation is, with slot tracking we do slot += hidx & _PTEIDX_GROUP_IX; hptep = htab_address + slot; want_v = hpte_encode_avpn(vpn, psize, ssize); native_lock_hpte(hptep); hpte_v = be64_to_cpu(hptep->v); if (cpu_has_feature(CPU_FTR_ARCH_300)) hpte_v = hpte_new_to_old_v(hpte_v, be64_to_cpu(hptep->r)); if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) native_unlock_hpte(hptep); and without slot tracking we do for (i = 0; i < HPTES_PER_GROUP; i++, hptep++) { /* check locklessly first */ hpte_v = be64_to_cpu(hptep->v); if (cpu_has_feature(CPU_FTR_ARCH_300)) hpte_v = hpte_new_to_old_v(hpte_v, be64_to_cpu(hptep->r)); if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID)) continue; native_lock_hpte(hptep); That is without the patch series, we take the hpte lock always even if the hpte didn't match. Hence in perf annotate we find the lock to be highly contended without patch series. I will change that to compare pte without taking lock and see if that has any impact. -aneesh