From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 985D86B025E for ; Mon, 9 Oct 2017 03:53:41 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id p2so16399464pfk.0 for ; Mon, 09 Oct 2017 00:53:41 -0700 (PDT) Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id p5si6104050plk.185.2017.10.09.00.53.40 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 09 Oct 2017 00:53:40 -0700 (PDT) Date: Mon, 9 Oct 2017 15:53:38 +0800 From: Aaron Lu Subject: Re: [PATCH] page_alloc.c: inline __rmqueue() Message-ID: <20171009075338.GC1798@intel.com> References: <20171009054434.GA1798@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Anshuman Khandual Cc: linux-mm , lkml , Andrew Morton , Andi Kleen , Dave Hansen , Huang Ying , Tim Chen , Kemi Wang On Mon, Oct 09, 2017 at 01:07:36PM +0530, Anshuman Khandual wrote: > On 10/09/2017 11:14 AM, Aaron Lu wrote: > > __rmqueue() is called by rmqueue_bulk() and rmqueue() under zone->lock > > and that lock can be heavily contended with memory intensive applications. > > > > Since __rmqueue() is a small function, inline it can save us some time. > > With the will-it-scale/page_fault1/process benchmark, when using nr_cpu > > processes to stress buddy: > > > > On a 2 sockets Intel-Skylake machine: > > base %change head > > 77342 +6.3% 82203 will-it-scale.per_process_ops > > > > On a 4 sockets Intel-Skylake machine: > > base %change head > > 75746 +4.6% 79248 will-it-scale.per_process_ops > > > > This patch adds inline to __rmqueue(). > > > > Signed-off-by: Aaron Lu > > Ran it through kernel bench and ebizzy micro benchmarks. Results > were comparable with and without the patch. May be these are not > the appropriate tests for this inlining improvement. Anyways it I think so. The benefit only appears when the lock contention is huge enough, e.g. perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath is as high as 80% with the workload I have used. > does not have any performance degradation either. > > Reviewed-by: Anshuman Khandual > Tested-by: Anshuman Khandual Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org