From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E437C4360F for ; Thu, 21 Feb 2019 07:53:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6E4522148D for ; Thu, 21 Feb 2019 07:53:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726974AbfBUHxk (ORCPT ); Thu, 21 Feb 2019 02:53:40 -0500 Received: from mga14.intel.com ([192.55.52.115]:2215 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725858AbfBUHxj (ORCPT ); Thu, 21 Feb 2019 02:53:39 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Feb 2019 23:53:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,394,1544515200"; d="scan'208";a="135986157" Received: from richard.sh.intel.com (HELO localhost) ([10.239.159.54]) by orsmga002.jf.intel.com with ESMTP; 20 Feb 2019 23:53:36 -0800 Date: Thu, 21 Feb 2019 15:53:13 +0800 From: Wei Yang To: "Huang, Ying" Cc: Greg Kroah-Hartman , kernel test robot , Wei Yang , Stephen Rothwell , "Rafael J. Wysocki" , lkp@01.org, LKML Subject: Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression Message-ID: <20190221075313.GA4113@richard> Reply-To: Wei Yang References: <20190218075442.GI29177@shao2-debian> <20190219005945.GA16734@richard> <20190219121904.GA24103@kroah.com> <20190221031049.GE28258@shao2-debian> <20190221071023.GA28637@kroah.com> <8736oh1uf5.fsf@yhuang-dev.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8736oh1uf5.fsf@yhuang-dev.intel.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 21, 2019 at 03:18:22PM +0800, Huang, Ying wrote: >Greg Kroah-Hartman writes: > >> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote: >>> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote: >>> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote: >>> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote: >>> > > >Greeting, >>> > > > >>> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due to commit: >>> > > > >>> > > > >>> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move device->knode_class to device_private") >>> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master >>> > > > >>> > > >>> > > This is interesting. >>> > > >>> > > I didn't expect the move of this field will impact the performance. >>> > > >>> > > The reason is struct device is a hotter memory than device->device_private? >>> > > >>> > > >in testcase: will-it-scale >>> > > >on test machine: 288 threads Knights Mill with 80G memory >>> > > >with following parameters: >>> > > > >>> > > > nr_task: 100% >>> > > > mode: thread >>> > > > test: unlink2 >>> > > > cpufreq_governor: performance >>> > > > >>> > > >test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. >>> > > >test-url: https://github.com/antonblanchard/will-it-scale >>> > > > >>> > > >In addition to that, the commit also has significant impact on the following tests: >>> > > > >>> > > >+------------------+---------------------------------------------------------------+ >>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% regression | >>> > > >| test machine | 288 threads Knights Mill with 80G memory | >>> > > >| test parameters | cpufreq_governor=performance | >>> > > >| | mode=thread | >>> > > >| | nr_task=100% | >>> > > >| | test=signal1 | >>> > >>> > Ok, I'm going to blame your testing system, or something here, and not >>> > the above patch. >>> > >>> > All this test does is call raise(3). That does not touch the driver >>> > core at all. >>> > >>> > > >+------------------+---------------------------------------------------------------+ >>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% regression | >>> > > >| test machine | 288 threads Knights Mill with 80G memory | >>> > > >| test parameters | cpufreq_governor=performance | >>> > > >| | mode=thread | >>> > > >| | nr_task=100% | >>> > > >| | test=open1 | >>> > > >+------------------+---------------------------------------------------------------+ >>> > >>> > Same here, open1 just calls open/close a lot. No driver core >>> > interaction at all there either. >>> > >>> > So are you _sure_ this is the offending patch? >>> >>> Hi Greg, >>> >>> We did an experiment, recovered the layout of struct device. and we >>> found the regression is gone. I guess the regession is not from the >>> patch but related to the struct layout. >>> >>> >>> tests: 1 >>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-unlink2/lkp-knm01 >>> >>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f >>> ---------------- -------------------------- >>> %stddev change %stddev >>> \ | \ >>> 237096 14% 270789 will-it-scale.workload >>> 823 14% 939 will-it-scale.per_thread_ops >>> >>> >>> tests: 1 >>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-signal1/lkp-knm01 >>> >>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f >>> ---------------- -------------------------- >>> %stddev change %stddev >>> \ | \ >>> 93.51 3% 48% 138.53 3% will-it-scale.time.user_time >>> 186 40% 261 will-it-scale.per_thread_ops >>> 53909 40% 75507 will-it-scale.workload >>> >>> >>> tests: 1 >>> testcase/path_params/tbox_group/run: will-it-scale/performance-thread-100%-open1/lkp-knm01 >>> >>> 570d0200123fb4f8 a36dc70b810afe9183de2ea18f >>> ---------------- -------------------------- >>> %stddev change %stddev >>> \ | \ >>> 447722 22% 546258 10% will-it-scale.time.involuntary_context_switches >>> 226995 19% 269751 will-it-scale.workload >>> 787 19% 936 will-it-scale.per_thread_ops >>> >>> >>> >>> commit a36dc70b810afe9183de2ea18faa4c0939c139ac >>> Author: 0day robot >>> Date: Wed Feb 20 14:21:19 2019 +0800 >>> >>> backfile klist_node in struct device for debugging >>> >>> Signed-off-by: 0day robot >>> >>> diff --git a/include/linux/device.h b/include/linux/device.h >>> index d0e452fd0bff2..31666cb72b3ba 100644 >>> --- a/include/linux/device.h >>> +++ b/include/linux/device.h >>> @@ -1035,6 +1035,7 @@ struct device { >>> spinlock_t devres_lock; >>> struct list_head devres_head; >>> >>> + struct klist_node knode_class_test_by_rongc; >>> struct class *class; >>> const struct attribute_group **groups; /* optional groups */ >> >> While this is fun to worry about alignment and structure size of 'struct >> device' I find it odd given that the syscalls and userspace load of >> those test programs have nothing to do with 'struct device' at all. >> >> So I can work on fixing up the alignment of struct device, as that's a >> nice thing to do for systems with 30k of these in memory, but that >> shouldn't affect a workload of a constant string of signal calls. > >Hi, Greg, > >I don't think this is an issues of struct device. As you said, struct >device isn't access much during test. Struct device may share slab page >with some other data structures (signal related, or fd related (as in >some other test cases)), so that the alignment of these data structures >are affected, so caused the performance regression. > I didn't get the point here neither. slab allocator ask memory from page allocator Page by Page and split the page into pre-defined size. For example, 128B, 512B... Just as shown in /proc/slabinfo. Per my understanding, each struct device / device_private will sits in its own aligned space. struct device would sits in 1K slab and struct device_private would sits in 256B slab, both before and after this patch if I am correct. Hmm... I am just curious about how this alignment is affected. Maybe I lost some point? >Best Regards, >Huang, Ying > >> thanks, >> >> greg k-h -- Wei Yang Help you, Help me