From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD1EFC4320D for ; Wed, 25 Sep 2019 09:01:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A3EC220673 for ; Wed, 25 Sep 2019 09:01:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727215AbfIYJBB (ORCPT ); Wed, 25 Sep 2019 05:01:01 -0400 Received: from mga02.intel.com ([134.134.136.20]:21106 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725808AbfIYJAM (ORCPT ); Wed, 25 Sep 2019 05:00:12 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Sep 2019 02:00:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,547,1559545200"; d="scan'208";a="213976110" Received: from xingzhen-mobl1.ccr.corp.intel.com (HELO [10.255.29.233]) ([10.255.29.233]) by fmsmga004.fm.intel.com with ESMTP; 25 Sep 2019 02:00:04 -0700 Subject: Re: [LKP] [SUNRPC] 0472e47660: fsmark.app_overhead 16.0% regression From: Xing Zhengjun To: Trond Myklebust , "rong.a.chen@intel.com" Cc: "torvalds@linux-foundation.org" , "lkp@01.org" , "linux-kernel@vger.kernel.org" References: <20190520055434.GZ31424@shao2-debian> <9a07c589f955e5af5acc0fa09a16a3256089e764.camel@hammerspace.com> <9753a9a4a82943f6aacc2bfb0f93efc5f96bcaa5.camel@hammerspace.com> <2bbe636a-14f1-4592-d1f9-a9f765a02939@linux.intel.com> <81fb0e7d-1879-9267-83da-4671fec50920@linux.intel.com> <491bd283-f607-3111-32ae-07294eda123d@linux.intel.com> <081447bc-69c5-aa45-8f85-29add0b83c15@linux.intel.com> Message-ID: <17e6b70e-7148-73a0-045c-a74d0d2795ad@linux.intel.com> Date: Wed, 25 Sep 2019 17:00:03 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/30/2019 8:43 AM, Xing Zhengjun wrote: > > > On 8/7/2019 3:56 PM, Xing Zhengjun wrote: >> >> >> On 7/24/2019 1:17 PM, Xing Zhengjun wrote: >>> >>> >>> On 7/12/2019 2:42 PM, Xing Zhengjun wrote: >>>> Hi Trond, >>>> >>>>      I attached perf-profile part big changes, hope it is useful for >>>> analyzing the issue. >>> >>> Ping... >> >> ping... >> > ping... ping... >>> >>>> >>>> >>>> In testcase: fsmark >>>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ >>>> 3.00GHz with 384G memory >>>> with following parameters: >>>> >>>>          iterations: 20x >>>>          nr_threads: 64t >>>>          disk: 1BRD_48G >>>>          fs: xfs >>>>          fs2: nfsv4 >>>>          filesize: 4M >>>>          test_size: 80G >>>>          sync_method: fsyncBeforeClose >>>>          cpufreq_governor: performance >>>> >>>> test-description: The fsmark is a file system benchmark to test >>>> synchronous write workloads, for example, mail servers workload. >>>> test-url: https://sourceforge.net/projects/fsmark/ >>>> >>>> commit: >>>>    e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()") >>>>    0472e47660 ("SUNRPC: Convert socket page send code to use >>>> iov_iter()") >>>> >>>> e791f8e9380d945e 0472e476604998c127f3c80d291 >>>> ---------------- --------------------------- >>>>           %stddev     %change         %stddev >>>>               \          |                \ >>>>      527.29           -22.6%     407.96        fsmark.files_per_sec >>>>        1.97 ± 11%      +0.9        2.88 ±  4% >>>> perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry >>>> >>>>        0.00            +0.9        0.93 ±  4% >>>> perf-profile.calltrace.cycles-pp.tcp_write_xmit.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages >>>> >>>>        2.11 ± 10%      +0.9        3.05 ±  4% >>>> perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary >>>> >>>>        5.29 ±  2%      +1.2        6.46 ±  7% >>>> perf-profile.calltrace.cycles-pp.svc_recv.nfsd.kthread.ret_from_fork >>>>        9.61 ±  5%      +3.1       12.70 ±  2% >>>> perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork >>>>        9.27 ±  5%      +3.1       12.40 ±  2% >>>> perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork >>>> >>>>       34.52 ±  4%      +3.3       37.78 ±  2% >>>> perf-profile.calltrace.cycles-pp.ret_from_fork >>>>       34.52 ±  4%      +3.3       37.78 ±  2% >>>> perf-profile.calltrace.cycles-pp.kthread.ret_from_fork >>>>        0.00            +3.4        3.41 ±  4% >>>> perf-profile.calltrace.cycles-pp.memcpy_erms.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg >>>> >>>>        0.00            +3.4        3.44 ±  4% >>>> perf-profile.calltrace.cycles-pp.memcpy_from_page._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg >>>> >>>>        0.00            +3.5        3.54 ±  4% >>>> perf-profile.calltrace.cycles-pp._copy_from_iter_full.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages >>>> >>>>        2.30 ±  5%      +3.7        6.02 ±  3% >>>> perf-profile.calltrace.cycles-pp.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread.kthread >>>> >>>>        2.30 ±  5%      +3.7        6.02 ±  3% >>>> perf-profile.calltrace.cycles-pp.rpc_async_schedule.process_one_work.worker_thread.kthread.ret_from_fork >>>> >>>>        1.81 ±  4%      +3.8        5.59 ±  4% >>>> perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread >>>> >>>>        1.80 ±  3%      +3.8        5.59 ±  3% >>>> perf-profile.calltrace.cycles-pp.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work >>>> >>>>        1.73 ±  4%      +3.8        5.54 ±  4% >>>> perf-profile.calltrace.cycles-pp.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute.rpc_async_schedule >>>> >>>>        1.72 ±  4%      +3.8        5.54 ±  4% >>>> perf-profile.calltrace.cycles-pp.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit.__rpc_execute >>>> >>>>        0.00            +5.4        5.42 ±  4% >>>> perf-profile.calltrace.cycles-pp.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request >>>> >>>>        0.00            +5.5        5.52 ±  4% >>>> perf-profile.calltrace.cycles-pp.tcp_sendmsg.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit >>>> >>>>        0.00            +5.5        5.53 ±  4% >>>> perf-profile.calltrace.cycles-pp.sock_sendmsg.xs_sendpages.xs_tcp_send_request.xprt_transmit.call_transmit >>>> >>>>        9.61 ±  5%      +3.1       12.70 ±  2% >>>> perf-profile.children.cycles-pp.worker_thread >>>>        9.27 ±  5%      +3.1       12.40 ±  2% >>>> perf-profile.children.cycles-pp.process_one_work >>>>        6.19            +3.2        9.40 ±  4% >>>> perf-profile.children.cycles-pp.memcpy_erms >>>>       34.53 ±  4%      +3.3       37.78 ±  2% >>>> perf-profile.children.cycles-pp.ret_from_fork >>>>       34.52 ±  4%      +3.3       37.78 ±  2% >>>> perf-profile.children.cycles-pp.kthread >>>>        0.00            +3.5        3.46 ±  4% >>>> perf-profile.children.cycles-pp.memcpy_from_page >>>>        0.00            +3.6        3.56 ±  4% >>>> perf-profile.children.cycles-pp._copy_from_iter_full >>>>        2.47 ±  4%      +3.7        6.18 ±  3% >>>> perf-profile.children.cycles-pp.__rpc_execute >>>>        2.30 ±  5%      +3.7        6.02 ±  3% >>>> perf-profile.children.cycles-pp.rpc_async_schedule >>>>        1.90 ±  4%      +3.8        5.67 ±  3% >>>> perf-profile.children.cycles-pp.call_transmit >>>>        1.89 ±  3%      +3.8        5.66 ±  3% >>>> perf-profile.children.cycles-pp.xprt_transmit >>>>        1.82 ±  4%      +3.8        5.62 ±  3% >>>> perf-profile.children.cycles-pp.xs_tcp_send_request >>>>        1.81 ±  4%      +3.8        5.62 ±  3% >>>> perf-profile.children.cycles-pp.xs_sendpages >>>>        0.21 ± 17%      +5.3        5.48 ±  4% >>>> perf-profile.children.cycles-pp.tcp_sendmsg_locked >>>>        0.25 ± 18%      +5.3        5.59 ±  3% >>>> perf-profile.children.cycles-pp.tcp_sendmsg >>>>        0.26 ± 16%      +5.3        5.60 ±  3% >>>> perf-profile.children.cycles-pp.sock_sendmsg >>>>        1.19 ±  5%      +0.5        1.68 ±  3% >>>> perf-profile.self.cycles-pp.get_page_from_freelist >>>>        6.10            +3.2        9.27 ±  4% >>>> perf-profile.self.cycles-pp.memcpy_erms >>>> >>>> >>>> On 7/9/2019 10:39 AM, Xing Zhengjun wrote: >>>>> Hi Trond, >>>>> >>>>> On 7/8/2019 7:44 PM, Trond Myklebust wrote: >>>>>> I've asked several times now about how to interpret your results. >>>>>> As far as I can tell from your numbers, the overhead appears to be >>>>>> entirely contained in the NUMA section of your results. >>>>>> IOW: it would appear to be a scheduling overhead due to NUMA. I've >>>>>> been asking whether or not that is a correct interpretation of the >>>>>> numbers you published. >>>>> Thanks for your feedback. I used the same hardware and the same >>>>> test parameters to test the two commits: >>>>>     e791f8e938 ("SUNRPC: Convert xs_send_kvec() to use >>>>> iov_iter_kvec()") >>>>>     0472e47660 ("SUNRPC: Convert socket page send code to use >>>>> iov_iter()") >>>>> >>>>> If it is caused by NUMA, why only commit 0472e47660 throughput is >>>>> decreased? The filesystem we test is NFS, commit 0472e47660 is >>>>> related with the network, could you help to check if have any other >>>>> clues for the regression. Thanks. >>>>> >>>> >>> >> > -- Zhengjun Xing