From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752659AbbCYMWk (ORCPT ); Wed, 25 Mar 2015 08:22:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46585 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752421AbbCYMWj (ORCPT ); Wed, 25 Mar 2015 08:22:39 -0400 Message-ID: <5512A88A.1010608@redhat.com> Date: Wed, 25 Mar 2015 08:22:34 -0400 From: Joe Mario User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: David Ahern , Don Zickus CC: acme@kernel.org, linux-kernel@vger.kernel.org, Jiri Olsa Subject: Re: [PATCH] perf tool: Fix ppid for synthesized fork events References: <1426786875-18025-1-git-send-email-dsahern@gmail.com> <20150319205648.GC199787@redhat.com> <550B3A66.2030902@gmail.com> <20150324201020.GH199787@redhat.com> <5511D349.7070508@gmail.com> In-Reply-To: <5511D349.7070508@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/24/2015 05:12 PM, David Ahern wrote: > On 3/24/15 2:10 PM, Don Zickus wrote: >> He does this with and without the patch. The difference is usually over 50% >> extra time with the patch for both the record timings and report timings.:-( > > I find that shocking. The patch only populates ppid and ptid with a value read from the file that is already opened and processed. Most of the patch is just plumbing the value from the low level function that processes the status file back to synthesize_fork. > > What benchmark is this? Is it something I can download and run? if so, details? site, command, args? > > Thanks, > David Hi David: We ran "time perf mem record -a -e cpu/mem-loads,ldlat=50/pp -e cpu/mem-stores/pp sleep 10" on a system that was running SPECjbb2013 in the background. There were about 10,000 java threads with about 500 to 800 in a runnable state at any given time. We ran it on a 4 socket x86 IVB server. We had two perf binaries. One with your patch and one without it. Because the benchmark doesn't always have a constant load, we ran the above perf command in a loop alternating between the patched and unpatched version. The elapsed wall clock times ("real" field from time) for the perf with your patch was typically >= 50% longer than the equivalent unpatched perf. We can't give out a copy of the SPEC benchmark (part of the SPEC agreement). Try to find some application (or create your own) to load the system with lots of busy threads. Joe