public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alexey Budankov <alexey.budankov@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>, Kan Liang <kan.liang@intel.com>,
	Dmitri Prokhorov <Dmitry.Prohorov@intel.com>,
	Valery Cherepennikov <valery.cherepennikov@intel.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Stephane Eranian <eranian@google.com>,
	David Carrillo-Cisneros <davidcc@google.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Vince Weaver <vince@deater.net>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC][PATCH] perf: Rewrite enabled/running timekeeping
Date: Tue, 5 Sep 2017 15:06:26 +0300	[thread overview]
Message-ID: <f66ce7fe-db8d-2456-5510-c99db6cdcb2a@linux.intel.com> (raw)
In-Reply-To: <9c3c1fc7-5a2e-2745-dab1-390dc7d58d50@linux.intel.com>

Hi,
On 05.09.2017 13:17, Alexey Budankov wrote:
> On 04.09.2017 18:41, Peter Zijlstra wrote:
>> On Mon, Sep 04, 2017 at 05:56:06PM +0300, Alexey Budankov wrote:
>>> On 04.09.2017 15:08, Peter Zijlstra wrote:
>>>> On Mon, Sep 04, 2017 at 01:46:45PM +0300, Alexey Budankov wrote:
>>>>>> So the below completely rewrites timekeeping (and probably breaks
>>>>>> world) but does away with the need to touch events that don't get
>>>>>> scheduled.
>>>>>
>>>>> We still need and do iterate thru all events at some points e.g. on context switches.
>>>>
>>>> Why do we _need_ to?
>>>
>>> We do so in the current implementation with several tstamp_* fields.
>>
>> Right, but we want to stop doing so asap :-)
>>
> 
> Well, I see you point :). It turns out that with straightforward timekeeping 
> we can also avoid whole tree iteration on context switches additionally to RB 
> tree based iterations and rotations on hrtimer interrupt. That brings even more 
> performance and rotation switch can be avoided.
> 
> However we can't completely get rid of whole tree iterations because of 
> inheritance code on forks in perf_event_init_context() here:
> 
> perf_event_groups_for_each(event, &parent_ctx->pinned_groups, group_node) {
> 		ret = inherit_task_group(event, parent, parent_ctx,
> 					 child, ctxn, &inherited_all);
> 		if (ret)
> 			goto out_unlock;
> }
> 
> and here:
> 
> perf_event_groups_for_each(event, &parent_ctx->flexible_groups, group_node) {
> 		ret = inherit_task_group(event, parent, parent_ctx,
> 					 child, ctxn, &inherited_all);
> 		if (ret)
> 			goto out_unlock;
> }
> 
> Below is the patch set put on top of the timekeeping rework. 
> 
> It is for tip/master branch.
> 
> ---
>  include/linux/perf_event.h |  40 +--
>  kernel/events/core.c       | 839 +++++++++++++++++++++++----------------------
>  2 files changed, 448 insertions(+), 431 deletions(-)
> 

Got this under perf_fuzzer on Xeon Phi (KNL):

[ 6614.226280] ------------[ cut here ]------------
[ 6614.226305] WARNING: CPU: 45 PID: 43385 at kernel/events/core.c:239 event_function+0xb3/0xe0
[ 6614.226310] Modules linked in: btrfs xor raid6_pq ufs hfsplus hfs minix vfat msdos fat jfs xfs reiserfs binfmt_misc xt_CHECKSUM iptable_mangle fuse ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables cmac arc4 md4 nls_utf8 nfsv3 cifs rpcsec_gss_krb5 nfsv4 nfs ccm dns_resolver fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm intel_rapl hfi1 sb_edac nfsd x86_pkg_temp_thermal intel_powerclamp rdmavt ipmi_ssif coretemp iTCO_wdt joydev ib_core crct10dif_pclmul iTCO_vendor_support crc32_pclmul ipmi_si ghash_clmulni_intel
[ 6614.226444]  auth_rpcgss ipmi_devintf mei_me tpm_tis intel_cstate mei tpm_tis_core intel_uncore pcspkr intel_rapl_perf tpm ipmi_msghandler shpchp nfs_acl lpc_ich lockd i2c_i801 wmi grace acpi_power_meter acpi_pad sunrpc mgag200 drm_kms_helper ttm drm igb crc32c_intel ptp pps_core dca i2c_algo_bit
[ 6614.226501] CPU: 45 PID: 43385 Comm: perf_fuzzer Not tainted 4.13.0-v11.1.2+ #5
[ 6614.226506] Hardware name: Intel Corporation S7200AP/S7200AP, BIOS S72C610.86B.01.01.0190.080520162104 08/05/2016
[ 6614.226511] task: ffff8d0f5866c000 task.stack: ffffa6b05aae8000
[ 6614.226517] RIP: 0010:event_function+0xb3/0xe0
[ 6614.226522] RSP: 0018:ffffa6b05aaebc30 EFLAGS: 00010087
[ 6614.226528] RAX: 0000000000000000 RBX: ffffc6b03c545a90 RCX: ffffa6b05aaebd00
[ 6614.226532] RDX: 0000000000000000 RSI: ffffa6b05aaebca0 RDI: ffffc6b03c545a98
[ 6614.226536] RBP: ffffa6b05aaebc58 R08: 000000000001f8e0 R09: ffff8d0fa99fd120
[ 6614.226540] R10: ffff8d0fbb407900 R11: 0000000000000000 R12: ffffc6b03ba05a90
[ 6614.226544] R13: 0000000000000000 R14: ffffa6b05aaebd48 R15: ffff8d0f9c48a000
[ 6614.226550] FS:  00007f05cff85740(0000) GS:ffff8d0fbc540000(0000) knlGS:0000000000000000
[ 6614.226555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6614.226559] CR2: 00000000007211c0 CR3: 0000002f269fa000 CR4: 00000000001407e0
[ 6614.226563] Call Trace:
[ 6614.226577]  remote_function+0x3b/0x50
[ 6614.226585]  generic_exec_single+0x9a/0xd0
[ 6614.226592]  smp_call_function_single+0xc8/0x100
[ 6614.226599]  cpu_function_call+0x43/0x60
[ 6614.226606]  ? cpu_clock_event_read+0x10/0x10
[ 6614.226612]  perf_event_read+0xc7/0xe0
[ 6614.226619]  ? perf_install_in_context+0xf0/0xf0
[ 6614.226625]  __perf_read_group_add+0x25/0x180
[ 6614.226632]  perf_read+0xcb/0x2b0
[ 6614.226640]  __vfs_read+0x37/0x160
[ 6614.226648]  ? security_file_permission+0x9d/0xc0
[ 6614.226655]  vfs_read+0x8c/0x130
[ 6614.226661]  SyS_read+0x55/0xc0
[ 6614.226670]  do_syscall_64+0x67/0x180
[ 6614.226678]  entry_SYSCALL64_slow_path+0x25/0x25
[ 6614.226684] RIP: 0033:0x7f05cfaad980
[ 6614.226688] RSP: 002b:00007fff1b562b48 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 6614.226694] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f05cfaad980
[ 6614.226699] RDX: 0000000000004fcf RSI: 0000000000735680 RDI: 0000000000000003
[ 6614.226703] RBP: 00007fff1b562b60 R08: 00007f05cfd800f4 R09: 00007f05cfd80140
[ 6614.226707] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401980
[ 6614.226711] R13: 00007fff1b564f60 R14: 0000000000000000 R15: 0000000000000000
[ 6614.226716] Code: e2 48 89 de 4c 89 ff 41 ff 56 08 31 c0 4d 85 ed 74 05 41 c6 45 08 00 c6 43 08 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 39 dc 74 cf <0f> ff eb cb 0f ff 0f 1f 80 00 00 00 00 e9 78 ff ff ff 0f ff 66 
[ 6614.226812] ---[ end trace ff12704813059a28 ]---

static int event_function(void *info)
{
	struct event_function_struct *efs = info;
	struct perf_event *event = efs->event;
	struct perf_event_context *ctx = event->ctx;
	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
	struct perf_event_context *task_ctx = cpuctx->task_ctx;
	int ret = 0;

	WARN_ON_ONCE(!irqs_disabled());

	perf_ctx_lock(cpuctx, task_ctx);
	/*
	 * Since we do the IPI call without holding ctx->lock things can have
	 * changed, double check we hit the task we set out to hit.
	 */
	if (ctx->task) {
		if (ctx->task != current) {
			ret = -ESRCH;
			goto unlock;
		}

		/*
		 * We only use event_function_call() on established contexts,
		 * and event_function() is only ever called when active (or
		 * rather, we'll have bailed in task_function_call() or the
		 * above ctx->task != current test), therefore we must have
		 * ctx->is_active here.
		 */
		WARN_ON_ONCE(!ctx->is_active);
		/*
		 * And since we have ctx->is_active, cpuctx->task_ctx must
		 * match.
		 */
		WARN_ON_ONCE(task_ctx != ctx);
	} else {
===>		WARN_ON_ONCE(&cpuctx->ctx != ctx);
	}

	efs->func(event, cpuctx, ctx, efs->data);
unlock:
	perf_ctx_unlock(cpuctx, task_ctx);

	return ret;
}

  parent reply	other threads:[~2017-09-05 12:06 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-02  8:11 [PATCH v6 0/3] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Alexey Budankov
2017-08-02  8:13 ` [PATCH v6 1/3] perf/core: use rb trees for pinned/flexible groups Alexey Budankov
2017-08-03 13:00   ` Peter Zijlstra
2017-08-03 20:30     ` Alexey Budankov
2017-08-04 14:36       ` Peter Zijlstra
2017-08-07  7:17         ` Alexey Budankov
2017-08-07  8:39           ` Peter Zijlstra
2017-08-07  9:13             ` Peter Zijlstra
2017-08-07 15:32               ` Alexey Budankov
2017-08-07 15:55                 ` Peter Zijlstra
2017-08-07 16:27                   ` Alexey Budankov
2017-08-07 16:57                     ` Peter Zijlstra
2017-08-07 17:39                       ` Andi Kleen
2017-08-07 18:12                         ` Peter Zijlstra
2017-08-07 18:13                       ` Alexey Budankov
2017-08-15 17:28           ` Alexey Budankov
2017-08-23 13:39             ` Alexander Shishkin
2017-08-23 14:18               ` Alexey Budankov
2017-08-29 13:51             ` Alexander Shishkin
2017-08-30  8:30               ` Alexey Budankov
2017-08-30 10:18                 ` Alexander Shishkin
2017-08-30 10:30                   ` Alexey Budankov
2017-08-30 11:13                     ` Alexander Shishkin
2017-08-30 11:16                 ` Alexey Budankov
2017-08-31 10:12                   ` Alexey Budankov
2017-08-31 10:12             ` Alexey Budankov
2017-08-04 14:53       ` Peter Zijlstra
2017-08-07 15:22         ` Alexey Budankov
2017-08-02  8:15 ` [PATCH v6 2/3]: perf/core: use context tstamp_data for skipped events on mux interrupt Alexey Budankov
2017-08-03 13:04   ` Peter Zijlstra
2017-08-03 14:00   ` Peter Zijlstra
2017-08-03 15:58     ` Alexey Budankov
2017-08-04 12:36       ` Peter Zijlstra
2017-08-03 15:00   ` Peter Zijlstra
2017-08-03 18:47     ` Alexey Budankov
2017-08-04 12:35       ` Peter Zijlstra
2017-08-04 12:51         ` Peter Zijlstra
2017-08-04 14:25           ` Alexey Budankov
2017-08-04 14:23         ` Alexey Budankov
2017-08-10 15:57     ` Alexey Budankov
2017-08-22 20:47       ` Peter Zijlstra
2017-08-23  8:54         ` Alexey Budankov
2017-08-31 17:18           ` [RFC][PATCH] perf: Rewrite enabled/running timekeeping Peter Zijlstra
2017-08-31 19:51             ` Stephane Eranian
2017-09-05  7:51               ` Stephane Eranian
2017-09-05  9:44                 ` Peter Zijlstra
2017-09-01 10:45             ` Alexey Budankov
2017-09-01 12:31               ` Peter Zijlstra
2017-09-01 11:17             ` Alexey Budankov
2017-09-01 12:42               ` Peter Zijlstra
2017-09-01 21:03             ` Vince Weaver
2017-09-04 10:46             ` Alexey Budankov
2017-09-04 12:08               ` Peter Zijlstra
2017-09-04 14:56                 ` Alexey Budankov
2017-09-04 15:41                   ` Peter Zijlstra
2017-09-04 15:58                     ` Peter Zijlstra
2017-09-05 10:17                     ` Alexey Budankov
2017-09-05 11:19                       ` Peter Zijlstra
2017-09-11  6:55                         ` Alexey Budankov
2017-09-05 12:06                       ` Alexey Budankov [this message]
2017-09-05 12:59                         ` Peter Zijlstra
2017-09-05 16:03                         ` Peter Zijlstra
2017-09-06 13:48                           ` Alexey Budankov
2017-09-08  8:47                           ` Alexey Budankov
2018-03-12 17:43                             ` [tip:perf/core] perf/cor: Use RB trees for pinned/flexible groups tip-bot for Alexey Budankov
2017-08-02  8:16 ` [PATCH v6 3/3]: perf/core: add mux switch to skip to the current CPU's events list on mux interrupt Alexey Budankov
2017-08-18  5:17 ` [PATCH v7 0/2] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Alexey Budankov
2017-08-18  5:21   ` [PATCH v7 1/2] perf/core: use rb trees for pinned/flexible groups Alexey Budankov
2017-08-23 11:17     ` Alexander Shishkin
2017-08-23 17:23       ` Alexey Budankov
2017-08-18  5:22   ` [PATCH v7 2/2] perf/core: add mux switch to skip to the current CPU's events list on mux interrupt Alexey Budankov
2017-08-23 11:54     ` Alexander Shishkin
2017-08-23 18:12       ` Alexey Budankov
2017-08-22 20:21   ` [PATCH v7 0/2] perf/core: addressing 4x slowdown during per-process profiling of STREAM benchmark on Intel Xeon Phi Peter Zijlstra
2017-08-23  8:54     ` Alexey Budankov
2017-08-31 10:12     ` Alexey Budankov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f66ce7fe-db8d-2456-5510-c99db6cdcb2a@linux.intel.com \
    --to=alexey.budankov@linux.intel.com \
    --cc=Dmitry.Prohorov@intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=davidcc@google.com \
    --cc=eranian@google.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=valery.cherepennikov@intel.com \
    --cc=vince@deater.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox