From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F30FE24C068; Tue, 20 May 2025 16:09:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.10 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747757351; cv=none; b=cpBcZGk/ota5sNwEVbiO5kTlRM6kNNLBTgvlVxX9Nb/2C6tyB8/2M+mwN5xW1H2GJ4+gvEJ2Fj0G/1kgYaDSkTlt/hokhp6ZqOVSdNOeMVKH3r3jHexmM/IBrEmIdLL0JMRtYlhvlP7xFoHjzhgDTzNxzDcWBhi6PHiYpIcbmrU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747757351; c=relaxed/simple; bh=AeVud6YEDQdemd6oX8m5m+S+pVIKa7Kx3YBOVTpKFqo=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=AOlQmvzs7zCrRdWtBLWDgQOeskSOA523FIFXtq/z6AcgVX8hq8yb19JwSUa5564CErzPu6yX3y+/xVJmEXVXqCbvqGFaHZK1uegFc8n8Yc0bp2Ud9n+2rRb5O4pAvOZnVeyQc/5pip0YIuEGt4ysRGQE8rXNovkXq+TueW+hK0Y= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZgD7zX6y; arc=none smtp.client-ip=192.198.163.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZgD7zX6y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1747757349; x=1779293349; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=AeVud6YEDQdemd6oX8m5m+S+pVIKa7Kx3YBOVTpKFqo=; b=ZgD7zX6yqKGbRy3oHs78gwahd11dD7JcFRVaQT2ilFPHojA/efJEO8V6 DEoix8V3nJeyFhBAIqbM/1zwjWNua8IM2cqUz0SkWRwoUpzH4iUyI8Ltx 0gUVlHQ/JJCR2zkhdi/aysLYZDf+cUQF2YxKWfW66bG7kvmzjrN4jpzxW PElnRY+LOGNOWTw9xAfsd5LAYqCgGrXLyuTtycPlV0+GhtClimaEeND9t uIuy0ExYHkyQLLfPBrGBl+HUEZaS3gTSmkx2kV1kIM7aWnXA7tObktF7Y XZq/mBv85KWcFnZYPnZcrjH1E3MQn9J1GM0bHHBlbrGW57YJvuVi0d5er w==; X-CSE-ConnectionGUID: aEg4PXnBQOahIHR/n33Njg== X-CSE-MsgGUID: CmgAmiIOTYiLjFwkpD25cg== X-IronPort-AV: E=McAfee;i="6700,10204,11439"; a="61043549" X-IronPort-AV: E=Sophos;i="6.15,302,1739865600"; d="scan'208";a="61043549" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2025 09:09:08 -0700 X-CSE-ConnectionGUID: YEEdc5hAQTeU1HjQFrvCqg== X-CSE-MsgGUID: 79azw0SoTVqoAAoDJAPJCw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.15,302,1739865600"; d="scan'208";a="144995610" Received: from linux.intel.com ([10.54.29.200]) by fmviesa004.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2025 09:09:07 -0700 Received: from [10.246.136.52] (kliang2-mobl1.ccr.corp.intel.com [10.246.136.52]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by linux.intel.com (Postfix) with ESMTPS id 7C0C620B5736; Tue, 20 May 2025 09:11:45 -0700 (PDT) Message-ID: <09ed8cb5-707d-4b13-b230-cff4fab02b72@linux.intel.com> Date: Tue, 20 May 2025 12:09:02 -0400 Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 04/38] perf: Add a EVENT_GUEST flag To: Namhyung Kim , Mingwei Zhang Cc: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Sean Christopherson , Paolo Bonzini , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Liang@google.com, "H. Peter Anvin" , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, Yongwei Ma , Xiong Zhang , Dapeng Mi , Jim Mattson , Sandipan Das , Zide Chen , Eranian Stephane , Shukla Manali , Nikunj Dadhania References: <20250324173121.1275209-1-mizhang@google.com> <20250324173121.1275209-5-mizhang@google.com> Content-Language: en-US From: "Liang, Kan" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 2025-05-19 2:58 a.m., Namhyung Kim wrote: > Hello, > > On Mon, Mar 24, 2025 at 05:30:44PM +0000, Mingwei Zhang wrote: >> From: Kan Liang >> >> Current perf doesn't explicitly schedule out all exclude_guest events >> while the guest is running. There is no problem with the current >> emulated vPMU. Because perf owns all the PMU counters. It can mask the >> counter which is assigned to an exclude_guest event when a guest is >> running (Intel way), or set the corresponding HOSTONLY bit in evsentsel >> (AMD way). The counter doesn't count when a guest is running. >> >> However, either way doesn't work with the introduced passthrough vPMU. >> A guest owns all the PMU counters when it's running. The host should not >> mask any counters. The counter may be used by the guest. The evsentsel >> may be overwritten. >> >> Perf should explicitly schedule out all exclude_guest events to release >> the PMU resources when entering a guest, and resume the counting when >> exiting the guest. >> >> It's possible that an exclude_guest event is created when a guest is >> running. The new event should not be scheduled in as well. >> >> The ctx time is shared among different PMUs. The time cannot be stopped >> when a guest is running. It is required to calculate the time for events >> from other PMUs, e.g., uncore events. Add timeguest to track the guest >> run time. For an exclude_guest event, the elapsed time equals >> the ctx time - guest time. >> Cgroup has dedicated times. Use the same method to deduct the guest time >> from the cgroup time as well. >> >> Co-developed-by: Peter Zijlstra (Intel) >> Signed-off-by: Peter Zijlstra (Intel) >> Signed-off-by: Kan Liang >> Signed-off-by: Mingwei Zhang >> --- >> include/linux/perf_event.h | 6 ++ >> kernel/events/core.c | 209 +++++++++++++++++++++++++++++-------- >> 2 files changed, 169 insertions(+), 46 deletions(-) >> >> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h >> index a2fd1bdc955c..7bda1e20be12 100644 >> --- a/include/linux/perf_event.h >> +++ b/include/linux/perf_event.h >> @@ -999,6 +999,11 @@ struct perf_event_context { >> */ >> struct perf_time_ctx time; >> >> + /* >> + * Context clock, runs when in the guest mode. >> + */ >> + struct perf_time_ctx timeguest; > > Why not make it an array as you use it later? Do you mean struct perf_time_ctx times[2]? I don't see a big benefit of using times[T_GUEST] VS.timeguest. > >> + >> /* >> * These fields let us detect when two contexts have both >> * been cloned (inherited) from a common ancestor. >> @@ -1089,6 +1094,7 @@ struct bpf_perf_event_data_kern { >> */ >> struct perf_cgroup_info { >> struct perf_time_ctx time; >> + struct perf_time_ctx timeguest; >> int active; >> }; >> >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index e38c8b5e8086..7a2115b2c5c1 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -163,7 +163,8 @@ enum event_type_t { >> /* see ctx_resched() for details */ >> EVENT_CPU = 0x10, >> EVENT_CGROUP = 0x20, >> - EVENT_FLAGS = EVENT_CGROUP, >> + EVENT_GUEST = 0x40, > > It's not clear to me if this flag is for events to include guests or > exclude them. Can you please add a comment? > /* * There are guest events. The for_each_epc() iteration can * skip those PMUs which doesn't support guest events via the * MEDIATED_VPMU. It is also used to indicate the start/end of * guest events to calculate the guest running time. */ Thanks, Kan > Thanks, > Namhyung > > >> + EVENT_FLAGS = EVENT_CGROUP | EVENT_GUEST, >> /* compound helpers */ >> EVENT_ALL = EVENT_FLEXIBLE | EVENT_PINNED, >> EVENT_TIME_FROZEN = EVENT_TIME | EVENT_FROZEN, >> @@ -435,6 +436,7 @@ static atomic_t nr_include_guest_events __read_mostly; >> >> static atomic_t nr_mediated_pmu_vms; >> static DEFINE_MUTEX(perf_mediated_pmu_mutex); >> +static DEFINE_PER_CPU(bool, perf_in_guest); >> >> /* !exclude_guest event of PMU with PERF_PMU_CAP_MEDIATED_VPMU */ >> static inline bool is_include_guest_event(struct perf_event *event) >> @@ -738,6 +740,9 @@ static bool perf_skip_pmu_ctx(struct perf_event_pmu_context *pmu_ctx, >> { >> if ((event_type & EVENT_CGROUP) && !pmu_ctx->nr_cgroups) >> return true; >> + if ((event_type & EVENT_GUEST) && >> + !(pmu_ctx->pmu->capabilities & PERF_PMU_CAP_MEDIATED_VPMU)) >> + return true; >> return false; >> } >> >> @@ -788,6 +793,39 @@ static inline void update_perf_time_ctx(struct perf_time_ctx *time, u64 now, boo >> WRITE_ONCE(time->offset, time->time - time->stamp); >> } >> >> +static_assert(offsetof(struct perf_event_context, timeguest) - >> + offsetof(struct perf_event_context, time) == >> + sizeof(struct perf_time_ctx)); >> + >> +#define T_TOTAL 0 >> +#define T_GUEST 1 >> + >> +static inline u64 __perf_event_time_ctx(struct perf_event *event, >> + struct perf_time_ctx *times) >> +{ >> + u64 time = times[T_TOTAL].time; >> + >> + if (event->attr.exclude_guest) >> + time -= times[T_GUEST].time; >> + >> + return time; >> +} >> + >> +static inline u64 __perf_event_time_ctx_now(struct perf_event *event, >> + struct perf_time_ctx *times, >> + u64 now) >> +{ >> + if (event->attr.exclude_guest && __this_cpu_read(perf_in_guest)) { >> + /* >> + * (now + times[total].offset) - (now + times[guest].offset) := >> + * times[total].offset - times[guest].offset >> + */ >> + return READ_ONCE(times[T_TOTAL].offset) - READ_ONCE(times[T_GUEST].offset); >> + } >> + >> + return now + READ_ONCE(times[T_TOTAL].offset); >> +} >> + >> #ifdef CONFIG_CGROUP_PERF >> >> static inline bool >> @@ -824,12 +862,16 @@ static inline int is_cgroup_event(struct perf_event *event) >> return event->cgrp != NULL; >> } >> >> +static_assert(offsetof(struct perf_cgroup_info, timeguest) - >> + offsetof(struct perf_cgroup_info, time) == >> + sizeof(struct perf_time_ctx)); >> + >> static inline u64 perf_cgroup_event_time(struct perf_event *event) >> { >> struct perf_cgroup_info *t; >> >> t = per_cpu_ptr(event->cgrp->info, event->cpu); >> - return t->time.time; >> + return __perf_event_time_ctx(event, &t->time); >> } >> >> static inline u64 perf_cgroup_event_time_now(struct perf_event *event, u64 now) >> @@ -838,9 +880,21 @@ static inline u64 perf_cgroup_event_time_now(struct perf_event *event, u64 now) >> >> t = per_cpu_ptr(event->cgrp->info, event->cpu); >> if (!__load_acquire(&t->active)) >> - return t->time.time; >> - now += READ_ONCE(t->time.offset); >> - return now; >> + return __perf_event_time_ctx(event, &t->time); >> + >> + return __perf_event_time_ctx_now(event, &t->time, now); >> +} >> + >> +static inline void __update_cgrp_guest_time(struct perf_cgroup_info *info, u64 now, bool adv) >> +{ >> + update_perf_time_ctx(&info->timeguest, now, adv); >> +} >> + >> +static inline void update_cgrp_time(struct perf_cgroup_info *info, u64 now) >> +{ >> + update_perf_time_ctx(&info->time, now, true); >> + if (__this_cpu_read(perf_in_guest)) >> + __update_cgrp_guest_time(info, now, true); >> } >> >> static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx, bool final) >> @@ -856,7 +910,7 @@ static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx, >> cgrp = container_of(css, struct perf_cgroup, css); >> info = this_cpu_ptr(cgrp->info); >> >> - update_perf_time_ctx(&info->time, now, true); >> + update_cgrp_time(info, now); >> if (final) >> __store_release(&info->active, 0); >> } >> @@ -879,11 +933,11 @@ static inline void update_cgrp_time_from_event(struct perf_event *event) >> * Do not update time when cgroup is not active >> */ >> if (info->active) >> - update_perf_time_ctx(&info->time, perf_clock(), true); >> + update_cgrp_time(info, perf_clock()); >> } >> >> static inline void >> -perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx) >> +perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx, bool guest) >> { >> struct perf_event_context *ctx = &cpuctx->ctx; >> struct perf_cgroup *cgrp = cpuctx->cgrp; >> @@ -903,8 +957,12 @@ perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx) >> for (css = &cgrp->css; css; css = css->parent) { >> cgrp = container_of(css, struct perf_cgroup, css); >> info = this_cpu_ptr(cgrp->info); >> - update_perf_time_ctx(&info->time, ctx->time.stamp, false); >> - __store_release(&info->active, 1); >> + if (guest) { >> + __update_cgrp_guest_time(info, ctx->time.stamp, false); >> + } else { >> + update_perf_time_ctx(&info->time, ctx->time.stamp, false); >> + __store_release(&info->active, 1); >> + } >> } >> } >> >> @@ -1104,7 +1162,7 @@ static inline int perf_cgroup_connect(pid_t pid, struct perf_event *event, >> } >> >> static inline void >> -perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx) >> +perf_cgroup_set_timestamp(struct perf_cpu_context *cpuctx, bool guest) >> { >> } >> >> @@ -1514,16 +1572,24 @@ static void perf_unpin_context(struct perf_event_context *ctx) >> */ >> static void __update_context_time(struct perf_event_context *ctx, bool adv) >> { >> - u64 now = perf_clock(); >> + lockdep_assert_held(&ctx->lock); >> + >> + update_perf_time_ctx(&ctx->time, perf_clock(), adv); >> +} >> >> +static void __update_context_guest_time(struct perf_event_context *ctx, bool adv) >> +{ >> lockdep_assert_held(&ctx->lock); >> >> - update_perf_time_ctx(&ctx->time, now, adv); >> + /* must be called after __update_context_time(); */ >> + update_perf_time_ctx(&ctx->timeguest, ctx->time.stamp, adv); >> } >> >> static void update_context_time(struct perf_event_context *ctx) >> { >> __update_context_time(ctx, true); >> + if (__this_cpu_read(perf_in_guest)) >> + __update_context_guest_time(ctx, true); >> } >> >> static u64 perf_event_time(struct perf_event *event) >> @@ -1536,7 +1602,7 @@ static u64 perf_event_time(struct perf_event *event) >> if (is_cgroup_event(event)) >> return perf_cgroup_event_time(event); >> >> - return ctx->time.time; >> + return __perf_event_time_ctx(event, &ctx->time); >> } >> >> static u64 perf_event_time_now(struct perf_event *event, u64 now) >> @@ -1550,10 +1616,9 @@ static u64 perf_event_time_now(struct perf_event *event, u64 now) >> return perf_cgroup_event_time_now(event, now); >> >> if (!(__load_acquire(&ctx->is_active) & EVENT_TIME)) >> - return ctx->time.time; >> + return __perf_event_time_ctx(event, &ctx->time); >> >> - now += READ_ONCE(ctx->time.offset); >> - return now; >> + return __perf_event_time_ctx_now(event, &ctx->time, now); >> } >> >> static enum event_type_t get_event_type(struct perf_event *event) >> @@ -2384,20 +2449,23 @@ group_sched_out(struct perf_event *group_event, struct perf_event_context *ctx) >> } >> >> static inline void >> -__ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx, bool final) >> +__ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx, >> + bool final, enum event_type_t event_type) >> { >> if (ctx->is_active & EVENT_TIME) { >> if (ctx->is_active & EVENT_FROZEN) >> return; >> + >> update_context_time(ctx); >> - update_cgrp_time_from_cpuctx(cpuctx, final); >> + /* vPMU should not stop time */ >> + update_cgrp_time_from_cpuctx(cpuctx, !(event_type & EVENT_GUEST) && final); >> } >> } >> >> static inline void >> ctx_time_update(struct perf_cpu_context *cpuctx, struct perf_event_context *ctx) >> { >> - __ctx_time_update(cpuctx, ctx, false); >> + __ctx_time_update(cpuctx, ctx, false, 0); >> } >> >> /* >> @@ -3405,7 +3473,7 @@ ctx_sched_out(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t >> * >> * would only update time for the pinned events. >> */ >> - __ctx_time_update(cpuctx, ctx, ctx == &cpuctx->ctx); >> + __ctx_time_update(cpuctx, ctx, ctx == &cpuctx->ctx, event_type); >> >> /* >> * CPU-release for the below ->is_active store, >> @@ -3431,7 +3499,18 @@ ctx_sched_out(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t >> cpuctx->task_ctx = NULL; >> } >> >> - is_active ^= ctx->is_active; /* changed bits */ >> + if (event_type & EVENT_GUEST) { >> + /* >> + * Schedule out all exclude_guest events of PMU >> + * with PERF_PMU_CAP_MEDIATED_VPMU. >> + */ >> + is_active = EVENT_ALL; >> + __update_context_guest_time(ctx, false); >> + perf_cgroup_set_timestamp(cpuctx, true); >> + barrier(); >> + } else { >> + is_active ^= ctx->is_active; /* changed bits */ >> + } >> >> for_each_epc(pmu_ctx, ctx, pmu, event_type) >> __pmu_ctx_sched_out(pmu_ctx, is_active); >> @@ -3926,10 +4005,15 @@ static inline void group_update_userpage(struct perf_event *group_event) >> event_update_userpage(event); >> } >> >> +struct merge_sched_data { >> + int can_add_hw; >> + enum event_type_t event_type; >> +}; >> + >> static int merge_sched_in(struct perf_event *event, void *data) >> { >> struct perf_event_context *ctx = event->ctx; >> - int *can_add_hw = data; >> + struct merge_sched_data *msd = data; >> >> if (event->state <= PERF_EVENT_STATE_OFF) >> return 0; >> @@ -3937,13 +4021,22 @@ static int merge_sched_in(struct perf_event *event, void *data) >> if (!event_filter_match(event)) >> return 0; >> >> - if (group_can_go_on(event, *can_add_hw)) { >> + /* >> + * Don't schedule in any host events from PMU with >> + * PERF_PMU_CAP_MEDIATED_VPMU, while a guest is running. >> + */ >> + if (__this_cpu_read(perf_in_guest) && >> + event->pmu_ctx->pmu->capabilities & PERF_PMU_CAP_MEDIATED_VPMU && >> + !(msd->event_type & EVENT_GUEST)) >> + return 0; >> + >> + if (group_can_go_on(event, msd->can_add_hw)) { >> if (!group_sched_in(event, ctx)) >> list_add_tail(&event->active_list, get_event_list(event)); >> } >> >> if (event->state == PERF_EVENT_STATE_INACTIVE) { >> - *can_add_hw = 0; >> + msd->can_add_hw = 0; >> if (event->attr.pinned) { >> perf_cgroup_event_disable(event, ctx); >> perf_event_set_state(event, PERF_EVENT_STATE_ERROR); >> @@ -3962,11 +4055,15 @@ static int merge_sched_in(struct perf_event *event, void *data) >> >> static void pmu_groups_sched_in(struct perf_event_context *ctx, >> struct perf_event_groups *groups, >> - struct pmu *pmu) >> + struct pmu *pmu, >> + enum event_type_t event_type) >> { >> - int can_add_hw = 1; >> + struct merge_sched_data msd = { >> + .can_add_hw = 1, >> + .event_type = event_type, >> + }; >> visit_groups_merge(ctx, groups, smp_processor_id(), pmu, >> - merge_sched_in, &can_add_hw); >> + merge_sched_in, &msd); >> } >> >> static void __pmu_ctx_sched_in(struct perf_event_pmu_context *pmu_ctx, >> @@ -3975,9 +4072,9 @@ static void __pmu_ctx_sched_in(struct perf_event_pmu_context *pmu_ctx, >> struct perf_event_context *ctx = pmu_ctx->ctx; >> >> if (event_type & EVENT_PINNED) >> - pmu_groups_sched_in(ctx, &ctx->pinned_groups, pmu_ctx->pmu); >> + pmu_groups_sched_in(ctx, &ctx->pinned_groups, pmu_ctx->pmu, event_type); >> if (event_type & EVENT_FLEXIBLE) >> - pmu_groups_sched_in(ctx, &ctx->flexible_groups, pmu_ctx->pmu); >> + pmu_groups_sched_in(ctx, &ctx->flexible_groups, pmu_ctx->pmu, event_type); >> } >> >> static void >> @@ -3994,9 +4091,11 @@ ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t >> return; >> >> if (!(is_active & EVENT_TIME)) { >> + /* EVENT_TIME should be active while the guest runs */ >> + WARN_ON_ONCE(event_type & EVENT_GUEST); >> /* start ctx time */ >> __update_context_time(ctx, false); >> - perf_cgroup_set_timestamp(cpuctx); >> + perf_cgroup_set_timestamp(cpuctx, false); >> /* >> * CPU-release for the below ->is_active store, >> * see __load_acquire() in perf_event_time_now() >> @@ -4012,7 +4111,23 @@ ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t >> WARN_ON_ONCE(cpuctx->task_ctx != ctx); >> } >> >> - is_active ^= ctx->is_active; /* changed bits */ >> + if (event_type & EVENT_GUEST) { >> + /* >> + * Schedule in the required exclude_guest events of PMU >> + * with PERF_PMU_CAP_MEDIATED_VPMU. >> + */ >> + is_active = event_type & EVENT_ALL; >> + >> + /* >> + * Update ctx time to set the new start time for >> + * the exclude_guest events. >> + */ >> + update_context_time(ctx); >> + update_cgrp_time_from_cpuctx(cpuctx, false); >> + barrier(); >> + } else { >> + is_active ^= ctx->is_active; /* changed bits */ >> + } >> >> /* >> * First go through the list and put on any pinned groups >> @@ -4020,13 +4135,13 @@ ctx_sched_in(struct perf_event_context *ctx, struct pmu *pmu, enum event_type_t >> */ >> if (is_active & EVENT_PINNED) { >> for_each_epc(pmu_ctx, ctx, pmu, event_type) >> - __pmu_ctx_sched_in(pmu_ctx, EVENT_PINNED); >> + __pmu_ctx_sched_in(pmu_ctx, EVENT_PINNED | (event_type & EVENT_GUEST)); >> } >> >> /* Then walk through the lower prio flexible groups */ >> if (is_active & EVENT_FLEXIBLE) { >> for_each_epc(pmu_ctx, ctx, pmu, event_type) >> - __pmu_ctx_sched_in(pmu_ctx, EVENT_FLEXIBLE); >> + __pmu_ctx_sched_in(pmu_ctx, EVENT_FLEXIBLE | (event_type & EVENT_GUEST)); >> } >> } >> >> @@ -6285,23 +6400,25 @@ void perf_event_update_userpage(struct perf_event *event) >> if (!rb) >> goto unlock; >> >> - /* >> - * compute total_time_enabled, total_time_running >> - * based on snapshot values taken when the event >> - * was last scheduled in. >> - * >> - * we cannot simply called update_context_time() >> - * because of locking issue as we can be called in >> - * NMI context >> - */ >> - calc_timer_values(event, &now, &enabled, &running); >> - >> - userpg = rb->user_page; >> /* >> * Disable preemption to guarantee consistent time stamps are stored to >> * the user page. >> */ >> preempt_disable(); >> + >> + /* >> + * compute total_time_enabled, total_time_running >> + * based on snapshot values taken when the event >> + * was last scheduled in. >> + * >> + * we cannot simply called update_context_time() >> + * because of locking issue as we can be called in >> + * NMI context >> + */ >> + calc_timer_values(event, &now, &enabled, &running); >> + >> + userpg = rb->user_page; >> + >> ++userpg->lock; >> barrier(); >> userpg->index = perf_event_index(event); >> -- >> 2.49.0.395.g12beb8f557-goog >> >