Re: [PATCH] perf_events: AMD event scheduling (v2)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: eranian@google.com
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, paulus@samba.org,
	davem@davemloft.net, fweisbec@gmail.com, robert.richter@amd.com,
	perfmon2-devel@lists.sf.net, eranian@gmail.com
Subject: Re: [PATCH]  perf_events: AMD event scheduling (v2)
Date: Thu, 04 Feb 2010 15:55:02 +0100	[thread overview]
Message-ID: <1265295302.24455.2265.camel@laptop> (raw)
In-Reply-To: <4b674594.0a04d00a.6005.2567@mx.google.com>

On Mon, 2010-02-01 at 22:15 +0200, Stephane Eranian wrote:
> 	
> 	This patch adds correct AMD Northbridge event scheduling.
> 	It must be applied on top tip-x86 + hw_perf_enable() fix.
> 
> 	NB events are events measuring L3 cache, Hypertransport
> 	traffic. They are identified by an event code  >= 0xe0.
> 	They measure events on the Northbride which is shared
> 	by all cores on a package. NB events are counted on a
> 	shared set of counters. When a NB event is programmed
> 	in a counter, the data actually comes from a shared
> 	counter. Thus, access to those counters needs to be
> 	synchronized.
> 
> 	We implement the synchronization such that no two cores
> 	can be measuring NB events using the same counters. Thus,
> 	we maintain a per-NB * allocation table. The available slot
> 	is propagated using the event_constraint structure.
>  
> 	This 2nd version takes into account the changes on how
> 	constraints are stored by the scheduling code.
> 
> 	The patch also takes care of hotplug CPU.
> 
> 	Signed-off-by: Stephane Eranian <eranian@google.com>

Please run the patch through checkpatch, there's lots of trivial coding
style errors (spaces instead of tabs, for(i=0; etc..)

> @@ -2250,10 +2261,144 @@ intel_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event
>  	return &unconstrained;
>  }
>  
> +/*
> + * AMD64 events are detected based on their event codes.
> + */
> +static inline int amd_is_nb_event(struct hw_perf_event *hwc)
> +{
> +	u64 val = hwc->config;

& K7_EVNTSEL_EVENT_MASK ?

> +	/* event code : bits [35-32] | [7-0] */
> +	val = (val >> 24) | ( val & 0xff);
> +	return val >= 0x0e0;
> +}
> +
> +static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
> +				      struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct perf_event *old;
> +	struct amd_nb *nb;
> +	int i;
> +
> +	/*
> +	 * only care about NB events
> +	 */
> +	if(!amd_is_nb_event(hwc))
> +		return;
> +
> +	/*
> +	 * NB not initialized
> +	 */
> +	nb = cpuc->amd_nb;
> +	if (!nb)
> +		return;
> +
> +	if (hwc->idx == -1)
> +		return;
> +
> +	/*
> +	 * need to scan whole list because event may not have
> +	 * been assigned during scheduling
> +	 */
> +	for(i=0; i < x86_pmu.num_events; i++) {
> +		if (nb->owners[i] == event) {
> +			old = cmpxchg(nb->owners+i, event, NULL);

might want to validate old is indeed event.

> +			return;
> +		}
> +	}
> +}
> +
> + /*
> +  * AMD64 Northbridge events need special treatment because
> +  * counter access needs to be synchronized across all cores
> +  * of a package. Refer to BKDG section 3.12
> +  *
> +  * NB events are events measuring L3 cache, Hypertransport
> +  * traffic. They are identified by an event code  >= 0xe0.
> +  * They measure events on the Northbride which is shared
> +  * by all cores on a package. NB events are counted on a
> +  * shared set of counters. When a NB event is programmed
> +  * in a counter, the data actually comes from a shared
> +  * counter. Thus, access to those counters needs to be
> +  * synchronized.
> +  * We implement the synchronization such that no two cores
> +  * can be measuring NB events using the same counters. Thus,
> +  * we maintain a per-NB * allocation table. The available slot
> +  * is propagated using the event_constraint structure.
> +  *
> +  * We provide only one choice for each NB event based on
> +  * the fact that only NB events have restrictions. Consequently,
> +  * if a counter is available, there is a guarantee the NB event
> +  * will be assigned to it. If no slot is available, an empty
> +  * constraint is returned and scheduling will evnetually fail
> +  * for this event.
> +  *
> +  * Note that all cores attached the same NB compete for the same
> +  * counters to host NB events, this is why we use atomic ops. Some
> +  * multi-chip CPUs may have more than one NB.
> +  *
> +  * Given that resources are allocated (cmpxchg), they must be
> +  * eventually freed for others to use. This is accomplished by
> +  * calling amd_put_event_constraints().
> +  *
> +  * Non NB events are not impacted by this restriction.
> +  */
>  static struct event_constraint *
>  amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
>  {
> -	return &unconstrained;
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct amd_nb *nb = cpuc->amd_nb;
> +	struct perf_event *old = NULL;
> +	int max = x86_pmu.num_events;
> +	int i, j, k = -1;
> +
> +	/*
> +	 * if not NB event or no NB, then no constraints
> +	 */
> +	if (!amd_is_nb_event(hwc) || !nb)
> +		return &unconstrained;
> +
> +	/*
> +	 * detect if already present, if so reuse
> +	 *
> +	 * cannot merge with actual allocation
> +	 * because of possible holes
> +	 *
> +	 * event can already be present yet not assigned (in hwc->idx)
> +	 * because of successive calls to x86_schedule_events() from
> +	 * hw_perf_group_sched_in() without hw_perf_enable()
> +	 */
> +	for(i=0; i < max; i++) {
> +		/*
> +		 * keep track of first free slot
> +		 */
> +		if (k == -1 && !nb->owners[i])
> +			k = i;

break?

> +
> +		/* already present, reuse */
> +		if (nb->owners[i] == event)
> +			goto skip;

s/skip/done/ ?

> +	}
> +	/*
> +	 * not present, so grab a new slot
> +	 *
> +	 * try to alllcate same counter as before if
> +	 * event has already been assigned once. Otherwise,
> +	 * try to use free counter k obtained during the 1st
> +	 * pass above.
> +	 */
> +	i = j = hwc->idx != -1 ? hwc->idx : (k == -1 ? 0 : k);

That's patently unreadable, and I'm not sure what happens if we failed
to find an eligible spot in the above loop, should we not somehow jump
out and return emptyconstraint?

> +	do {
> +		old = cmpxchg(nb->owners+i, NULL, event);
> +		if (!old)
> +			break;
> +		if (++i == x86_pmu.num_events)
> +			i = 0;
> +	} while (i != j);
> +skip:
> +	if (!old)
> +		return &nb->event_constraints[i];
> +	return &emptyconstraint;
>  }
>  
>  static int x86_event_sched_in(struct perf_event *event,

> @@ -2561,6 +2707,96 @@ static __init int intel_pmu_init(void)
>  	return 0;
>  }
>  
> +static struct amd_nb *amd_alloc_nb(int cpu, int nb_id)
> +{
> +        struct amd_nb *nb;
> +	int i;
> +
> +        nb= vmalloc_node(sizeof(struct amd_nb), cpu_to_node(cpu));

$ pahole -C amd_nb build/arch/x86/kernel/cpu/perf_event.o
struct amd_nb {
        int                        nb_id;                /*     0     4 */
        int                        refcnt;               /*     4     4 */
        struct perf_event *        owners[64];           /*     8   512 */
        /* --- cacheline 8 boundary (512 bytes) was 8 bytes ago --- */
        struct event_constraint    event_constraints[64]; /*   520  1536 */
        /* --- cacheline 32 boundary (2048 bytes) was 8 bytes ago --- */

        /* size: 2056, cachelines: 33 */
        /* last cacheline: 8 bytes */
};

Surely we can kmalloc that?

> +        if (!nb)
> +                return NULL;
> +
> +        memset(nb, 0, sizeof(*nb));
> +        nb->nb_id = nb_id;
> +
> +	/*
> +	 * initialize all possible NB constraints
> +	 */
> +	for(i=0; i < x86_pmu.num_events; i++) {
> +		set_bit(i, nb->event_constraints[i].idxmsk);
> +		nb->event_constraints[i].weight = 1;
> +	}
> +        return nb;
> +}

Terrible whilespace damage.

> +
> +static void amd_pmu_cpu_online(int cpu)
> +{
> +	struct cpu_hw_events *cpu1, *cpu2;
> +	struct amd_nb *nb = NULL;
> +	int i, nb_id;
> +
> +	if (boot_cpu_data.x86_max_cores < 2)
> +		return;

So there are no single core AMD chips that have a NorthBridge PMU? What
about the recent 64bit single core laptop chips?

> +
> +	/*
> +	 * function may be called too early in the
> +	 * boot process, in which case nb_id is bogus
> +	 *
> +	 * for BSP, there is an explicit call from
> +	 * amd_pmu_init()
> +	 */

I keep getting flash-backs to doom's graphics engine every time I see
BSP..

> +	nb_id = amd_get_nb_id(cpu);
> +	if (nb_id == BAD_APICID)
> +		return;
> +
> +	cpu1 = &per_cpu(cpu_hw_events, cpu);
> +	cpu1->amd_nb = NULL;
> +
> +	raw_spin_lock(&amd_nb_lock);
> +
> +	for_each_online_cpu(i) {
> +		cpu2 = &per_cpu(cpu_hw_events, i);
> +		nb = cpu2->amd_nb;
> +		if (!nb)
> +			continue;
> +		if (nb->nb_id == nb_id)
> +			goto found;
> +	}
> +
> +	nb = amd_alloc_nb(cpu, nb_id);
> +	if (!nb) {
> +		pr_err("perf_events: failed to allocate NB storage for CPU%d\n", cpu);
> +		raw_spin_unlock(&amd_nb_lock);
> +		return;
> +	}
> +found:
> +	nb->refcnt++;
> +	cpu1->amd_nb = nb;
> +
> +	raw_spin_unlock(&amd_nb_lock);

Can't this be simplified by using the cpu to node mask?

> +	pr_info("CPU%d NB%d ref=%d\n", cpu, nb_id, nb->refcnt);

stray debug stuff?

> +}
> +
> +static void amd_pmu_cpu_offline(int cpu)
> +{
> +	struct cpu_hw_events *cpuhw;
> +
> +	if (boot_cpu_data.x86_max_cores < 2)
> +		return;
> +
> +	cpuhw = &per_cpu(cpu_hw_events, cpu);
> +
> +	raw_spin_lock(&amd_nb_lock);
> +
> +	if (--cpuhw->amd_nb->refcnt == 0)
> +		vfree(cpuhw->amd_nb);
> +
> +	cpuhw->amd_nb = NULL;
> +
> +	raw_spin_unlock(&amd_nb_lock);
> +}
> +
>  static __init int amd_pmu_init(void)
>  {
>  	/* Performance-monitoring supported from K7 and later: */
> @@ -2573,6 +2809,8 @@ static __init int amd_pmu_init(void)
>  	memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
>  	       sizeof(hw_cache_event_ids));
>  
> +	/* initialize BSP */

Binary Space Partitioning again?

> +	amd_pmu_cpu_online(smp_processor_id());
>  	return 0;
>  }


Also, I think this is buggy in that:

  perf_disable();
  event->pmu->disable(event);
  ...
  event->pmu->enable(event);
  perf_enable();

can now fail, I think we need to move the put_event_constraint() from
x86_pmu_disable() into x86_perf_enable() or something.

next prev parent reply	other threads:[~2010-02-04 14:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-01 20:15 [PATCH] perf_events: AMD event scheduling (v2) Stephane Eranian
2010-02-04 14:55 ` Peter Zijlstra [this message]
2010-02-04 14:59   ` Peter Zijlstra
2010-02-04 15:04     ` Stephane Eranian
2010-02-04 16:05   ` Stephane Eranian
2010-02-04 16:19     ` Peter Zijlstra
2010-02-04 16:32       ` Stephane Eranian
2010-02-04 16:35         ` Stephane Eranian
2010-02-25 18:29   ` Stephane Eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1265295302.24455.2265.camel@laptop \
    --to=peterz@infradead.org \
    --cc=davem@davemloft.net \
    --cc=eranian@gmail.com \
    --cc=eranian@google.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=perfmon2-devel@lists.sf.net \
    --cc=robert.richter@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.