Re: [PATCH v4 4/5] cxl: CXL Performance Monitoring Unit driver

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: <linux-cxl@vger.kernel.org>, <peterz@infradead.org>,
	<mingo@redhat.com>, <acme@kernel.org>, <mark.rutland@arm.com>,
	<will@kernel.org>, <dan.j.williams@intel.com>,
	<linuxarm@huawei.com>, <linux-perf-users@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>,
	"Davidlohr Bueso" <dave@stgolabs.net>,
	Dave Jiang <dave.jiang@intel.com>
Subject: Re: [PATCH v4 4/5] cxl: CXL Performance Monitoring Unit driver
Date: Tue, 4 Apr 2023 17:48:41 +0100	[thread overview]
Message-ID: <20230404174841.000045a7@Huawei.com> (raw)
In-Reply-To: <bcae8e21-0afc-9fc6-7f90-07afca412223@linux.intel.com>

On Mon, 3 Apr 2023 13:32:06 -0400
"Liang, Kan" <kan.liang@linux.intel.com> wrote:

> On 2023-03-30 12:45 p.m., Jonathan Cameron wrote:
> > CXL rev 3.0 introduces a standard performance monitoring hardware
> > block to CXL. Instances are discovered using CXL Register Locator DVSEC
> > entries. Each CXL component may have multiple PMUs.
> > 
> > This initial driver supports on a subset of types of counter.
> > It support counters that are either fixed or configurable, but requires
> > that they support the ability to freeze and write value whilst frozen.
> > 
> > Development done with QEMU model which will be posted shortly.
> >   
> 
> So the patch series is only tested with QEMU. Is there a real hardware
> which supports the CXL PMON?

In common with a lot of CXL stuff (and in general new standards based
architecture features), nothing that anyone is willing to talk about
except in general terms at this stage (this is a CXL 3.0 feature so
pretty recent).  Much of the existing kernel CXL support has been written
against emulation of one type or another (and we've shaken out a lot of
bugs that aren't yet seen with real hardware. Switches are very
rare for example - I've not yet registered PMUs for those, but that's
on the todo list after we get this merged). 

For this particular feature it's likely that initial devices will only
exercise some parts of the driver support. It's much easier to poke
the interesting corner cases with emulation given how extreme the
design space allowed by the CPMU specification is. To that end there
are a lot more features to add to this driver once we have a basic version
in place. One example is that we've had a request for fixed free running counter
support combined with vendor defined events as that's particularly useful
for debugging of new devices.

Note that there is still work to be done on the emulation to make it
easier to vary what is being emulated as right now I end up hacking it
to hit particular interesting cases and that's not going to be great
for CI testing this.

> 
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 

> > +
> > +#define CPMU_ATTR_CONFIG_MASK_MSK	GENMASK_ULL(63, 48)  
> 
> (31, 0)?

Indeed. You'd have thought that having them right next
to the definition would have made that obvious when I messed
up the cut and paste. *sigh*

> 
> > +#define CPMU_ATTR_CONFIG_GID_MSK	GENMASK_ULL(47, 32)
> > +#define CPMU_ATTR_CONFIG_VID_MSK	GENMASK_ULL(63, 48)
> > +#define CPMU_ATTR_CONFIG1_THRESHOLD_MSK	GENMASK_ULL(15, 0)
> > +#define CPMU_ATTR_CONFIG1_INVERT_MSK	BIT(16)
> > +#define CPMU_ATTR_CONFIG1_EDGE_MSK	BIT(17)
> > +#define CPMU_ATTR_CONFIG1_FILTER_EN_MSK	BIT(18)
> > +#define CPMU_ATTR_CONFIG2_HDM_MSK	GENMASK(15, 0)

> > +static void cpmu_event_start(struct perf_event *event, int flags)
> > +{
> > +	struct cpmu_info *info = pmu_to_cpmu_info(event->pmu);
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	void __iomem *base = info->base;
> > +	u64 cfg;
> > +
> > +	if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
> > +		return;
> > +
> > +	WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
> > +	hwc->state = 0;
> > +
> > +	/*
> > +	 * Currently only hdm filter control is implemnted, this code will
> > +	 * want generalizing when more filters are added.
> > +	 */
> > +	if (info->filter_hdm) {
> > +		if (cpmu_config1_hdm_filter_en(event))
> > +			cfg = cpmu_config2_get_hdm_decoder(event);
> > +		else
> > +			cfg = GENMASK(15, 0); /* No filtering if 0xFFFF_FFFF */
> > +		writeq(cfg, base + CPMU_FILTER_CFG_REG(hwc->idx, 0));
> > +	}
> > +
> > +	cfg = readq(base + CPMU_COUNTER_CFG_REG(hwc->idx));  
> 
> I don't think we need the previous value. Just overwrite it.

Some bits are hwinit or RO.  Whilst we can probably get away with just writing
random garbage (zeros) to those bits, but I'd prefer to write their
value back to them.

> 
> 
> > +	cfg |= FIELD_PREP(CPMU_COUNTER_CFG_INT_ON_OVRFLW, 1);
> > +	cfg |= FIELD_PREP(CPMU_COUNTER_CFG_ENABLE, 1);
> > +	cfg |= FIELD_PREP(CPMU_COUNTER_CFG_EDGE, cpmu_config1_get_edge(event) ? 1 : 0);
> > +	cfg |= FIELD_PREP(CPMU_COUNTER_CFG_INVERT, cpmu_config1_get_invert(event) ? 1 : 0);
> > +
> > +	/* Fixed purpose counters have next two fields RO */
> > +	if (test_bit(hwc->idx, info->conf_counter_bm)) {
> > +		cfg |= FIELD_PREP(CPMU_COUNTER_CFG_EVENT_GRP_ID_IDX_MSK, hwc->event_base);
> > +		cfg |= FIELD_PREP(CPMU_COUNTER_CFG_EVENTS_MSK, cpmu_config_get_mask(event));
> > +	}
> > +	cfg &= ~CPMU_COUNTER_CFG_THRESHOLD_MSK;
> > +	/*
> > +	 * For events that generate only 1 count per clock the CXL 3.0 spec
> > +	 * states the threshold shall be set to 1 but if set to 0 it will
> > +	 * count the raw value anwyay?
> > +	 * There is no definition of what events will count multiple per cycle
> > +	 * and hence to which non 1 values of threshold can apply.
> > +	 * (CXL 3.0 8.2.7.2.1 Counter Configuration - threshold field definition)
> > +	 */
> > +	cfg |= FIELD_PREP(CPMU_COUNTER_CFG_THRESHOLD_MSK,
> > +			  cpmu_config1_get_threshold(event));
> > +	writeq(cfg, base + CPMU_COUNTER_CFG_REG(hwc->idx));
> > +
> > +	local64_set(&hwc->prev_count, 0);
> > +	writeq(0, base + CPMU_COUNTER_REG(hwc->idx));  
> 
> The counter is reset twice. The other is in add()->cpmu_reset_counter().
> I think you can remove the one in add().

After some tracing I think you are correct. This seems to be always called immediately
after add anyway so no point in doing it twice.

> 
> > +
> > +	perf_event_update_userpage(event);
> > +}
> > +
...

> > +
> > +/*
> > + * Reset ensures no possibility of any information leaking to wrong
> > + * counter. Note that all fields written during start()
> > + */
> > +static void cpmu_reset_counter(struct cpmu_info *info, int idx)
> > +{
> > +	void __iomem *base = info->base;
> > +
> > +	/* Much of this register is read only */
> > +	writeq(0, base + CPMU_EVENT_CAP_REG(idx));  
> 
> 
> CPMU_COUNTER_CFG_REG?
> 
> I don't think we need to reset the config here, since it will be rewrite
> in cpmu_event_start().

You are correct.  The above write is clearly a noop as that's a read only register
and as you have observed cpmu_event_start() writes almost all the bits that
are not HWInit. I'll make it write the remaining one as well.

As above this whole function can then go away.

> 
> 
> > +	/* Filters are not per counter, so no reset here */
> > +	writeq(0, base + CPMU_COUNTER_REG(idx));
> > +}

> > +	info->pmu = (struct pmu) {
> > +		.name = dev_name(dev),
> > +		.parent = dev,  
> 
> I don't see the PMU parent is used anywhere.
> Please remove this and drop the Patch 2 from this series.

No.  As per the discussion of v4, without this the driver is incorrectly
resulting in the device being listed in /sys/devices rather than
/sys/bus/devices/pcie0000:0c/0000:0c:00.0/0000:0d:00.0/cpmu0/cpmu0

I do not want to introduce more ABI that will then need fixing up later.

This issue is common with a lot of other drivers hence the series
https://lore.kernel.org/all/20230404134225.13408-1-Jonathan.Cameron@huawei.com/
to correct it for all drivers that should have a simple parent set.

> 
> If there will be other followup patches, please add the PMU parent
> support there.
> 
> Thanks,
> Kan

Thanks again for looking at this so quickly.

Jonathan

next prev parent reply	other threads:[~2023-04-04 16:48 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 16:45 [PATCH v4 0/5] CXL 3.0 Performance Monitoring Unit support Jonathan Cameron
2023-03-30 16:45 ` [PATCH v4 1/5] cxl: Add function to count regblocks of a given type Jonathan Cameron
2023-04-04  3:59   ` Dan Williams
2023-03-30 16:45 ` [PATCH v4 2/5] perf: Allow a PMU to have a parent Jonathan Cameron
2023-04-04  4:03   ` Dan Williams
2023-03-30 16:45 ` [PATCH v4 3/5] cxl/pci: Find and register CXL PMU devices Jonathan Cameron
2023-04-04 19:17   ` Dan Williams
2023-04-05 10:48     ` Jonathan Cameron
2023-03-30 16:45 ` [PATCH v4 4/5] cxl: CXL Performance Monitoring Unit driver Jonathan Cameron
2023-04-03 17:32   ` Liang, Kan
2023-04-04 16:48     ` Jonathan Cameron [this message]
2023-04-04 21:53   ` Dan Williams
2023-04-05 16:08     ` Jonathan Cameron
2023-04-05 19:26       ` Dan Williams
2023-03-30 16:45 ` [PATCH v4 5/5] docs: perf: Minimal introduction the the CXL PMU device and driver Jonathan Cameron
2023-04-03 17:45   ` Liang, Kan
2023-04-04 16:55     ` Jonathan Cameron
2023-04-04 22:24   ` Dan Williams
2023-04-06 16:33     ` Jonathan Cameron
2023-04-04  3:55 ` [PATCH v4 0/5] CXL 3.0 Performance Monitoring Unit support Dan Williams
2023-04-11 13:21   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230404174841.000045a7@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=acme@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).