From: Kyle Meyer <kyle.meyer@hpe.com>
To: "Zhuo, Qiuxu" <qiuxu.zhuo@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>, "bp@alien8.de" <bp@alien8.de>,
"james.morse@arm.com" <james.morse@arm.com>,
"mchehab@kernel.org" <mchehab@kernel.org>,
"rric@kernel.org" <rric@kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] EDAC/{i10nm,skx,skx_common}: Support multiple clumps
Date: Thu, 5 Dec 2024 20:33:30 -0600 [thread overview]
Message-ID: <Z1Jieg7ACUMZLsF-@hpe.com> (raw)
In-Reply-To: <CY8PR11MB7134E24098C6E16E43C57EAA89312@CY8PR11MB7134.namprd11.prod.outlook.com>
On Fri, Dec 06, 2024 at 01:26:12AM +0000, Zhuo, Qiuxu wrote:
> > From: Kyle Meyer <kyle.meyer@hpe.com>
> > Sent: Friday, December 6, 2024 8:57 AM
> > To: Luck, Tony <tony.luck@intel.com>
> > Cc: bp@alien8.de; james.morse@arm.com; mchehab@kernel.org;
> > rric@kernel.org; linux-edac@vger.kernel.org; linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH] EDAC/{i10nm,skx,skx_common}: Support multiple
> > clumps
> >
> > On Thu, Dec 05, 2024 at 03:57:55PM -0800, Luck, Tony wrote:
> > > > +int skx_get_src_id(struct skx_dev *d, int off, u8 *id) { #ifdef
> > > > +CONFIG_NUMA
> > > > + return skx_get_pkg_id(d, id);
> > > > +#else
> > > > + u32 reg;
> > > > +
> > > > + if (pci_read_config_dword(d->util_all, off, ®)) {
> > > > + skx_printk(KERN_ERR, "Failed to read src id\n");
> > > > + return -ENODEV;
> > > > + }
> > > > +
> > > > + *id = GET_BITFIELD(reg, 12, 14);
> > > > + return 0;
> > > > +#endif
> > >
> > > Doh ... I alwasy forget about IS_ENABLED(). This can be written:
> > >
> > >
> > > int skx_get_src_id(struct skx_dev *d, int off, u8 *id) {
> > > u32 reg;
> > >
> > > if (IS_ENABLED(CONFIG_NUMA))
> > > return skx_get_pkg_id(d, id);
> > >
> > > if (pci_read_config_dword(d->util_all, off, ®)) {
> > > skx_printk(KERN_ERR, "Failed to read src id\n");
> > > return -ENODEV;
> > > }
> > >
> > > *id = GET_BITFIELD(reg, 12, 14);
> > > return 0;
> > > }
> >
> > Looks good.
> >
> > > 1) Does this work? I tried on a non-clumpy system that is NUMA.
> >
> > Yes, I just tested this on a Sapphire Rapids system with multiple UPI domains.
> >
> > > 2) Is it better (assuming #fidef factored off into a .h file)?
> >
> > IMO, yes, but there's one subtle difference. EDAC will not load on systems that
> > have a single UPI domain when CONFIG_NUMA is enabled but numa=off,
> > because
> > pcibus_to_node() in skx_get_pkg_id() will return NUMA_NO_NODE (-1). Is
> > that a case that we need to worry about?
>
> I think we need to make the EDAC load/work even in this case.
> Regardless of CONFIG_NUMA or whether numa=off is set or not, could we do it like this:
>
> int skx_get_src_id(struct skx_dev *d, int off, u8 *id)
> {
> u32 reg;
>
> if (!skx_get_pkg_id(d, id))
> return 0;
>
> if (pci_read_config_dword(d->util_all, off, ®)) {
> skx_printk(KERN_ERR, "Failed to read src id\n");
> return -ENODEV;
> }
>
> *id = GET_BITFIELD(reg, 12, 14);
> return 0;
> }
So, we're back to the original issue on systems with multiple UPI/QPI domains
when NUMA is disabled.
Systems with multiple UPI/QPI domains can't use source IDs to map devices
to sockets. skx_get_src_id() will successfully read the source ID from PCI
configuration space registers but it might not map to the correct socket because
each UPI/QPI domain has identical repeating source IDs.
For example, 8 sockets with 2 UPI/QPI domains:
Socket 0 -> Source ID 0
Socket 1 -> Source ID 1
Socket 2 -> Source ID 2
Socket 3 -> Source ID 3
Socket 4 -> Source ID 0
Socket 5 -> Source ID 1
Socket 6 -> Source ID 2
Socket 7 -> Source ID 3
EDAC will successfully load, but it will not find the the corresponding device
for errors on sockets 4 though 7 (for example, see skx_common.c:178).
Looking at my original patch, EDAC will not load when a system has multiple UPI/
QPI domains and NUMA is disabled. We fail early with "Failed to get package ID
from NUMA information" instead of later when an error occurs.
Thanks,
Kyle Meyer
next prev parent reply other threads:[~2024-12-06 2:33 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-05 16:59 [PATCH] EDAC/{i10nm,skx,skx_common}: Support multiple clumps Kyle Meyer
2024-12-05 19:13 ` Luck, Tony
2024-12-05 20:05 ` Kyle Meyer
2024-12-05 22:52 ` Luck, Tony
2024-12-05 23:57 ` Luck, Tony
2024-12-06 0:57 ` Kyle Meyer
2024-12-06 1:26 ` Zhuo, Qiuxu
2024-12-06 2:33 ` Kyle Meyer [this message]
2024-12-06 21:24 ` Luck, Tony
2024-12-06 22:09 ` Luck, Tony
2024-12-10 16:37 ` Bjorn Helgaas
2024-12-10 17:50 ` Luck, Tony
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z1Jieg7ACUMZLsF-@hpe.com \
--to=kyle.meyer@hpe.com \
--cc=bp@alien8.de \
--cc=james.morse@arm.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=qiuxu.zhuo@intel.com \
--cc=rric@kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox