From: "Chatradhi, Naveen Krishna" <nchatrad@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: linux-edac@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org, mingo@redhat.com,
mchehab@kernel.org, yazen.ghannam@amd.com,
Muralidhara M K <muralimk@amd.com>
Subject: Re: [PATCH v6 1/5] x86/amd_nb: Add support for northbridges on Aldebaran
Date: Thu, 4 Nov 2021 18:48:29 +0530 [thread overview]
Message-ID: <b7f3639a-e46c-25e8-270b-04860074fd3c@amd.com> (raw)
In-Reply-To: <YYF9ei59G/OUyZqR@zn.tnic>
Hi Boris,
On 11/2/2021 11:33 PM, Borislav Petkov wrote:
> [CAUTION: External Email]
>
> On Thu, Oct 28, 2021 at 06:31:02PM +0530, Naveen Krishna Chatradhi wrote:
>
> Staring at this more...
Thanks for taking the time.
>
>> +/*
>> + * Newer AMD CPUs and GPUs whose data fabrics can be connected via custom xGMI
>> + * links, comes with registers to gather local and remote node type map info.
>> + *
>> + * "Local Node Type" refers to nodes with the same type as that from which the
>> + * register is read, and "Remote Node Type" refers to nodes with a different type.
>> + *
>> + * This function, reads the registers from GPU DF function 1.
>> + * Hence, local nodes are GPU and remote nodes are CPUs.
>> + */
>> +static int amd_get_node_map(void)
> ... so this is a generic function name...
>
>> +{
>> + struct amd_node_map *nodemap;
>> + struct pci_dev *pdev;
>> + u32 tmp;
>> +
>> + pdev = pci_get_device(PCI_VENDOR_ID_AMD,
>> + PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1, NULL);
> ... but this here is trying to get the Aldebaran PCI device function.
I know, this is confusion. we will try to give a meaning for definition
here.
>
> So what happens if in the future, the GPU is a different one and it
> gets RAS functionality and PCI device functions too? You'd probably need
> to add that new GPU support too.
Yes, might happen
>
> And then looking at that patch again, see how this new code is bolted on
> and sure, it all is made to work, but it is strenuous and you have to
> always pay attention to what type of devices you're dealing with.
>
> And the next patch does:
>
> ... if (bank_type == SMCA_UMC_V2) {
>
> /* do UMC v2 special stuff here. */
>
> which begs the question: wouldn't this GPU PCI devices enumeration be a
> lot cleaner if it were separate?
>
> I.e., in amd_nb.c you'd have
>
> init_amd_nbs:
>
> amd_cache_northbridges();
> amd_cache_gart();
> amd_cache_gpu_devices();
Agreed. however, a slight modification to the suggestion
Instead of modifying the init_amd_nbs()
How about, defining a new struct
+struct system_topology {
+ const struct pci_device_id *misc_ids;
+ const struct pci_device_id *link_ids;
+ const struct pci_device_id *root_ids;
+ u16 roots_per_misc;
+ u16 misc_count;
+ u16 root_count;
+};
and modifying the amd_cache_northbridges() to
+int amd_cache_northbridges(void)
+{
+ struct system_toplogy topo;
+ int ret;
+
+ if (amd_northbridges.num)
+ return 0;
+
+ ret = amd_cpu_nbs(&topo);
+ printk("==> misc:%d\n", ret);
+
+ if (look_for_remote_nodes()) {
+ ret = amd_gpu_nbs(&topo);
+ printk("==> gpu_misc:%d\n", ret);
+ }
+
+ get_next_northbridges(&topo);
This way, creating appropriate number MCs under EDAC and existing
exported APIs can remain the same.
Let me know your thoughts on this. I can send an updated version with
your comments addressed.
>
> and in this last one you do your enumeration. Completely separate data
> structures and all. Adding a new device support would then be trivial.
>
> And then looking at the next patch again, you have:
>
> + } else if (bank_type == SMCA_UMC_V2) {
> + /*
> + * SMCA_UMC_V2 exists on GPU nodes, extract the node id
> + * from register MCA_IPID[47:44](InstanceIdHi).
> + * The InstanceIdHi field represents the instance ID of the GPU.
> + * Which needs to be mapped to a value used by Linux,
> + * where GPU nodes are simply numerically after the CPU nodes.
> + */
> + node_id = ((m->ipid >> 44) & 0xF) -
> + amd_gpu_node_start_id() + amd_cpu_node_count();
>
> where instead of exporting those functions and having the caller do the
> calculations, you'd have a function in amd_nb.c which is called
>
> amd_get_gpu_node_id(unsigned long ipid)
>
> which will use those separate data structures mentioned above and give
> you the node id.
Sure, we can modify this way.
>
> And those GPU node IDs are placed numerically after the CPU nodes so
> your code doesn't need to do anything special - just read out registers
> and cache them.
>
> And you don't need those exports either - it is all nicely encapsulated
> and a single function is used to get the callers what they wanna know.
Got it, thank you.
>
> Hmmm?
>
> --
> Regards/Gruss,
> Boris.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CNaveenKrishna.Chatradhi%40amd.com%7Cdd5b3586178441f4886808d99e2b1ef3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637714730331703852%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=oXDojOFqVVhxn4P1tgwLycaJgc2rvwo8EoUj3i971Mw%3D&reserved=0
next prev parent reply other threads:[~2021-11-04 13:21 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-28 13:01 [PATCH v6 0/5] x86/edac/amd64: Add heterogeneous node support Naveen Krishna Chatradhi
2021-10-28 13:01 ` [PATCH v6 1/5] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-11-01 17:28 ` Borislav Petkov
2021-11-02 18:03 ` Borislav Petkov
2021-11-04 13:18 ` Chatradhi, Naveen Krishna [this message]
2021-11-08 13:34 ` Borislav Petkov
2021-11-08 16:53 ` Chatradhi, Naveen Krishna
2021-11-08 19:03 ` Borislav Petkov
2021-11-09 11:30 ` Chatradhi, Naveen Krishna
2021-11-09 20:41 ` Borislav Petkov
2021-11-04 13:21 ` Chatradhi, Naveen Krishna
2021-10-28 13:01 ` [PATCH v6 2/5] EDAC/mce_amd: Extract node id from MCA_IPID Naveen Krishna Chatradhi
2021-11-08 13:37 ` Borislav Petkov
2021-10-28 13:01 ` [PATCH v6 3/5] EDAC/amd64: Extend family ops functions Naveen Krishna Chatradhi
2021-11-10 17:45 ` Borislav Petkov
2021-11-11 16:23 ` Chatradhi, Naveen Krishna
2021-11-11 18:05 ` Borislav Petkov
2021-11-12 20:59 ` Yazen Ghannam
2021-11-13 11:58 ` Borislav Petkov
2021-10-28 13:01 ` [PATCH v6 4/5] EDAC/amd64: Move struct fam_type into amd64_pvt structure Naveen Krishna Chatradhi
2021-11-11 12:39 ` Borislav Petkov
2021-11-11 16:26 ` Chatradhi, Naveen Krishna
2021-10-28 13:01 ` [PATCH v6 5/5] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes Naveen Krishna Chatradhi
2021-11-11 13:12 ` Borislav Petkov
2021-11-15 15:24 ` Chatradhi, Naveen Krishna
2021-11-15 16:04 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b7f3639a-e46c-25e8-270b-04860074fd3c@amd.com \
--to=nchatrad@amd.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=mingo@redhat.com \
--cc=muralimk@amd.com \
--cc=x86@kernel.org \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox