From: Borislav Petkov <bp@alien8.de>
To: Naveen Krishna Chatradhi <nchatrad@amd.com>
Cc: linux-edac@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org, mingo@redhat.com,
mchehab@kernel.org, yazen.ghannam@amd.com,
Muralidhara M K <muralimk@amd.com>
Subject: Re: [PATCH v6 1/5] x86/amd_nb: Add support for northbridges on Aldebaran
Date: Tue, 2 Nov 2021 19:03:38 +0100 [thread overview]
Message-ID: <YYF9ei59G/OUyZqR@zn.tnic> (raw)
In-Reply-To: <20211028130106.15701-2-nchatrad@amd.com>
On Thu, Oct 28, 2021 at 06:31:02PM +0530, Naveen Krishna Chatradhi wrote:
Staring at this more...
> +/*
> + * Newer AMD CPUs and GPUs whose data fabrics can be connected via custom xGMI
> + * links, comes with registers to gather local and remote node type map info.
> + *
> + * "Local Node Type" refers to nodes with the same type as that from which the
> + * register is read, and "Remote Node Type" refers to nodes with a different type.
> + *
> + * This function, reads the registers from GPU DF function 1.
> + * Hence, local nodes are GPU and remote nodes are CPUs.
> + */
> +static int amd_get_node_map(void)
... so this is a generic function name...
> +{
> + struct amd_node_map *nodemap;
> + struct pci_dev *pdev;
> + u32 tmp;
> +
> + pdev = pci_get_device(PCI_VENDOR_ID_AMD,
> + PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1, NULL);
... but this here is trying to get the Aldebaran PCI device function.
So what happens if in the future, the GPU is a different one and it
gets RAS functionality and PCI device functions too? You'd probably need
to add that new GPU support too.
And then looking at that patch again, see how this new code is bolted on
and sure, it all is made to work, but it is strenuous and you have to
always pay attention to what type of devices you're dealing with.
And the next patch does:
... if (bank_type == SMCA_UMC_V2) {
/* do UMC v2 special stuff here. */
which begs the question: wouldn't this GPU PCI devices enumeration be a
lot cleaner if it were separate?
I.e., in amd_nb.c you'd have
init_amd_nbs:
amd_cache_northbridges();
amd_cache_gart();
amd_cache_gpu_devices();
and in this last one you do your enumeration. Completely separate data
structures and all. Adding a new device support would then be trivial.
And then looking at the next patch again, you have:
+ } else if (bank_type == SMCA_UMC_V2) {
+ /*
+ * SMCA_UMC_V2 exists on GPU nodes, extract the node id
+ * from register MCA_IPID[47:44](InstanceIdHi).
+ * The InstanceIdHi field represents the instance ID of the GPU.
+ * Which needs to be mapped to a value used by Linux,
+ * where GPU nodes are simply numerically after the CPU nodes.
+ */
+ node_id = ((m->ipid >> 44) & 0xF) -
+ amd_gpu_node_start_id() + amd_cpu_node_count();
where instead of exporting those functions and having the caller do the
calculations, you'd have a function in amd_nb.c which is called
amd_get_gpu_node_id(unsigned long ipid)
which will use those separate data structures mentioned above and give
you the node id.
And those GPU node IDs are placed numerically after the CPU nodes so
your code doesn't need to do anything special - just read out registers
and cache them.
And you don't need those exports either - it is all nicely encapsulated
and a single function is used to get the callers what they wanna know.
Hmmm?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
next prev parent reply other threads:[~2021-11-02 18:03 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-28 13:01 [PATCH v6 0/5] x86/edac/amd64: Add heterogeneous node support Naveen Krishna Chatradhi
2021-10-28 13:01 ` [PATCH v6 1/5] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-11-01 17:28 ` Borislav Petkov
2021-11-02 18:03 ` Borislav Petkov [this message]
2021-11-04 13:18 ` Chatradhi, Naveen Krishna
2021-11-08 13:34 ` Borislav Petkov
2021-11-08 16:53 ` Chatradhi, Naveen Krishna
2021-11-08 19:03 ` Borislav Petkov
2021-11-09 11:30 ` Chatradhi, Naveen Krishna
2021-11-09 20:41 ` Borislav Petkov
2021-11-04 13:21 ` Chatradhi, Naveen Krishna
2021-10-28 13:01 ` [PATCH v6 2/5] EDAC/mce_amd: Extract node id from MCA_IPID Naveen Krishna Chatradhi
2021-11-08 13:37 ` Borislav Petkov
2021-10-28 13:01 ` [PATCH v6 3/5] EDAC/amd64: Extend family ops functions Naveen Krishna Chatradhi
2021-11-10 17:45 ` Borislav Petkov
2021-11-11 16:23 ` Chatradhi, Naveen Krishna
2021-11-11 18:05 ` Borislav Petkov
2021-11-12 20:59 ` Yazen Ghannam
2021-11-13 11:58 ` Borislav Petkov
2021-10-28 13:01 ` [PATCH v6 4/5] EDAC/amd64: Move struct fam_type into amd64_pvt structure Naveen Krishna Chatradhi
2021-11-11 12:39 ` Borislav Petkov
2021-11-11 16:26 ` Chatradhi, Naveen Krishna
2021-10-28 13:01 ` [PATCH v6 5/5] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes Naveen Krishna Chatradhi
2021-11-11 13:12 ` Borislav Petkov
2021-11-15 15:24 ` Chatradhi, Naveen Krishna
2021-11-15 16:04 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YYF9ei59G/OUyZqR@zn.tnic \
--to=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=mingo@redhat.com \
--cc=muralimk@amd.com \
--cc=nchatrad@amd.com \
--cc=x86@kernel.org \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox