[net-next v44] mctp pcc: Implement MCTP over PCC Transport

Netdev List
 help / color / mirror / Atom feed

* [net-next v44] mctp pcc: Implement MCTP over PCC Transport
@ 2026-05-22 19:36 Adam Young
  2026-05-28  8:45 ` Paolo Abeni
  0 siblings, 1 reply; 6+ messages in thread
From: Adam Young @ 2026-05-22 19:36 UTC (permalink / raw)
  To: Jeremy Kerr, Matt Johnston, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: netdev, linux-kernel, Sudeep Holla, Jonathan Cameron, Huisong Li

Implementation of network driver for
Management Component Transport Protocol(MCTP)
over Platform Communication Channel(PCC)

DMTF DSP:0292
Link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf

The transport mechanism is called Platform Communication Channels (PCC)
is part of the ACPI spec:

Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/14_Platform_Communications_Channel/Platform_Comm_Channel.html

The PCC mechanism is managed via a mailbox implemented at
drivers/mailbox/pcc.c

MCTP devices are specified via ACPI by entries in DSDT/SSDT and
reference channels specified in the PCCT. Messages are sent on a type
3 and received on a type 4 channel.  Communication with other devices
use the PCC based doorbell mechanism; a shared memory segment with a
corresponding interrupt and a memory register used to trigger remote
interrupts.

The shared buffer must be at least 68 bytes long as that is the minimum
MTU as defined by the MCTP specification.

Unlike the existing PCC Type 2 based drivers, the mssg parameter to
mbox_send_msg is actively used. The data section of the struct sk_buff
that contains the outgoing packet is sent to the mailbox, already
properly formatted as a PCC exctended message.

If the mailbox ring buffer is full, the driver stops the incoming
packet queues until a message has been sent, freeing space in the
ring buffer.

When the Type 3 channel outbox receives a txdone response interrupt,
it consumes the outgoing sk_buff, allowing it to be freed.

Bringing up an interface creates the channel between the network driver
and the mailbox driver. This enables communication with the remote
endpoint, to include the receipt of new messages. Bringing down an
interface removes the channel, and no new messages can be delivered.
Stopping the interface will leave any packets that are cached in the
mailbox ringbuffer. They cannot safely be freed until the PCC mailbox
attempts to deliver them and has removed them from the ring buffer.

PCC is based on a shared buffer and a set of I/O mapped memory locations
that the Spec calls registers.  This mechanism exists regardless of the
existence of the driver. If the user has the ability to map these
physical location to virtual locations, they have the ability to drive the
hardware.  Thus, there is a security aspect to this mechanism that extends
beyond the responsibilities of the operating system.

If the hardware does not expose the PCC in the ACPI table, this device
will never be enabled. Thus it is only an issue on hardware that does
support PCC. In that case, it is up to the remote controller to sanitize
communication; MCTP will be exposed as a socket interface, and userland
can send any crafted packet it wants. It would also be incumbent on
the hardware manufacturer to allow the end user to disable MCTP over PCC
communication if they did not want to expose it.

Link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/14_Platform_Communications_Channel/Platform_Comm_Channel.html
Signed-off-by: Adam Young <admiyo@os.amperecomputing.com>

---

Previous Version:
https://lore.kernel.org/lkml/20260520213831.118082-1-admiyo@os.amperecomputing.com/

Changes from Previous version

- Removed unnecessarty little endian conversion
- Replaced double call to send_message with pessimistic stop of queue and
single send.
- Set clilent to NULL after close to avoid dangling pointers on reopen
- restart queue after draining the ring buffer
- USE Symbolic constant to check.indicate RBuf is empty
---
 MAINTAINERS                 |   5 +
 drivers/net/mctp/Kconfig    |  16 ++
 drivers/net/mctp/Makefile   |   1 +
 drivers/net/mctp/mctp-pcc.c | 429 ++++++++++++++++++++++++++++++++++++
 4 files changed, 451 insertions(+)
 create mode 100644 drivers/net/mctp/mctp-pcc.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 5db1a2923dd2..f6df8dce41f6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15415,6 +15415,11 @@ F:	include/net/mctpdevice.h
 F:	include/net/netns/mctp.h
 F:	net/mctp/
 
+MANAGEMENT COMPONENT TRANSPORT PROTOCOL (MCTP) over PCC (MCTP-PCC) Driver
+M:	Adam Young <admiyo@os.amperecomputing.com>
+S:	Maintained
+F:	drivers/net/mctp/mctp-pcc.c
+
 MAPLE TREE
 M:	Liam R. Howlett <liam@infradead.org>
 R:	Alice Ryhl <aliceryhl@google.com>
diff --git a/drivers/net/mctp/Kconfig b/drivers/net/mctp/Kconfig
index cf325ab0b1ef..e68d23794a80 100644
--- a/drivers/net/mctp/Kconfig
+++ b/drivers/net/mctp/Kconfig
@@ -47,6 +47,22 @@ config MCTP_TRANSPORT_I3C
 	  A MCTP protocol network device is created for each I3C bus
 	  having a "mctp-controller" devicetree property.
 
+config MCTP_TRANSPORT_PCC
+	tristate "MCTP PCC transport"
+	depends on ACPI
+	depends on PCC
+	depends on 64BIT
+	depends on CPU_LITTLE_ENDIAN
+	help
+	  Provides a driver to access MCTP devices over PCC transport,
+	  A MCTP protocol network device is created via ACPI for each
+	  entry in the DSDT/SSDT that matches the identifier. The Platform
+	  communication channels are selected from the corresponding
+	  entries in the PCCT.
+
+	  Say y here if you need to connect to MCTP endpoints over PCC. To
+	  compile as a module, use m; the module will be called mctp-pcc.
+
 config MCTP_TRANSPORT_USB
 	tristate "MCTP USB transport"
 	depends on USB
diff --git a/drivers/net/mctp/Makefile b/drivers/net/mctp/Makefile
index c36006849a1e..0a591299ffa9 100644
--- a/drivers/net/mctp/Makefile
+++ b/drivers/net/mctp/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_MCTP_SERIAL) += mctp-serial.o
 obj-$(CONFIG_MCTP_TRANSPORT_I2C) += mctp-i2c.o
 obj-$(CONFIG_MCTP_TRANSPORT_I3C) += mctp-i3c.o
+obj-$(CONFIG_MCTP_TRANSPORT_PCC) += mctp-pcc.o
 obj-$(CONFIG_MCTP_TRANSPORT_USB) += mctp-usb.o
diff --git a/drivers/net/mctp/mctp-pcc.c b/drivers/net/mctp/mctp-pcc.c
new file mode 100644
index 000000000000..f2dd5286efa7
--- /dev/null
+++ b/drivers/net/mctp/mctp-pcc.c
@@ -0,0 +1,429 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * mctp-pcc.c - Driver for MCTP over PCC.
+ * Copyright (c) 2024-2026, Ampere Computing LLC
+ *
+ */
+
+/* Implementation of MCTP over PCC DMTF Specification DSP0256
+ * https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
+ */
+
+#include <linux/acpi.h>
+#include <linux/hrtimer.h>
+#include <linux/if_arp.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mailbox_client.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/skbuff.h>
+#include <linux/string.h>
+
+#include <acpi/acpi_bus.h>
+#include <acpi/acpi_drivers.h>
+#include <acpi/acrestyp.h>
+#include <acpi/actbl.h>
+#include <acpi/pcc.h>
+#include <net/mctp.h>
+#include <net/mctpdevice.h>
+
+#define MCTP_SIGNATURE          "MCTP"
+#define MCTP_SIGNATURE_LENGTH   (sizeof(MCTP_SIGNATURE) - 1)
+#define MCTP_MIN_MTU            68
+#define PCC_HEADER_SIZE         sizeof(struct acpi_pcct_ext_pcc_shared_memory)
+#define MCTP_PCC_MIN_SIZE       (PCC_HEADER_SIZE + MCTP_MIN_MTU)
+#define PCC_EXTRA_LEN           (PCC_HEADER_SIZE - sizeof(pcc_header.command))
+struct mctp_pcc_mailbox {
+	u32 index;
+	struct pcc_mbox_chan *chan;
+	struct mbox_client client;
+};
+
+/* The netdev structure. One of these per PCC adapter. */
+struct mctp_pcc_ndev {
+	struct net_device *ndev;
+	struct acpi_device *acpi_device;
+	struct mctp_pcc_mailbox inbox;
+	struct mctp_pcc_mailbox outbox;
+};
+
+static void mctp_pcc_client_rx_callback(struct mbox_client *cl, void *mssg)
+{
+	struct acpi_pcct_ext_pcc_shared_memory pcc_header;
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	struct mctp_pcc_mailbox *inbox;
+	struct mctp_skb_cb *cb;
+	struct sk_buff *skb;
+	int size;
+
+	mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, inbox.client);
+	inbox = &mctp_pcc_ndev->inbox;
+	memcpy_fromio(&pcc_header, inbox->chan->shmem, sizeof(pcc_header));
+
+	// The message must at least have the PCC command indicating it is an MCTP
+	// message followed by the MCTP header, or we have a malformed message.
+	// This may be run on big endian system, but the data in the buffer is
+	// explicitly little endian.
+	if (pcc_header.length < sizeof(pcc_header.command) + sizeof(struct mctp_hdr))
+		goto error;
+
+	// If the reported size is larger than the shared memory minus headers,
+	// something is wrong and treat the buffer as corrupted data.
+	if (pcc_header.length > inbox->chan->shmem_size - PCC_EXTRA_LEN)
+		goto error;
+
+	if (memcmp(&pcc_header.command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH) != 0)
+		goto error;
+
+	size = pcc_header.length + PCC_EXTRA_LEN;
+	skb = netdev_alloc_skb(mctp_pcc_ndev->ndev, size);
+	if (!skb)
+		goto error;
+
+	skb_put(skb, size);
+	skb->protocol = htons(ETH_P_MCTP);
+	memcpy_fromio(skb->data, inbox->chan->shmem, size);
+	dev_dstats_rx_add(mctp_pcc_ndev->ndev, size);
+	skb_pull(skb, sizeof(pcc_header));
+	skb_reset_mac_header(skb);
+	skb_reset_network_header(skb);
+	cb = __mctp_cb(skb);
+	cb->halen = 0;
+	netif_rx(skb);
+	return;
+
+error:
+	dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
+}
+
+static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct acpi_pcct_ext_pcc_shared_memory *pcc_header;
+	struct mctp_pcc_ndev *mpnd = netdev_priv(ndev);
+	int len = skb->len;
+
+	if (skb_cow_head(skb, sizeof(*pcc_header)))
+		goto error;
+
+	pcc_header = skb_push(skb, sizeof(*pcc_header));
+	pcc_header->signature = PCC_SIGNATURE | mpnd->outbox.index;
+	pcc_header->flags = PCC_CMD_COMPLETION_NOTIFY;
+	memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
+	pcc_header->length = len + MCTP_SIGNATURE_LENGTH;
+
+	if (skb->len > mpnd->outbox.chan->shmem_size)
+		goto error;
+
+	/*
+	 * There is a possibility that the mailbox can be cleared on
+	 * another thread. If that is the case, and we don't restart
+	 * the queue, it will remain permanently stopped.
+	 * Stopping the queue before attempting to send the message
+	 * allows us to always restart it if mbox_send_message succeeds.
+	 */
+	netif_stop_queue(ndev);
+	if (mbox_send_message(mpnd->outbox.chan->mchan, skb) >= 0) {
+		netif_wake_queue(ndev);
+	} else {
+		// Remove the header in case it gets sent again
+		skb_pull(skb, sizeof(*pcc_header));
+		return NETDEV_TX_BUSY;
+	}
+	return NETDEV_TX_OK;
+
+error:
+	dev_dstats_tx_dropped(ndev);
+	kfree_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+static void mctp_pcc_tx_prepare(struct mbox_client *cl, void *mssg)
+{
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	struct mctp_pcc_mailbox *outbox;
+	struct sk_buff *skb = mssg;
+
+	mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, outbox.client);
+	outbox = &mctp_pcc_ndev->outbox;
+
+	/* The PCC Mailbox typically does not make use of the mssg pointer
+	 * The mctp-over pcc driver is the only client that uses it.
+	 * This value should always be non-null; it is possible
+	 * that a change in the Mailbox level will break that assumption.
+	 */
+	if (!skb) {
+		netdev_warn_once(mctp_pcc_ndev->ndev,
+				 "%s called with null message.\n", __func__);
+		return;
+	}
+	memcpy_toio(outbox->chan->shmem, skb->data, skb->len);
+}
+
+static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int rc)
+{
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	struct pcpu_dstats *dstats;
+	struct sk_buff *skb = mssg;
+	unsigned long flags;
+
+	/*
+	 * If there is a packet in flight during driver cleanup
+	 * It may have been freed already.
+	 */
+	if (!mssg)
+		return;
+	mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, outbox.client);
+
+	/* Use an IRQ safe update as this is called from HARD IRQ instead of
+	 * dev_dstats_tx_add(mctp_pcc_ndev->ndev, skb->len);
+	 */
+	dstats = this_cpu_ptr(mctp_pcc_ndev->ndev->dstats);
+	flags = u64_stats_update_begin_irqsave(&dstats->syncp);
+
+	if (rc) {
+		u64_stats_inc(&dstats->tx_drops);
+	} else {
+		u64_stats_inc(&dstats->tx_packets);
+		u64_stats_add(&dstats->tx_bytes, skb->len);
+	}
+	u64_stats_update_end_irqrestore(&dstats->syncp, flags);
+	dev_consume_skb_any(skb);
+	netif_wake_queue(mctp_pcc_ndev->ndev);
+}
+
+static int mctp_pcc_open(struct net_device *ndev)
+{
+	struct mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
+	struct mctp_pcc_mailbox *outbox, *inbox;
+
+	outbox = &mctp_pcc_ndev->outbox;
+	inbox = &mctp_pcc_ndev->inbox;
+
+	outbox->chan = pcc_mbox_request_channel(&outbox->client, outbox->index);
+	if (IS_ERR(outbox->chan))
+		return PTR_ERR(outbox->chan);
+	if (outbox->chan->shmem_size < MCTP_PCC_MIN_SIZE) {
+		pcc_mbox_free_channel(outbox->chan);
+		return -EINVAL;
+	}
+
+	inbox->client.rx_callback = mctp_pcc_client_rx_callback;
+	inbox->chan = pcc_mbox_request_channel(&inbox->client, inbox->index);
+	if (IS_ERR(inbox->chan)) {
+		pcc_mbox_free_channel(outbox->chan);
+		return PTR_ERR(inbox->chan);
+	}
+	if (inbox->chan->shmem_size < MCTP_PCC_MIN_SIZE) {
+		pcc_mbox_free_channel(outbox->chan);
+		pcc_mbox_free_channel(inbox->chan);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int mctp_pcc_stop(struct net_device *ndev)
+{
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	unsigned int count, idx;
+	struct mbox_chan *chan;
+	struct sk_buff *skb;
+
+	mctp_pcc_ndev = netdev_priv(ndev);
+	chan = mctp_pcc_ndev->outbox.chan->mchan;
+	pcc_mbox_free_channel(mctp_pcc_ndev->inbox.chan);
+	mctp_pcc_ndev->inbox.chan = NULL;
+	scoped_guard(spinlock_irqsave, &chan->lock) {
+		if (chan->active_req !=  MBOX_NO_MSG) {
+			skb = chan->active_req;
+			chan->active_req = MBOX_NO_MSG;
+			dev_dstats_tx_dropped(ndev);
+			dev_consume_skb_any(skb);
+		}
+		while (chan->msg_count > 0) {
+			count = chan->msg_count;
+			idx = chan->msg_free;
+			if (idx >= count)
+				idx -= count;
+			else
+				idx += MBOX_TX_QUEUE_LEN - count;
+			skb = chan->msg_data[idx];
+			dev_dstats_tx_dropped(ndev);
+			dev_consume_skb_any(skb);
+			chan->msg_count--;
+		}
+	}
+	pcc_mbox_free_channel(mctp_pcc_ndev->outbox.chan);
+	mctp_pcc_ndev->outbox.chan = NULL;
+	/*
+	 * If the queue was stopped because the ring buffer was full
+	 * we can restart it here as we now know the ring buffer has
+	 * been emptied and the queue can be used again if the
+	 * netdev is re-opened.
+	 */
+	netif_wake_queue(mctp_pcc_ndev->ndev);
+	return 0;
+}
+
+static const struct net_device_ops mctp_pcc_netdev_ops = {
+	.ndo_open = mctp_pcc_open,
+	.ndo_stop = mctp_pcc_stop,
+	.ndo_start_xmit = mctp_pcc_tx,
+};
+
+static void mctp_pcc_setup(struct net_device *ndev)
+{
+	ndev->type = ARPHRD_MCTP;
+	ndev->hard_header_len = sizeof(struct acpi_pcct_ext_pcc_shared_memory);
+	ndev->tx_queue_len = 0;
+	ndev->flags = IFF_NOARP;
+	ndev->netdev_ops = &mctp_pcc_netdev_ops;
+	ndev->needs_free_netdev = true;
+	ndev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS;
+}
+
+struct mctp_pcc_lookup_context {
+	int index;
+	u32 inbox_index;
+	u32 outbox_index;
+};
+
+static acpi_status lookup_pcct_indices(struct acpi_resource *ares,
+				       void *context)
+{
+	struct mctp_pcc_lookup_context *luc = context;
+	struct acpi_resource_address32 *addr;
+
+	if (ares->type != ACPI_RESOURCE_TYPE_ADDRESS32)
+		return AE_OK;
+
+	addr = ACPI_CAST_PTR(struct acpi_resource_address32, &ares->data);
+	switch (luc->index) {
+	case 0:
+		luc->outbox_index = addr[0].address.minimum;
+		break;
+	case 1:
+		luc->inbox_index = addr[0].address.minimum;
+		break;
+	default:
+		return AE_ERROR;
+	}
+	luc->index++;
+	return AE_OK;
+}
+
+static void mctp_cleanup_netdev(void *data)
+{
+	struct net_device *ndev = data;
+
+	mctp_unregister_netdev(ndev);
+}
+
+static int initialize_mtu(struct net_device *ndev)
+{
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	struct mctp_pcc_mailbox *outbox;
+	struct pcc_mbox_chan *pchan;
+	int mctp_pcc_max_mtu;
+
+	mctp_pcc_ndev = netdev_priv(ndev);
+	outbox = &mctp_pcc_ndev->outbox;
+	pchan = pcc_mbox_request_channel(&outbox->client, outbox->index);
+	if (IS_ERR(pchan))
+		return PTR_ERR(pchan);
+	if (pchan->shmem_size < MCTP_MIN_MTU + sizeof(struct acpi_pcct_ext_pcc_shared_memory)) {
+		pcc_mbox_free_channel(pchan);
+		return -EINVAL;
+	}
+	mctp_pcc_max_mtu = pchan->shmem_size - sizeof(struct acpi_pcct_ext_pcc_shared_memory);
+	pcc_mbox_free_channel(pchan);
+
+	ndev->mtu = MCTP_MIN_MTU;
+	ndev->max_mtu = mctp_pcc_max_mtu;
+	ndev->min_mtu = MCTP_MIN_MTU;
+
+	return 0;
+}
+
+static int mctp_pcc_driver_add(struct acpi_device *acpi_dev)
+{
+	struct mctp_pcc_lookup_context context = {0};
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	struct device *dev = &acpi_dev->dev;
+	struct net_device *ndev;
+	acpi_handle dev_handle;
+	acpi_status status;
+	char name[32];
+	int rc;
+
+	dev_dbg(dev, "Adding mctp_pcc device for HID %s\n",
+		acpi_device_hid(acpi_dev));
+	dev_handle = acpi_device_handle(acpi_dev);
+	status = acpi_walk_resources(dev_handle, "_CRS", lookup_pcct_indices,
+				     &context);
+	if (!ACPI_SUCCESS(status)) {
+		dev_err(dev, "FAILED to lookup PCC indexes from CRS\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Ensure we have exactly 2 channels: an outbox and an inbox.
+	 */
+	if (context.index != 2)
+		return -EINVAL;
+
+	snprintf(name, sizeof(name), "mctppcc%d", context.inbox_index);
+	ndev = alloc_netdev(sizeof(*mctp_pcc_ndev), name, NET_NAME_PREDICTABLE,
+			    mctp_pcc_setup);
+	if (!ndev)
+		return -ENOMEM;
+
+	mctp_pcc_ndev = netdev_priv(ndev);
+
+	mctp_pcc_ndev->inbox.index = context.inbox_index;
+	mctp_pcc_ndev->inbox.client.dev = dev;
+	mctp_pcc_ndev->outbox.index = context.outbox_index;
+	mctp_pcc_ndev->outbox.client.dev = dev;
+
+	mctp_pcc_ndev->outbox.client.tx_prepare = mctp_pcc_tx_prepare;
+	mctp_pcc_ndev->outbox.client.tx_done = mctp_pcc_tx_done;
+	mctp_pcc_ndev->acpi_device = acpi_dev;
+	mctp_pcc_ndev->ndev = ndev;
+	acpi_dev->driver_data = mctp_pcc_ndev;
+
+	rc = initialize_mtu(ndev);
+	if (rc)
+		goto free_netdev;
+
+	rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
+	if (rc)
+		goto free_netdev;
+
+	return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
+free_netdev:
+	free_netdev(ndev);
+	return rc;
+}
+
+static const struct acpi_device_id mctp_pcc_device_ids[] = {
+	{ "DMT0001" },
+	{}
+};
+
+static struct acpi_driver mctp_pcc_driver = {
+	.name = "mctp_pcc",
+	.class = "Unknown",
+	.ids = mctp_pcc_device_ids,
+	.ops = {
+		.add = mctp_pcc_driver_add,
+	},
+};
+
+module_acpi_driver(mctp_pcc_driver);
+
+MODULE_DEVICE_TABLE(acpi, mctp_pcc_device_ids);
+
+MODULE_DESCRIPTION("MCTP PCC ACPI device");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Adam Young <admiyo@os.amperecomputing.com>");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [net-next v44] mctp pcc: Implement MCTP over PCC Transport
  2026-05-22 19:36 [net-next v44] mctp pcc: Implement MCTP over PCC Transport Adam Young
@ 2026-05-28  8:45 ` Paolo Abeni
  2026-05-28 18:26   ` Adam Young
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2026-05-28  8:45 UTC (permalink / raw)
  To: admiyo
  Cc: jk, matt, andrew+netdev, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel, sudeep.holla, Jonathan.Cameron, lihuisong

From: AI Reviewer <ai@example.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
mctp pcc: Implement MCTP over PCC Transport

This patch adds a new mctp-pcc network driver that implements the DMTF
DSP0292 binding for MCTP over the ACPI Platform Communication Channel.
Type 3 channels are used for transmit and Type 4 channels for receive,
with each interface created from a DMT0001 ACPI device referencing two
PCC subspaces.

Link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/14_Platform_Communications_Channel/Platform_Comm_Channel.html

A few questions inline below.

[Low]
The Link tag and the body of the commit message both reference
DSP0292_1.0.0WIP50. Is it intentional to merge an in-tree binding
based on a Work-In-Progress draft of the spec rather than the
published 1.0.0?

Anything userspace-visible that ships in this driver (the netdev name
mctppccN, the ARPHRD type, the literal "MCTP" marker overloaded into
the PCC command field) becomes a de-facto ABI once it lands in a
stable kernel; if the published DSP0292 finalises any of those points
differently, the kernel would have to keep bug-compatibility with the
WIP variant. Could the rationale for picking the WIP baseline be
captured in the commit message, or alternatively could the merge wait
on the published revision?

> diff --git a/drivers/net/mctp/Kconfig b/drivers/net/mctp/Kconfig
> index cf325ab0b1ef..e68d23794a80 100644
> --- a/drivers/net/mctp/Kconfig
> +++ b/drivers/net/mctp/Kconfig
> @@ -47,6 +47,22 @@ config MCTP_TRANSPORT_I3C
>  	  A MCTP protocol network device is created for each I3C bus
>  	  having a "mctp-controller" devicetree property.
>  
> +config MCTP_TRANSPORT_PCC
> +	tristate "MCTP PCC transport"
> +	depends on ACPI
> +	depends on PCC
> +	depends on 64BIT
> +	depends on CPU_LITTLE_ENDIAN

[Medium]
Could the rationale for depends on 64BIT and depends on
CPU_LITTLE_ENDIAN be added to the commit message?

DSP0292 and ACPI PCC are not 64-bit-only, and the PCC shared memory
header is fixed little-endian which can be handled with __le32 fields
and le32_to_cpu()/readl() accessors so the driver runs on any host
endianness. Other PCC consumers such as drivers/acpi/cppc_acpi.c and
drivers/mailbox/pcc.c are not LE-only.

If these depends on lines are present to paper over byte-handling
choices in the driver, would it be preferable to use endian-correct
accessors and drop the dependencies?

> +	help
> +	  Provides a driver to access MCTP devices over PCC transport,
[ ... ]

> diff --git a/drivers/net/mctp/mctp-pcc.c b/drivers/net/mctp/mctp-pcc.c
> new file mode 100644
> index 000000000000..f2dd5286efa7
> --- /dev/null
> +++ b/drivers/net/mctp/mctp-pcc.c
> @@ -0,0 +1,429 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * mctp-pcc.c - Driver for MCTP over PCC.
> + * Copyright (c) 2024-2026, Ampere Computing LLC
> + *
> + */
> +
> +/* Implementation of MCTP over PCC DMTF Specification DSP0256
> + * https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
> + */

[Low]
The header comment says DSP0256 but the URL and the commit message
both say DSP0292. DSP0256 is a different DMTF document (PLDM for
Monitoring and Control). Should the comment read DSP0292?

[ ... ]

> +#define MCTP_SIGNATURE          "MCTP"
> +#define MCTP_SIGNATURE_LENGTH   (sizeof(MCTP_SIGNATURE) - 1)
> +#define MCTP_MIN_MTU            68
> +#define PCC_HEADER_SIZE         sizeof(struct acpi_pcct_ext_pcc_shared_memory)
> +#define MCTP_PCC_MIN_SIZE       (PCC_HEADER_SIZE + MCTP_MIN_MTU)
> +#define PCC_EXTRA_LEN           (PCC_HEADER_SIZE - sizeof(pcc_header.command))

[Low]
PCC_EXTRA_LEN refers to pcc_header, which is not a global symbol or
type but a local variable inside mctp_pcc_client_rx_callback(). It
expands successfully today only because every caller happens to have
a pcc_header in scope. Would something like the following be more
robust?

  #define PCC_EXTRA_LEN (PCC_HEADER_SIZE - \
      sizeof_field(struct acpi_pcct_ext_pcc_shared_memory, command))

[ ... ]

> +static void mctp_pcc_client_rx_callback(struct mbox_client *cl, void *mssg)
> +{
> +	struct acpi_pcct_ext_pcc_shared_memory pcc_header;
> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
> +	struct mctp_pcc_mailbox *inbox;
> +	struct mctp_skb_cb *cb;
> +	struct sk_buff *skb;
> +	int size;
> +
> +	mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, inbox.client);
> +	inbox = &mctp_pcc_ndev->inbox;
> +	memcpy_fromio(&pcc_header, inbox->chan->shmem, sizeof(pcc_header));

[High]
Can this dereference a NULL shmem during teardown?

Looking at pcc_mbox_free_channel() in drivers/mailbox/pcc.c, the
channel's shmem is iounmapped (and the pointer cleared) before the
shutdown path runs free_irq(). Until free_irq() returns, an inbound
doorbell on another CPU can still fire pcc_mbox_irq, which calls
mbox_chan_received_data(), which calls chan->cl->rx_callback (cl is
still set at that moment) — landing here.

Existing PCC clients such as drivers/acpi/acpi_pcc.c::pcc_rx_callback
do not touch shmem from their callback, so this driver appears to be
the first one to expose this ordering.

Should the rx_callback bail out when inbox->chan->shmem is NULL, or
should the ordering in pcc_mbox_free_channel() (free_irq before
iounmap) be addressed in the mailbox layer first?

> +
> +	// The message must at least have the PCC command indicating it is an MCTP
> +	// message followed by the MCTP header, or we have a malformed message.
> +	// This may be run on big endian system, but the data in the buffer is
> +	// explicitly little endian.

[Medium]
This comment claims the callback may run on a big-endian system, but
the Kconfig has depends on CPU_LITTLE_ENDIAN, and the u32 fields
length, command and signature are read from MMIO via memcpy_fromio
without any le32_to_cpu() or readl() byteswap. On big-endian, those
header fields would be misinterpreted.

Is the comment stale, or is the intent eventually to drop the
CPU_LITTLE_ENDIAN dependency? If a future change relaxes the Kconfig
dependency based on this comment, the driver will silently corrupt
multi-byte header fields. Could the comment and the Kconfig be made
consistent (and the code made endian-correct if the goal is to
support BE)?

> +	if (pcc_header.length < sizeof(pcc_header.command) + sizeof(struct mctp_hdr))
> +		goto error;
> +
> +	// If the reported size is larger than the shared memory minus headers,
> +	// something is wrong and treat the buffer as corrupted data.
> +	if (pcc_header.length > inbox->chan->shmem_size - PCC_EXTRA_LEN)
> +		goto error;
> +
> +	if (memcmp(&pcc_header.command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH) != 0)
> +		goto error;

[Low]
The TX path stamps pcc_header.signature with PCC_SIGNATURE |
subspace_id but the RX path validates pcc_header.length and the
"MCTP" command field only — signature is never checked. Would it be
worth adding a signature check here for symmetry with TX and to
reject obviously malformed peer input?

> +
> +	size = pcc_header.length + PCC_EXTRA_LEN;

[Medium]
Is the type of size correct here?

pcc_header.length is u32 and is sourced from peer-writable shared
memory. PCC_EXTRA_LEN is a sizeof() expression of type size_t. The
sum is a size_t that is then narrowed to a signed int.

The only upper bound on pcc_header.length is the earlier check
against inbox->chan->shmem_size - PCC_EXTRA_LEN. shmem_size is u64
from the PCCT and is only checked to be >= MCTP_PCC_MIN_SIZE in
mctp_pcc_open(). If firmware ever exposes a PCC channel whose
shmem_size exceeds INT_MAX, pcc_header.length can be near 4 GiB and
still pass that bound, after which size wraps to a small or negative
int.

The subsequent skb_pull(skb, sizeof(pcc_header)) on a too-small skb
ends up in __skb_pull() which calls BUG(). Would using unsigned int
or size_t for size, validating pcc_header.length against an
INT_MAX-safe bound, and sanity-checking shmem_size in mctp_pcc_open()
close this off?

> +	skb = netdev_alloc_skb(mctp_pcc_ndev->ndev, size);
> +	if (!skb)
> +		goto error;
> +
> +	skb_put(skb, size);
> +	skb->protocol = htons(ETH_P_MCTP);
> +	memcpy_fromio(skb->data, inbox->chan->shmem, size);

[Medium, Low]
Two concerns about reading from shmem here:

First, the header bytes are read twice — once into the local
pcc_header for validation, and again as the leading bytes of this
memcpy_fromio() into skb->data. Between the two reads the remote
endpoint can rewrite the shared region, so the bytes the driver
validated and the bytes the MCTP layer parses are not guaranteed to
match. A common hardening pattern is to snapshot once into a kernel
buffer, validate from the snapshot, and then either copy the
validated header into skb->data and memcpy_fromio() only the
remaining payload, or do a single MMIO copy into the skb and validate
from that copy.

Second, on weakly-ordered architectures (arm64 is included via 64BIT)
memcpy_fromio is a byte/word loop without a final dma_rmb() and the
PCC mailbox layer does not insert a barrier on the consumer side
before invoking the rx callback. Would a dma_rmb() between the IRQ
and consuming buffer contents (and between successive memcpy_fromio()
calls of the same region) be appropriate here?

> +	dev_dstats_rx_add(mctp_pcc_ndev->ndev, size);
> +	skb_pull(skb, sizeof(pcc_header));
> +	skb_reset_mac_header(skb);
> +	skb_reset_network_header(skb);
> +	cb = __mctp_cb(skb);
> +	cb->halen = 0;
> +	netif_rx(skb);
> +	return;
> +
> +error:
> +	dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
> +}
> +
> +static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
> +{
> +	struct acpi_pcct_ext_pcc_shared_memory *pcc_header;
> +	struct mctp_pcc_ndev *mpnd = netdev_priv(ndev);
> +	int len = skb->len;
> +
> +	if (skb_cow_head(skb, sizeof(*pcc_header)))
> +		goto error;
> +
> +	pcc_header = skb_push(skb, sizeof(*pcc_header));
> +	pcc_header->signature = PCC_SIGNATURE | mpnd->outbox.index;
> +	pcc_header->flags = PCC_CMD_COMPLETION_NOTIFY;
> +	memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
> +	pcc_header->length = len + MCTP_SIGNATURE_LENGTH;
> +
> +	if (skb->len > mpnd->outbox.chan->shmem_size)
> +		goto error;

[Low]
This length check is performed after skb_cow_head() and skb_push()
have already mutated the skb. It also looks unreachable in normal
operation: net core enforces skb->len <= dev->mtu, and
initialize_mtu() sets max_mtu = shmem_size - sizeof(pcc_header), so
after the skb_push the resulting length is at most shmem_size.

Could the check either be moved before the mutation (comparing
len + sizeof(*pcc_header) against shmem_size) or removed?

> +
[ ... ]
> +	netif_stop_queue(ndev);
> +	if (mbox_send_message(mpnd->outbox.chan->mchan, skb) >= 0) {
> +		netif_wake_queue(ndev);
> +	} else {
> +		// Remove the header in case it gets sent again
> +		skb_pull(skb, sizeof(*pcc_header));
> +		return NETDEV_TX_BUSY;
> +	}
> +	return NETDEV_TX_OK;
[ ... ]
> +}
> +
> +static void mctp_pcc_tx_prepare(struct mbox_client *cl, void *mssg)
> +{
> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
> +	struct mctp_pcc_mailbox *outbox;
> +	struct sk_buff *skb = mssg;
> +
> +	mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, outbox.client);
> +	outbox = &mctp_pcc_ndev->outbox;
> +
> +	/* The PCC Mailbox typically does not make use of the mssg pointer
> +	 * The mctp-over pcc driver is the only client that uses it.
> +	 * This value should always be non-null; it is possible
> +	 * that a change in the Mailbox level will break that assumption.
> +	 */

[Medium]
The comment effectively concedes that this driver depends on
behaviour that other PCC clients (cppc, acpi_pcc, hisi_uncore) do not
rely on, and that a change in the mailbox layer could break it.
Building a new in-tree driver on an undocumented contract is fragile
— if PCC core ever ignores or sanitises mssg, this driver's TX path
would silently leak SKBs or crash.

Could the mssg semantics be formalised in the mailbox/PCC API, or
could the driver instead track outstanding SKBs in its own state
(for example, a single-slot or queue indexed by send order) so it
does not depend on mssg being passed through?

> +	if (!skb) {
> +		netdev_warn_once(mctp_pcc_ndev->ndev,
> +				 "%s called with null message.\n", __func__);
> +		return;
> +	}
> +	memcpy_toio(outbox->chan->shmem, skb->data, skb->len);
> +}
> +
> +static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int rc)
> +{
> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
> +	struct pcpu_dstats *dstats;
> +	struct sk_buff *skb = mssg;
> +	unsigned long flags;
> +
> +	/*
> +	 * If there is a packet in flight during driver cleanup
> +	 * It may have been freed already.
> +	 */
> +	if (!mssg)
> +		return;

[Medium]
The comment describes a use-after-free hazard, but a NULL check does
not detect a freed pointer — freed memory is not NULL.

Looking at the actual cleanup path, mctp_pcc_stop() takes
chan->lock, swaps active_req to MBOX_NO_MSG and frees the skb, while
tx_tick() in drivers/mailbox/mailbox.c also reads-and-clears
active_req under the same lock before invoking tx_done. So tx_done
should never run with a freed pointer here, and this NULL check is
effectively dead.

Either the documented race is real and this guard does not actually
catch it (the subsequent dev_consume_skb_any(skb) and skb->len reads
would still touch freed memory), or the race cannot occur and the
comment is misleading. Could this be reworked so the comment matches
the actual invariants?

[ ... ]
> +}
> +
> +static int mctp_pcc_open(struct net_device *ndev)
> +{
> +	struct mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
> +	struct mctp_pcc_mailbox *outbox, *inbox;
> +
> +	outbox = &mctp_pcc_ndev->outbox;
> +	inbox = &mctp_pcc_ndev->inbox;
> +
> +	outbox->chan = pcc_mbox_request_channel(&outbox->client, outbox->index);
> +	if (IS_ERR(outbox->chan))
> +		return PTR_ERR(outbox->chan);
> +	if (outbox->chan->shmem_size < MCTP_PCC_MIN_SIZE) {
> +		pcc_mbox_free_channel(outbox->chan);
> +		return -EINVAL;
> +	}
> +
> +	inbox->client.rx_callback = mctp_pcc_client_rx_callback;
> +	inbox->chan = pcc_mbox_request_channel(&inbox->client, inbox->index);
> +	if (IS_ERR(inbox->chan)) {
> +		pcc_mbox_free_channel(outbox->chan);
> +		return PTR_ERR(inbox->chan);
> +	}
> +	if (inbox->chan->shmem_size < MCTP_PCC_MIN_SIZE) {
> +		pcc_mbox_free_channel(outbox->chan);
> +		pcc_mbox_free_channel(inbox->chan);
> +		return -EINVAL;
> +	}

[Medium]
On the shmem_size validation failure paths, outbox->chan and
inbox->chan are not reset to NULL after pcc_mbox_free_channel(),
which is the invariant mctp_pcc_stop() establishes elsewhere (it
sets the pointers to NULL after freeing). After ndo_open returns an
error those fields hold pointers to channels whose shmem has been
unmapped and whose chan->cl has been cleared. Should these be
NULLed for consistency?

> +	return 0;
> +}
> +
> +static int mctp_pcc_stop(struct net_device *ndev)
> +{
> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
> +	unsigned int count, idx;
> +	struct mbox_chan *chan;
> +	struct sk_buff *skb;
> +
> +	mctp_pcc_ndev = netdev_priv(ndev);
> +	chan = mctp_pcc_ndev->outbox.chan->mchan;
> +	pcc_mbox_free_channel(mctp_pcc_ndev->inbox.chan);
> +	mctp_pcc_ndev->inbox.chan = NULL;
> +	scoped_guard(spinlock_irqsave, &chan->lock) {
> +		if (chan->active_req !=  MBOX_NO_MSG) {
> +			skb = chan->active_req;
> +			chan->active_req = MBOX_NO_MSG;
> +			dev_dstats_tx_dropped(ndev);
> +			dev_consume_skb_any(skb);
> +		}
> +		while (chan->msg_count > 0) {
> +			count = chan->msg_count;
> +			idx = chan->msg_free;
> +			if (idx >= count)
> +				idx -= count;
> +			else
> +				idx += MBOX_TX_QUEUE_LEN - count;
> +			skb = chan->msg_data[idx];
> +			dev_dstats_tx_dropped(ndev);
> +			dev_consume_skb_any(skb);
> +			chan->msg_count--;
> +		}
> +	}

[High]
Should this driver be reaching into the private internals of struct
mbox_chan?

This block takes chan->lock (commented in mailbox_controller.h as
"Serialise access to the channel"), reads and clears chan->active_req,
walks chan->msg_data[] using a verbatim copy of msg_submit()'s tail
math:

    count = chan->msg_count;
    idx = chan->msg_free;
    if (idx >= count)
        idx -= count;
    else
        idx += MBOX_TX_QUEUE_LEN - count;

and decrements chan->msg_count. These fields are owned by
drivers/mailbox/mailbox.c and there is no public API for clients to
drain queued TX messages.

Note also that chan->msg_free is never advanced by this loop, leaving
the channel with msg_count=0 but msg_free unchanged — this only
happens to be safe because pcc_mbox_free_channel() runs immediately
afterwards.

Any future change to the mailbox queue layout, locking primitive, or
indexing semantics would silently produce leaks, double-frees, or
corruption in this driver only. Would it be feasible to add a generic
mbox_drain_channel() helper, or have the mailbox call tx_done with a
failure code on free, instead of duplicating this internal logic in a
client?

> +	pcc_mbox_free_channel(mctp_pcc_ndev->outbox.chan);
> +	mctp_pcc_ndev->outbox.chan = NULL;
[ ... ]
> +	return 0;
> +}
[ ... ]
> +static int initialize_mtu(struct net_device *ndev)
> +{
> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
> +	struct mctp_pcc_mailbox *outbox;
> +	struct pcc_mbox_chan *pchan;
> +	int mctp_pcc_max_mtu;
> +
> +	mctp_pcc_ndev = netdev_priv(ndev);
> +	outbox = &mctp_pcc_ndev->outbox;
> +	pchan = pcc_mbox_request_channel(&outbox->client, outbox->index);
> +	if (IS_ERR(pchan))
> +		return PTR_ERR(pchan);
> +	if (pchan->shmem_size < MCTP_MIN_MTU + sizeof(struct acpi_pcct_ext_pcc_shared_memory)) {
> +		pcc_mbox_free_channel(pchan);
> +		return -EINVAL;
> +	}
> +	mctp_pcc_max_mtu = pchan->shmem_size - sizeof(struct acpi_pcct_ext_pcc_shared_memory);
> +	pcc_mbox_free_channel(pchan);

[Medium]
Probe time acquires the outbox channel just to read shmem_size and
then immediately releases it; mctp_pcc_open() requests the same
channel again on first ifup. A few questions:

This maps and unmaps shmem twice. Could shmem_size be obtained
without binding the channel (for example by inspecting the PCCT
entry) or by acquiring the channel once and holding it across probe
and open?

There is also a small TOCTOU window between probe and ifup where
nothing prevents another mailbox client from grabbing the channel —
pcc_mbox_request_channel() returns -EBUSY if chan->cl is set, so a
later ndo_open would silently fail.

Finally, only the outbox shmem_size is validated at probe time; the
inbox is only validated on first open. Should both be checked at the
same point?

> +	ndev->mtu = MCTP_MIN_MTU;
> +	ndev->max_mtu = mctp_pcc_max_mtu;
> +	ndev->min_mtu = MCTP_MIN_MTU;
> +
> +	return 0;
> +}
> +
> +static int mctp_pcc_driver_add(struct acpi_device *acpi_dev)
> +{
[ ... ]
> +	mctp_pcc_ndev->acpi_device = acpi_dev;
> +	mctp_pcc_ndev->ndev = ndev;
> +	acpi_dev->driver_data = mctp_pcc_ndev;
> +
> +	rc = initialize_mtu(ndev);
> +	if (rc)
> +		goto free_netdev;
> +
> +	rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
> +	if (rc)
> +		goto free_netdev;
> +
> +	return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
> +free_netdev:
> +	free_netdev(ndev);
> +	return rc;
> +}

[Medium]
acpi_dev->driver_data is set to mctp_pcc_ndev before initialize_mtu()
and mctp_register_netdev() are called. On the goto free_netdev path,
free_netdev(ndev) frees the netdev along with the embedded
mctp_pcc_ndev, but acpi_dev->driver_data still points at that freed
memory. ACPI does not call .remove() when .add() returns non-zero,
so nothing zeroes the pointer afterwards.

Would it be safer to set acpi_dev->driver_data only after all the
fallible operations succeed, or to NULL it on the error path?

> +
> +static const struct acpi_device_id mctp_pcc_device_ids[] = {
> +	{ "DMT0001" },
> +	{}
> +};
> +
> +static struct acpi_driver mctp_pcc_driver = {
> +	.name = "mctp_pcc",
> +	.class = "Unknown",

[Low]
Is .class = "Unknown" intentional? The ACPI subsystem exposes this
string via sysfs and uses it for grouping. A more meaningful value
(for example "mctp_pcc") or omitting the field would seem
preferable for a new in-tree driver.

> +	.ids = mctp_pcc_device_ids,
> +	.ops = {
> +		.add = mctp_pcc_driver_add,
> +	},
> +};
-- 
This is an AI-generated review.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net-next v44] mctp pcc: Implement MCTP over PCC Transport
  2026-05-28  8:45 ` Paolo Abeni
@ 2026-05-28 18:26   ` Adam Young
  2026-05-29  1:44     ` Jeremy Kerr
  0 siblings, 1 reply; 6+ messages in thread
From: Adam Young @ 2026-05-28 18:26 UTC (permalink / raw)
  To: Paolo Abeni, admiyo
  Cc: jk, matt, andrew+netdev, davem, edumazet, kuba, netdev,
	linux-kernel, sudeep.holla, Jonathan.Cameron, lihuisong


On 5/28/26 04:45, Paolo Abeni wrote:
> From: AI Reviewer <ai@example.com>
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://netdev-ai.bots.linux.dev/sashiko/
> ---
> mctp pcc: Implement MCTP over PCC Transport
>
> This patch adds a new mctp-pcc network driver that implements the DMTF
> DSP0292 binding for MCTP over the ACPI Platform Communication Channel.
> Type 3 channels are used for transmit and Type 4 channels for receive,
> with each interface created from a DMT0001 ACPI device referencing two
> PCC subspaces.
>
> Link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
> Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/14_Platform_Communications_Channel/Platform_Comm_Channel.html
>
> A few questions inline below.
>
> [Low]
> The Link tag and the body of the commit message both reference
> DSP0292_1.0.0WIP50. Is it intentional to merge an in-tree binding
> based on a Work-In-Progress draft of the spec rather than the
> published 1.0.0?
>
> Anything userspace-visible that ships in this driver (the netdev name
> mctppccN, the ARPHRD type, the literal "MCTP" marker overloaded into
> the PCC command field) becomes a de-facto ABI once it lands in a
> stable kernel; if the published DSP0292 finalises any of those points
> differently, the kernel would have to keep bug-compatibility with the
> WIP variant. Could the rationale for picking the WIP baseline be
> captured in the commit message, or alternatively could the merge wait
> on the published revision?
>
>> diff --git a/drivers/net/mctp/Kconfig b/drivers/net/mctp/Kconfig
>> index cf325ab0b1ef..e68d23794a80 100644
>> --- a/drivers/net/mctp/Kconfig
>> +++ b/drivers/net/mctp/Kconfig
>> @@ -47,6 +47,22 @@ config MCTP_TRANSPORT_I3C
>>   	  A MCTP protocol network device is created for each I3C bus
>>   	  having a "mctp-controller" devicetree property.
>>   
>> +config MCTP_TRANSPORT_PCC
>> +	tristate "MCTP PCC transport"
>> +	depends on ACPI
>> +	depends on PCC
>> +	depends on 64BIT
>> +	depends on CPU_LITTLE_ENDIAN
> [Medium]
> Could the rationale for depends on 64BIT and depends on
> CPU_LITTLE_ENDIAN be added to the commit message?
>
> DSP0292 and ACPI PCC are not 64-bit-only, and the PCC shared memory
> header is fixed little-endian which can be handled with __le32 fields
> and le32_to_cpu()/readl() accessors so the driver runs on any host
> endianness. Other PCC consumers such as drivers/acpi/cppc_acpi.c and
> drivers/mailbox/pcc.c are not LE-only.
>
> If these depends on lines are present to paper over byte-handling
> choices in the driver, would it be preferable to use endian-correct
> accessors and drop the dependencies?

The ACPICA code used to access the PCC region and registers is imported 
an external project, and it hard codes the values to machine, not 
explicitly little endian.  So making this little-endian only ensures 
that the ordering is correct.  The only machines that are going to 
support MCTP over PCC are little endian and 64 bit, so this is strictly 
speaking correct.


>
>> +	help
>> +	  Provides a driver to access MCTP devices over PCC transport,
> [ ... ]
>
>> diff --git a/drivers/net/mctp/mctp-pcc.c b/drivers/net/mctp/mctp-pcc.c
>> new file mode 100644
>> index 000000000000..f2dd5286efa7
>> --- /dev/null
>> +++ b/drivers/net/mctp/mctp-pcc.c
>> @@ -0,0 +1,429 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * mctp-pcc.c - Driver for MCTP over PCC.
>> + * Copyright (c) 2024-2026, Ampere Computing LLC
>> + *
>> + */
>> +
>> +/* Implementation of MCTP over PCC DMTF Specification DSP0256
>> + * https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
>> + */
> [Low]
> The header comment says DSP0256 but the URL and the commit message
> both say DSP0292. DSP0256 is a different DMTF document (PLDM for
> Monitoring and Control). Should the comment read DSP0292?
>
> [ ... ]

The DMTF DSP0256 is the MCTP Host Interface Specification. It 
standardizes how Management Component Transport Protocol (MCTP) packets 
and messages are communicated between a system's host software and its 
internal management controllers (like a BMC or satellite controller). 
DSP0292 IS specific to the binding. Neither are PLDM. THis comment is 
correct


>
>> +#define MCTP_SIGNATURE          "MCTP"
>> +#define MCTP_SIGNATURE_LENGTH   (sizeof(MCTP_SIGNATURE) - 1)
>> +#define MCTP_MIN_MTU            68
>> +#define PCC_HEADER_SIZE         sizeof(struct acpi_pcct_ext_pcc_shared_memory)
>> +#define MCTP_PCC_MIN_SIZE       (PCC_HEADER_SIZE + MCTP_MIN_MTU)
>> +#define PCC_EXTRA_LEN           (PCC_HEADER_SIZE - sizeof(pcc_header.command))
> [Low]
> PCC_EXTRA_LEN refers to pcc_header, which is not a global symbol or
> type but a local variable inside mctp_pcc_client_rx_callback(). It
> expands successfully today only because every caller happens to have
> a pcc_header in scope. Would something like the following be more
> robust?
>
>    #define PCC_EXTRA_LEN (PCC_HEADER_SIZE - \
>        sizeof_field(struct acpi_pcct_ext_pcc_shared_memory, command))
>
> [ ... ]
>
>> +static void mctp_pcc_client_rx_callback(struct mbox_client *cl, void *mssg)
>> +{
>> +	struct acpi_pcct_ext_pcc_shared_memory pcc_header;
>> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
>> +	struct mctp_pcc_mailbox *inbox;
>> +	struct mctp_skb_cb *cb;
>> +	struct sk_buff *skb;
>> +	int size;
>> +
>> +	mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, inbox.client);
>> +	inbox = &mctp_pcc_ndev->inbox;
>> +	memcpy_fromio(&pcc_header, inbox->chan->shmem, sizeof(pcc_header));
> [High]
> Can this dereference a NULL shmem during teardown?
>
> Looking at pcc_mbox_free_channel() in drivers/mailbox/pcc.c, the
> channel's shmem is iounmapped (and the pointer cleared) before the
> shutdown path runs free_irq(). Until free_irq() returns, an inbound
> doorbell on another CPU can still fire pcc_mbox_irq, which calls
> mbox_chan_received_data(), which calls chan->cl->rx_callback (cl is
> still set at that moment) — landing here.
>
> Existing PCC clients such as drivers/acpi/acpi_pcc.c::pcc_rx_callback
> do not touch shmem from their callback, so this driver appears to be
> the first one to expose this ordering.
>
> Should the rx_callback bail out when inbox->chan->shmem is NULL, or
> should the ordering in pcc_mbox_free_channel() (free_irq before
> iounmap) be addressed in the mailbox layer first?

That needs to be fixed in the PCC layer, and is submitted in a different 
patch.  When that patch is accepted, this issue will be fixed.  I do not 
want to hold up this driver for that issue.

See

https://lore.kernel.org/linux-hwmon/20260522205220.237355-1-admiyo@os.amperecomputing.com/


>
>> +
>> +	// The message must at least have the PCC command indicating it is an MCTP
>> +	// message followed by the MCTP header, or we have a malformed message.
>> +	// This may be run on big endian system, but the data in the buffer is
>> +	// explicitly little endian.
> [Medium]
> This comment claims the callback may run on a big-endian system, but
> the Kconfig has depends on CPU_LITTLE_ENDIAN, and the u32 fields
> length, command and signature are read from MMIO via memcpy_fromio
> without any le32_to_cpu() or readl() byteswap. On big-endian, those
> header fields would be misinterpreted.
>
> Is the comment stale, or is the intent eventually to drop the
> CPU_LITTLE_ENDIAN dependency? If a future change relaxes the Kconfig
> dependency based on this comment, the driver will silently corrupt
> multi-byte header fields. Could the comment and the Kconfig be made
> consistent (and the code made endian-correct if the goal is to
> support BE)?

Yeah, the Big Endian reference is spurious.  If there is another 
iteration I will pull it.


>
>> +	if (pcc_header.length < sizeof(pcc_header.command) + sizeof(struct mctp_hdr))
>> +		goto error;
>> +
>> +	// If the reported size is larger than the shared memory minus headers,
>> +	// something is wrong and treat the buffer as corrupted data.
>> +	if (pcc_header.length > inbox->chan->shmem_size - PCC_EXTRA_LEN)
>> +		goto error;
>> +
>> +	if (memcmp(&pcc_header.command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH) != 0)
>> +		goto error;
> [Low]
> The TX path stamps pcc_header.signature with PCC_SIGNATURE |
> subspace_id but the RX path validates pcc_header.length and the
> "MCTP" command field only — signature is never checked. Would it be
> worth adding a signature check here for symmetry with TX and to
> reject obviously malformed peer input?

It couldn't hurt, but it is not essential.


>
>> +
>> +	size = pcc_header.length + PCC_EXTRA_LEN;
> [Medium]
> Is the type of size correct here?
>
> pcc_header.length is u32 and is sourced from peer-writable shared
> memory. PCC_EXTRA_LEN is a sizeof() expression of type size_t. The
> sum is a size_t that is then narrowed to a signed int.
>
> The only upper bound on pcc_header.length is the earlier check
> against inbox->chan->shmem_size - PCC_EXTRA_LEN. shmem_size is u64
> from the PCCT and is only checked to be >= MCTP_PCC_MIN_SIZE in
> mctp_pcc_open(). If firmware ever exposes a PCC channel whose
> shmem_size exceeds INT_MAX, pcc_header.length can be near 4 GiB and
> still pass that bound, after which size wraps to a small or negative
> int.
>
> The subsequent skb_pull(skb, sizeof(pcc_header)) on a too-small skb
> ends up in __skb_pull() which calls BUG(). Would using unsigned int
> or size_t for size, validating pcc_header.length against an
> INT_MAX-safe bound, and sanity-checking shmem_size in mctp_pcc_open()
> close this off?

The calculation is correct, and the different data types do not effect 
that.  Shared buffers are physical memory and they cannot be that large.


>
>> +	skb = netdev_alloc_skb(mctp_pcc_ndev->ndev, size);
>> +	if (!skb)
>> +		goto error;
>> +
>> +	skb_put(skb, size);
>> +	skb->protocol = htons(ETH_P_MCTP);
>> +	memcpy_fromio(skb->data, inbox->chan->shmem, size);
> [Medium, Low]
> Two concerns about reading from shmem here:
>
> First, the header bytes are read twice — once into the local
> pcc_header for validation, and again as the leading bytes of this
> memcpy_fromio() into skb->data. Between the two reads the remote
> endpoint can rewrite the shared region,

They are forbidden to do so by the protocol, which uses flags set in the 
registers.  If they do that, all bets are off, your platform is hosed, 
and there is no recovery.



>   so the bytes the driver
> validated and the bytes the MCTP layer parses are not guaranteed to
> match. A common hardening pattern is to snapshot once into a kernel
> buffer, validate from the snapshot, and then either copy the
> validated header into skb->data and memcpy_fromio() only the
> remaining payload, or do a single MMIO copy into the skb and validate
> from that copy.
Too expensive.  This is done this way explicitly for the sake of 
optimizing and not allocaing more memory than we need.
>
> Second, on weakly-ordered architectures (arm64 is included via 64BIT)
> memcpy_fromio is a byte/word loop without a final dma_rmb() and the
> PCC mailbox layer does not insert a barrier on the consumer side
> before invoking the rx callback. Would a dma_rmb() between the IRQ
> and consuming buffer contents (and between successive memcpy_fromio()
> calls of the same region) be appropriate here?

If so, it needs to be done for the entire PCC Mailbox infrastructure, 
and would not be appropriate in this patch.


>
>> +	dev_dstats_rx_add(mctp_pcc_ndev->ndev, size);
>> +	skb_pull(skb, sizeof(pcc_header));
>> +	skb_reset_mac_header(skb);
>> +	skb_reset_network_header(skb);
>> +	cb = __mctp_cb(skb);
>> +	cb->halen = 0;
>> +	netif_rx(skb);
>> +	return;
>> +
>> +error:
>> +	dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
>> +}
>> +
>> +static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +	struct acpi_pcct_ext_pcc_shared_memory *pcc_header;
>> +	struct mctp_pcc_ndev *mpnd = netdev_priv(ndev);
>> +	int len = skb->len;
>> +
>> +	if (skb_cow_head(skb, sizeof(*pcc_header)))
>> +		goto error;
>> +
>> +	pcc_header = skb_push(skb, sizeof(*pcc_header));
>> +	pcc_header->signature = PCC_SIGNATURE | mpnd->outbox.index;
>> +	pcc_header->flags = PCC_CMD_COMPLETION_NOTIFY;
>> +	memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
>> +	pcc_header->length = len + MCTP_SIGNATURE_LENGTH;
>> +
>> +	if (skb->len > mpnd->outbox.chan->shmem_size)
>> +		goto error;
> [Low]
> This length check is performed after skb_cow_head() and skb_push()
> have already mutated the skb. It also looks unreachable in normal
> operation: net core enforces skb->len <= dev->mtu, and
> initialize_mtu() sets max_mtu = shmem_size - sizeof(pcc_header), so
> after the skb_push the resulting length is at most shmem_size.
>
> Could the check either be moved before the mutation (comparing
> len + sizeof(*pcc_header) against shmem_size) or removed?
This is just paranoid programming. It is fine here.  Moving it makes no 
difference.
>
>> +
> [ ... ]
>> +	netif_stop_queue(ndev);
>> +	if (mbox_send_message(mpnd->outbox.chan->mchan, skb) >= 0) {
>> +		netif_wake_queue(ndev);
>> +	} else {
>> +		// Remove the header in case it gets sent again
>> +		skb_pull(skb, sizeof(*pcc_header));
>> +		return NETDEV_TX_BUSY;
>> +	}
>> +	return NETDEV_TX_OK;
> [ ... ]
>> +}
>> +
>> +static void mctp_pcc_tx_prepare(struct mbox_client *cl, void *mssg)
>> +{
>> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
>> +	struct mctp_pcc_mailbox *outbox;
>> +	struct sk_buff *skb = mssg;
>> +
>> +	mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, outbox.client);
>> +	outbox = &mctp_pcc_ndev->outbox;
>> +
>> +	/* The PCC Mailbox typically does not make use of the mssg pointer
>> +	 * The mctp-over pcc driver is the only client that uses it.
>> +	 * This value should always be non-null; it is possible
>> +	 * that a change in the Mailbox level will break that assumption.
>> +	 */
> [Medium]
> The comment effectively concedes that this driver depends on
> behaviour that other PCC clients (cppc, acpi_pcc, hisi_uncore) do not
> rely on, and that a change in the mailbox layer could break it.
> Building a new in-tree driver on an undocumented contract is fragile
> — if PCC core ever ignores or sanitises mssg, this driver's TX path
> would silently leak SKBs or crash.
>
> Could the mssg semantics be formalised in the mailbox/PCC API, or
> could the driver instead track outstanding SKBs in its own state
> (for example, a single-slot or queue indexed by send order) so it
> does not depend on mssg being passed through?
THat would rewquire a rewrite of the Mailbox code, and I do not want to 
hold up this patch for that. I initially tried somethign along those 
lines but it was rejected by the mailbox maintainer.
>
>> +	if (!skb) {
>> +		netdev_warn_once(mctp_pcc_ndev->ndev,
>> +				 "%s called with null message.\n", __func__);
>> +		return;
>> +	}
>> +	memcpy_toio(outbox->chan->shmem, skb->data, skb->len);
>> +}
>> +
>> +static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int rc)
>> +{
>> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
>> +	struct pcpu_dstats *dstats;
>> +	struct sk_buff *skb = mssg;
>> +	unsigned long flags;
>> +
>> +	/*
>> +	 * If there is a packet in flight during driver cleanup
>> +	 * It may have been freed already.
>> +	 */
>> +	if (!mssg)
>> +		return;
> [Medium]
> The comment describes a use-after-free hazard, but a NULL check does
> not detect a freed pointer — freed memory is not NULL.
But the code that frees the buffer sets the value to MBOX_NO_MS.
>
> Looking at the actual cleanup path, mctp_pcc_stop() takes
> chan->lock, swaps active_req to MBOX_NO_MSG and frees the skb, while
> tx_tick() in drivers/mailbox/mailbox.c also reads-and-clears
> active_req under the same lock before invoking tx_done. So tx_done
> should never run with a freed pointer here, and this NULL check is
> effectively dead.
>
> Either the documented race is real and this guard does not actually
> catch it (the subsequent dev_consume_skb_any(skb) and skb->len reads
> would still touch freed memory), or the race cannot occur and the
> comment is misleading. Could this be reworked so the comment matches
> the actual invariants?

It was written before the change to setting active_req=MBOX_NO_MSG but 
the result is the same.


>
> [ ... ]
>> +}
>> +
>> +static int mctp_pcc_open(struct net_device *ndev)
>> +{
>> +	struct mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
>> +	struct mctp_pcc_mailbox *outbox, *inbox;
>> +
>> +	outbox = &mctp_pcc_ndev->outbox;
>> +	inbox = &mctp_pcc_ndev->inbox;
>> +
>> +	outbox->chan = pcc_mbox_request_channel(&outbox->client, outbox->index);
>> +	if (IS_ERR(outbox->chan))
>> +		return PTR_ERR(outbox->chan);
>> +	if (outbox->chan->shmem_size < MCTP_PCC_MIN_SIZE) {
>> +		pcc_mbox_free_channel(outbox->chan);
>> +		return -EINVAL;
>> +	}
>> +
>> +	inbox->client.rx_callback = mctp_pcc_client_rx_callback;
>> +	inbox->chan = pcc_mbox_request_channel(&inbox->client, inbox->index);
>> +	if (IS_ERR(inbox->chan)) {
>> +		pcc_mbox_free_channel(outbox->chan);
>> +		return PTR_ERR(inbox->chan);
>> +	}
>> +	if (inbox->chan->shmem_size < MCTP_PCC_MIN_SIZE) {
>> +		pcc_mbox_free_channel(outbox->chan);
>> +		pcc_mbox_free_channel(inbox->chan);
>> +		return -EINVAL;
>> +	}
> [Medium]
> On the shmem_size validation failure paths, outbox->chan and
> inbox->chan are not reset to NULL after pcc_mbox_free_channel(),
> which is the invariant mctp_pcc_stop() establishes elsewhere (it
> sets the pointers to NULL after freeing). After ndo_open returns an
> error those fields hold pointers to channels whose shmem has been
> unmapped and whose chan->cl has been cleared. Should these be
> NULLed for consistency?
Yes, for consistency,  but it does not break things if they are not set 
to NULL there.
>
>> +	return 0;
>> +}
>> +
>> +static int mctp_pcc_stop(struct net_device *ndev)
>> +{
>> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
>> +	unsigned int count, idx;
>> +	struct mbox_chan *chan;
>> +	struct sk_buff *skb;
>> +
>> +	mctp_pcc_ndev = netdev_priv(ndev);
>> +	chan = mctp_pcc_ndev->outbox.chan->mchan;
>> +	pcc_mbox_free_channel(mctp_pcc_ndev->inbox.chan);
>> +	mctp_pcc_ndev->inbox.chan = NULL;
>> +	scoped_guard(spinlock_irqsave, &chan->lock) {
>> +		if (chan->active_req !=  MBOX_NO_MSG) {
>> +			skb = chan->active_req;
>> +			chan->active_req = MBOX_NO_MSG;
>> +			dev_dstats_tx_dropped(ndev);
>> +			dev_consume_skb_any(skb);
>> +		}
>> +		while (chan->msg_count > 0) {
>> +			count = chan->msg_count;
>> +			idx = chan->msg_free;
>> +			if (idx >= count)
>> +				idx -= count;
>> +			else
>> +				idx += MBOX_TX_QUEUE_LEN - count;
>> +			skb = chan->msg_data[idx];
>> +			dev_dstats_tx_dropped(ndev);
>> +			dev_consume_skb_any(skb);
>> +			chan->msg_count--;
>> +		}
>> +	}
> [High]
> Should this driver be reaching into the private internals of struct
> mbox_chan?
It has no choice.
>
> This block takes chan->lock (commented in mailbox_controller.h as
> "Serialise access to the channel"), reads and clears chan->active_req,
> walks chan->msg_data[] using a verbatim copy of msg_submit()'s tail
> math:
>
>      count = chan->msg_count;
>      idx = chan->msg_free;
>      if (idx >= count)
>          idx -= count;
>      else
>          idx += MBOX_TX_QUEUE_LEN - count;
>
> and decrements chan->msg_count. These fields are owned by
> drivers/mailbox/mailbox.c and there is no public API for clients to
> drain queued TX messages.
And changes there could break this driver, but there is no documented 
way to free memory in the ring buffer.
>
> Note also that chan->msg_free is never advanced by this loop, leaving
> the channel with msg_count=0 but msg_free unchanged — this only
> happens to be safe because pcc_mbox_free_channel() runs immediately
> afterwards.
>
> Any future change to the mailbox queue layout, locking primitive, or
> indexing semantics would silently produce leaks, double-frees, or
> corruption in this driver only. Would it be feasible to add a generic
> mbox_drain_channel() helper, or have the mailbox call tx_done with a
> failure code on free, instead of duplicating this internal logic in a
> client?
>
>> +	pcc_mbox_free_channel(mctp_pcc_ndev->outbox.chan);
>> +	mctp_pcc_ndev->outbox.chan = NULL;
> [ ... ]
>> +	return 0;
>> +}
> [ ... ]
>> +static int initialize_mtu(struct net_device *ndev)
>> +{
>> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
>> +	struct mctp_pcc_mailbox *outbox;
>> +	struct pcc_mbox_chan *pchan;
>> +	int mctp_pcc_max_mtu;
>> +
>> +	mctp_pcc_ndev = netdev_priv(ndev);
>> +	outbox = &mctp_pcc_ndev->outbox;
>> +	pchan = pcc_mbox_request_channel(&outbox->client, outbox->index);
>> +	if (IS_ERR(pchan))
>> +		return PTR_ERR(pchan);
>> +	if (pchan->shmem_size < MCTP_MIN_MTU + sizeof(struct acpi_pcct_ext_pcc_shared_memory)) {
>> +		pcc_mbox_free_channel(pchan);
>> +		return -EINVAL;
>> +	}
>> +	mctp_pcc_max_mtu = pchan->shmem_size - sizeof(struct acpi_pcct_ext_pcc_shared_memory);
>> +	pcc_mbox_free_channel(pchan);
> [Medium]
> Probe time acquires the outbox channel just to read shmem_size and
> then immediately releases it; mctp_pcc_open() requests the same
> channel again on first ifup. A few questions:

Again, fixing this requires a fix in the PCC layer, and I am not willing 
to hold up the driver for that.

The fix is to provide a query function that does not open the driver.  I 
have implemented that and will submit it after this driver gets merged.


>
> This maps and unmaps shmem twice. Could shmem_size be obtained
> without binding the channel (for example by inspecting the PCCT
> entry) or by acquiring the channel once and holding it across probe
> and open?
>
> There is also a small TOCTOU window between probe and ifup where
> nothing prevents another mailbox client from grabbing the channel —
> pcc_mbox_request_channel() returns -EBUSY if chan->cl is set, so a
> later ndo_open would silently fail.
This is an issue in the mailbox API, and can be fixed by ensuring proper 
seriliazation there.  It cannot be addresssed in this driver.
>
> Finally, only the outbox shmem_size is validated at probe time; the
> inbox is only validated on first open. Should both be checked at the
> same point?
No need.  This is better if done using a query function which removes 
the chance of triggering a remote message send on the inbox channel.
>
>> +	ndev->mtu = MCTP_MIN_MTU;
>> +	ndev->max_mtu = mctp_pcc_max_mtu;
>> +	ndev->min_mtu = MCTP_MIN_MTU;
>> +
>> +	return 0;
>> +}
>> +
>> +static int mctp_pcc_driver_add(struct acpi_device *acpi_dev)
>> +{
> [ ... ]
>> +	mctp_pcc_ndev->acpi_device = acpi_dev;
>> +	mctp_pcc_ndev->ndev = ndev;
>> +	acpi_dev->driver_data = mctp_pcc_ndev;
>> +
>> +	rc = initialize_mtu(ndev);
>> +	if (rc)
>> +		goto free_netdev;
>> +
>> +	rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
>> +	if (rc)
>> +		goto free_netdev;
>> +
>> +	return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
>> +free_netdev:
>> +	free_netdev(ndev);
>> +	return rc;
>> +}
> [Medium]
> acpi_dev->driver_data is set to mctp_pcc_ndev before initialize_mtu()
> and mctp_register_netdev() are called. On the goto free_netdev path,
> free_netdev(ndev) frees the netdev along with the embedded
> mctp_pcc_ndev, but acpi_dev->driver_data still points at that freed
> memory. ACPI does not call .remove() when .add() returns non-zero,
> so nothing zeroes the pointer afterwards.
>
> Would it be safer to set acpi_dev->driver_data only after all the
> fallible operations succeed, or to NULL it on the error path?
Possibly, but not essential.  It would make the start up code more 
complex and harder to read and to clean up.
>
>> +
>> +static const struct acpi_device_id mctp_pcc_device_ids[] = {
>> +	{ "DMT0001" },
>> +	{}
>> +};
>> +
>> +static struct acpi_driver mctp_pcc_driver = {
>> +	.name = "mctp_pcc",
>> +	.class = "Unknown",
> [Low]
> Is .class = "Unknown" intentional? The ACPI subsystem exposes this
> string via sysfs and uses it for grouping. A more meaningful value
> (for example "mctp_pcc") or omitting the field would seem
> preferable for a new in-tree driver.
There is no accepted value for this yet.  Unknown is appropriate for now.
>
>> +	.ids = mctp_pcc_device_ids,
>> +	.ops = {
>> +		.add = mctp_pcc_driver_add,
>> +	},
>> +};


I do not see any required changes based on this review.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net-next v44] mctp pcc: Implement MCTP over PCC Transport
  2026-05-28 18:26   ` Adam Young
@ 2026-05-29  1:44     ` Jeremy Kerr
  2026-06-01 16:01       ` Adam Young
  0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Kerr @ 2026-05-29  1:44 UTC (permalink / raw)
  To: Adam Young, Paolo Abeni, admiyo
  Cc: matt, andrew+netdev, davem, edumazet, kuba, netdev, linux-kernel,
	sudeep.holla, Jonathan.Cameron, lihuisong

Hi Adam,

> > [Low]
> > The Link tag and the body of the commit message both reference
> > DSP0292_1.0.0WIP50. Is it intentional to merge an in-tree binding
> > based on a Work-In-Progress draft of the spec rather than the
> > published 1.0.0?

This is probably worth an update to reference the non-WIP spec version.

> The ACPICA code used to access the PCC region and registers is imported 
> an external project, and it hard codes the values to machine, not 
> explicitly little endian.  So making this little-endian only ensures 
> that the ordering is correct.  The only machines that are going to 
> support MCTP over PCC are little endian and 64 bit, so this is strictly 
> speaking correct.

The configuration is less about "which machines you think might use
this", but more "what does the implementation require".

The LE condition makes sense, as the driver does not support BE. But
what is requiring 64-bit?

> > The header comment says DSP0256 but the URL and the commit message
> > both say DSP0292. DSP0256 is a different DMTF document (PLDM for
> > Monitoring and Control). Should the comment read DSP0292?
> > 
> > [ ... ]
> 
> The DMTF DSP0256 is the MCTP Host Interface Specification. It 
> standardizes how Management Component Transport Protocol (MCTP) packets 
> and messages are communicated between a system's host software and its 
> internal management controllers (like a BMC or satellite controller).
> DSP0292 IS specific to the binding. Neither are PLDM. THis comment is
> correct

The comment is:

  Implementation of MCTP over PCC DMTF Specification DSP0256

But DSP0256 is not the "MCTP over PCC" transport binding spec. Either
mention the host interface specifically, or use the correct number for
the PCC spec. I think the latter would be best, as the host interface
spec seems less relevant here.

> > > +static void mctp_pcc_client_rx_callback(struct mbox_client *cl, void *mssg)
> > > +{
> > > +	struct acpi_pcct_ext_pcc_shared_memory pcc_header;
> > > +	struct mctp_pcc_ndev *mctp_pcc_ndev;
> > > +	struct mctp_pcc_mailbox *inbox;
> > > +	struct mctp_skb_cb *cb;
> > > +	struct sk_buff *skb;
> > > +	int size;
> > > +
> > > +	mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, inbox.client);
> > > +	inbox = &mctp_pcc_ndev->inbox;
> > > +	memcpy_fromio(&pcc_header, inbox->chan->shmem, sizeof(pcc_header));
> > [High]
> > Can this dereference a NULL shmem during teardown?
> > 
> > Looking at pcc_mbox_free_channel() in drivers/mailbox/pcc.c, the
> > channel's shmem is iounmapped (and the pointer cleared) before the
> > shutdown path runs free_irq(). Until free_irq() returns, an inbound
> > doorbell on another CPU can still fire pcc_mbox_irq, which calls
> > mbox_chan_received_data(), which calls chan->cl->rx_callback (cl is
> > still set at that moment) — landing here.
> > 
> > Existing PCC clients such as drivers/acpi/acpi_pcc.c::pcc_rx_callback
> > do not touch shmem from their callback, so this driver appears to be
> > the first one to expose this ordering.
> > 
> > Should the rx_callback bail out when inbox->chan->shmem is NULL, or
> > should the ordering in pcc_mbox_free_channel() (free_irq before
> > iounmap) be addressed in the mailbox layer first?
> 
> That needs to be fixed in the PCC layer, and is submitted in a different 
> patch.  When that patch is accepted, this issue will be fixed.  I do not 
> want to hold up this driver for that issue.

Understood that there's a fix in development for this, but we can't
introduce a potential null pointer dereference in the interim.

If you want this merged before the fix, how about a guard for a valid
shmem pointer?

> Yeah, the Big Endian reference is spurious.  If there is another 
> iteration I will pull it.

Sounds good.

Cheers,


Jeremy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net-next v44] mctp pcc: Implement MCTP over PCC Transport
  2026-05-29  1:44     ` Jeremy Kerr
@ 2026-06-01 16:01       ` Adam Young
  2026-06-02  4:42         ` Jeremy Kerr
  0 siblings, 1 reply; 6+ messages in thread
From: Adam Young @ 2026-06-01 16:01 UTC (permalink / raw)
  To: Jeremy Kerr, Paolo Abeni, admiyo
  Cc: matt, andrew+netdev, davem, edumazet, kuba, netdev, linux-kernel,
	sudeep.holla, Jonathan.Cameron, lihuisong


On 5/28/26 21:44, Jeremy Kerr wrote:
> Hi Adam,
>
>>> [Low]
>>> The Link tag and the body of the commit message both reference
>>> DSP0292_1.0.0WIP50. Is it intentional to merge an in-tree binding
>>> based on a Work-In-Progress draft of the spec rather than the
>>> published 1.0.0?
> This is probably worth an update to reference the non-WIP spec version.

Wilco

>
>> The ACPICA code used to access the PCC region and registers is imported
>> an external project, and it hard codes the values to machine, not
>> explicitly little endian.  So making this little-endian only ensures
>> that the ordering is correct.  The only machines that are going to
>> support MCTP over PCC are little endian and 64 bit, so this is strictly
>> speaking correct.
> The configuration is less about "which machines you think might use
> this", but more "what does the implementation require".
>
> The LE condition makes sense, as the driver does not support BE. But
> what is requiring 64-bit?
The code itself will not, as written, work on a 32 Bit system, as there 
are 64 bit specific code.  There is no polan to support 64 bit nor a way 
to test it currently, so this is preventative.  Should this change in 
the future, the driver will need to be modified. Limiting the scope.
>
>>> The header comment says DSP0256 but the URL and the commit message
>>> both say DSP0292. DSP0256 is a different DMTF document (PLDM for
>>> Monitoring and Control). Should the comment read DSP0292?
>>>
>>> [ ... ]
>> The DMTF DSP0256 is the MCTP Host Interface Specification. It
>> standardizes how Management Component Transport Protocol (MCTP) packets
>> and messages are communicated between a system's host software and its
>> internal management controllers (like a BMC or satellite controller).
>> DSP0292 IS specific to the binding. Neither are PLDM. THis comment is
>> correct
> The comment is:
>
>    Implementation of MCTP over PCC DMTF Specification DSP0256
>
> But DSP0256 is not the "MCTP over PCC" transport binding spec. Either
> mention the host interface specifically, or use the correct number for
> the PCC spec. I think the latter would be best, as the host interface
> spec seems less relevant here.

Yeah, just going to drop that line as it is redundant with other info in 
the header.

>
>>>> +static void mctp_pcc_client_rx_callback(struct mbox_client *cl, void *mssg)
>>>> +{
>>>> +	struct acpi_pcct_ext_pcc_shared_memory pcc_header;
>>>> +	struct mctp_pcc_ndev *mctp_pcc_ndev;
>>>> +	struct mctp_pcc_mailbox *inbox;
>>>> +	struct mctp_skb_cb *cb;
>>>> +	struct sk_buff *skb;
>>>> +	int size;
>>>> +
>>>> +	mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, inbox.client);
>>>> +	inbox = &mctp_pcc_ndev->inbox;
>>>> +	memcpy_fromio(&pcc_header, inbox->chan->shmem, sizeof(pcc_header));
>>> [High]
>>> Can this dereference a NULL shmem during teardown?
>>>
>>> Looking at pcc_mbox_free_channel() in drivers/mailbox/pcc.c, the
>>> channel's shmem is iounmapped (and the pointer cleared) before the
>>> shutdown path runs free_irq(). Until free_irq() returns, an inbound
>>> doorbell on another CPU can still fire pcc_mbox_irq, which calls
>>> mbox_chan_received_data(), which calls chan->cl->rx_callback (cl is
>>> still set at that moment) — landing here.
>>>
>>> Existing PCC clients such as drivers/acpi/acpi_pcc.c::pcc_rx_callback
>>> do not touch shmem from their callback, so this driver appears to be
>>> the first one to expose this ordering.
>>>
>>> Should the rx_callback bail out when inbox->chan->shmem is NULL, or
>>> should the ordering in pcc_mbox_free_channel() (free_irq before
>>> iounmap) be addressed in the mailbox layer first?
>> That needs to be fixed in the PCC layer, and is submitted in a different
>> patch.  When that patch is accepted, this issue will be fixed.  I do not
>> want to hold up this driver for that issue.
> Understood that there's a fix in development for this, but we can't
> introduce a potential null pointer dereference in the interim.
>
> If you want this merged before the fix, how about a guard for a valid
> shmem pointer?

The mutux used to control access to the create-teardown path is static 
to drivers/mailbox/mailbox.c

static DEFINE_MUTEX(con_mutex);

I could run mincore, but we would still have a race condition 
immediately after the mincore call.

The patch that this gates on is:

https://lore.kernel.org/lkml/20260522205220.237355-1-admiyo@os.amperecomputing.com/

Please comment on that one to move it along.


>
>> Yeah, the Big Endian reference is spurious.  If there is another
>> iteration I will pull it.
> Sounds good.
>
> Cheers,
>
>
> Jeremy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [net-next v44] mctp pcc: Implement MCTP over PCC Transport
  2026-06-01 16:01       ` Adam Young
@ 2026-06-02  4:42         ` Jeremy Kerr
  0 siblings, 0 replies; 6+ messages in thread
From: Jeremy Kerr @ 2026-06-02  4:42 UTC (permalink / raw)
  To: Adam Young, Paolo Abeni, admiyo
  Cc: matt, andrew+netdev, davem, edumazet, kuba, netdev, linux-kernel,
	sudeep.holla, Jonathan.Cameron, lihuisong

Hi Adam,

> > The LE condition makes sense, as the driver does not support BE. But
> > what is requiring 64-bit?
> The code itself will not, as written, work on a 32 Bit system, as there 
> are 64 bit specific code.

Yeah, that was more my question - what is 64-bit specific about the
code?

> > The comment is:
> > 
> >    Implementation of MCTP over PCC DMTF Specification DSP0256
> > 
> > But DSP0256 is not the "MCTP over PCC" transport binding spec. Either
> > mention the host interface specifically, or use the correct number for
> > the PCC spec. I think the latter would be best, as the host interface
> > spec seems less relevant here.
> 
> Yeah, just going to drop that line as it is redundant with other info in 
> the header.

OK.

> > > > > +static void mctp_pcc_client_rx_callback(struct mbox_client *cl, void *mssg)
> > > > > +{
> > > > > +       struct acpi_pcct_ext_pcc_shared_memory pcc_header;
> > > > > +       struct mctp_pcc_ndev *mctp_pcc_ndev;
> > > > > +       struct mctp_pcc_mailbox *inbox;
> > > > > +       struct mctp_skb_cb *cb;
> > > > > +       struct sk_buff *skb;
> > > > > +       int size;
> > > > > +
> > > > > +       mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, inbox.client);
> > > > > +       inbox = &mctp_pcc_ndev->inbox;
> > > > > +       memcpy_fromio(&pcc_header, inbox->chan->shmem, sizeof(pcc_header));
> > > > [High]
> > > > Can this dereference a NULL shmem during teardown?
> > > > 
> > > > Looking at pcc_mbox_free_channel() in drivers/mailbox/pcc.c, the
> > > > channel's shmem is iounmapped (and the pointer cleared) before the
> > > > shutdown path runs free_irq(). Until free_irq() returns, an inbound
> > > > doorbell on another CPU can still fire pcc_mbox_irq, which calls
> > > > mbox_chan_received_data(), which calls chan->cl->rx_callback (cl is
> > > > still set at that moment) — landing here.
> > > > 
> > > > Existing PCC clients such as drivers/acpi/acpi_pcc.c::pcc_rx_callback
> > > > do not touch shmem from their callback, so this driver appears to be
> > > > the first one to expose this ordering.
> > > > 
> > > > Should the rx_callback bail out when inbox->chan->shmem is NULL, or
> > > > should the ordering in pcc_mbox_free_channel() (free_irq before
> > > > iounmap) be addressed in the mailbox layer first?
> > > That needs to be fixed in the PCC layer, and is submitted in a different
> > > patch.  When that patch is accepted, this issue will be fixed.  I do not
> > > want to hold up this driver for that issue.
> > Understood that there's a fix in development for this, but we can't
> > introduce a potential null pointer dereference in the interim.
> > 
> > If you want this merged before the fix, how about a guard for a valid
> > shmem pointer?
> 
> The mutux used to control access to the create-teardown path is static 
> to drivers/mailbox/mailbox.c

You can't acquire that mutex here anyway, you're in atomic context.

> I could run mincore

No, that wouldn't work (and would be moderately horrific)

Since you did not answer the guard question, are you implying that the
shmem pointer may be unstable *during* the rx_complete call? (ie.,
rather than being invalidated before the call, and so a simple `if
(chan->shmem)` guard would not suffice?)

The Sashiko review seems to be incorrect in that you are not the first
user of the shmem data; there are other uses of chan->shmem that do not
seem to have any intrinsic exclusion between the rx callback and the
free_channel() paths, nor check the validity of shmem in their rx
callback.

However, the closest existing use - the xgene i2c driver - seems like it
can not free the pcc mailbox while a rx irq might occur, with the i2c
locking in place. You could do something similar here, possibly via a
spinlock acquired in both the ndo_stop handler, and the rx callback.

But if your proposed change to the iounmap() ordering works, that would
be a neater solution here.

> The patch that this gates on is:
> 
> https://lore.kernel.org/lkml/20260522205220.237355-1-admiyo@os.amperecomputing.com/
> 
> Please comment on that one to move it along.

As you can probably tell, the pcc mailbox interface is definitely not my
area of expertise.

Cheers,


Jeremy

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-02  4:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 19:36 [net-next v44] mctp pcc: Implement MCTP over PCC Transport Adam Young
2026-05-28  8:45 ` Paolo Abeni
2026-05-28 18:26   ` Adam Young
2026-05-29  1:44     ` Jeremy Kerr
2026-06-01 16:01       ` Adam Young
2026-06-02  4:42         ` Jeremy Kerr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox