* [PATCH net-next v27 0/1] MCTP Over PCC Transport
@ 2025-08-28 4:33 Adam Young
2025-08-28 4:33 ` [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over " Adam Young
0 siblings, 1 reply; 8+ messages in thread
From: Adam Young @ 2025-08-28 4:33 UTC (permalink / raw)
Cc: netdev, linux-kernel, Jeremy Kerr, Matt Johnston,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Sudeep Holla, Jonathan Cameron, Huisong Li
This series adds support for the Management Control Transport Protocol (MCTP)
over the Platform Communication Channel (PCC) mechanism.
DMTF DSP:0292
https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
MCTP defines a communication model intended to
facilitate communication between Management controllers
and other management controllers, and between Management
controllers and management devices
PCC is a mechanism for communication between components within
the Platform. It is a composed of shared memory regions,
interrupt registers, and status registers.
The MCTP over PCC driver makes use of two PCC channels. For
sending messages, it uses a Type 3 channel, and for receiving
messages it uses the paired Type 4 channel. The device
and corresponding channels are specified via ACPI.
The first patch in the series implements a mechanism to allow the driver
to indicate whether an ACK should be sent back to the caller
after processing the interrupt. This is an optional feature in
the PCC code, but has been made explicitly required in another driver.
The implementation here maintains the backwards compatibility of that
driver.
MCTP is a general purpose protocol so it would be impossible to enumerate
all the use cases, but some of the ones that are most topical are attestation
and RAS support. There are a handful of protocols built on top of MCTP, to
include PLDM and SPDM, both specified by the DMTF.
https://www.dmtf.org/sites/default/files/standards/documents/DSP0240_1.0.0.pdf
https://www.dmtf.org/sites/default/files/standards/documents/DSP0274_1.3.0.pd
SPDM entails various usages, including device identity collection, device
authentication, measurement collection, and device secure session establishment.
PLDM is more likely to be used for hardware support: temperature, voltage, or
fan sensor control.
At least two companies have devices that can make use of the mechanism. One is
Ampere Computing, my employer.
The mechanism it uses is called Platform Communication Channels is part of the
ACPI spec: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/14_Platform_Communications_Channel/Platform_Comm_Channel.html
Since it is a socket interface, the system administrator also has the ability
to ignore an MCTP link that they do not want to enable. This link would be visible
to the end user, but would not be usable.
If MCTP support is disabled in the Kernel, this driver would also be disabled.
PCC is based on a shared buffer and a set of I/O mapped memory locations that the
Spec calls registers. This mechanism exists regardless of the existence of the
driver. Thus, if the user has the ability to map these physical location to
virtual locations, they have the ability to drive the hardware. Thus, there
is a security aspect to this mechanism that extends beyond the responsibilities
of the operating system.
If the hardware does not expose the PCC in the ACPI table, this device will never
be enabled. Thus it is only an issue on hard that does support PCC. In that case,
it is up to the remote controller to sanitize communication; MCTP will be exposed
as a socket interface, and userland can send any crafted packet it wants. It would
thus also be incumbent on the hardware manufacturer to allow the end user to disable
MCTP over PCC communication if they did not want to expose it.
Previous implementations of the pcc version of the mailbox protocol assumed the
driver was directly managing the shared memory region. This lead to duplicated
code and missed steps of the PCC protocol. The first patch in this series makes
it possible for mailbox/pcc to manage the writing of the buffer prior to sending
messages. It also fixes the notification of message transmission completion.
Previous Version:
https://lore.kernel.org/lkml/20250827044810.152775-1-admiyo@os.amperecomputing.com/
Changes in V27:
- Stop and restart packet Queues to deal with a full ring buffer
- drop the 'i' from the middle of the link name
- restore the allocation and freeing of the channel to the driver add/remove functions
leaving only the queue draining in the ndo stop function
Changes in V26:
- Remove the addition net-device spinlock and use the spinlock already present in skb lists
- Use temporary variables to check for success finding the skb in the lists
- Remove comment that is no longer relevant
Changes in V25:
- Use spin lock to control access to queues of sk_buffs
- removed unused constants
- added ndo_open and ndo_stop functions. These two functions do
channel creation and cleanup, to remove packets from the mailbox.
They do queue cleanup as well.
- No longer cleans up the channel from the device.
Changes in V24:
- Removed endianess for PCC header values
- Kept Column width to under 80 chars
- Typo in commit message
- Prereqisite patch for PCC buffer management was merged late in 6.17.
See "mailbox/pcc: support mailbox management of the shared buffer"
Changes in V23:
- Trigger for direct management of shared buffer based on flag in pcc channel
- Only initialize rx_alloc for inbox, not outbox.
- Read value for requested IRQ flag out of channel's current_req
- unqueue an sk_buff that failed to send
- Move error handling for skb resize error inline instead of goto
Changes in V22:
- Direct management of the shared buffer in the mailbox layer.
- Proper checking of command complete flag prior to writing to the buffer.
Changes in V21:
- Use existing constants PCC_SIGNATURE and PCC_CMD_COMPLETION_NOTIFY
- Check return code on call to send_data and drop packet if failed
- use sizeof(*mctp_pcc_header) etc, instead of structs for resizing buffers
- simplify check for ares->type != PCC_DWORD_TYPE
- simply return result devm_add_action_or_reset
- reduce initializer for mctp_pcc_lookup_context context = {};
- move initialization of mbox dev into mctp_pcc_initialize_mailbox
- minor spacing changes
Changes in V20:
- corrected typo in RFC version
- removed spurious space
- tx spin lock only controls access to shared memory buffer
- tx spin lock not eheld on error condition
- tx returns OK if skb can't be expanded
Changes in V19:
- Rebased on changes to PCC mailbox handling
- checks for cloned SKB prior to transmission
- converted doulbe slash comments to C comments
Changes in V18:
- Added Acked-By
- Fix minor spacing issue
Changes in V17:
- No new changes. Rebased on net-next post 6.13 release.
Changes in V16:
- do not duplicate cleanup after devm_add_action_or_reset calls
Changes in V15:
- corrected indentation formatting error
- Corrected TABS issue in MAINTAINER entry
Changes in V14:
- Do not attempt to unregister a netdev that is never registered
- Added MAINTAINER entry
Changes in V13:
- Explicitly Convert PCC header from little endian to machine native
Changes in V12:
- Explicitly use little endian conversion for PCC header signature
- Builds clean with make C=1
Changes in V11:
- Explicitly use little endian types for PCC header
Changes in V11:
- Switch Big Endian data types to machine local for PCC header
- use mctp specific function for registering netdev
Changes in V10:
- sync with net-next branch
- use dstats helper functions
- remove duplicate drop stat
- remove more double spaces
Changes in V9:
- Prerequisite patch for PCC mailbox has been merged
- Stats collection now use helper functions
- many double spaces reduced to single
Changes in V8:
- change 0 to NULL for pointer check of shmem
- add semi for static version of pcc_mbox_ioremap
- convert pcc_mbox_ioremap function to static inline when client code is not being built
- remove shmem comment from struct pcc_chan_info descriptor
- copy rx_dropped in mctp_pcc_net_stats
- removed trailing newline on error message
- removed double space in dev_dbg string
- use big endian for header members
- Fix use full spec ID in description
- Fix typo in file description
- Form the complete outbound message in the sk_buff
Changes in V7:
- Removed the Hardware address as specification is not published.
- Map the shared buffer in the mailbox and share the mapped region with the driver
- Use the sk_buff memory to prepare the message before copying to shared region
Changes in V6:
- Removed patch for ACPICA code that has merged
- Includes the hardware address in the network device
- Converted all device resources to devm resources
- Removed mctp_pcc_driver_remove function
- uses acpi_driver_module for initialization
- created helper structure for in and out mailboxes
- Consolidated code for initializing mailboxes in the add_device function
- Added specification references
- Removed duplicate constant PCC_ACK_FLAG_MASK
- Use the MCTP_SIGNATURE_LENGTH define
- made naming of header structs consistent
- use sizeof local variables for offset calculations
- prefix structure name to avoid potential clash
- removed unnecessary null initialization from acpi_device_id
Changes in V5
- Removed Owner field from ACPI module declaration
- removed unused next field from struct mctp_pcc_ndev
- Corrected logic reading RX ACK flag.
- Added comment for struct pcc_chan_info field shmem_base_addr
- check against current mtu instead of max mtu for packet length\
- removed unnecessary lookups of pnd->mdev.dev
Changes in V4
- Read flags out of shared buffer to trigger ACK for Type 4 RX
- Remove list of netdevs and cleanup from devices only
- tag PCCT protocol headers as little endian
- Remove unused constants
Changes in V3
- removed unused header
- removed spurious space
- removed spurious semis after functiomns
- removed null assignment for init
- remove redundant set of device on skb
- tabify constant declarations
- added rtnl_link_stats64 function
- set MTU to minimum to start
- clean up logic on driver removal
- remove cast on void * assignment
- call cleanup function directly
- check received length before allocating skb
- introduce symbolic constatn for ACK FLAG MASK
- symbolic constant for PCC header flag.
- Add namespace ID to PCC magic
- replaced readls with copy from io of PCC header
- replaced custom modules init and cleanup with ACPI version
Changes in V2
- All Variable Declarations are in reverse Xmass Tree Format
- All Checkpatch Warnings Are Fixed
- Removed Dead code
- Added packet tx/rx stats
- Removed network physical address. This is still in
disucssion in the spec, and will be added once there
is consensus. The protocol can be used with out it.
This also lead to the removal of the Big Endian
conversions.
- Avoided using non volatile pointers in copy to and from io space
- Reorderd the patches to put the ACK check for the PCC Mailbox
as a pre-requisite. The corresponding change for the MCTP
driver has been inlined in the main patch.
- Replaced magic numbers with constants, fixed typos, and other
minor changes from code review.
Adam Young (1):
mctp pcc: Implement MCTP over PCC Transport
MAINTAINERS | 5 +
drivers/net/mctp/Kconfig | 13 ++
drivers/net/mctp/Makefile | 1 +
drivers/net/mctp/mctp-pcc.c | 343 ++++++++++++++++++++++++++++++++++++
4 files changed, 362 insertions(+)
create mode 100644 drivers/net/mctp/mctp-pcc.c
--
2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC Transport
2025-08-28 4:33 [PATCH net-next v27 0/1] MCTP Over PCC Transport Adam Young
@ 2025-08-28 4:33 ` Adam Young
2025-08-29 9:16 ` Jeremy Kerr
0 siblings, 1 reply; 8+ messages in thread
From: Adam Young @ 2025-08-28 4:33 UTC (permalink / raw)
To: Jeremy Kerr, Matt Johnston, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel, Sudeep Holla, Jonathan Cameron, Huisong Li
Implementation of network driver for
Management Control Transport Protocol(MCTP)
over Platform Communication Channel(PCC)
DMTF DSP:0292
https://www.dmtf.org/sites/default/files/standards/documents/\
DSP0292_1.0.0WIP50.pdf
MCTP devices are specified via ACPI by entries
in DSDT/SDST and reference channels specified
in the PCCT. Messages are sent on a type 3 and
received on a type 4 channel. Communication with
other devices use the PCC based doorbell mechanism;
a shared memory segment with a corresponding
interrupt and a memory register used to trigger
remote interrupts.
This driver takes advantage of PCC mailbox buffer
management. The data section of the struct sk_buff
that contains the outgoing packet is sent to the mailbox,
already properly formatted as a PCC message. The driver
is also responsible for allocating a struct sk_buff that
is then passed to the mailbox and used to record the
data in the shared buffer. It maintains a list of both
outging and incoming sk_buffs to match the data buffers
If the mailbox ring buffer is full, the driver stops the
incoming packet queues until a message has been sent,
freeing space in the ring buffer.
When the Type 3 channel outbox receives a txdone response
interrupt, it consumes the outgoing sk_buff, allowing
it to be freed.
Adding an interface creates and frees
the channel between the network driver and the mailbox
driver.
Stopping the interface also frees any packets that
are cached in the mailbox ringbuffer.
Signed-off-by: Adam Young <admiyo@os.amperecomputing.com>
---
MAINTAINERS | 5 +
drivers/net/mctp/Kconfig | 13 ++
drivers/net/mctp/Makefile | 1 +
drivers/net/mctp/mctp-pcc.c | 343 ++++++++++++++++++++++++++++++++++++
4 files changed, 362 insertions(+)
create mode 100644 drivers/net/mctp/mctp-pcc.c
diff --git a/MAINTAINERS b/MAINTAINERS
index bce96dd254b8..de359bddcb2f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14660,6 +14660,11 @@ F: include/net/mctpdevice.h
F: include/net/netns/mctp.h
F: net/mctp/
+MANAGEMENT COMPONENT TRANSPORT PROTOCOL (MCTP) over PCC (MCTP-PCC) Driver
+M: Adam Young <admiyo@os.amperecomputing.com>
+S: Maintained
+F: drivers/net/mctp/mctp-pcc.c
+
MAPLE TREE
M: Liam R. Howlett <Liam.Howlett@oracle.com>
L: maple-tree@lists.infradead.org
diff --git a/drivers/net/mctp/Kconfig b/drivers/net/mctp/Kconfig
index cf325ab0b1ef..f69d0237f058 100644
--- a/drivers/net/mctp/Kconfig
+++ b/drivers/net/mctp/Kconfig
@@ -57,6 +57,19 @@ config MCTP_TRANSPORT_USB
MCTP-over-USB interfaces are peer-to-peer, so each interface
represents a physical connection to one remote MCTP endpoint.
+config MCTP_TRANSPORT_PCC
+ tristate "MCTP PCC transport"
+ depends on ACPI
+ help
+ Provides a driver to access MCTP devices over PCC transport,
+ A MCTP protocol network device is created via ACPI for each
+ entry in the DST/SDST that matches the identifier. The Platform
+ communication channels are selected from the corresponding
+ entries in the PCCT.
+
+ Say y here if you need to connect to MCTP endpoints over PCC. To
+ compile as a module, use m; the module will be called mctp-pcc.
+
endmenu
endif
diff --git a/drivers/net/mctp/Makefile b/drivers/net/mctp/Makefile
index c36006849a1e..2276f148df7c 100644
--- a/drivers/net/mctp/Makefile
+++ b/drivers/net/mctp/Makefile
@@ -1,3 +1,4 @@
+obj-$(CONFIG_MCTP_TRANSPORT_PCC) += mctp-pcc.o
obj-$(CONFIG_MCTP_SERIAL) += mctp-serial.o
obj-$(CONFIG_MCTP_TRANSPORT_I2C) += mctp-i2c.o
obj-$(CONFIG_MCTP_TRANSPORT_I3C) += mctp-i3c.o
diff --git a/drivers/net/mctp/mctp-pcc.c b/drivers/net/mctp/mctp-pcc.c
new file mode 100644
index 000000000000..947ac2ed760c
--- /dev/null
+++ b/drivers/net/mctp/mctp-pcc.c
@@ -0,0 +1,343 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * mctp-pcc.c - Driver for MCTP over PCC.
+ * Copyright (c) 2024-2025, Ampere Computing LLC
+ *
+ */
+
+/* Implementation of MCTP over PCC DMTF Specification DSP0256
+ * https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
+ */
+
+#include <linux/acpi.h>
+#include <linux/if_arp.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mailbox_client.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/string.h>
+#include <linux/skbuff.h>
+#include <linux/hrtimer.h>
+
+#include <acpi/acpi_bus.h>
+#include <acpi/acpi_drivers.h>
+#include <acpi/acrestyp.h>
+#include <acpi/actbl.h>
+#include <net/mctp.h>
+#include <net/mctpdevice.h>
+#include <acpi/pcc.h>
+
+#define MCTP_SIGNATURE "MCTP"
+#define MCTP_SIGNATURE_LENGTH (sizeof(MCTP_SIGNATURE) - 1)
+#define MCTP_MIN_MTU 68
+#define PCC_DWORD_TYPE 0x0c
+
+struct mctp_pcc_mailbox {
+ u32 index;
+ struct pcc_mbox_chan *chan;
+ struct mbox_client client;
+ struct sk_buff_head packets;
+};
+
+/* The netdev structure. One of these per PCC adapter. */
+struct mctp_pcc_ndev {
+ struct net_device *ndev;
+ struct acpi_device *acpi_device;
+ struct mctp_pcc_mailbox inbox;
+ struct mctp_pcc_mailbox outbox;
+};
+
+static void *mctp_pcc_rx_alloc(struct mbox_client *c, int size)
+{
+ struct mctp_pcc_ndev *mctp_pcc_ndev;
+ struct mctp_pcc_mailbox *box;
+ struct sk_buff *skb;
+
+ mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, inbox.client);
+ box = &mctp_pcc_ndev->inbox;
+
+ if (size > mctp_pcc_ndev->ndev->mtu)
+ return NULL;
+ skb = netdev_alloc_skb(mctp_pcc_ndev->ndev, size);
+ if (!skb)
+ return NULL;
+ skb_put(skb, size);
+ skb->protocol = htons(ETH_P_MCTP);
+ skb_queue_head(&box->packets, skb);
+
+ return skb->data;
+}
+
+static void mctp_pcc_client_rx_callback(struct mbox_client *c, void *buffer)
+{
+ struct mctp_pcc_ndev *mctp_pcc_ndev;
+ struct sk_buff *curr_skb = NULL;
+ struct pcc_header pcc_header;
+ struct sk_buff *skb = NULL;
+ struct mctp_skb_cb *cb;
+
+ mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, inbox.client);
+ if (!buffer) {
+ dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
+ return;
+ }
+
+ spin_lock(&mctp_pcc_ndev->inbox.packets.lock);
+ skb_queue_walk(&mctp_pcc_ndev->inbox.packets, curr_skb) {
+ if (curr_skb->data != buffer)
+ continue;
+ skb = curr_skb;
+ __skb_unlink(skb, &mctp_pcc_ndev->inbox.packets);
+ break;
+ }
+ spin_unlock(&mctp_pcc_ndev->inbox.packets.lock);
+
+ if (skb) {
+ dev_dstats_rx_add(mctp_pcc_ndev->ndev, skb->len);
+ skb_reset_mac_header(skb);
+ skb_pull(skb, sizeof(pcc_header));
+ skb_reset_network_header(skb);
+ cb = __mctp_cb(skb);
+ cb->halen = 0;
+ netif_rx(skb);
+ }
+}
+
+static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int r)
+{
+ struct mctp_pcc_ndev *mctp_pcc_ndev;
+ struct mctp_pcc_mailbox *outbox;
+ struct sk_buff *skb = NULL;
+ struct sk_buff *curr_skb;
+
+ mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, outbox.client);
+ outbox = container_of(c, struct mctp_pcc_mailbox, client);
+ spin_lock(&outbox->packets.lock);
+ skb_queue_walk(&outbox->packets, curr_skb) {
+ if (curr_skb->data == mssg) {
+ skb = curr_skb;
+ __skb_unlink(skb, &outbox->packets);
+ break;
+ }
+ }
+ spin_unlock(&outbox->packets.lock);
+ if (skb)
+ dev_consume_skb_any(skb);
+ netif_wake_queue(mctp_pcc_ndev->ndev);
+}
+
+static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
+{
+ struct mctp_pcc_ndev *mpnd = netdev_priv(ndev);
+ struct pcc_header *pcc_header;
+ int len = skb->len;
+ int rc;
+
+ rc = skb_cow_head(skb, sizeof(*pcc_header));
+ if (rc) {
+ dev_dstats_tx_dropped(ndev);
+ kfree_skb(skb);
+ return NETDEV_TX_OK;
+ }
+
+ pcc_header = skb_push(skb, sizeof(*pcc_header));
+ pcc_header->signature = PCC_SIGNATURE | mpnd->outbox.index;
+ pcc_header->flags = PCC_CMD_COMPLETION_NOTIFY;
+ memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
+ pcc_header->length = len + MCTP_SIGNATURE_LENGTH;
+
+ skb_queue_head(&mpnd->outbox.packets, skb);
+
+ rc = mbox_send_message(mpnd->outbox.chan->mchan, skb->data);
+
+ if (rc < 0) {
+ netif_stop_queue(ndev);
+ skb_unlink(skb, &mpnd->outbox.packets);
+ return NETDEV_TX_BUSY;
+ }
+
+ dev_dstats_tx_add(ndev, len);
+ return NETDEV_TX_OK;
+}
+
+static void drain_packets(struct sk_buff_head *list)
+{
+ struct sk_buff *skb;
+
+ while (!skb_queue_empty(list)) {
+ skb = skb_dequeue(list);
+ dev_consume_skb_any(skb);
+ }
+}
+
+static int mctp_pcc_ndo_stop(struct net_device *ndev)
+{
+ struct mctp_pcc_ndev *mctp_pcc_ndev =
+ netdev_priv(ndev);
+ drain_packets(&mctp_pcc_ndev->outbox.packets);
+ drain_packets(&mctp_pcc_ndev->inbox.packets);
+ return 0;
+}
+
+static const struct net_device_ops mctp_pcc_netdev_ops = {
+ .ndo_stop = mctp_pcc_ndo_stop,
+ .ndo_start_xmit = mctp_pcc_tx,
+};
+
+static void mctp_pcc_setup(struct net_device *ndev)
+{
+ ndev->type = ARPHRD_MCTP;
+ ndev->hard_header_len = 0;
+ ndev->tx_queue_len = 0;
+ ndev->flags = IFF_NOARP;
+ ndev->netdev_ops = &mctp_pcc_netdev_ops;
+ ndev->needs_free_netdev = true;
+ ndev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS;
+}
+
+struct mctp_pcc_lookup_context {
+ int index;
+ u32 inbox_index;
+ u32 outbox_index;
+};
+
+static acpi_status lookup_pcct_indices(struct acpi_resource *ares,
+ void *context)
+{
+ struct mctp_pcc_lookup_context *luc = context;
+ struct acpi_resource_address32 *addr;
+
+ if (ares->type != PCC_DWORD_TYPE)
+ return AE_OK;
+
+ addr = ACPI_CAST_PTR(struct acpi_resource_address32, &ares->data);
+ switch (luc->index) {
+ case 0:
+ luc->outbox_index = addr[0].address.minimum;
+ break;
+ case 1:
+ luc->inbox_index = addr[0].address.minimum;
+ break;
+ }
+ luc->index++;
+ return AE_OK;
+}
+
+static void mctp_cleanup_channel(void *data)
+{
+ struct pcc_mbox_chan *chan = data;
+
+ pcc_mbox_free_channel(chan);
+}
+
+static int mctp_pcc_initialize_mailbox(struct device *dev,
+ struct mctp_pcc_mailbox *box, u32 index)
+{
+ box->index = index;
+ skb_queue_head_init(&box->packets);
+ box->client.dev = dev;
+ box->chan = pcc_mbox_request_channel(&box->client, index);
+ if (IS_ERR(box->chan))
+ return PTR_ERR(box->chan);
+ return devm_add_action_or_reset(dev, mctp_cleanup_channel, box->chan);
+}
+
+static void mctp_cleanup_netdev(void *data)
+{
+ struct net_device *ndev = data;
+
+ mctp_unregister_netdev(ndev);
+}
+
+static int mctp_pcc_driver_add(struct acpi_device *acpi_dev)
+{
+ struct mctp_pcc_lookup_context context = {0};
+ struct mctp_pcc_ndev *mctp_pcc_ndev;
+ struct device *dev = &acpi_dev->dev;
+ struct net_device *ndev;
+ acpi_handle dev_handle;
+ acpi_status status;
+ int mctp_pcc_mtu;
+ char name[32];
+ int rc;
+
+ dev_dbg(dev, "Adding mctp_pcc device for HID %s\n",
+ acpi_device_hid(acpi_dev));
+ dev_handle = acpi_device_handle(acpi_dev);
+ status = acpi_walk_resources(dev_handle, "_CRS", lookup_pcct_indices,
+ &context);
+ if (!ACPI_SUCCESS(status)) {
+ dev_err(dev, "FAILURE to lookup PCC indexes from CRS\n");
+ return -EINVAL;
+ }
+
+ snprintf(name, sizeof(name), "mctppcc%d", context.inbox_index);
+ ndev = alloc_netdev(sizeof(*mctp_pcc_ndev), name, NET_NAME_PREDICTABLE,
+ mctp_pcc_setup);
+ if (!ndev)
+ return -ENOMEM;
+
+ mctp_pcc_ndev = netdev_priv(ndev);
+
+ /* inbox initialization */
+ rc = mctp_pcc_initialize_mailbox(dev, &mctp_pcc_ndev->inbox,
+ context.inbox_index);
+ if (rc)
+ goto free_netdev;
+
+ mctp_pcc_ndev->inbox.client.rx_callback = mctp_pcc_client_rx_callback;
+ mctp_pcc_ndev->inbox.chan->rx_alloc = mctp_pcc_rx_alloc;
+
+ /* outbox initialization */
+ rc = mctp_pcc_initialize_mailbox(dev, &mctp_pcc_ndev->outbox,
+ context.outbox_index);
+ if (rc)
+ goto free_netdev;
+
+ mctp_pcc_ndev->outbox.client.tx_done = mctp_pcc_tx_done;
+ mctp_pcc_ndev->acpi_device = acpi_dev;
+ mctp_pcc_ndev->ndev = ndev;
+ acpi_dev->driver_data = mctp_pcc_ndev;
+
+ mctp_pcc_ndev->outbox.chan->manage_writes = true;
+
+ /* initialize MTU values */
+ mctp_pcc_mtu = mctp_pcc_ndev->outbox.chan->shmem_size
+ - sizeof(struct pcc_header);
+ ndev->mtu = MCTP_MIN_MTU;
+ ndev->max_mtu = mctp_pcc_mtu;
+ ndev->min_mtu = MCTP_MIN_MTU;
+
+ rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
+ if (rc)
+ goto free_netdev;
+
+ return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
+free_netdev:
+ free_netdev(ndev);
+ return rc;
+}
+
+static const struct acpi_device_id mctp_pcc_device_ids[] = {
+ { "DMT0001" },
+ {}
+};
+
+static struct acpi_driver mctp_pcc_driver = {
+ .name = "mctp_pcc",
+ .class = "Unknown",
+ .ids = mctp_pcc_device_ids,
+ .ops = {
+ .add = mctp_pcc_driver_add,
+ },
+};
+
+module_acpi_driver(mctp_pcc_driver);
+
+MODULE_DEVICE_TABLE(acpi, mctp_pcc_device_ids);
+
+MODULE_DESCRIPTION("MCTP PCC ACPI device");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Adam Young <admiyo@os.amperecomputing.com>");
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC Transport
2025-08-28 4:33 ` [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over " Adam Young
@ 2025-08-29 9:16 ` Jeremy Kerr
2025-08-29 15:59 ` Adam Young
0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Kerr @ 2025-08-29 9:16 UTC (permalink / raw)
To: Adam Young, Matt Johnston, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel, Sudeep Holla, Jonathan Cameron, Huisong Li
Hi Adam,
Still some issues with skb management / synchronisation in the cleanup
path; comments inline (as well as some minor things, but mostly optional
on addressing those).
You seem to be sending follow-up patches fairly quickly, rather than
responding to queries about the code, or discussing the approach. If you
have no points to discuss, that's fine, but please do feel free to chat
about options, or ask for clarifications, before jumping into the next
patch revision. Even letting us know why you may have rejected a
suggestion is helpful for me in the review process too.
> +static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int r)
> +{
> + struct mctp_pcc_ndev *mctp_pcc_ndev;
> + struct mctp_pcc_mailbox *outbox;
> + struct sk_buff *skb = NULL;
> + struct sk_buff *curr_skb;
> +
> + mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, outbox.client);
> + outbox = container_of(c, struct mctp_pcc_mailbox, client);
> + spin_lock(&outbox->packets.lock);
> + skb_queue_walk(&outbox->packets, curr_skb) {
> + if (curr_skb->data == mssg) {
> + skb = curr_skb;
> + __skb_unlink(skb, &outbox->packets);
> + break;
> + }
> + }
> + spin_unlock(&outbox->packets.lock);
> + if (skb)
> + dev_consume_skb_any(skb);
> + netif_wake_queue(mctp_pcc_ndev->ndev);
> +}
> +
> +static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
> +{
> + struct mctp_pcc_ndev *mpnd = netdev_priv(ndev);
> + struct pcc_header *pcc_header;
> + int len = skb->len;
> + int rc;
> +
> + rc = skb_cow_head(skb, sizeof(*pcc_header));
> + if (rc) {
> + dev_dstats_tx_dropped(ndev);
> + kfree_skb(skb);
> + return NETDEV_TX_OK;
> + }
> +
> + pcc_header = skb_push(skb, sizeof(*pcc_header));
> + pcc_header->signature = PCC_SIGNATURE | mpnd->outbox.index;
> + pcc_header->flags = PCC_CMD_COMPLETION_NOTIFY;
> + memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
> + pcc_header->length = len + MCTP_SIGNATURE_LENGTH;
> +
> + skb_queue_head(&mpnd->outbox.packets, skb);
> +
> + rc = mbox_send_message(mpnd->outbox.chan->mchan, skb->data);
> +
> + if (rc < 0) {
> + netif_stop_queue(ndev);
Nice!
> + skb_unlink(skb, &mpnd->outbox.packets);
> + return NETDEV_TX_BUSY;
> + }
> +
> + dev_dstats_tx_add(ndev, len);
> + return NETDEV_TX_OK;
> +}
> +
> +static void drain_packets(struct sk_buff_head *list)
> +{
> + struct sk_buff *skb;
> +
> + while (!skb_queue_empty(list)) {
> + skb = skb_dequeue(list);
> + dev_consume_skb_any(skb);
> + }
> +}
> +
> +static int mctp_pcc_ndo_stop(struct net_device *ndev)
> +{
> + struct mctp_pcc_ndev *mctp_pcc_ndev =
> + netdev_priv(ndev);
Minor: Unneeded wrapping here, and it seems to be suppressing the
warning about a blank line after declarations.
> + drain_packets(&mctp_pcc_ndev->outbox.packets);
> + drain_packets(&mctp_pcc_ndev->inbox.packets);
Now that you're no longer doing the pcc_mbox_free_channel() in ndo_stop,
nothing has quiesced the pcc channels at this point, right? In which
case you now have a race between the channels' accesses to skb->data and
freeing the skbs here.
Is there a mbox facility to (synchronously) stop processing the inbound
channel, and completing the outbound channel?
> + return 0;
> +}
> +
> +static const struct net_device_ops mctp_pcc_netdev_ops = {
> + .ndo_stop = mctp_pcc_ndo_stop,
> + .ndo_start_xmit = mctp_pcc_tx,
> +};
> +
> +static void mctp_pcc_setup(struct net_device *ndev)
> +{
> + ndev->type = ARPHRD_MCTP;
> + ndev->hard_header_len = 0;
> + ndev->tx_queue_len = 0;
> + ndev->flags = IFF_NOARP;
> + ndev->netdev_ops = &mctp_pcc_netdev_ops;
> + ndev->needs_free_netdev = true;
> + ndev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS;
> +}
> +
> +struct mctp_pcc_lookup_context {
> + int index;
> + u32 inbox_index;
> + u32 outbox_index;
> +};
> +
> +static acpi_status lookup_pcct_indices(struct acpi_resource *ares,
> + void *context)
> +{
> + struct mctp_pcc_lookup_context *luc = context;
> + struct acpi_resource_address32 *addr;
> +
> + if (ares->type != PCC_DWORD_TYPE)
> + return AE_OK;
> +
> + addr = ACPI_CAST_PTR(struct acpi_resource_address32, &ares->data);
> + switch (luc->index) {
> + case 0:
> + luc->outbox_index = addr[0].address.minimum;
> + break;
> + case 1:
> + luc->inbox_index = addr[0].address.minimum;
> + break;
> + }
> + luc->index++;
> + return AE_OK;
> +}
> +
> +static void mctp_cleanup_channel(void *data)
> +{
> + struct pcc_mbox_chan *chan = data;
> +
> + pcc_mbox_free_channel(chan);
> +}
> +
> +static int mctp_pcc_initialize_mailbox(struct device *dev,
> + struct mctp_pcc_mailbox *box, u32 index)
> +{
> + box->index = index;
> + skb_queue_head_init(&box->packets);
> + box->client.dev = dev;
> + box->chan = pcc_mbox_request_channel(&box->client, index);
> + if (IS_ERR(box->chan))
> + return PTR_ERR(box->chan);
> + return devm_add_action_or_reset(dev, mctp_cleanup_channel, box->chan);
> +}
> +
> +static void mctp_cleanup_netdev(void *data)
> +{
> + struct net_device *ndev = data;
> +
> + mctp_unregister_netdev(ndev);
> +}
> +
> +static int mctp_pcc_driver_add(struct acpi_device *acpi_dev)
> +{
> + struct mctp_pcc_lookup_context context = {0};
> + struct mctp_pcc_ndev *mctp_pcc_ndev;
> + struct device *dev = &acpi_dev->dev;
> + struct net_device *ndev;
> + acpi_handle dev_handle;
> + acpi_status status;
> + int mctp_pcc_mtu;
> + char name[32];
> + int rc;
> +
> + dev_dbg(dev, "Adding mctp_pcc device for HID %s\n",
> + acpi_device_hid(acpi_dev));
> + dev_handle = acpi_device_handle(acpi_dev);
> + status = acpi_walk_resources(dev_handle, "_CRS", lookup_pcct_indices,
> + &context);
> + if (!ACPI_SUCCESS(status)) {
> + dev_err(dev, "FAILURE to lookup PCC indexes from CRS\n");
> + return -EINVAL;
> + }
> +
> + snprintf(name, sizeof(name), "mctppcc%d", context.inbox_index);
> + ndev = alloc_netdev(sizeof(*mctp_pcc_ndev), name, NET_NAME_PREDICTABLE,
> + mctp_pcc_setup);
> + if (!ndev)
> + return -ENOMEM;
> +
> + mctp_pcc_ndev = netdev_priv(ndev);
> +
> + /* inbox initialization */
> + rc = mctp_pcc_initialize_mailbox(dev, &mctp_pcc_ndev->inbox,
> + context.inbox_index);
> + if (rc)
> + goto free_netdev;
> +
> + mctp_pcc_ndev->inbox.client.rx_callback = mctp_pcc_client_rx_callback;
> + mctp_pcc_ndev->inbox.chan->rx_alloc = mctp_pcc_rx_alloc;
> +
> + /* outbox initialization */
> + rc = mctp_pcc_initialize_mailbox(dev, &mctp_pcc_ndev->outbox,
> + context.outbox_index);
> + if (rc)
> + goto free_netdev;
> +
> + mctp_pcc_ndev->outbox.client.tx_done = mctp_pcc_tx_done;
> + mctp_pcc_ndev->acpi_device = acpi_dev;
> + mctp_pcc_ndev->ndev = ndev;
> + acpi_dev->driver_data = mctp_pcc_ndev;
> +
> + mctp_pcc_ndev->outbox.chan->manage_writes = true;
> +
> + /* initialize MTU values */
> + mctp_pcc_mtu = mctp_pcc_ndev->outbox.chan->shmem_size
> + - sizeof(struct pcc_header);
Minor: no need for this temporary var.
(Or the comment really - we can see this is initialising MTU values from
the fact that it's initialising values with mtu in their name :) )
> + ndev->mtu = MCTP_MIN_MTU;
> + ndev->max_mtu = mctp_pcc_mtu;
> + ndev->min_mtu = MCTP_MIN_MTU;
> +
> + rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
> + if (rc)
> + goto free_netdev;
> +
> + return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
As has been mentioned elsewhere, using the devm cleanup mechanism is a
bit unconventional here. You have the device remove callback available,
which lets you do the same, and that way you can demonstrate symmetry
between the add and remove implementations.
I'm okay with the approach, but you may want to consider the remove
callback if that gives you an implementation that reads more clearly.
Are you doing much testing here? I'd recommend running with KASAN and
device removal & link status change under active transfers.
Cheers,
Jeremy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC Transport
2025-08-29 9:16 ` Jeremy Kerr
@ 2025-08-29 15:59 ` Adam Young
2025-09-01 2:31 ` Jeremy Kerr
0 siblings, 1 reply; 8+ messages in thread
From: Adam Young @ 2025-08-29 15:59 UTC (permalink / raw)
To: Jeremy Kerr, Adam Young, Matt Johnston, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel, Sudeep Holla, Jonathan Cameron, Huisong Li
On 8/29/25 05:16, Jeremy Kerr wrote:
> Hi Adam,
>
> Still some issues with skb management / synchronisation in the cleanup
> path; comments inline (as well as some minor things, but mostly optional
> on addressing those).
>
> You seem to be sending follow-up patches fairly quickly, rather than
> responding to queries about the code, or discussing the approach. If you
> have no points to discuss, that's fine, but please do feel free to chat
> about options, or ask for clarifications, before jumping into the next
> patch revision. Even letting us know why you may have rejected a
> suggestion is helpful for me in the review process too.
I thought I was addressing all issues. Perhaps I do need to slow
down....different work styles: I am used to responding to code reviews
as quickly as possible.
more comments in-line.
>
>> +static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int r)
>> +{
>> + struct mctp_pcc_ndev *mctp_pcc_ndev;
>> + struct mctp_pcc_mailbox *outbox;
>> + struct sk_buff *skb = NULL;
>> + struct sk_buff *curr_skb;
>> +
>> + mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, outbox.client);
>> + outbox = container_of(c, struct mctp_pcc_mailbox, client);
>> + spin_lock(&outbox->packets.lock);
>> + skb_queue_walk(&outbox->packets, curr_skb) {
>> + if (curr_skb->data == mssg) {
>> + skb = curr_skb;
>> + __skb_unlink(skb, &outbox->packets);
>> + break;
>> + }
>> + }
>> + spin_unlock(&outbox->packets.lock);
>> + if (skb)
>> + dev_consume_skb_any(skb);
>> + netif_wake_queue(mctp_pcc_ndev->ndev);
>> +}
>> +
>> +static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> + struct mctp_pcc_ndev *mpnd = netdev_priv(ndev);
>> + struct pcc_header *pcc_header;
>> + int len = skb->len;
>> + int rc;
>> +
>> + rc = skb_cow_head(skb, sizeof(*pcc_header));
>> + if (rc) {
>> + dev_dstats_tx_dropped(ndev);
>> + kfree_skb(skb);
>> + return NETDEV_TX_OK;
>> + }
>> +
>> + pcc_header = skb_push(skb, sizeof(*pcc_header));
>> + pcc_header->signature = PCC_SIGNATURE | mpnd->outbox.index;
>> + pcc_header->flags = PCC_CMD_COMPLETION_NOTIFY;
>> + memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
>> + pcc_header->length = len + MCTP_SIGNATURE_LENGTH;
>> +
>> + skb_queue_head(&mpnd->outbox.packets, skb);
>> +
>> + rc = mbox_send_message(mpnd->outbox.chan->mchan, skb->data);
>> +
>> + if (rc < 0) {
>> + netif_stop_queue(ndev);
> Nice!
Yeah. Had a bit of an internal discussion about this one. Ideally, we
would stop the queue one packet earlier, and not drop anything. The
mbox_send_message only returns the index of the next slot in the ring
buffer and since it is circular, that does not give us a sense of the
space left.
>
>> + skb_unlink(skb, &mpnd->outbox.packets);
>> + return NETDEV_TX_BUSY;
>> + }
>> +
>> + dev_dstats_tx_add(ndev, len);
>> + return NETDEV_TX_OK;
>> +}
>> +
>> +static void drain_packets(struct sk_buff_head *list)
>> +{
>> + struct sk_buff *skb;
>> +
>> + while (!skb_queue_empty(list)) {
>> + skb = skb_dequeue(list);
>> + dev_consume_skb_any(skb);
>> + }
>> +}
>> +
>> +static int mctp_pcc_ndo_stop(struct net_device *ndev)
>> +{
>> + struct mctp_pcc_ndev *mctp_pcc_ndev =
>> + netdev_priv(ndev);
> Minor: Unneeded wrapping here, and it seems to be suppressing the
> warning about a blank line after declarations.
The Reverse XMasstree format checker I am using seems overly strict. I
will try to unwrap all of these. Is it better to do a separate variable
initialization? Seems a bad coding practice for just a format decision
that is purely aesthetic. But this one is simple to fix.
>
>> + drain_packets(&mctp_pcc_ndev->outbox.packets);
>> + drain_packets(&mctp_pcc_ndev->inbox.packets);
> Now that you're no longer doing the pcc_mbox_free_channel() in ndo_stop,
> nothing has quiesced the pcc channels at this point, right? In which
> case you now have a race between the channels' accesses to skb->data and
> freeing the skbs here.
(I have written and rewritten this section multiple times, so apoliges
if soemthing is unclear or awkward...it might reflect and earlier
thought...)
OK, I think I do need to call pcc_mbox_free_channel here, which means I
need to allocate them in the pairing function. The ring buffer will
still have pointers to the sk_buffs, but they will never be looked at
again: the ring buffer will ger reinitialized if another client binds to
it. Which means that I need the .start function to create the client
again, and all the other changes that come along with that. The removal
was to deal with the setting of the MTU, which requires a channel to
read the size of the shared buffer.
mctp_pcc_mtu = mctp_pcc_ndev->outbox.chan->shmem_size -
sizeof(struct pcc_header);
I could create a channel, read the value, and release it. The Value I
need is in the ACPI PCC-table but I don't have direct access to it.
Perhaps it would be better to initialize the value to -1 and use that to
optionally reset it to the max value on ndo open.
Check me here: The channel has a value ring msg_count that keeps track
of the number of entires in the ring buffer. This needs to be set to0
in order for it to think the buffer is empty. It is initialized in
__mbox_bind_client, called from mbox_bind_client which is in turn called
from mbox_request_channel
The networking infra calls stop_ndo, so it must stop sending packets to
it first. I can netif_stop_queue(ndev) of course, but that seems
counterintuitive? Assume i don't need to do that, but can't find the
calling code.
>
> Is there a mbox facility to (synchronously) stop processing the inbound
> channel, and completing the outbound channel?
There is mbox_free_channel which calls shutdown, and that removed the
IRQ handler, so no more messages will be processed. That should be
sufficient.
>
>> + return 0;
>> +}
>> +
>> +static const struct net_device_ops mctp_pcc_netdev_ops = {
>> + .ndo_stop = mctp_pcc_ndo_stop,
>> + .ndo_start_xmit = mctp_pcc_tx,
>> +};
>> +
>> +static void mctp_pcc_setup(struct net_device *ndev)
>> +{
>> + ndev->type = ARPHRD_MCTP;
>> + ndev->hard_header_len = 0;
>> + ndev->tx_queue_len = 0;
>> + ndev->flags = IFF_NOARP;
>> + ndev->netdev_ops = &mctp_pcc_netdev_ops;
>> + ndev->needs_free_netdev = true;
>> + ndev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS;
>> +}
>> +
>> +struct mctp_pcc_lookup_context {
>> + int index;
>> + u32 inbox_index;
>> + u32 outbox_index;
>> +};
>> +
>> +static acpi_status lookup_pcct_indices(struct acpi_resource *ares,
>> + void *context)
>> +{
>> + struct mctp_pcc_lookup_context *luc = context;
>> + struct acpi_resource_address32 *addr;
>> +
>> + if (ares->type != PCC_DWORD_TYPE)
>> + return AE_OK;
>> +
>> + addr = ACPI_CAST_PTR(struct acpi_resource_address32, &ares->data);
>> + switch (luc->index) {
>> + case 0:
>> + luc->outbox_index = addr[0].address.minimum;
>> + break;
>> + case 1:
>> + luc->inbox_index = addr[0].address.minimum;
>> + break;
>> + }
>> + luc->index++;
>> + return AE_OK;
>> +}
>> +
>> +static void mctp_cleanup_channel(void *data)
>> +{
>> + struct pcc_mbox_chan *chan = data;
>> +
>> + pcc_mbox_free_channel(chan);
>> +}
>> +
>> +static int mctp_pcc_initialize_mailbox(struct device *dev,
>> + struct mctp_pcc_mailbox *box, u32 index)
>> +{
>> + box->index = index;
>> + skb_queue_head_init(&box->packets);
>> + box->client.dev = dev;
>> + box->chan = pcc_mbox_request_channel(&box->client, index);
>> + if (IS_ERR(box->chan))
>> + return PTR_ERR(box->chan);
>> + return devm_add_action_or_reset(dev, mctp_cleanup_channel, box->chan);
>> +}
>> +
>> +static void mctp_cleanup_netdev(void *data)
>> +{
>> + struct net_device *ndev = data;
>> +
>> + mctp_unregister_netdev(ndev);
>> +}
>> +
>> +static int mctp_pcc_driver_add(struct acpi_device *acpi_dev)
>> +{
>> + struct mctp_pcc_lookup_context context = {0};
>> + struct mctp_pcc_ndev *mctp_pcc_ndev;
>> + struct device *dev = &acpi_dev->dev;
>> + struct net_device *ndev;
>> + acpi_handle dev_handle;
>> + acpi_status status;
>> + int mctp_pcc_mtu;
>> + char name[32];
>> + int rc;
>> +
>> + dev_dbg(dev, "Adding mctp_pcc device for HID %s\n",
>> + acpi_device_hid(acpi_dev));
>> + dev_handle = acpi_device_handle(acpi_dev);
>> + status = acpi_walk_resources(dev_handle, "_CRS", lookup_pcct_indices,
>> + &context);
>> + if (!ACPI_SUCCESS(status)) {
>> + dev_err(dev, "FAILURE to lookup PCC indexes from CRS\n");
>> + return -EINVAL;
>> + }
>> +
>> + snprintf(name, sizeof(name), "mctppcc%d", context.inbox_index);
>> + ndev = alloc_netdev(sizeof(*mctp_pcc_ndev), name, NET_NAME_PREDICTABLE,
>> + mctp_pcc_setup);
>> + if (!ndev)
>> + return -ENOMEM;
>> +
>> + mctp_pcc_ndev = netdev_priv(ndev);
>> +
>> + /* inbox initialization */
>> + rc = mctp_pcc_initialize_mailbox(dev, &mctp_pcc_ndev->inbox,
>> + context.inbox_index);
>> + if (rc)
>> + goto free_netdev;
>> +
>> + mctp_pcc_ndev->inbox.client.rx_callback = mctp_pcc_client_rx_callback;
>> + mctp_pcc_ndev->inbox.chan->rx_alloc = mctp_pcc_rx_alloc;
>> +
>> + /* outbox initialization */
>> + rc = mctp_pcc_initialize_mailbox(dev, &mctp_pcc_ndev->outbox,
>> + context.outbox_index);
>> + if (rc)
>> + goto free_netdev;
>> +
>> + mctp_pcc_ndev->outbox.client.tx_done = mctp_pcc_tx_done;
>> + mctp_pcc_ndev->acpi_device = acpi_dev;
>> + mctp_pcc_ndev->ndev = ndev;
>> + acpi_dev->driver_data = mctp_pcc_ndev;
>> +
>> + mctp_pcc_ndev->outbox.chan->manage_writes = true;
>> +
>> + /* initialize MTU values */
>> + mctp_pcc_mtu = mctp_pcc_ndev->outbox.chan->shmem_size
>> + - sizeof(struct pcc_header);
> Minor: no need for this temporary var.
>
> (Or the comment really - we can see this is initialising MTU values from
> the fact that it's initialising values with mtu in their name :) )
I think I am going to set this to -1 here, and then initialize in ndo
open if it is still -1.
>
>> + ndev->mtu = MCTP_MIN_MTU;
>> + ndev->max_mtu = mctp_pcc_mtu;
>> + ndev->min_mtu = MCTP_MIN_MTU;
>> +
>> + rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
>> + if (rc)
>> + goto free_netdev;
>> +
>> + return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
> As has been mentioned elsewhere, using the devm cleanup mechanism is a
> bit unconventional here. You have the device remove callback available,
> which lets you do the same, and that way you can demonstrate symmetry
> between the add and remove implementations.
This has gone through a few iterations and I thought I had it clear.
I was trying to make use of automated cleanup as much as possible.
>
> I'm okay with the approach, but you may want to consider the remove
> callback if that gives you an implementation that reads more clearly.
>
> Are you doing much testing here? I'd recommend running with KASAN and
> device removal & link status change under active transfers.
Will do. I have not done that yet. I was unaware it existed.
>
> Cheers,
THanks so much for the detailed review and explanations.
>
>
> Jeremy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC Transport
2025-08-29 15:59 ` Adam Young
@ 2025-09-01 2:31 ` Jeremy Kerr
2025-09-02 22:45 ` Adam Young
0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Kerr @ 2025-09-01 2:31 UTC (permalink / raw)
To: Adam Young, Adam Young, Matt Johnston, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel, Sudeep Holla, Jonathan Cameron, Huisong Li
Hi Adam,
> > Nice!
>
> Yeah. Had a bit of an internal discussion about this one. Ideally, we
> would stop the queue one packet earlier, and not drop anything. The
> mbox_send_message only returns the index of the next slot in the ring
> buffer and since it is circular, that does not give us a sense of the
> space left.
I think that's okay as-is. If you have access to the tail pointer of the
ring buffer too, you may be able to calculate space, but that's
1) optional, and 2) best left to a later change.
> > > +static int mctp_pcc_ndo_stop(struct net_device *ndev)
> > > +{
> > > + struct mctp_pcc_ndev *mctp_pcc_ndev =
> > > + netdev_priv(ndev);
> > Minor: Unneeded wrapping here, and it seems to be suppressing the
> > warning about a blank line after declarations.
>
> The Reverse XMasstree format checker I am using seems overly strict. I
> will try to unwrap all of these. Is it better to do a separate variable
> initialization? Seems a bad coding practice for just a format decision
> that is purely aesthetic. But this one is simple to fix.
That shouldn't be tripping any RCT checks here, as it's just one
variable init?
mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
Keep it in one if possible (as you have done).
> > > + drain_packets(&mctp_pcc_ndev->outbox.packets);
> > > + drain_packets(&mctp_pcc_ndev->inbox.packets);
> > Now that you're no longer doing the pcc_mbox_free_channel() in ndo_stop,
> > nothing has quiesced the pcc channels at this point, right? In which
> > case you now have a race between the channels' accesses to skb->data and
> > freeing the skbs here.
>
> (I have written and rewritten this section multiple times, so apoliges
> if soemthing is unclear or awkward...it might reflect and earlier
> thought...)
>
> OK, I think I do need to call pcc_mbox_free_channel here,
You should ensure that the packet processing has stopped on
ndo_stop (and has not started before ndo_open). Without doing that, you
have two issues:
- the RX packets will continue while the interface is down; and
- you cannot free the lists
If there's a way to keep the channel allocated, but suspend the
processing of messages, that would seem like a good option (and sounds
like it would solve the MTU complexity).
However, on a cursory look through the pcc/mailbox infrastructure, it
seems like the pcc_mbox_request_channel starts processing immediately -
so it looks like you can not have the channel in an allocated-but-idle
state, since the request_channel does the bind_client, which does a
pcc_startup.
So, I figure you have two options:
- only request the channel until the interface is up; or
- implement your own quiescing in the callbacks - keeping the channels
allocated, but check if the netdev is operational (ie, ndo_open has
been called) before processing an RX message
> which means I need to allocate them in the pairing function. The ring
> buffer will still have pointers to the sk_buffs, but they will never
> be looked at again: the ring buffer will ger reinitialized if another
> client binds to it.
OK, but the skbs will remain allocated between those operations, which
has other side-effects (eg, socket mem accounting). You'll want to drain
the queue (as you are doing) if the queue is no longer making progress.
> The removal was to deal with the setting of the MTU, which requires a
> channel to read the size of the shared buffer.
>
> mctp_pcc_mtu = mctp_pcc_ndev->outbox.chan->shmem_size -
> sizeof(struct pcc_header);
>
> I could create a channel, read the value, and release it. The Value I
> need is in the ACPI PCC-table but I don't have direct access to it.
> Perhaps it would be better to initialize the value to -1 and use that to
> optionally reset it to the max value on ndo open.
If you cannot create the channel until ndo_open, I'd you will also need
to check the current mtu value on open, and failing if it exceeds the
limit of the channel.
If you have some way to extract this from the ACPI tables, that may be
preferable. With the -1 approach, you'll also need to ensure that the
*current* MTU is not larger than the channel max, on ndo_open. For
example:
$ mctp link set mctppcc0 mtu 1000
# sets current mtu, no max currently specified
$ mctp link set mctppcc0 up
# driver discovers max is (say) 500, now we have an invalid mtu
So, if you're able to parse the max from ACPI on initial bind, then you
can detect the error at the time of occurrence (the `link set mtu`)
rather than later (the `link set up`, and then ndo_open failing).
> Check me here: The channel has a value ring msg_count that keeps track
> of the number of entires in the ring buffer. This needs to be set to0
> in order for it to think the buffer is empty. It is initialized in
> __mbox_bind_client, called from mbox_bind_client which is in turn called
> from mbox_request_channel
>
> The networking infra calls stop_ndo, so it must stop sending packets to
> it first. I can netif_stop_queue(ndev) of course, but that seems
> counterintuitive? Assume i don't need to do that, but can't find the
> calling code.
You won't have any further ->ndo_start_xmit calls at the point that
->ndo_stop is called.
> > Is there a mbox facility to (synchronously) stop processing the inbound
> > channel, and completing the outbound channel?
>
> There is mbox_free_channel which calls shutdown, and that removed the
> IRQ handler, so no more messages will be processed. That should be
> sufficient.
OK, as above, this will depend on the approach you on allocating and
releasing the channels.
> > > + ndev->mtu = MCTP_MIN_MTU;
> > > + ndev->max_mtu = mctp_pcc_mtu;
> > > + ndev->min_mtu = MCTP_MIN_MTU;
> > > +
> > > + rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
> > > + if (rc)
> > > + goto free_netdev;
> > > +
> > > + return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
> > As has been mentioned elsewhere, using the devm cleanup mechanism is a
> > bit unconventional here. You have the device remove callback available,
> > which lets you do the same, and that way you can demonstrate symmetry
> > between the add and remove implementations.
>
> This has gone through a few iterations and I thought I had it clear.
>
> I was trying to make use of automated cleanup as much as possible.
OK, your call there. Using ->remove allows you to be explicit about the
matching ordering, which would be my approach, but that's certainly not
the only correct one.
Cheers,
Jeremy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC Transport
2025-09-01 2:31 ` Jeremy Kerr
@ 2025-09-02 22:45 ` Adam Young
2025-09-03 3:06 ` Adam Young
0 siblings, 1 reply; 8+ messages in thread
From: Adam Young @ 2025-09-02 22:45 UTC (permalink / raw)
To: Jeremy Kerr, Adam Young, Matt Johnston, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel, Sudeep Holla, Jonathan Cameron, Huisong Li
On 8/31/25 22:31, Jeremy Kerr wrote:
> Hi Adam,
>
>>> Nice!
>> Yeah. Had a bit of an internal discussion about this one. Ideally, we
>> would stop the queue one packet earlier, and not drop anything. The
>> mbox_send_message only returns the index of the next slot in the ring
>> buffer and since it is circular, that does not give us a sense of the
>> space left.
> I think that's okay as-is. If you have access to the tail pointer of the
> ring buffer too, you may be able to calculate space, but that's
> 1) optional, and 2) best left to a later change.
>
>>>> +static int mctp_pcc_ndo_stop(struct net_device *ndev)
>>>> +{
>>>> + struct mctp_pcc_ndev *mctp_pcc_ndev =
>>>> + netdev_priv(ndev);
>>> Minor: Unneeded wrapping here, and it seems to be suppressing the
>>> warning about a blank line after declarations.
>> The Reverse XMasstree format checker I am using seems overly strict. I
>> will try to unwrap all of these. Is it better to do a separate variable
>> initialization? Seems a bad coding practice for just a format decision
>> that is purely aesthetic. But this one is simple to fix.
> That shouldn't be tripping any RCT checks here, as it's just one
> variable init?
>
> mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
>
> Keep it in one if possible (as you have done).
>
>>>> + drain_packets(&mctp_pcc_ndev->outbox.packets);
>>>> + drain_packets(&mctp_pcc_ndev->inbox.packets);
>>> Now that you're no longer doing the pcc_mbox_free_channel() in ndo_stop,
>>> nothing has quiesced the pcc channels at this point, right? In which
>>> case you now have a race between the channels' accesses to skb->data and
>>> freeing the skbs here.
>> (I have written and rewritten this section multiple times, so apoliges
>> if soemthing is unclear or awkward...it might reflect and earlier
>> thought...)
>>
>> OK, I think I do need to call pcc_mbox_free_channel here,
> You should ensure that the packet processing has stopped on
> ndo_stop (and has not started before ndo_open). Without doing that, you
> have two issues:
>
> - the RX packets will continue while the interface is down; and
> - you cannot free the lists
>
> If there's a way to keep the channel allocated, but suspend the
> processing of messages, that would seem like a good option (and sounds
> like it would solve the MTU complexity).
>
> However, on a cursory look through the pcc/mailbox infrastructure, it
> seems like the pcc_mbox_request_channel starts processing immediately -
> so it looks like you can not have the channel in an allocated-but-idle
> state, since the request_channel does the bind_client, which does a
> pcc_startup.
>
> So, I figure you have two options:
>
> - only request the channel until the interface is up; or
>
> - implement your own quiescing in the callbacks - keeping the channels
> allocated, but check if the netdev is operational (ie, ndo_open has
> been called) before processing an RX message
This is how I went: during Add, I quickly request the outbound channel,
get the MTU, and free it. There should be no reason to get any messages
during this window. If we do, it is spurious an likely a Firmware error.
The channels are requested during ndo_open and freed during ndo_stop.
After the free, the packets get drained. A restart of the device drops
all messages in the ring buffer.
>
>> which means I need to allocate them in the pairing function. The ring
>> buffer will still have pointers to the sk_buffs, but they will never
>> be looked at again: the ring buffer will ger reinitialized if another
>> client binds to it.
> OK, but the skbs will remain allocated between those operations, which
> has other side-effects (eg, socket mem accounting). You'll want to drain
> the queue (as you are doing) if the queue is no longer making progress.
I think this approach will minimize the lifespan of the sk_buffs: they
will either be waiting to be sent by the mailbox, or they will get freed
immediately upon driver down.
>
>> The removal was to deal with the setting of the MTU, which requires a
>> channel to read the size of the shared buffer.
>>
>> mctp_pcc_mtu = mctp_pcc_ndev->outbox.chan->shmem_size -
>> sizeof(struct pcc_header);
>>
>> I could create a channel, read the value, and release it. The Value I
>> need is in the ACPI PCC-table but I don't have direct access to it.
>> Perhaps it would be better to initialize the value to -1 and use that to
>> optionally reset it to the max value on ndo open.
> If you cannot create the channel until ndo_open, I'd you will also need
> to check the current mtu value on open, and failing if it exceeds the
> limit of the channel.
>
> If you have some way to extract this from the ACPI tables, that may be
> preferable. With the -1 approach, you'll also need to ensure that the
> *current* MTU is not larger than the channel max, on ndo_open. For
> example:
>
> $ mctp link set mctppcc0 mtu 1000
> # sets current mtu, no max currently specified
> $ mctp link set mctppcc0 up
> # driver discovers max is (say) 500, now we have an invalid mtu
Note that I tested this and it now works as expected.
>
> So, if you're able to parse the max from ACPI on initial bind, then you
> can detect the error at the time of occurrence (the `link set mtu`)
> rather than later (the `link set up`, and then ndo_open failing).
Yep. Makes sense. Setting it on driver add is essential.
>
>> Check me here: The channel has a value ring msg_count that keeps track
>> of the number of entires in the ring buffer. This needs to be set to0
>> in order for it to think the buffer is empty. It is initialized in
>> __mbox_bind_client, called from mbox_bind_client which is in turn called
>> from mbox_request_channel
>>
>> The networking infra calls stop_ndo, so it must stop sending packets to
>> it first. I can netif_stop_queue(ndev) of course, but that seems
>> counterintuitive? Assume i don't need to do that, but can't find the
>> calling code.
> You won't have any further ->ndo_start_xmit calls at the point that
> ->ndo_stop is called.
>
>>> Is there a mbox facility to (synchronously) stop processing the inbound
>>> channel, and completing the outbound channel?
>> There is mbox_free_channel which calls shutdown, and that removed the
>> IRQ handler, so no more messages will be processed. That should be
>> sufficient.
> OK, as above, this will depend on the approach you on allocating and
> releasing the channels.
>
>>>> + ndev->mtu = MCTP_MIN_MTU;
>>>> + ndev->max_mtu = mctp_pcc_mtu;
>>>> + ndev->min_mtu = MCTP_MIN_MTU;
>>>> +
>>>> + rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
>>>> + if (rc)
>>>> + goto free_netdev;
>>>> +
>>>> + return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
>>> As has been mentioned elsewhere, using the devm cleanup mechanism is a
>>> bit unconventional here. You have the device remove callback available,
>>> which lets you do the same, and that way you can demonstrate symmetry
>>> between the add and remove implementations.
>> This has gone through a few iterations and I thought I had it clear.
>>
>> I was trying to make use of automated cleanup as much as possible.
> OK, your call there. Using ->remove allows you to be explicit about the
> matching ordering, which would be my approach, but that's certainly not
> the only correct one.
The last of the devm_add_action_or_reset will now go away: I don't have
to free the channel when I clean up the device, as it will be done when
stop is called.
>
> Cheers,
>
>
> Jeremy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC Transport
2025-09-02 22:45 ` Adam Young
@ 2025-09-03 3:06 ` Adam Young
2025-09-03 3:31 ` Jeremy Kerr
0 siblings, 1 reply; 8+ messages in thread
From: Adam Young @ 2025-09-03 3:06 UTC (permalink / raw)
To: Jeremy Kerr, Adam Young, Matt Johnston, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel, Sudeep Holla, Jonathan Cameron, Huisong Li
On 9/2/25 18:45, Adam Young wrote:
>>>> +static int mctp_pcc_ndo_stop(struct net_device *ndev)
>>>> +{
>>>> + struct mctp_pcc_ndev *mctp_pcc_ndev =
>>>> + netdev_priv(ndev);
>>> Minor: Unneeded wrapping here, and it seems to be suppressing the
>>> warning about a blank line after declarations.
>> The Reverse XMasstree format checker I am using seems overly strict. I
>> will try to unwrap all of these. Is it better to do a separate variable
>> initialization? Seems a bad coding practice for just a format decision
>> that is purely aesthetic. But this one is simple to fix.
> That shouldn't be tripping any RCT checks here, as it's just one
> variable init?
>
> mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
>
> Keep it in one if possible (as you have done).
The issue is being moved around. After my current set of changes, it
ends up like this:
static int initialize_MTU(struct net_device *ndev)
{
struct mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
struct mctp_pcc_mailbox *outbox
= &mctp_pcc_ndev->outbox;
int mctp_pcc_mtu;
Without the wrapping I get:
WARNING: Violation(s) in mctp-pcc.c
Line 275
struct mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
struct mctp_pcc_mailbox *outbox = &mctp_pcc_ndev->outbox;
I could move the initialization of outbox, but that seems wrong. The
wrapping is the least bad option here.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over PCC Transport
2025-09-03 3:06 ` Adam Young
@ 2025-09-03 3:31 ` Jeremy Kerr
0 siblings, 0 replies; 8+ messages in thread
From: Jeremy Kerr @ 2025-09-03 3:31 UTC (permalink / raw)
To: Adam Young, Adam Young, Matt Johnston, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: netdev, linux-kernel, Sudeep Holla, Jonathan Cameron, Huisong Li
Hi Adam,
> Without the wrapping I get:
> WARNING: Violation(s) in mctp-pcc.c
> Line 275
> struct mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
> struct mctp_pcc_mailbox *outbox = &mctp_pcc_ndev->outbox;
>
> I could move the initialization of outbox, but that seems wrong. The
> wrapping is the least bad option here.
You're kind-of tricking your RCT checker there, by introducing this
unnecessary wrap - and the results are not RCT. I wouldn't call that the
best option.
The netdev docs say this:
If there are dependencies between the variables preventing the ordering
move the initialization out of line.
But I think this is a fairly legitimate case of breaking RCT for better
readability, and preserving variable-initiness. We have exactly this,
intentionally, in existing code, eg.:
static void mctp_usb_out_complete(struct urb *urb)
{
struct sk_buff *skb = urb->context;
struct net_device *netdev = skb->dev;
int status;
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/mctp/mctp-usb.c#n38
- which triggers no checkpatch/NIPA warnings.
The wrapping in the middle of declarations seem odd for a non-80-col
line.
I would suggest either out-of-line, or slightly-broken RCT, over the
checker-workaround format. However, I would not reject a patch on this
choice of style alone :D
Cheers,
Jeremy
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-09-03 3:31 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-28 4:33 [PATCH net-next v27 0/1] MCTP Over PCC Transport Adam Young
2025-08-28 4:33 ` [PATCH net-next v27 1/1] mctp pcc: Implement MCTP over " Adam Young
2025-08-29 9:16 ` Jeremy Kerr
2025-08-29 15:59 ` Adam Young
2025-09-01 2:31 ` Jeremy Kerr
2025-09-02 22:45 ` Adam Young
2025-09-03 3:06 ` Adam Young
2025-09-03 3:31 ` Jeremy Kerr
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).