* RE: [EXTERNAL] Re: [PATCH rdma] RDMA/mana_ib: Disable RX steering on RSS QP destroy
From: Long Li @ 2026-03-23 18:03 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Konstantin Taranov, Jakub Kicinski, David S . Miller, Paolo Abeni,
Eric Dumazet, Andrew Lunn, Jason Gunthorpe, Haiyang Zhang,
KY Srinivasan, Wei Liu, Dexuan Cui, Simon Horman,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
In-Reply-To: <20260322184848.GC814676@unreal>
> On Fri, Mar 20, 2026 at 05:28:42PM -0700, Long Li wrote:
> > When an RSS QP is destroyed (e.g. DPDK exit), mana_ib_destroy_qp_rss()
> > destroys the RX WQ objects but does not disable vPort RX steering in
> > firmware. This leaves stale steering configuration that still points
> > to the destroyed RX objects.
> >
> > If traffic continues to arrive (e.g. peer VM is still transmitting)
> > and the VF interface is subsequently brought up (mana_open), the
> > firmware may deliver completions using stale CQ IDs from the old RX objects.
> > These CQ IDs can be reused by the ethernet driver for new TX CQs,
> > causing RX completions to land on TX CQs:
> >
> > WARNING: mana_poll_tx_cq+0x1b8/0x220 [mana] (is_sq == false)
> > WARNING: mana_gd_process_eq_events+0x209/0x290 (cq_table lookup
> > fails)
> >
> > Fix this by disabling vPort RX steering before destroying RX WQ objects.
> > Note that mana_fence_rqs() cannot be used here because the fence
> > completion is delivered on the CQ, which is polled by user-mode (e.g.
> > DPDK) and not visible to the kernel driver.
> >
> > Refactor the disable logic into a shared mana_disable_vport_rx() in
> > mana_en, exported for use by mana_ib, replacing the duplicate code.
> > The ethernet driver's mana_dealloc_queues() is also updated to call
> > this common function.
> >
> > Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure
> > Network Adapter")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Long Li <longli@microsoft.com>
> > ---
> > drivers/infiniband/hw/mana/qp.c | 17 ++++++++++++++++-
> > drivers/net/ethernet/microsoft/mana/mana_en.c | 11 ++++++++++-
> > include/net/mana/mana.h | 1 +
> > 3 files changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mana/qp.c
> > b/drivers/infiniband/hw/mana/qp.c index 80cf4ade4b75..b27084c53a14
> > 100644
> > --- a/drivers/infiniband/hw/mana/qp.c
> > +++ b/drivers/infiniband/hw/mana/qp.c
> > @@ -829,11 +829,26 @@ static int mana_ib_destroy_qp_rss(struct
> mana_ib_qp *qp,
> > struct net_device *ndev;
> > struct mana_ib_wq *wq;
> > struct ib_wq *ibwq;
> > - int i;
> > + int i, err;
> >
> > ndev = mana_ib_get_netdev(qp->ibqp.device, qp->port);
> > mpc = netdev_priv(ndev);
> >
> > + /* Disable vPort RX steering before destroying RX WQ objects.
> > + * Otherwise firmware still routes traffic to the destroyed queues,
> > + * which can cause bogus completions on reused CQ IDs when the
> > + * ethernet driver later creates new queues on mana_open().
> > + *
> > + * Unlike the ethernet teardown path, mana_fence_rqs() cannot be
> > + * used here because the fence completion CQE is delivered on the
> > + * CQ which is polled by userspace (e.g. DPDK), so there is no way
> > + * for the kernel to wait for fence completion.
> > + */
> > + err = mana_disable_vport_rx(mpc);
> > + if (err)
> > + ibdev_err(&mdev->ib_dev,
> > + "Failed to disable vPort RX: %d\n", err);
>
> mana_cfg_vport_steering() is already prints in all failure scenarios.
>
> Thanks
I'm sending v2 with this message removed.
Thanks,
Long
>
> > +
> > for (i = 0; i < (1 << ind_tbl->log_ind_tbl_size); i++) {
> > ibwq = ind_tbl->ind_tbl[i];
> > wq = container_of(ibwq, struct mana_ib_wq, ibwq); diff --git
> > a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > index 22444c7530a5..51719ef1c09b 100644
> > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> > @@ -2934,6 +2934,13 @@ static void mana_rss_table_init(struct
> mana_port_context *apc)
> > ethtool_rxfh_indir_default(i, apc->num_queues); }
> >
> > +int mana_disable_vport_rx(struct mana_port_context *apc) {
> > + return mana_cfg_vport_steering(apc, TRI_STATE_FALSE, false, false,
> > + false);
> > +}
> > +EXPORT_SYMBOL_NS(mana_disable_vport_rx, "NET_MANA");
> > +
> > int mana_config_rss(struct mana_port_context *apc, enum TRI_STATE rx,
> > bool update_hash, bool update_tab) { @@ -3339,10 +3346,12
> @@
> > static int mana_dealloc_queues(struct net_device *ndev)
> > */
> >
> > apc->rss_state = TRI_STATE_FALSE;
> > - err = mana_config_rss(apc, TRI_STATE_FALSE, false, false);
> > + err = mana_disable_vport_rx(apc);
> > if (err && mana_en_need_log(apc, err))
> > netdev_err(ndev, "Failed to disable vPort: %d\n", err);
> >
> > + mana_fence_rqs(apc);
> > +
> > /* Even in err case, still need to cleanup the vPort */
> > mana_destroy_rxqs(apc);
> > mana_destroy_txq(apc);
> > diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h index
> > 204c2b612a62..2634e9135eed 100644
> > --- a/include/net/mana/mana.h
> > +++ b/include/net/mana/mana.h
> > @@ -574,6 +574,7 @@ struct mana_port_context { netdev_tx_t
> > mana_start_xmit(struct sk_buff *skb, struct net_device *ndev); int
> > mana_config_rss(struct mana_port_context *ac, enum TRI_STATE rx,
> > bool update_hash, bool update_tab);
> > +int mana_disable_vport_rx(struct mana_port_context *apc);
> >
> > int mana_alloc_queues(struct net_device *ndev); int
> > mana_attach(struct net_device *ndev);
> > --
> > 2.43.0
> >
^ permalink raw reply
* [PATCH net-next v2] net: mana: Set default number of queues to 16
From: Long Li @ 2026-03-23 19:49 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
Set the default number of queues per vPort to MANA_DEF_NUM_QUEUES (16),
as 16 queues can achieve optimal throughput for typical workloads. The
actual number of queues may be lower if it exceeds the hardware reported
limit. Users can increase the number of queues up to max_queues via
ethtool if needed.
Signed-off-by: Long Li <longli@microsoft.com>
---
v2:
- Updated commit message to clarify that the actual number of queues
may be lower if it exceeds the hardware reported limit.
drivers/net/ethernet/microsoft/mana/mana_en.c | 3 ++-
include/net/mana/mana.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 49c65cc1697c..b39e8b920791 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3357,7 +3357,8 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
apc->ac = ac;
apc->ndev = ndev;
apc->max_queues = gc->max_num_queues;
- apc->num_queues = gc->max_num_queues;
+ /* Use MANA_DEF_NUM_QUEUES as default, still honoring the HW limit */
+ apc->num_queues = min(gc->max_num_queues, MANA_DEF_NUM_QUEUES);
apc->tx_queue_size = DEF_TX_BUFFERS_PER_QUEUE;
apc->rx_queue_size = DEF_RX_BUFFERS_PER_QUEUE;
apc->port_handle = INVALID_MANA_HANDLE;
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 3336688fed5e..96d21cbbdee2 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -1007,6 +1007,7 @@ struct mana_deregister_filter_resp {
#define STATISTICS_FLAGS_TX_ERRORS_GDMA_ERROR 0x0000000004000000
#define MANA_MAX_NUM_QUEUES 64
+#define MANA_DEF_NUM_QUEUES 16
#define MANA_SHORT_VPORT_OFFSET_MAX ((1U << 8) - 1)
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v5 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management
From: Long Li @ 2026-03-23 19:59 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
This series adds per-vPort Event Queue (EQ) allocation and MSI-X interrupt
management for the MANA driver. Previously, all vPorts shared a single set
of EQs. This change enables dedicated EQs per vPort with support for both
dedicated and shared MSI-X vector allocation modes.
Patch 1 moves EQ ownership from mana_context to per-vPort mana_port_context
and exports create/destroy functions for the RDMA driver.
Patch 2 adds device capability queries to determine whether MSI-X vectors
should be dedicated per-vPort or shared. When the number of available MSI-X
vectors is insufficient for dedicated allocation, the driver enables sharing
mode with bitmap-based vector assignment.
Patch 3 introduces the GIC (GDMA IRQ Context) abstraction with reference
counting, allowing multiple EQs to safely share a single MSI-X vector.
Patch 4 converts the global EQ allocation in probe/resume to use the new
GIC functions.
Patch 5 adds per-vPort GIC lifecycle management, calling get/put on each
EQ creation and destruction during vPort open/close.
Patch 6 extends the same GIC lifecycle management to the RDMA driver's EQ
allocation path.
Changes in v5:
- Rebased on net-next/main
Changes in v4:
- Rebased on net-next/main 7.0-rc4
- Patch 2: Use MANA_DEF_NUM_QUEUES instead of hardcoded 16 for
max_num_queues clamping
- Patch 3: Track dyn_msix in GIC context instead of re-checking
pci_msix_can_alloc_dyn() on each call; improved remove_irqs iteration
to skip unallocated entries
Changes in v3:
- Rebased on net-next/main
- Patch 1: Added NULL check for mpc->eqs in mana_ib_create_qp_rss() to
prevent NULL pointer dereference when RSS QP is created before a raw QP
has configured the vport and allocated EQs
Changes in v2:
- Rebased on net-next/main (adapted to kzalloc_objs/kzalloc_obj macros,
new GDMA_DRV_CAP_FLAG definitions)
- Patch 2: Fixed misleading comment for max_num_queues vs
max_num_queues_vport in gdma.h
- Patch 3: Fixed spelling typo in gdma_main.c ("difference" -> "different")
Long Li (6):
net: mana: Create separate EQs for each vPort
net: mana: Query device capabilities and configure MSI-X sharing for
EQs
net: mana: Introduce GIC context with refcounting for interrupt
management
net: mana: Use GIC functions to allocate global EQs
net: mana: Allocate interrupt context for each EQ when creating vPort
RDMA/mana_ib: Allocate interrupt contexts on EQs
drivers/infiniband/hw/mana/main.c | 47 ++-
drivers/infiniband/hw/mana/qp.c | 16 +-
.../net/ethernet/microsoft/mana/gdma_main.c | 307 +++++++++++++-----
drivers/net/ethernet/microsoft/mana/mana_en.c | 162 +++++----
include/net/mana/gdma.h | 32 +-
include/net/mana/mana.h | 7 +-
6 files changed, 416 insertions(+), 155 deletions(-)
--
2.43.0
^ permalink raw reply
* [PATCH net-next v5 1/6] net: mana: Create separate EQs for each vPort
From: Long Li @ 2026-03-23 19:59 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
In-Reply-To: <20260323195952.1767304-1-longli@microsoft.com>
To prepare for assigning vPorts to dedicated MSI-X vectors, remove EQ
sharing among the vPorts and create dedicated EQs for each vPort.
Move the EQ definition from struct mana_context to struct mana_port_context
and update related support functions. Export mana_create_eq() and
mana_destroy_eq() for use by the MANA RDMA driver.
Signed-off-by: Long Li <longli@microsoft.com>
---
Changes in v3:
- Added NULL check for mpc->eqs in mana_ib_create_qp_rss() to prevent
kernel crash when RSS QP is created before EQs are allocated
---
drivers/infiniband/hw/mana/main.c | 14 ++-
drivers/infiniband/hw/mana/qp.c | 16 ++-
drivers/net/ethernet/microsoft/mana/mana_en.c | 109 ++++++++++--------
include/net/mana/mana.h | 7 +-
4 files changed, 94 insertions(+), 52 deletions(-)
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index 8d99cd00f002..d51dd0ee85f4 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -20,8 +20,10 @@ void mana_ib_uncfg_vport(struct mana_ib_dev *dev, struct mana_ib_pd *pd,
pd->vport_use_count--;
WARN_ON(pd->vport_use_count < 0);
- if (!pd->vport_use_count)
+ if (!pd->vport_use_count) {
+ mana_destroy_eq(mpc);
mana_uncfg_vport(mpc);
+ }
mutex_unlock(&pd->vport_mutex);
}
@@ -55,15 +57,21 @@ int mana_ib_cfg_vport(struct mana_ib_dev *dev, u32 port, struct mana_ib_pd *pd,
return err;
}
- mutex_unlock(&pd->vport_mutex);
pd->tx_shortform_allowed = mpc->tx_shortform_allowed;
pd->tx_vp_offset = mpc->tx_vp_offset;
+ err = mana_create_eq(mpc);
+ if (err) {
+ mana_uncfg_vport(mpc);
+ pd->vport_use_count--;
+ }
+
+ mutex_unlock(&pd->vport_mutex);
ibdev_dbg(&dev->ib_dev, "vport handle %llx pdid %x doorbell_id %x\n",
mpc->port_handle, pd->pdn, doorbell_id);
- return 0;
+ return err;
}
int mana_ib_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 82f84f7ad37a..80cf4ade4b75 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -188,7 +188,15 @@ static int mana_ib_create_qp_rss(struct ib_qp *ibqp, struct ib_pd *pd,
cq_spec.gdma_region = cq->queue.gdma_region;
cq_spec.queue_size = cq->cqe * COMP_ENTRY_SIZE;
cq_spec.modr_ctx_id = 0;
- eq = &mpc->ac->eqs[cq->comp_vector];
+ /* EQs are created when a raw QP configures the vport.
+ * A raw QP must be created before creating rwq_ind_tbl.
+ */
+ if (!mpc->eqs) {
+ ret = -EINVAL;
+ i--;
+ goto fail;
+ }
+ eq = &mpc->eqs[cq->comp_vector % mpc->num_queues];
cq_spec.attached_eq = eq->eq->id;
ret = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_RQ,
@@ -340,7 +348,11 @@ static int mana_ib_create_qp_raw(struct ib_qp *ibqp, struct ib_pd *ibpd,
cq_spec.queue_size = send_cq->cqe * COMP_ENTRY_SIZE;
cq_spec.modr_ctx_id = 0;
eq_vec = send_cq->comp_vector;
- eq = &mpc->ac->eqs[eq_vec];
+ if (!mpc->eqs) {
+ err = -EINVAL;
+ goto err_destroy_queue;
+ }
+ eq = &mpc->eqs[eq_vec % mpc->num_queues];
cq_spec.attached_eq = eq->eq->id;
err = mana_create_wq_obj(mpc, mpc->port_handle, GDMA_SQ, &wq_spec,
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index b39e8b920791..178c583d74b4 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1596,78 +1596,82 @@ void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
}
EXPORT_SYMBOL_NS(mana_destroy_wq_obj, "NET_MANA");
-static void mana_destroy_eq(struct mana_context *ac)
+void mana_destroy_eq(struct mana_port_context *apc)
{
+ struct mana_context *ac = apc->ac;
struct gdma_context *gc = ac->gdma_dev->gdma_context;
struct gdma_queue *eq;
int i;
- if (!ac->eqs)
+ if (!apc->eqs)
return;
- debugfs_remove_recursive(ac->mana_eqs_debugfs);
- ac->mana_eqs_debugfs = NULL;
+ debugfs_remove_recursive(apc->mana_eqs_debugfs);
+ apc->mana_eqs_debugfs = NULL;
- for (i = 0; i < gc->max_num_queues; i++) {
- eq = ac->eqs[i].eq;
+ for (i = 0; i < apc->num_queues; i++) {
+ eq = apc->eqs[i].eq;
if (!eq)
continue;
mana_gd_destroy_queue(gc, eq);
}
- kfree(ac->eqs);
- ac->eqs = NULL;
+ kfree(apc->eqs);
+ apc->eqs = NULL;
}
+EXPORT_SYMBOL_NS(mana_destroy_eq, "NET_MANA");
-static void mana_create_eq_debugfs(struct mana_context *ac, int i)
+static void mana_create_eq_debugfs(struct mana_port_context *apc, int i)
{
- struct mana_eq eq = ac->eqs[i];
+ struct mana_eq eq = apc->eqs[i];
char eqnum[32];
sprintf(eqnum, "eq%d", i);
- eq.mana_eq_debugfs = debugfs_create_dir(eqnum, ac->mana_eqs_debugfs);
+ eq.mana_eq_debugfs = debugfs_create_dir(eqnum, apc->mana_eqs_debugfs);
debugfs_create_u32("head", 0400, eq.mana_eq_debugfs, &eq.eq->head);
debugfs_create_u32("tail", 0400, eq.mana_eq_debugfs, &eq.eq->tail);
debugfs_create_file("eq_dump", 0400, eq.mana_eq_debugfs, eq.eq, &mana_dbg_q_fops);
}
-static int mana_create_eq(struct mana_context *ac)
+int mana_create_eq(struct mana_port_context *apc)
{
- struct gdma_dev *gd = ac->gdma_dev;
+ struct gdma_dev *gd = apc->ac->gdma_dev;
struct gdma_context *gc = gd->gdma_context;
struct gdma_queue_spec spec = {};
int err;
int i;
- ac->eqs = kzalloc_objs(struct mana_eq, gc->max_num_queues);
- if (!ac->eqs)
+ WARN_ON(apc->eqs);
+ apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
+ if (!apc->eqs)
return -ENOMEM;
spec.type = GDMA_EQ;
spec.monitor_avl_buf = false;
spec.queue_size = EQ_SIZE;
spec.eq.callback = NULL;
- spec.eq.context = ac->eqs;
+ spec.eq.context = apc->eqs;
spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
- ac->mana_eqs_debugfs = debugfs_create_dir("EQs", gc->mana_pci_debugfs);
+ apc->mana_eqs_debugfs = debugfs_create_dir("EQs", apc->mana_port_debugfs);
- for (i = 0; i < gc->max_num_queues; i++) {
+ for (i = 0; i < apc->num_queues; i++) {
spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
- err = mana_gd_create_mana_eq(gd, &spec, &ac->eqs[i].eq);
+ err = mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq);
if (err) {
dev_err(gc->dev, "Failed to create EQ %d : %d\n", i, err);
goto out;
}
- mana_create_eq_debugfs(ac, i);
+ mana_create_eq_debugfs(apc, i);
}
return 0;
out:
- mana_destroy_eq(ac);
+ mana_destroy_eq(apc);
return err;
}
+EXPORT_SYMBOL_NS(mana_create_eq, "NET_MANA");
static int mana_fence_rq(struct mana_port_context *apc, struct mana_rxq *rxq)
{
@@ -2421,7 +2425,7 @@ static int mana_create_txq(struct mana_port_context *apc,
spec.monitor_avl_buf = false;
spec.queue_size = cq_size;
spec.cq.callback = mana_schedule_napi;
- spec.cq.parent_eq = ac->eqs[i].eq;
+ spec.cq.parent_eq = apc->eqs[i].eq;
spec.cq.context = cq;
err = mana_gd_create_mana_wq_cq(gd, &spec, &cq->gdma_cq);
if (err)
@@ -2814,13 +2818,12 @@ static void mana_create_rxq_debugfs(struct mana_port_context *apc, int idx)
static int mana_add_rx_queues(struct mana_port_context *apc,
struct net_device *ndev)
{
- struct mana_context *ac = apc->ac;
struct mana_rxq *rxq;
int err = 0;
int i;
for (i = 0; i < apc->num_queues; i++) {
- rxq = mana_create_rxq(apc, i, &ac->eqs[i], ndev);
+ rxq = mana_create_rxq(apc, i, &apc->eqs[i], ndev);
if (!rxq) {
err = -ENOMEM;
netdev_err(ndev, "Failed to create rxq %d : %d\n", i, err);
@@ -2839,9 +2842,8 @@ static int mana_add_rx_queues(struct mana_port_context *apc,
return err;
}
-static void mana_destroy_vport(struct mana_port_context *apc)
+static void mana_destroy_rxqs(struct mana_port_context *apc)
{
- struct gdma_dev *gd = apc->ac->gdma_dev;
struct mana_rxq *rxq;
u32 rxq_idx;
@@ -2853,8 +2855,12 @@ static void mana_destroy_vport(struct mana_port_context *apc)
mana_destroy_rxq(apc, rxq, true);
apc->rxqs[rxq_idx] = NULL;
}
+}
+
+static void mana_destroy_vport(struct mana_port_context *apc)
+{
+ struct gdma_dev *gd = apc->ac->gdma_dev;
- mana_destroy_txq(apc);
mana_uncfg_vport(apc);
if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode)
@@ -2875,11 +2881,7 @@ static int mana_create_vport(struct mana_port_context *apc,
return err;
}
- err = mana_cfg_vport(apc, gd->pdid, gd->doorbell);
- if (err)
- return err;
-
- return mana_create_txq(apc, net);
+ return mana_cfg_vport(apc, gd->pdid, gd->doorbell);
}
static int mana_rss_table_alloc(struct mana_port_context *apc)
@@ -3156,21 +3158,36 @@ int mana_alloc_queues(struct net_device *ndev)
err = mana_create_vport(apc, ndev);
if (err) {
- netdev_err(ndev, "Failed to create vPort %u : %d\n", apc->port_idx, err);
+ netdev_err(ndev, "Failed to create vPort %u : %d\n",
+ apc->port_idx, err);
return err;
}
+ err = mana_create_eq(apc);
+ if (err) {
+ netdev_err(ndev, "Failed to create EQ on vPort %u: %d\n",
+ apc->port_idx, err);
+ goto destroy_vport;
+ }
+
+ err = mana_create_txq(apc, ndev);
+ if (err) {
+ netdev_err(ndev, "Failed to create TXQ on vPort %u: %d\n",
+ apc->port_idx, err);
+ goto destroy_eq;
+ }
+
err = netif_set_real_num_tx_queues(ndev, apc->num_queues);
if (err) {
netdev_err(ndev,
"netif_set_real_num_tx_queues () failed for ndev with num_queues %u : %d\n",
apc->num_queues, err);
- goto destroy_vport;
+ goto destroy_txq;
}
err = mana_add_rx_queues(apc, ndev);
if (err)
- goto destroy_vport;
+ goto destroy_rxq;
apc->rss_state = apc->num_queues > 1 ? TRI_STATE_TRUE : TRI_STATE_FALSE;
@@ -3179,7 +3196,7 @@ int mana_alloc_queues(struct net_device *ndev)
netdev_err(ndev,
"netif_set_real_num_rx_queues () failed for ndev with num_queues %u : %d\n",
apc->num_queues, err);
- goto destroy_vport;
+ goto destroy_rxq;
}
mana_rss_table_init(apc);
@@ -3187,19 +3204,25 @@ int mana_alloc_queues(struct net_device *ndev)
err = mana_config_rss(apc, TRI_STATE_TRUE, true, true);
if (err) {
netdev_err(ndev, "Failed to configure RSS table: %d\n", err);
- goto destroy_vport;
+ goto destroy_rxq;
}
if (gd->gdma_context->is_pf && !apc->ac->bm_hostmode) {
err = mana_pf_register_filter(apc);
if (err)
- goto destroy_vport;
+ goto destroy_rxq;
}
mana_chn_setxdp(apc, mana_xdp_get(apc));
return 0;
+destroy_rxq:
+ mana_destroy_rxqs(apc);
+destroy_txq:
+ mana_destroy_txq(apc);
+destroy_eq:
+ mana_destroy_eq(apc);
destroy_vport:
mana_destroy_vport(apc);
return err;
@@ -3302,6 +3325,9 @@ static int mana_dealloc_queues(struct net_device *ndev)
netdev_err(ndev, "Failed to disable vPort: %d\n", err);
/* Even in err case, still need to cleanup the vPort */
+ mana_destroy_rxqs(apc);
+ mana_destroy_txq(apc);
+ mana_destroy_eq(apc);
mana_destroy_vport(apc);
return 0;
@@ -3618,12 +3644,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
gd->driver_data = ac;
}
- err = mana_create_eq(ac);
- if (err) {
- dev_err(dev, "Failed to create EQs: %d\n", err);
- goto out;
- }
-
err = mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
if (err)
@@ -3762,7 +3782,6 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
free_netdev(ndev);
}
- mana_destroy_eq(ac);
out:
if (ac->per_port_queue_reset_wq) {
destroy_workqueue(ac->per_port_queue_reset_wq);
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 96d21cbbdee2..204c2b612a62 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -480,8 +480,6 @@ struct mana_context {
u8 bm_hostmode;
struct mana_ethtool_hc_stats hc_stats;
- struct mana_eq *eqs;
- struct dentry *mana_eqs_debugfs;
struct workqueue_struct *per_port_queue_reset_wq;
/* Workqueue for querying hardware stats */
struct delayed_work gf_stats_work;
@@ -501,6 +499,9 @@ struct mana_port_context {
u8 mac_addr[ETH_ALEN];
+ struct mana_eq *eqs;
+ struct dentry *mana_eqs_debugfs;
+
enum TRI_STATE rss_state;
mana_handle_t default_rxobj;
@@ -1033,6 +1034,8 @@ void mana_destroy_wq_obj(struct mana_port_context *apc, u32 wq_type,
int mana_cfg_vport(struct mana_port_context *apc, u32 protection_dom_id,
u32 doorbell_pg_id);
void mana_uncfg_vport(struct mana_port_context *apc);
+int mana_create_eq(struct mana_port_context *apc);
+void mana_destroy_eq(struct mana_port_context *apc);
struct net_device *mana_get_primary_netdev(struct mana_context *ac,
u32 port_index,
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v5 2/6] net: mana: Query device capabilities and configure MSI-X sharing for EQs
From: Long Li @ 2026-03-23 19:59 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
In-Reply-To: <20260323195952.1767304-1-longli@microsoft.com>
When querying the device, adjust the max number of queues to allow
dedicated MSI-X vectors for each vPort. The number of queues per vPort
is clamped to no less than MANA_DEF_NUM_QUEUES. MSI-X sharing among
vPorts is disabled by default and is only enabled when there are not
enough MSI-X vectors for dedicated allocation.
Rename mana_query_device_cfg() to mana_gd_query_device_cfg() as it is
used at GDMA device probe time for querying device capabilities.
Signed-off-by: Long Li <longli@microsoft.com>
---
Changes in v4:
- Use MANA_DEF_NUM_QUEUES instead of hardcoded 16 for max_num_queues
clamping
Changes in v2:
- Fixed misleading comment for max_num_queues vs max_num_queues_vport
in gdma.h
---
.../net/ethernet/microsoft/mana/gdma_main.c | 66 ++++++++++++++++---
drivers/net/ethernet/microsoft/mana/mana_en.c | 36 +++++-----
include/net/mana/gdma.h | 13 +++-
3 files changed, 91 insertions(+), 24 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 2ba1fa3336f9..ae18b4054a02 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -124,6 +124,9 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
struct gdma_context *gc = pci_get_drvdata(pdev);
struct gdma_query_max_resources_resp resp = {};
struct gdma_general_req req = {};
+ unsigned int max_num_queues;
+ u8 bm_hostmode;
+ u16 num_ports;
int err;
mana_gd_init_req_hdr(&req.hdr, GDMA_QUERY_MAX_RESOURCES,
@@ -169,6 +172,40 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
if (gc->max_num_queues > gc->num_msix_usable - 1)
gc->max_num_queues = gc->num_msix_usable - 1;
+ err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
+ MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
+ if (err)
+ return err;
+
+ if (!num_ports)
+ return -EINVAL;
+
+ /*
+ * Adjust gc->max_num_queues returned from the SOC to allow dedicated
+ * MSIx for each vPort. Clamp to no less than MANA_DEF_NUM_QUEUES.
+ */
+ max_num_queues = (gc->num_msix_usable - 1) / num_ports;
+ max_num_queues = roundup_pow_of_two(max(max_num_queues, 1U));
+ if (max_num_queues < MANA_DEF_NUM_QUEUES)
+ max_num_queues = MANA_DEF_NUM_QUEUES;
+
+ /*
+ * Use dedicated MSIx for EQs whenever possible, use MSIx sharing for
+ * Ethernet EQs when (max_num_queues * num_ports > num_msix_usable - 1)
+ */
+ max_num_queues = min(gc->max_num_queues, max_num_queues);
+ if (max_num_queues * num_ports > gc->num_msix_usable - 1)
+ gc->msi_sharing = true;
+
+ /* If MSI is shared, use max allowed value */
+ if (gc->msi_sharing)
+ gc->max_num_queues_vport = min(gc->num_msix_usable - 1, gc->max_num_queues);
+ else
+ gc->max_num_queues_vport = max_num_queues;
+
+ dev_info(gc->dev, "MSI sharing mode %d max queues %d\n",
+ gc->msi_sharing, gc->max_num_queues);
+
return 0;
}
@@ -1831,6 +1868,7 @@ static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev)
/* Need 1 interrupt for HWC */
max_irqs = min(num_online_cpus(), MANA_MAX_NUM_QUEUES) + 1;
min_irqs = 2;
+ gc->msi_sharing = true;
}
nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX);
@@ -1909,6 +1947,8 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev)
pci_free_irq_vectors(pdev);
+ bitmap_free(gc->msi_bitmap);
+ gc->msi_bitmap = NULL;
gc->max_num_msix = 0;
gc->num_msix_usable = 0;
}
@@ -1943,20 +1983,30 @@ static int mana_gd_setup(struct pci_dev *pdev)
if (err)
goto destroy_hwc;
- err = mana_gd_query_max_resources(pdev);
+ err = mana_gd_detect_devices(pdev);
if (err)
goto destroy_hwc;
- err = mana_gd_setup_remaining_irqs(pdev);
- if (err) {
- dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
- goto destroy_hwc;
- }
-
- err = mana_gd_detect_devices(pdev);
+ err = mana_gd_query_max_resources(pdev);
if (err)
goto destroy_hwc;
+ if (!gc->msi_sharing) {
+ gc->msi_bitmap = bitmap_zalloc(gc->num_msix_usable, GFP_KERNEL);
+ if (!gc->msi_bitmap) {
+ err = -ENOMEM;
+ goto destroy_hwc;
+ }
+ /* Set bit for HWC */
+ set_bit(0, gc->msi_bitmap);
+ } else {
+ err = mana_gd_setup_remaining_irqs(pdev);
+ if (err) {
+ dev_err(gc->dev, "Failed to setup remaining IRQs: %d", err);
+ goto destroy_hwc;
+ }
+ }
+
dev_dbg(&pdev->dev, "mana gdma setup successful\n");
return 0;
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 178c583d74b4..004d48bba8aa 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1000,10 +1000,9 @@ static int mana_init_port_context(struct mana_port_context *apc)
return !apc->rxqs ? -ENOMEM : 0;
}
-static int mana_send_request(struct mana_context *ac, void *in_buf,
- u32 in_len, void *out_buf, u32 out_len)
+static int gdma_mana_send_request(struct gdma_context *gc, void *in_buf,
+ u32 in_len, void *out_buf, u32 out_len)
{
- struct gdma_context *gc = ac->gdma_dev->gdma_context;
struct gdma_resp_hdr *resp = out_buf;
struct gdma_req_hdr *req = in_buf;
struct device *dev = gc->dev;
@@ -1037,6 +1036,14 @@ static int mana_send_request(struct mana_context *ac, void *in_buf,
return 0;
}
+static int mana_send_request(struct mana_context *ac, void *in_buf,
+ u32 in_len, void *out_buf, u32 out_len)
+{
+ struct gdma_context *gc = ac->gdma_dev->gdma_context;
+
+ return gdma_mana_send_request(gc, in_buf, in_len, out_buf, out_len);
+}
+
static int mana_verify_resp_hdr(const struct gdma_resp_hdr *resp_hdr,
const enum mana_command_code expected_code,
const u32 min_size)
@@ -1170,11 +1177,10 @@ static void mana_pf_deregister_filter(struct mana_port_context *apc)
err, resp.hdr.status);
}
-static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
- u32 proto_minor_ver, u32 proto_micro_ver,
- u16 *max_num_vports, u8 *bm_hostmode)
+int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
+ u32 proto_minor_ver, u32 proto_micro_ver,
+ u16 *max_num_vports, u8 *bm_hostmode)
{
- struct gdma_context *gc = ac->gdma_dev->gdma_context;
struct mana_query_device_cfg_resp resp = {};
struct mana_query_device_cfg_req req = {};
struct device *dev = gc->dev;
@@ -1189,7 +1195,7 @@ static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
req.proto_minor_ver = proto_minor_ver;
req.proto_micro_ver = proto_micro_ver;
- err = mana_send_request(ac, &req, sizeof(req), &resp, sizeof(resp));
+ err = gdma_mana_send_request(gc, &req, sizeof(req), &resp, sizeof(resp));
if (err) {
dev_err(dev, "Failed to query config: %d", err);
return err;
@@ -1217,8 +1223,6 @@ static int mana_query_device_cfg(struct mana_context *ac, u32 proto_major_ver,
else
*bm_hostmode = 0;
- debugfs_create_u16("adapter-MTU", 0400, gc->mana_pci_debugfs, &gc->adapter_mtu);
-
return 0;
}
@@ -3373,7 +3377,7 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
int err;
ndev = alloc_etherdev_mq(sizeof(struct mana_port_context),
- gc->max_num_queues);
+ gc->max_num_queues_vport);
if (!ndev)
return -ENOMEM;
@@ -3382,9 +3386,9 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
apc = netdev_priv(ndev);
apc->ac = ac;
apc->ndev = ndev;
- apc->max_queues = gc->max_num_queues;
+ apc->max_queues = gc->max_num_queues_vport;
/* Use MANA_DEF_NUM_QUEUES as default, still honoring the HW limit */
- apc->num_queues = min(gc->max_num_queues, MANA_DEF_NUM_QUEUES);
+ apc->num_queues = min(gc->max_num_queues_vport, MANA_DEF_NUM_QUEUES);
apc->tx_queue_size = DEF_TX_BUFFERS_PER_QUEUE;
apc->rx_queue_size = DEF_RX_BUFFERS_PER_QUEUE;
apc->port_handle = INVALID_MANA_HANDLE;
@@ -3644,13 +3648,15 @@ int mana_probe(struct gdma_dev *gd, bool resuming)
gd->driver_data = ac;
}
- err = mana_query_device_cfg(ac, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
- MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
+ err = mana_gd_query_device_cfg(gc, MANA_MAJOR_VERSION, MANA_MINOR_VERSION,
+ MANA_MICRO_VERSION, &num_ports, &bm_hostmode);
if (err)
goto out;
ac->bm_hostmode = bm_hostmode;
+ debugfs_create_u16("adapter-MTU", 0400, gc->mana_pci_debugfs, &gc->adapter_mtu);
+
if (!resuming) {
ac->num_ports = num_ports;
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 7fe3a1b61b2d..ecd9949df213 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -399,8 +399,10 @@ struct gdma_context {
struct device *dev;
struct dentry *mana_pci_debugfs;
- /* Per-vPort max number of queues */
+ /* Hardware max number of queues */
unsigned int max_num_queues;
+ /* Per-vPort max number of queues */
+ unsigned int max_num_queues_vport;
unsigned int max_num_msix;
unsigned int num_msix_usable;
struct xarray irq_contexts;
@@ -446,6 +448,12 @@ struct gdma_context {
struct workqueue_struct *service_wq;
unsigned long flags;
+
+ /* Indicate if this device is sharing MSI for EQs on MANA */
+ bool msi_sharing;
+
+ /* Bitmap tracks where MSI is allocated when it is not shared for EQs */
+ unsigned long *msi_bitmap;
};
static inline bool mana_gd_is_mana(struct gdma_dev *gd)
@@ -1013,4 +1021,7 @@ int mana_gd_resume(struct pci_dev *pdev);
bool mana_need_log(struct gdma_context *gc, int err);
+int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
+ u32 proto_minor_ver, u32 proto_micro_ver,
+ u16 *max_num_vports, u8 *bm_hostmode);
#endif /* _GDMA_H */
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v5 3/6] net: mana: Introduce GIC context with refcounting for interrupt management
From: Long Li @ 2026-03-23 19:59 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
In-Reply-To: <20260323195952.1767304-1-longli@microsoft.com>
To allow Ethernet EQs to use dedicated or shared MSI-X vectors and RDMA
EQs to share the same MSI-X, introduce a GIC (GDMA IRQ Context) with
reference counting. This allows the driver to create an interrupt context
on an assigned or unassigned MSI-X vector and share it across multiple
EQ consumers.
Signed-off-by: Long Li <longli@microsoft.com>
---
Changes in v4:
- Track dyn_msix in GIC context instead of re-checking
pci_msix_can_alloc_dyn() on each call
- Improved remove_irqs iteration to skip unallocated entries
Changes in v2:
- Fixed spelling typo in gdma_main.c ("difference" -> "different")
---
.../net/ethernet/microsoft/mana/gdma_main.c | 159 ++++++++++++++++++
include/net/mana/gdma.h | 11 ++
2 files changed, 170 insertions(+)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index ae18b4054a02..69a4427919f5 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1587,6 +1587,164 @@ static irqreturn_t mana_gd_intr(int irq, void *arg)
return IRQ_HANDLED;
}
+void mana_gd_put_gic(struct gdma_context *gc, bool use_msi_bitmap, int msi)
+{
+ struct pci_dev *dev = to_pci_dev(gc->dev);
+ struct msi_map irq_map;
+ struct gdma_irq_context *gic;
+ int irq;
+
+ mutex_lock(&gc->gic_mutex);
+
+ gic = xa_load(&gc->irq_contexts, msi);
+ if (WARN_ON(!gic)) {
+ mutex_unlock(&gc->gic_mutex);
+ return;
+ }
+
+ if (use_msi_bitmap)
+ gic->bitmap_refs--;
+
+ if (use_msi_bitmap && gic->bitmap_refs == 0)
+ clear_bit(msi, gc->msi_bitmap);
+
+ if (!refcount_dec_and_test(&gic->refcount))
+ goto out;
+
+ irq = pci_irq_vector(dev, msi);
+
+ irq_update_affinity_hint(irq, NULL);
+ free_irq(irq, gic);
+
+ if (gic->dyn_msix) {
+ irq_map.virq = irq;
+ irq_map.index = msi;
+ pci_msix_free_irq(dev, irq_map);
+ }
+
+ xa_erase(&gc->irq_contexts, msi);
+ kfree(gic);
+
+out:
+ mutex_unlock(&gc->gic_mutex);
+}
+EXPORT_SYMBOL_NS(mana_gd_put_gic, "NET_MANA");
+
+/*
+ * Get a GIC (GDMA IRQ Context) on a MSI vector
+ * a MSI can be shared between different EQs, this function supports setting
+ * up separate MSIs using a bitmap, or directly using the MSI index
+ *
+ * @use_msi_bitmap:
+ * True if MSI is assigned by this function on available slots from bitmap.
+ * False if MSI is passed from *msi_requested
+ */
+struct gdma_irq_context *mana_gd_get_gic(struct gdma_context *gc,
+ bool use_msi_bitmap,
+ int *msi_requested)
+{
+ struct gdma_irq_context *gic;
+ struct pci_dev *dev = to_pci_dev(gc->dev);
+ struct msi_map irq_map = { };
+ int irq;
+ int msi;
+ int err;
+
+ mutex_lock(&gc->gic_mutex);
+
+ if (use_msi_bitmap) {
+ msi = find_first_zero_bit(gc->msi_bitmap, gc->num_msix_usable);
+ if (msi >= gc->num_msix_usable) {
+ dev_err(gc->dev, "No free MSI vectors available\n");
+ gic = NULL;
+ goto out;
+ }
+ *msi_requested = msi;
+ } else {
+ msi = *msi_requested;
+ }
+
+ gic = xa_load(&gc->irq_contexts, msi);
+ if (gic) {
+ refcount_inc(&gic->refcount);
+ if (use_msi_bitmap) {
+ gic->bitmap_refs++;
+ set_bit(msi, gc->msi_bitmap);
+ }
+ goto out;
+ }
+
+ irq = pci_irq_vector(dev, msi);
+ if (irq == -EINVAL) {
+ irq_map = pci_msix_alloc_irq_at(dev, msi, NULL);
+ if (!irq_map.virq) {
+ err = irq_map.index;
+ dev_err(gc->dev,
+ "Failed to alloc irq_map msi %d err %d\n",
+ msi, err);
+ gic = NULL;
+ goto out;
+ }
+ irq = irq_map.virq;
+ msi = irq_map.index;
+ }
+
+ gic = kzalloc(sizeof(*gic), GFP_KERNEL);
+ if (!gic) {
+ if (irq_map.virq)
+ pci_msix_free_irq(dev, irq_map);
+ goto out;
+ }
+
+ gic->handler = mana_gd_process_eq_events;
+ gic->msi = msi;
+ gic->irq = irq;
+ INIT_LIST_HEAD(&gic->eq_list);
+ spin_lock_init(&gic->lock);
+
+ if (!gic->msi)
+ snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_hwc@pci:%s",
+ pci_name(dev));
+ else
+ snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_msi%d@pci:%s",
+ gic->msi, pci_name(dev));
+
+ err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
+ if (err) {
+ dev_err(gc->dev, "Failed to request irq %d %s\n",
+ irq, gic->name);
+ kfree(gic);
+ gic = NULL;
+ if (irq_map.virq)
+ pci_msix_free_irq(dev, irq_map);
+ goto out;
+ }
+
+ gic->dyn_msix = !!irq_map.virq;
+ refcount_set(&gic->refcount, 1);
+ gic->bitmap_refs = use_msi_bitmap ? 1 : 0;
+
+ err = xa_err(xa_store(&gc->irq_contexts, msi, gic, GFP_KERNEL));
+ if (err) {
+ dev_err(gc->dev, "Failed to store irq context for msi %d: %d\n",
+ msi, err);
+ free_irq(irq, gic);
+ kfree(gic);
+ gic = NULL;
+ if (irq_map.virq)
+ pci_msix_free_irq(dev, irq_map);
+ goto out;
+ }
+
+ if (use_msi_bitmap)
+ set_bit(msi, gc->msi_bitmap);
+
+out:
+ mutex_unlock(&gc->gic_mutex);
+ return gic;
+}
+EXPORT_SYMBOL_NS(mana_gd_get_gic, "NET_MANA");
+
int mana_gd_alloc_res_map(u32 res_avail, struct gdma_resource *r)
{
r->map = bitmap_zalloc(res_avail, GFP_KERNEL);
@@ -2076,6 +2234,7 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
goto release_region;
mutex_init(&gc->eq_test_event_mutex);
+ mutex_init(&gc->gic_mutex);
pci_set_drvdata(pdev, gc);
gc->bar0_pa = pci_resource_start(pdev, 0);
gc->bar0_size = pci_resource_len(pdev, 0);
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index ecd9949df213..4614a6a7271b 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -388,6 +388,11 @@ struct gdma_irq_context {
spinlock_t lock;
struct list_head eq_list;
char name[MANA_IRQ_NAME_SZ];
+ unsigned int msi;
+ unsigned int irq;
+ refcount_t refcount;
+ unsigned int bitmap_refs;
+ bool dyn_msix;
};
enum gdma_context_flags {
@@ -449,6 +454,9 @@ struct gdma_context {
unsigned long flags;
+ /* Protect access to GIC context */
+ struct mutex gic_mutex;
+
/* Indicate if this device is sharing MSI for EQs on MANA */
bool msi_sharing;
@@ -1021,6 +1029,9 @@ int mana_gd_resume(struct pci_dev *pdev);
bool mana_need_log(struct gdma_context *gc, int err);
+struct gdma_irq_context *mana_gd_get_gic(struct gdma_context *gc, bool use_msi_bitmap,
+ int *msi_requested);
+void mana_gd_put_gic(struct gdma_context *gc, bool use_msi_bitmap, int msi);
int mana_gd_query_device_cfg(struct gdma_context *gc, u32 proto_major_ver,
u32 proto_minor_ver, u32 proto_micro_ver,
u16 *max_num_vports, u8 *bm_hostmode);
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v5 4/6] net: mana: Use GIC functions to allocate global EQs
From: Long Li @ 2026-03-23 19:59 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
In-Reply-To: <20260323195952.1767304-1-longli@microsoft.com>
Replace the GDMA global interrupt setup code with the new GIC allocation
and release functions for managing interrupt contexts.
Signed-off-by: Long Li <longli@microsoft.com>
---
.../net/ethernet/microsoft/mana/gdma_main.c | 80 +++----------------
1 file changed, 10 insertions(+), 70 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 69a4427919f5..e7d5e589a217 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1860,30 +1860,13 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
* further used in irq_setup()
*/
for (i = 1; i <= nvec; i++) {
- gic = kzalloc_obj(*gic);
+ gic = mana_gd_get_gic(gc, false, &i);
if (!gic) {
err = -ENOMEM;
goto free_irq;
}
- gic->handler = mana_gd_process_eq_events;
- INIT_LIST_HEAD(&gic->eq_list);
- spin_lock_init(&gic->lock);
-
- snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s",
- i - 1, pci_name(pdev));
-
- /* one pci vector is already allocated for HWC */
- irqs[i - 1] = pci_irq_vector(pdev, i);
- if (irqs[i - 1] < 0) {
- err = irqs[i - 1];
- goto free_current_gic;
- }
-
- err = request_irq(irqs[i - 1], mana_gd_intr, 0, gic->name, gic);
- if (err)
- goto free_current_gic;
- xa_store(&gc->irq_contexts, i, gic, GFP_KERNEL);
+ irqs[i - 1] = gic->irq;
}
/*
@@ -1905,19 +1888,11 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
kfree(irqs);
return 0;
-free_current_gic:
- kfree(gic);
free_irq:
for (i -= 1; i > 0; i--) {
irq = pci_irq_vector(pdev, i);
- gic = xa_load(&gc->irq_contexts, i);
- if (WARN_ON(!gic))
- continue;
-
irq_update_affinity_hint(irq, NULL);
- free_irq(irq, gic);
- xa_erase(&gc->irq_contexts, i);
- kfree(gic);
+ mana_gd_put_gic(gc, false, i);
}
kfree(irqs);
return err;
@@ -1938,34 +1913,13 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
start_irqs = irqs;
for (i = 0; i < nvec; i++) {
- gic = kzalloc_obj(*gic);
+ gic = mana_gd_get_gic(gc, false, &i);
if (!gic) {
err = -ENOMEM;
goto free_irq;
}
- gic->handler = mana_gd_process_eq_events;
- INIT_LIST_HEAD(&gic->eq_list);
- spin_lock_init(&gic->lock);
-
- if (!i)
- snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_hwc@pci:%s",
- pci_name(pdev));
- else
- snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s",
- i - 1, pci_name(pdev));
-
- irqs[i] = pci_irq_vector(pdev, i);
- if (irqs[i] < 0) {
- err = irqs[i];
- goto free_current_gic;
- }
-
- err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic);
- if (err)
- goto free_current_gic;
-
- xa_store(&gc->irq_contexts, i, gic, GFP_KERNEL);
+ irqs[i] = gic->irq;
}
/* If number of IRQ is one extra than number of online CPUs,
@@ -1994,19 +1948,11 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
kfree(start_irqs);
return 0;
-free_current_gic:
- kfree(gic);
free_irq:
for (i -= 1; i >= 0; i--) {
irq = pci_irq_vector(pdev, i);
- gic = xa_load(&gc->irq_contexts, i);
- if (WARN_ON(!gic))
- continue;
-
irq_update_affinity_hint(irq, NULL);
- free_irq(irq, gic);
- xa_erase(&gc->irq_contexts, i);
- kfree(gic);
+ mana_gd_put_gic(gc, false, i);
}
kfree(start_irqs);
@@ -2081,26 +2027,20 @@ static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev)
static void mana_gd_remove_irqs(struct pci_dev *pdev)
{
struct gdma_context *gc = pci_get_drvdata(pdev);
- struct gdma_irq_context *gic;
int irq, i;
if (gc->max_num_msix < 1)
return;
for (i = 0; i < gc->max_num_msix; i++) {
- irq = pci_irq_vector(pdev, i);
- if (irq < 0)
- continue;
-
- gic = xa_load(&gc->irq_contexts, i);
- if (WARN_ON(!gic))
+ if (!xa_load(&gc->irq_contexts, i))
continue;
/* Need to clear the hint before free_irq */
+ irq = pci_irq_vector(pdev, i);
irq_update_affinity_hint(irq, NULL);
- free_irq(irq, gic);
- xa_erase(&gc->irq_contexts, i);
- kfree(gic);
+
+ mana_gd_put_gic(gc, false, i);
}
pci_free_irq_vectors(pdev);
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v5 5/6] net: mana: Allocate interrupt context for each EQ when creating vPort
From: Long Li @ 2026-03-23 19:59 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
In-Reply-To: <20260323195952.1767304-1-longli@microsoft.com>
Use GIC functions to create a dedicated interrupt context or acquire a
shared interrupt context for each EQ when setting up a vPort.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/net/ethernet/microsoft/mana/gdma_main.c | 2 +-
drivers/net/ethernet/microsoft/mana/mana_en.c | 17 ++++++++++++++++-
include/net/mana/gdma.h | 1 +
3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index e7d5e589a217..34b19e0740e1 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -826,7 +826,6 @@ static void mana_gd_deregister_irq(struct gdma_queue *queue)
}
spin_unlock_irqrestore(&gic->lock, flags);
- queue->eq.msix_index = INVALID_PCI_MSIX_INDEX;
synchronize_rcu();
}
@@ -941,6 +940,7 @@ static int mana_gd_create_eq(struct gdma_dev *gd,
out:
dev_err(dev, "Failed to create EQ: %d\n", err);
mana_gd_destroy_eq(gc, false, queue);
+ queue->eq.msix_index = INVALID_PCI_MSIX_INDEX;
return err;
}
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index 004d48bba8aa..b3c3a70f733f 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1606,6 +1606,7 @@ void mana_destroy_eq(struct mana_port_context *apc)
struct gdma_context *gc = ac->gdma_dev->gdma_context;
struct gdma_queue *eq;
int i;
+ unsigned int msi;
if (!apc->eqs)
return;
@@ -1618,7 +1619,9 @@ void mana_destroy_eq(struct mana_port_context *apc)
if (!eq)
continue;
+ msi = eq->eq.msix_index;
mana_gd_destroy_queue(gc, eq);
+ mana_gd_put_gic(gc, !gc->msi_sharing, msi);
}
kfree(apc->eqs);
@@ -1635,6 +1638,7 @@ static void mana_create_eq_debugfs(struct mana_port_context *apc, int i)
eq.mana_eq_debugfs = debugfs_create_dir(eqnum, apc->mana_eqs_debugfs);
debugfs_create_u32("head", 0400, eq.mana_eq_debugfs, &eq.eq->head);
debugfs_create_u32("tail", 0400, eq.mana_eq_debugfs, &eq.eq->tail);
+ debugfs_create_u32("irq", 0400, eq.mana_eq_debugfs, &eq.eq->eq.irq);
debugfs_create_file("eq_dump", 0400, eq.mana_eq_debugfs, eq.eq, &mana_dbg_q_fops);
}
@@ -1645,6 +1649,7 @@ int mana_create_eq(struct mana_port_context *apc)
struct gdma_queue_spec spec = {};
int err;
int i;
+ struct gdma_irq_context *gic;
WARN_ON(apc->eqs);
apc->eqs = kzalloc_objs(struct mana_eq, apc->num_queues);
@@ -1661,12 +1666,22 @@ int mana_create_eq(struct mana_port_context *apc)
apc->mana_eqs_debugfs = debugfs_create_dir("EQs", apc->mana_port_debugfs);
for (i = 0; i < apc->num_queues; i++) {
- spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
+ if (gc->msi_sharing)
+ spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
+
+ gic = mana_gd_get_gic(gc, !gc->msi_sharing, &spec.eq.msix_index);
+ if (!gic) {
+ err = -ENOMEM;
+ goto out;
+ }
+
err = mana_gd_create_mana_eq(gd, &spec, &apc->eqs[i].eq);
if (err) {
dev_err(gc->dev, "Failed to create EQ %d : %d\n", i, err);
+ mana_gd_put_gic(gc, !gc->msi_sharing, spec.eq.msix_index);
goto out;
}
+ apc->eqs[i].eq->eq.irq = gic->irq;
mana_create_eq_debugfs(apc, i);
}
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 4614a6a7271b..84f85b2299b4 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -342,6 +342,7 @@ struct gdma_queue {
void *context;
unsigned int msix_index;
+ unsigned int irq;
u32 log2_throttle_limit;
} eq;
--
2.43.0
^ permalink raw reply related
* [PATCH net-next v5 6/6] RDMA/mana_ib: Allocate interrupt contexts on EQs
From: Long Li @ 2026-03-23 19:59 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel
In-Reply-To: <20260323195952.1767304-1-longli@microsoft.com>
Use the GIC functions to allocate interrupt contexts for RDMA EQs. These
interrupt contexts may be shared with Ethernet EQs when MSI-X vectors
are limited.
The driver now supports allocating dedicated MSI-X for each EQ. Indicate
this capability through driver capability bits.
Signed-off-by: Long Li <longli@microsoft.com>
---
drivers/infiniband/hw/mana/main.c | 33 ++++++++++++++++++++++++++-----
include/net/mana/gdma.h | 7 +++++--
2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/hw/mana/main.c b/drivers/infiniband/hw/mana/main.c
index d51dd0ee85f4..0b74dd093b41 100644
--- a/drivers/infiniband/hw/mana/main.c
+++ b/drivers/infiniband/hw/mana/main.c
@@ -787,6 +787,7 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
{
struct gdma_context *gc = mdev_to_gc(mdev);
struct gdma_queue_spec spec = {};
+ struct gdma_irq_context *gic;
int err, i;
spec.type = GDMA_EQ;
@@ -797,9 +798,15 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
spec.eq.log2_throttle_limit = LOG2_EQ_THROTTLE;
spec.eq.msix_index = 0;
+ gic = mana_gd_get_gic(gc, false, &spec.eq.msix_index);
+ if (!gic)
+ return -ENOMEM;
+
err = mana_gd_create_mana_eq(mdev->gdma_dev, &spec, &mdev->fatal_err_eq);
- if (err)
+ if (err) {
+ mana_gd_put_gic(gc, false, 0);
return err;
+ }
mdev->eqs = kzalloc_objs(struct gdma_queue *,
mdev->ib_dev.num_comp_vectors);
@@ -810,31 +817,47 @@ int mana_ib_create_eqs(struct mana_ib_dev *mdev)
spec.eq.callback = NULL;
for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++) {
spec.eq.msix_index = (i + 1) % gc->num_msix_usable;
+
+ gic = mana_gd_get_gic(gc, false, &spec.eq.msix_index);
+ if (!gic) {
+ err = -ENOMEM;
+ goto destroy_eqs;
+ }
+
err = mana_gd_create_mana_eq(mdev->gdma_dev, &spec, &mdev->eqs[i]);
- if (err)
+ if (err) {
+ mana_gd_put_gic(gc, false, spec.eq.msix_index);
goto destroy_eqs;
+ }
}
return 0;
destroy_eqs:
- while (i-- > 0)
+ while (i-- > 0) {
mana_gd_destroy_queue(gc, mdev->eqs[i]);
+ mana_gd_put_gic(gc, false, (i + 1) % gc->num_msix_usable);
+ }
kfree(mdev->eqs);
destroy_fatal_eq:
mana_gd_destroy_queue(gc, mdev->fatal_err_eq);
+ mana_gd_put_gic(gc, false, 0);
return err;
}
void mana_ib_destroy_eqs(struct mana_ib_dev *mdev)
{
struct gdma_context *gc = mdev_to_gc(mdev);
- int i;
+ int i, msi;
mana_gd_destroy_queue(gc, mdev->fatal_err_eq);
+ mana_gd_put_gic(gc, false, 0);
- for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++)
+ for (i = 0; i < mdev->ib_dev.num_comp_vectors; i++) {
mana_gd_destroy_queue(gc, mdev->eqs[i]);
+ msi = (i + 1) % gc->num_msix_usable;
+ mana_gd_put_gic(gc, false, msi);
+ }
kfree(mdev->eqs);
}
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 84f85b2299b4..9faa072e779e 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -615,6 +615,7 @@ enum {
#define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECONFIG BIT(3)
#define GDMA_DRV_CAP_FLAG_1_GDMA_PAGES_4MB_1GB_2GB BIT(4)
#define GDMA_DRV_CAP_FLAG_1_VARIABLE_INDIRECTION_TABLE_SUPPORT BIT(5)
+#define GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE BIT(6)
/* Driver can handle holes (zeros) in the device list */
#define GDMA_DRV_CAP_FLAG_1_DEV_LIST_HOLES_SUP BIT(11)
@@ -631,7 +632,8 @@ enum {
/* Driver detects stalled send queues and recovers them */
#define GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY BIT(18)
-#define GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE BIT(6)
+/* Driver supports separate EQ/MSIs for each vPort */
+#define GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT BIT(19)
/* Driver supports linearizing the skb when num_sge exceeds hardware limit */
#define GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE BIT(20)
@@ -659,7 +661,8 @@ enum {
GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE | \
GDMA_DRV_CAP_FLAG_1_PROBE_RECOVERY | \
GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY | \
- GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY)
+ GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY | \
+ GDMA_DRV_CAP_FLAG_1_EQ_MSI_UNSHARE_MULTI_VPORT)
#define GDMA_DRV_CAP_FLAGS2 0
--
2.43.0
^ permalink raw reply related
* RE: [EXTERNAL] Re: [PATCH net-next v4 0/6] net: mana: Per-vPort EQ and MSI-X interrupt management
From: Long Li @ 2026-03-23 20:01 UTC (permalink / raw)
To: Simon Horman
Cc: Konstantin Taranov, Jakub Kicinski, David S . Miller, Paolo Abeni,
Eric Dumazet, Andrew Lunn, Jason Gunthorpe, Leon Romanovsky,
Haiyang Zhang, KY Srinivasan, Wei Liu, Dexuan Cui,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260323134337.GA69756@horms.kernel.org>
> -----Original Message-----
> From: Simon Horman <horms@kernel.org>
> Sent: Monday, March 23, 2026 6:44 AM
> To: Long Li <longli@microsoft.com>
> Cc: Konstantin Taranov <kotaranov@microsoft.com>; Jakub Kicinski
> <kuba@kernel.org>; David S . Miller <davem@davemloft.net>; Paolo Abeni
> <pabeni@redhat.com>; Eric Dumazet <edumazet@google.com>; Andrew Lunn
> <andrew+netdev@lunn.ch>; Jason Gunthorpe <jgg@ziepe.ca>; Leon Romanovsky
> <leon@kernel.org>; Haiyang Zhang <haiyangz@microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; Wei Liu <wei.liu@kernel.org>; Dexuan Cui
> <DECUI@microsoft.com>; netdev@vger.kernel.org; linux-rdma@vger.kernel.org;
> linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [EXTERNAL] Re: [PATCH net-next v4 0/6] net: mana: Per-vPort EQ and
> MSI-X interrupt management
>
> On Fri, Mar 20, 2026 at 04:54:13PM -0700, Long Li wrote:
> > This series adds per-vPort Event Queue (EQ) allocation and MSI-X
> > interrupt management for the MANA driver. Previously, all vPorts
> > shared a single set of EQs. This change enables dedicated EQs per
> > vPort with support for both dedicated and shared MSI-X vector allocation
> modes.
>
> ...
>
> Hi Long Li,
>
> Unfortunately this series did not apply to net-next cleanly.
> Which breaks our CI.
>
> Please rebase and repost.
>
> Thanks!
I have sent v5 of the patch set.
Please apply the patch set after this patch "net: mana: Set default number of queues to 16"
Thank you,
Long
^ permalink raw reply
* [PATCH rdma v2] RDMA/mana_ib: Disable RX steering on RSS QP destroy
From: Long Li @ 2026-03-23 20:10 UTC (permalink / raw)
To: Long Li, Konstantin Taranov, Jakub Kicinski, David S . Miller,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Jason Gunthorpe,
Leon Romanovsky, Haiyang Zhang, K . Y . Srinivasan, Wei Liu,
Dexuan Cui
Cc: Simon Horman, netdev, linux-rdma, linux-hyperv, linux-kernel,
stable
When an RSS QP is destroyed (e.g. DPDK exit), mana_ib_destroy_qp_rss()
destroys the RX WQ objects but does not disable vPort RX steering in
firmware. This leaves stale steering configuration that still points to
the destroyed RX objects.
If traffic continues to arrive (e.g. peer VM is still transmitting) and
the VF interface is subsequently brought up (mana_open), the firmware
may deliver completions using stale CQ IDs from the old RX objects.
These CQ IDs can be reused by the ethernet driver for new TX CQs,
causing RX completions to land on TX CQs:
WARNING: mana_poll_tx_cq+0x1b8/0x220 [mana] (is_sq == false)
WARNING: mana_gd_process_eq_events+0x209/0x290 (cq_table lookup fails)
Fix this by disabling vPort RX steering before destroying RX WQ objects.
Note that mana_fence_rqs() cannot be used here because the fence
completion is delivered on the CQ, which is polled by user-mode (e.g.
DPDK) and not visible to the kernel driver.
Refactor the disable logic into a shared mana_disable_vport_rx() in
mana_en, exported for use by mana_ib, replacing the duplicate code.
The ethernet driver's mana_dealloc_queues() is also updated to call
this common function.
Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
Cc: stable@vger.kernel.org
Signed-off-by: Long Li <longli@microsoft.com>
---
v2:
- Removed redundant ibdev_err on mana_disable_vport_rx() failure as
mana_cfg_vport_steering() already logs all failure scenarios.
- Added comment clarifying this is best effort.
drivers/infiniband/hw/mana/qp.c | 15 +++++++++++++++
drivers/net/ethernet/microsoft/mana/mana_en.c | 11 ++++++++++-
include/net/mana/mana.h | 1 +
3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mana/qp.c b/drivers/infiniband/hw/mana/qp.c
index 80cf4ade4b75..685e61e8436c 100644
--- a/drivers/infiniband/hw/mana/qp.c
+++ b/drivers/infiniband/hw/mana/qp.c
@@ -834,6 +834,21 @@ static int mana_ib_destroy_qp_rss(struct mana_ib_qp *qp,
ndev = mana_ib_get_netdev(qp->ibqp.device, qp->port);
mpc = netdev_priv(ndev);
+ /* Disable vPort RX steering before destroying RX WQ objects.
+ * Otherwise firmware still routes traffic to the destroyed queues,
+ * which can cause bogus completions on reused CQ IDs when the
+ * ethernet driver later creates new queues on mana_open().
+ *
+ * Unlike the ethernet teardown path, mana_fence_rqs() cannot be
+ * used here because the fence completion CQE is delivered on the
+ * CQ which is polled by userspace (e.g. DPDK), so there is no way
+ * for the kernel to wait for fence completion.
+ *
+ * This is best effort — if it fails there is not much we can do,
+ * and mana_cfg_vport_steering() already logs the error.
+ */
+ mana_disable_vport_rx(mpc);
+
for (i = 0; i < (1 << ind_tbl->log_ind_tbl_size); i++) {
ibwq = ind_tbl->ind_tbl[i];
wq = container_of(ibwq, struct mana_ib_wq, ibwq);
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index b3c3a70f733f..0816279f525e 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -2934,6 +2934,13 @@ static void mana_rss_table_init(struct mana_port_context *apc)
ethtool_rxfh_indir_default(i, apc->num_queues);
}
+int mana_disable_vport_rx(struct mana_port_context *apc)
+{
+ return mana_cfg_vport_steering(apc, TRI_STATE_FALSE, false, false,
+ false);
+}
+EXPORT_SYMBOL_NS(mana_disable_vport_rx, "NET_MANA");
+
int mana_config_rss(struct mana_port_context *apc, enum TRI_STATE rx,
bool update_hash, bool update_tab)
{
@@ -3339,10 +3346,12 @@ static int mana_dealloc_queues(struct net_device *ndev)
*/
apc->rss_state = TRI_STATE_FALSE;
- err = mana_config_rss(apc, TRI_STATE_FALSE, false, false);
+ err = mana_disable_vport_rx(apc);
if (err && mana_en_need_log(apc, err))
netdev_err(ndev, "Failed to disable vPort: %d\n", err);
+ mana_fence_rqs(apc);
+
/* Even in err case, still need to cleanup the vPort */
mana_destroy_rxqs(apc);
mana_destroy_txq(apc);
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 204c2b612a62..2634e9135eed 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -574,6 +574,7 @@ struct mana_port_context {
netdev_tx_t mana_start_xmit(struct sk_buff *skb, struct net_device *ndev);
int mana_config_rss(struct mana_port_context *ac, enum TRI_STATE rx,
bool update_hash, bool update_tab);
+int mana_disable_vport_rx(struct mana_port_context *apc);
int mana_alloc_queues(struct net_device *ndev);
int mana_attach(struct net_device *ndev);
--
2.43.0
^ permalink raw reply related
* Re: [PATCH ethtool-next] netlink: settings: add netlink support for RX CQE Coalescing params
From: patchwork-bot+netdevbpf @ 2026-03-23 22:20 UTC (permalink / raw)
To: Haiyang Zhang; +Cc: mkubecek, linux-hyperv, netdev, haiyangz, paulros
In-Reply-To: <20260320203159.1590235-1-haiyangz@linux.microsoft.com>
Hello:
This patch was applied to ethtool/ethtool.git (next)
by Michal Kubecek <mkubecek@suse.cz>:
On Fri, 20 Mar 2026 13:31:59 -0700 you wrote:
> From: Haiyang Zhang <haiyangz@microsoft.com>
>
> Add support to get/set RX CQE Coalescing parameters, including the max frames
> and time out value in nanoseconds.
>
> (Headers: dc3d720e12f6 "net: ethtool: add ethtool COALESCE_RX_CQE_FRAMES/NSECS")
>
> [...]
Here is the summary with links:
- [ethtool-next] netlink: settings: add netlink support for RX CQE Coalescing params
https://git.kernel.org/pub/scm/network/ethtool/ethtool.git/commit/?id=d35d87fbcda9
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH net-next v4] net: mana: Expose hardware diagnostic info via debugfs
From: Jakub Kicinski @ 2026-03-24 0:44 UTC (permalink / raw)
To: Erni Sri Satya Vennela
Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
edumazet, pabeni, kotaranov, horms, shradhagupta, shirazsaleem,
dipayanroy, yury.norov, kees, ssengar, gargaditya, linux-hyperv,
netdev, linux-kernel, linux-rdma
In-Reply-To: <20260319070926.1459515-1-ernis@linux.microsoft.com>
On Thu, 19 Mar 2026 00:09:13 -0700 Erni Sri Satya Vennela wrote:
> Add debugfs entries to expose hardware configuration and diagnostic
> information that aids in debugging driver initialization and runtime
> operations without adding noise to dmesg.
>
> The debugfs directory creation and removal for each PCI device is
> integrated into mana_gd_setup() and mana_gd_cleanup_device()
> respectively, so that all callers (probe, remove, suspend, resume,
> shutdown) share a single code path.
>
> Device-level entries (under /sys/kernel/debug/mana/<slot>/):
> - num_msix_usable, max_num_queues: Max resources from hardware
> - gdma_protocol_ver, pf_cap_flags1: VF version negotiation results
> - num_vports, bm_hostmode: Device configuration
>
> Per-vPort entries (under /sys/kernel/debug/mana/<slot>/vportN/):
> - port_handle: Hardware vPort handle
> - max_sq, max_rq: Max queues from vPort config
> - indir_table_sz: Indirection table size
> - steer_rx, steer_rss, steer_update_tab, steer_cqe_coalescing:
> Last applied steering configuration parameters
AI says:
> @@ -1918,15 +1930,23 @@ static int mana_gd_setup(struct pci_dev *pdev)
> struct gdma_context *gc = pci_get_drvdata(pdev);
> int err;
>
> + if (gc->is_pf)
> + gc->mana_pci_debugfs = debugfs_create_dir("0", mana_debugfs_root);
> + else
> + gc->mana_pci_debugfs = debugfs_create_dir(pci_slot_name(pdev->slot),
> + mana_debugfs_root);
If pdev->slot is NULL (which can happen for VFs in environments like generic
VFIO passthrough or nested KVM), will pci_slot_name(pdev->slot) cause a
NULL pointer dereference?
Also, could this naming scheme cause name collisions? If multiple PFs are
present, they would all try to use "0". Similarly, VFs across different
PCI domains or buses might share the same physical slot identifier, leading
to -EEXIST errors. Would it be safer to use the unique PCI BDF via
pci_name(pdev) instead?
> @@ -3141,6 +3149,24 @@ static int mana_init_port(struct net_device *ndev)
> eth_hw_addr_set(ndev, apc->mac_addr);
> sprintf(vport, "vport%d", port_idx);
> apc->mana_port_debugfs = debugfs_create_dir(vport, gc->mana_pci_debugfs);
> +
> + debugfs_create_u64("port_handle", 0400, apc->mana_port_debugfs,
> + &apc->port_handle);
When operations like changing the MTU or setting an XDP program trigger a
detach/attach cycle, mana_detach() invokes mana_cleanup_port_context(),
which recursively removes the apc->mana_port_debugfs directory.
During re-attachment, mana_attach() calls mana_init_port(), which
recreates the directory and the new files added in this patch. However, the
pre-existing current_speed file (created in mana_probe_port()) is not
recreated here.
Does this cause the current_speed file to be permanently lost after a
detach/attach cycle? Should the creation of current_speed be moved to
mana_init_port() so it survives the cycle?
--
pw-bot: cr
^ permalink raw reply
* [PATCH 00/12] treewide: Convert buses to use generic driver_override
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich
This is the follow-up of the driver_override generalization in [1], converting
the remaining 11 busses and removing the now-unused driver_set_override()
helper.
All of them (except AP, which has a different race condition) are prone to the
potential UAF described in [2], caused by accessing the driver_override field
from their corresponding match() callback.
In order to address this, the generalized driver_override field in struct device
is protected with a spinlock. The driver-core provides accessors, such as
device_match_driver_override(), device_has_driver_override() and
device_set_driver_override(), which all ensure proper locking internally.
Additionally, the driver-core provides a driver_override flag in struct
bus_type, which, once enabled, automatically registers generic sysfs callbacks,
allowing userspace to modify the driver_override field.
SPI and AP are a bit special; both print "\n" when driver_override is not set,
whereas all other buses (and thus the driver-core) produce "(null)\n" in this
case.
Hence, SPI and AP do not take advantage of the driver_override flag in struct
bus_type; AP additionally maintains a counter in its custom sysfs store().
Technically, we could support a custom fallback string when driver_override is
unset in struct bus_type, but only SPI would benefit from this, since AP has
additional custom logic in store() anyways.
(I'm not sure if there are userspace programs that strictly rely on this;
driverctl seems to check for both, but I rather not break some userspace tool
I'm not aware of. :)
This series is based on v7.0-rc5 with no additional dependencies, hence those
patches can be picked up by subsystems individually.
[1] https://lore.kernel.org/driver-core/20260303115720.48783-1-dakr@kernel.org/
[2] https://bugzilla.kernel.org/show_bug.cgi?id=220789
[3] https://gitlab.com/driverctl/driverctl/-/blob/0.121/driverctl?ref_type=tags#L99
Danilo Krummrich (12):
amba: use generic driver_override infrastructure
bus: fsl-mc: use generic driver_override infrastructure
cdx: use generic driver_override infrastructure
hv: vmbus: use generic driver_override infrastructure
PCI: use generic driver_override infrastructure
platform/wmi: use generic driver_override infrastructure
rpmsg: use generic driver_override infrastructure
vdpa: use generic driver_override infrastructure
s390/cio: use generic driver_override infrastructure
s390/ap: use generic driver_override infrastructure
spi: use generic driver_override infrastructure
driver core: remove driver_set_override()
drivers/amba/bus.c | 37 +++------------
drivers/base/driver.c | 75 ------------------------------
drivers/bus/fsl-mc/fsl-mc-bus.c | 43 +++--------------
drivers/cdx/cdx.c | 40 ++--------------
drivers/hv/vmbus_drv.c | 36 ++------------
drivers/pci/pci-driver.c | 11 +++--
drivers/pci/pci-sysfs.c | 28 -----------
drivers/pci/probe.c | 1 -
drivers/platform/wmi/core.c | 36 ++------------
drivers/rpmsg/qcom_glink_native.c | 2 -
drivers/rpmsg/rpmsg_core.c | 43 +++--------------
drivers/rpmsg/virtio_rpmsg_bus.c | 1 -
drivers/s390/cio/cio.h | 5 --
drivers/s390/cio/css.c | 34 ++------------
drivers/s390/crypto/ap_bus.c | 34 +++++++-------
drivers/s390/crypto/ap_bus.h | 1 -
drivers/s390/crypto/ap_queue.c | 24 +++-------
drivers/spi/spi.c | 19 +++-----
drivers/vdpa/vdpa.c | 48 ++-----------------
drivers/vfio/fsl-mc/vfio_fsl_mc.c | 4 +-
drivers/vfio/pci/vfio_pci_core.c | 5 +-
drivers/xen/xen-pciback/pci_stub.c | 6 ++-
include/linux/amba/bus.h | 5 --
include/linux/cdx/cdx_bus.h | 4 --
include/linux/device/driver.h | 2 -
include/linux/fsl/mc.h | 4 --
include/linux/hyperv.h | 5 --
include/linux/pci.h | 6 ---
include/linux/rpmsg.h | 4 --
include/linux/spi/spi.h | 5 --
include/linux/vdpa.h | 4 --
include/linux/wmi.h | 4 --
32 files changed, 88 insertions(+), 488 deletions(-)
base-commit: c369299895a591d96745d6492d4888259b004a9e
--
2.53.0
^ permalink raw reply
* [PATCH 01/12] amba: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 3cf385713460 ("ARM: 8256/1: driver coamba: add device binding path 'driver_override'")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/amba/bus.c | 37 ++++++-------------------------------
include/linux/amba/bus.h | 5 -----
2 files changed, 6 insertions(+), 36 deletions(-)
diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 6d479caf89cb..d721d64a9858 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -82,33 +82,6 @@ static void amba_put_disable_pclk(struct amba_device *pcdev)
}
-static ssize_t driver_override_show(struct device *_dev,
- struct device_attribute *attr, char *buf)
-{
- struct amba_device *dev = to_amba_device(_dev);
- ssize_t len;
-
- device_lock(_dev);
- len = sprintf(buf, "%s\n", dev->driver_override);
- device_unlock(_dev);
- return len;
-}
-
-static ssize_t driver_override_store(struct device *_dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct amba_device *dev = to_amba_device(_dev);
- int ret;
-
- ret = driver_set_override(_dev, &dev->driver_override, buf, count);
- if (ret)
- return ret;
-
- return count;
-}
-static DEVICE_ATTR_RW(driver_override);
-
#define amba_attr_func(name,fmt,arg...) \
static ssize_t name##_show(struct device *_dev, \
struct device_attribute *attr, char *buf) \
@@ -126,7 +99,6 @@ amba_attr_func(resource, "\t%016llx\t%016llx\t%016lx\n",
static struct attribute *amba_dev_attrs[] = {
&dev_attr_id.attr,
&dev_attr_resource.attr,
- &dev_attr_driver_override.attr,
NULL,
};
ATTRIBUTE_GROUPS(amba_dev);
@@ -209,10 +181,11 @@ static int amba_match(struct device *dev, const struct device_driver *drv)
{
struct amba_device *pcdev = to_amba_device(dev);
const struct amba_driver *pcdrv = to_amba_driver(drv);
+ int ret;
mutex_lock(&pcdev->periphid_lock);
if (!pcdev->periphid) {
- int ret = amba_read_periphid(pcdev);
+ ret = amba_read_periphid(pcdev);
/*
* Returning any error other than -EPROBE_DEFER from bus match
@@ -230,8 +203,9 @@ static int amba_match(struct device *dev, const struct device_driver *drv)
mutex_unlock(&pcdev->periphid_lock);
/* When driver_override is set, only bind to the matching driver */
- if (pcdev->driver_override)
- return !strcmp(pcdev->driver_override, drv->name);
+ ret = device_match_driver_override(dev, drv);
+ if (ret >= 0)
+ return ret;
return amba_lookup(pcdrv->id_table, pcdev) != NULL;
}
@@ -436,6 +410,7 @@ static const struct dev_pm_ops amba_pm = {
const struct bus_type amba_bustype = {
.name = "amba",
.dev_groups = amba_dev_groups,
+ .driver_override = true,
.match = amba_match,
.uevent = amba_uevent,
.probe = amba_probe,
diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h
index 9946276aff73..6c54d5c0d21f 100644
--- a/include/linux/amba/bus.h
+++ b/include/linux/amba/bus.h
@@ -71,11 +71,6 @@ struct amba_device {
unsigned int cid;
struct amba_cs_uci_id uci;
unsigned int irq[AMBA_NR_IRQS];
- /*
- * Driver name to force a match. Do not set directly, because core
- * frees it. Use driver_set_override() to set or clear it.
- */
- const char *driver_override;
};
struct amba_driver {
--
2.53.0
^ permalink raw reply related
* [PATCH 02/12] bus: fsl-mc: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/bus/fsl-mc/fsl-mc-bus.c | 43 +++++--------------------------
drivers/vfio/fsl-mc/vfio_fsl_mc.c | 4 +--
include/linux/fsl/mc.h | 4 ---
3 files changed, 8 insertions(+), 43 deletions(-)
diff --git a/drivers/bus/fsl-mc/fsl-mc-bus.c b/drivers/bus/fsl-mc/fsl-mc-bus.c
index c117745cf206..221146e4860b 100644
--- a/drivers/bus/fsl-mc/fsl-mc-bus.c
+++ b/drivers/bus/fsl-mc/fsl-mc-bus.c
@@ -86,12 +86,16 @@ static int fsl_mc_bus_match(struct device *dev, const struct device_driver *drv)
struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev);
const struct fsl_mc_driver *mc_drv = to_fsl_mc_driver(drv);
bool found = false;
+ int ret;
/* When driver_override is set, only bind to the matching driver */
- if (mc_dev->driver_override) {
- found = !strcmp(mc_dev->driver_override, mc_drv->driver.name);
+ ret = device_match_driver_override(dev, drv);
+ if (ret > 0) {
+ found = true;
goto out;
}
+ if (ret == 0)
+ goto out;
if (!mc_drv->match_id_table)
goto out;
@@ -210,39 +214,8 @@ static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(modalias);
-static ssize_t driver_override_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev);
- int ret;
-
- if (WARN_ON(dev->bus != &fsl_mc_bus_type))
- return -EINVAL;
-
- ret = driver_set_override(dev, &mc_dev->driver_override, buf, count);
- if (ret)
- return ret;
-
- return count;
-}
-
-static ssize_t driver_override_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev);
- ssize_t len;
-
- device_lock(dev);
- len = sysfs_emit(buf, "%s\n", mc_dev->driver_override);
- device_unlock(dev);
- return len;
-}
-static DEVICE_ATTR_RW(driver_override);
-
static struct attribute *fsl_mc_dev_attrs[] = {
&dev_attr_modalias.attr,
- &dev_attr_driver_override.attr,
NULL,
};
@@ -345,6 +318,7 @@ ATTRIBUTE_GROUPS(fsl_mc_bus);
const struct bus_type fsl_mc_bus_type = {
.name = "fsl-mc",
+ .driver_override = true,
.match = fsl_mc_bus_match,
.uevent = fsl_mc_bus_uevent,
.probe = fsl_mc_probe,
@@ -910,9 +884,6 @@ static struct notifier_block fsl_mc_nb;
*/
void fsl_mc_device_remove(struct fsl_mc_device *mc_dev)
{
- kfree(mc_dev->driver_override);
- mc_dev->driver_override = NULL;
-
/*
* The device-specific remove callback will get invoked by device_del()
*/
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index 462fae1aa538..b4c3958201b2 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -424,9 +424,7 @@ static int vfio_fsl_mc_bus_notifier(struct notifier_block *nb,
if (action == BUS_NOTIFY_ADD_DEVICE &&
vdev->mc_dev == mc_cont) {
- mc_dev->driver_override = kasprintf(GFP_KERNEL, "%s",
- vfio_fsl_mc_ops.name);
- if (!mc_dev->driver_override)
+ if (device_set_driver_override(dev, vfio_fsl_mc_ops.name))
dev_warn(dev, "VFIO_FSL_MC: Setting driver override for device in dprc %s failed\n",
dev_name(&mc_cont->dev));
else
diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h
index 897d6211c163..1da63f2d7040 100644
--- a/include/linux/fsl/mc.h
+++ b/include/linux/fsl/mc.h
@@ -178,9 +178,6 @@ struct fsl_mc_obj_desc {
* @regions: pointer to array of MMIO region entries
* @irqs: pointer to array of pointers to interrupts allocated to this device
* @resource: generic resource associated with this MC object device, if any.
- * @driver_override: driver name to force a match; do not set directly,
- * because core frees it; use driver_set_override() to
- * set or clear it.
*
* Generic device object for MC object devices that are "attached" to a
* MC bus.
@@ -214,7 +211,6 @@ struct fsl_mc_device {
struct fsl_mc_device_irq **irqs;
struct fsl_mc_resource *resource;
struct device_link *consumer_link;
- const char *driver_override;
};
#define to_fsl_mc_device(_dev) \
--
2.53.0
^ permalink raw reply related
* [PATCH 03/12] cdx: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 2959ab247061 ("cdx: add the cdx bus driver")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/cdx/cdx.c | 40 +++++--------------------------------
include/linux/cdx/cdx_bus.h | 4 ----
2 files changed, 5 insertions(+), 39 deletions(-)
diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
index 9196dc50a48d..d3d230247262 100644
--- a/drivers/cdx/cdx.c
+++ b/drivers/cdx/cdx.c
@@ -156,8 +156,6 @@ static int cdx_unregister_device(struct device *dev,
} else {
cdx_destroy_res_attr(cdx_dev, MAX_CDX_DEV_RESOURCES);
debugfs_remove_recursive(cdx_dev->debugfs_dir);
- kfree(cdx_dev->driver_override);
- cdx_dev->driver_override = NULL;
}
/*
@@ -268,6 +266,7 @@ static int cdx_bus_match(struct device *dev, const struct device_driver *drv)
const struct cdx_driver *cdx_drv = to_cdx_driver(drv);
const struct cdx_device_id *found_id = NULL;
const struct cdx_device_id *ids;
+ int ret;
if (cdx_dev->is_bus)
return false;
@@ -275,7 +274,8 @@ static int cdx_bus_match(struct device *dev, const struct device_driver *drv)
ids = cdx_drv->match_id_table;
/* When driver_override is set, only bind to the matching driver */
- if (cdx_dev->driver_override && strcmp(cdx_dev->driver_override, drv->name))
+ ret = device_match_driver_override(dev, drv);
+ if (ret == 0)
return false;
found_id = cdx_match_id(ids, cdx_dev);
@@ -289,7 +289,7 @@ static int cdx_bus_match(struct device *dev, const struct device_driver *drv)
*/
if (!found_id->override_only)
return true;
- if (cdx_dev->driver_override)
+ if (ret > 0)
return true;
ids = found_id + 1;
@@ -453,36 +453,6 @@ static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(modalias);
-static ssize_t driver_override_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct cdx_device *cdx_dev = to_cdx_device(dev);
- int ret;
-
- if (WARN_ON(dev->bus != &cdx_bus_type))
- return -EINVAL;
-
- ret = driver_set_override(dev, &cdx_dev->driver_override, buf, count);
- if (ret)
- return ret;
-
- return count;
-}
-
-static ssize_t driver_override_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct cdx_device *cdx_dev = to_cdx_device(dev);
- ssize_t len;
-
- device_lock(dev);
- len = sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
- device_unlock(dev);
- return len;
-}
-static DEVICE_ATTR_RW(driver_override);
-
static ssize_t enable_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
@@ -552,7 +522,6 @@ static struct attribute *cdx_dev_attrs[] = {
&dev_attr_class.attr,
&dev_attr_revision.attr,
&dev_attr_modalias.attr,
- &dev_attr_driver_override.attr,
NULL,
};
@@ -646,6 +615,7 @@ ATTRIBUTE_GROUPS(cdx_bus);
const struct bus_type cdx_bus_type = {
.name = "cdx",
+ .driver_override = true,
.match = cdx_bus_match,
.probe = cdx_probe,
.remove = cdx_remove,
diff --git a/include/linux/cdx/cdx_bus.h b/include/linux/cdx/cdx_bus.h
index b1ba97f6c9ad..f54770f110bc 100644
--- a/include/linux/cdx/cdx_bus.h
+++ b/include/linux/cdx/cdx_bus.h
@@ -137,9 +137,6 @@ struct cdx_controller {
* @enabled: is this bus enabled
* @msi_dev_id: MSI Device ID associated with CDX device
* @num_msi: Number of MSI's supported by the device
- * @driver_override: driver name to force a match; do not set directly,
- * because core frees it; use driver_set_override() to
- * set or clear it.
* @irqchip_lock: lock to synchronize irq/msi configuration
* @msi_write_pending: MSI write pending for this device
*/
@@ -165,7 +162,6 @@ struct cdx_device {
bool enabled;
u32 msi_dev_id;
u32 num_msi;
- const char *driver_override;
struct mutex irqchip_lock;
bool msi_write_pending;
};
--
2.53.0
^ permalink raw reply related
* [PATCH 04/12] hv: vmbus: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: d765edbb301c ("vmbus: add driver_override support")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/hv/vmbus_drv.c | 36 +++++-------------------------------
include/linux/hyperv.h | 5 -----
2 files changed, 5 insertions(+), 36 deletions(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index bc4fc1951ae1..bc8dfd136f3c 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -541,34 +541,6 @@ static ssize_t device_show(struct device *dev,
}
static DEVICE_ATTR_RO(device);
-static ssize_t driver_override_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct hv_device *hv_dev = device_to_hv_device(dev);
- int ret;
-
- ret = driver_set_override(dev, &hv_dev->driver_override, buf, count);
- if (ret)
- return ret;
-
- return count;
-}
-
-static ssize_t driver_override_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct hv_device *hv_dev = device_to_hv_device(dev);
- ssize_t len;
-
- device_lock(dev);
- len = sysfs_emit(buf, "%s\n", hv_dev->driver_override);
- device_unlock(dev);
-
- return len;
-}
-static DEVICE_ATTR_RW(driver_override);
-
/* Set up per device attributes in /sys/bus/vmbus/devices/<bus device> */
static struct attribute *vmbus_dev_attrs[] = {
&dev_attr_id.attr,
@@ -599,7 +571,6 @@ static struct attribute *vmbus_dev_attrs[] = {
&dev_attr_channel_vp_mapping.attr,
&dev_attr_vendor.attr,
&dev_attr_device.attr,
- &dev_attr_driver_override.attr,
NULL,
};
@@ -711,9 +682,11 @@ static const struct hv_vmbus_device_id *hv_vmbus_get_id(const struct hv_driver *
{
const guid_t *guid = &dev->dev_type;
const struct hv_vmbus_device_id *id;
+ int ret;
/* When driver_override is set, only bind to the matching driver */
- if (dev->driver_override && strcmp(dev->driver_override, drv->name))
+ ret = device_match_driver_override(&dev->device, &drv->driver);
+ if (ret == 0)
return NULL;
/* Look at the dynamic ids first, before the static ones */
@@ -722,7 +695,7 @@ static const struct hv_vmbus_device_id *hv_vmbus_get_id(const struct hv_driver *
id = hv_vmbus_dev_match(drv->id_table, guid);
/* driver_override will always match, send a dummy id */
- if (!id && dev->driver_override)
+ if (!id && ret > 0)
id = &vmbus_device_null;
return id;
@@ -1024,6 +997,7 @@ static const struct dev_pm_ops vmbus_pm = {
/* The one and only one */
static const struct bus_type hv_bus = {
.name = "vmbus",
+ .driver_override = true,
.match = vmbus_match,
.shutdown = vmbus_shutdown,
.remove = vmbus_remove,
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index dfc516c1c719..bf689d07d750 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1272,11 +1272,6 @@ struct hv_device {
u16 device_id;
struct device device;
- /*
- * Driver name to force a match. Do not set directly, because core
- * frees it. Use driver_set_override() to set or clear it.
- */
- const char *driver_override;
struct vmbus_channel *channel;
struct kset *channels_kset;
--
2.53.0
^ permalink raw reply related
* [PATCH 05/12] PCI: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 782a985d7af2 ("PCI: Introduce new device binding path using pci_dev.driver_override")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/pci/pci-driver.c | 11 +++++++----
drivers/pci/pci-sysfs.c | 28 ----------------------------
drivers/pci/probe.c | 1 -
drivers/vfio/pci/vfio_pci_core.c | 5 ++---
drivers/xen/xen-pciback/pci_stub.c | 6 ++++--
include/linux/pci.h | 6 ------
6 files changed, 13 insertions(+), 44 deletions(-)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index dd9075403987..d10ece0889f0 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -138,9 +138,11 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
{
struct pci_dynid *dynid;
const struct pci_device_id *found_id = NULL, *ids;
+ int ret;
/* When driver_override is set, only bind to the matching driver */
- if (dev->driver_override && strcmp(dev->driver_override, drv->name))
+ ret = device_match_driver_override(&dev->dev, &drv->driver);
+ if (ret == 0)
return NULL;
/* Look at the dynamic ids first, before the static ones */
@@ -164,7 +166,7 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
* matching.
*/
if (found_id->override_only) {
- if (dev->driver_override)
+ if (ret > 0)
return found_id;
} else {
return found_id;
@@ -172,7 +174,7 @@ static const struct pci_device_id *pci_match_device(struct pci_driver *drv,
}
/* driver_override will always match, send a dummy id */
- if (dev->driver_override)
+ if (ret > 0)
return &pci_device_id_any;
return NULL;
}
@@ -452,7 +454,7 @@ static int __pci_device_probe(struct pci_driver *drv, struct pci_dev *pci_dev)
static inline bool pci_device_can_probe(struct pci_dev *pdev)
{
return (!pdev->is_virtfn || pdev->physfn->sriov->drivers_autoprobe ||
- pdev->driver_override);
+ device_has_driver_override(&pdev->dev));
}
#else
static inline bool pci_device_can_probe(struct pci_dev *pdev)
@@ -1722,6 +1724,7 @@ static const struct cpumask *pci_device_irq_get_affinity(struct device *dev,
const struct bus_type pci_bus_type = {
.name = "pci",
+ .driver_override = true,
.match = pci_bus_match,
.uevent = pci_uevent,
.probe = pci_device_probe,
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 16eaaf749ba9..a9006cf4e9c8 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -615,33 +615,6 @@ static ssize_t devspec_show(struct device *dev,
static DEVICE_ATTR_RO(devspec);
#endif
-static ssize_t driver_override_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct pci_dev *pdev = to_pci_dev(dev);
- int ret;
-
- ret = driver_set_override(dev, &pdev->driver_override, buf, count);
- if (ret)
- return ret;
-
- return count;
-}
-
-static ssize_t driver_override_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct pci_dev *pdev = to_pci_dev(dev);
- ssize_t len;
-
- device_lock(dev);
- len = sysfs_emit(buf, "%s\n", pdev->driver_override);
- device_unlock(dev);
- return len;
-}
-static DEVICE_ATTR_RW(driver_override);
-
static struct attribute *pci_dev_attrs[] = {
&dev_attr_power_state.attr,
&dev_attr_resource.attr,
@@ -669,7 +642,6 @@ static struct attribute *pci_dev_attrs[] = {
#ifdef CONFIG_OF
&dev_attr_devspec.attr,
#endif
- &dev_attr_driver_override.attr,
&dev_attr_ari_enabled.attr,
NULL,
};
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bccc7a4bdd79..b4707640e102 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2488,7 +2488,6 @@ static void pci_release_dev(struct device *dev)
pci_release_of_node(pci_dev);
pcibios_release_device(pci_dev);
pci_bus_put(pci_dev->bus);
- kfree(pci_dev->driver_override);
bitmap_free(pci_dev->dma_alias_mask);
dev_dbg(dev, "device released\n");
kfree(pci_dev);
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d43745fe4c84..460852f79f29 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1987,9 +1987,8 @@ static int vfio_pci_bus_notifier(struct notifier_block *nb,
pdev->is_virtfn && physfn == vdev->pdev) {
pci_info(vdev->pdev, "Captured SR-IOV VF %s driver_override\n",
pci_name(pdev));
- pdev->driver_override = kasprintf(GFP_KERNEL, "%s",
- vdev->vdev.ops->name);
- WARN_ON(!pdev->driver_override);
+ WARN_ON(device_set_driver_override(&pdev->dev,
+ vdev->vdev.ops->name));
} else if (action == BUS_NOTIFY_BOUND_DRIVER &&
pdev->is_virtfn && physfn == vdev->pdev) {
struct pci_driver *drv = pci_dev_driver(pdev);
diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index e4b27aecbf05..79a2b5dfd694 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -598,6 +598,8 @@ static int pcistub_seize(struct pci_dev *dev,
return err;
}
+static struct pci_driver xen_pcibk_pci_driver;
+
/* Called when 'bind'. This means we must _NOT_ call pci_reset_function or
* other functions that take the sysfs lock. */
static int pcistub_probe(struct pci_dev *dev, const struct pci_device_id *id)
@@ -609,8 +611,8 @@ static int pcistub_probe(struct pci_dev *dev, const struct pci_device_id *id)
match = pcistub_match(dev);
- if ((dev->driver_override &&
- !strcmp(dev->driver_override, PCISTUB_DRIVER_NAME)) ||
+ if (device_match_driver_override(&dev->dev,
+ &xen_pcibk_pci_driver.driver) > 0 ||
match) {
if (dev->hdr_type != PCI_HEADER_TYPE_NORMAL
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1c270f1d5123..57e9463e4347 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -575,12 +575,6 @@ struct pci_dev {
u8 supported_speeds; /* Supported Link Speeds Vector */
phys_addr_t rom; /* Physical address if not from BAR */
size_t romlen; /* Length if not from BAR */
- /*
- * Driver name to force a match. Do not set directly, because core
- * frees it. Use driver_set_override() to set or clear it.
- */
- const char *driver_override;
-
unsigned long priv_flags; /* Private flags for the PCI driver */
/* These methods index pci_reset_fn_methods[] */
--
2.53.0
^ permalink raw reply related
* [PATCH 06/12] platform/wmi: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 12046f8c77e0 ("platform/x86: wmi: Add driver_override support")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/platform/wmi/core.c | 36 +++++-------------------------------
include/linux/wmi.h | 4 ----
2 files changed, 5 insertions(+), 35 deletions(-)
diff --git a/drivers/platform/wmi/core.c b/drivers/platform/wmi/core.c
index b8e6b9a421c6..750e3619724e 100644
--- a/drivers/platform/wmi/core.c
+++ b/drivers/platform/wmi/core.c
@@ -842,39 +842,11 @@ static ssize_t expensive_show(struct device *dev,
}
static DEVICE_ATTR_RO(expensive);
-static ssize_t driver_override_show(struct device *dev, struct device_attribute *attr,
- char *buf)
-{
- struct wmi_device *wdev = to_wmi_device(dev);
- ssize_t ret;
-
- device_lock(dev);
- ret = sysfs_emit(buf, "%s\n", wdev->driver_override);
- device_unlock(dev);
-
- return ret;
-}
-
-static ssize_t driver_override_store(struct device *dev, struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct wmi_device *wdev = to_wmi_device(dev);
- int ret;
-
- ret = driver_set_override(dev, &wdev->driver_override, buf, count);
- if (ret < 0)
- return ret;
-
- return count;
-}
-static DEVICE_ATTR_RW(driver_override);
-
static struct attribute *wmi_attrs[] = {
&dev_attr_modalias.attr,
&dev_attr_guid.attr,
&dev_attr_instance_count.attr,
&dev_attr_expensive.attr,
- &dev_attr_driver_override.attr,
NULL
};
ATTRIBUTE_GROUPS(wmi);
@@ -943,7 +915,6 @@ static void wmi_dev_release(struct device *dev)
{
struct wmi_block *wblock = dev_to_wblock(dev);
- kfree(wblock->dev.driver_override);
kfree(wblock);
}
@@ -952,10 +923,12 @@ static int wmi_dev_match(struct device *dev, const struct device_driver *driver)
const struct wmi_driver *wmi_driver = to_wmi_driver(driver);
struct wmi_block *wblock = dev_to_wblock(dev);
const struct wmi_device_id *id = wmi_driver->id_table;
+ int ret;
/* When driver_override is set, only bind to the matching driver */
- if (wblock->dev.driver_override)
- return !strcmp(wblock->dev.driver_override, driver->name);
+ ret = device_match_driver_override(dev, driver);
+ if (ret >= 0)
+ return ret;
if (id == NULL)
return 0;
@@ -1076,6 +1049,7 @@ static struct class wmi_bus_class = {
static const struct bus_type wmi_bus_type = {
.name = "wmi",
.dev_groups = wmi_groups,
+ .driver_override = true,
.match = wmi_dev_match,
.uevent = wmi_dev_uevent,
.probe = wmi_dev_probe,
diff --git a/include/linux/wmi.h b/include/linux/wmi.h
index 75cb0c7cfe57..14fb644e1701 100644
--- a/include/linux/wmi.h
+++ b/include/linux/wmi.h
@@ -18,16 +18,12 @@
* struct wmi_device - WMI device structure
* @dev: Device associated with this WMI device
* @setable: True for devices implementing the Set Control Method
- * @driver_override: Driver name to force a match; do not set directly,
- * because core frees it; use driver_set_override() to
- * set or clear it.
*
* This represents WMI devices discovered by the WMI driver core.
*/
struct wmi_device {
struct device dev;
bool setable;
- const char *driver_override;
};
/**
--
2.53.0
^ permalink raw reply related
* [PATCH 07/12] rpmsg: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: e95060478244 ("rpmsg: Introduce a driver override mechanism")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/rpmsg/qcom_glink_native.c | 2 --
drivers/rpmsg/rpmsg_core.c | 43 +++++--------------------------
drivers/rpmsg/virtio_rpmsg_bus.c | 1 -
include/linux/rpmsg.h | 4 ---
4 files changed, 7 insertions(+), 43 deletions(-)
diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c
index 9ef17c2e45b0..e9d1b2082477 100644
--- a/drivers/rpmsg/qcom_glink_native.c
+++ b/drivers/rpmsg/qcom_glink_native.c
@@ -1623,7 +1623,6 @@ static void qcom_glink_rpdev_release(struct device *dev)
{
struct rpmsg_device *rpdev = to_rpmsg_device(dev);
- kfree(rpdev->driver_override);
kfree(rpdev);
}
@@ -1859,7 +1858,6 @@ static void qcom_glink_device_release(struct device *dev)
/* Release qcom_glink_alloc_channel() reference */
kref_put(&channel->refcount, qcom_glink_channel_release);
- kfree(rpdev->driver_override);
kfree(rpdev);
}
diff --git a/drivers/rpmsg/rpmsg_core.c b/drivers/rpmsg/rpmsg_core.c
index 96964745065b..2b9f6d5a9a4f 100644
--- a/drivers/rpmsg/rpmsg_core.c
+++ b/drivers/rpmsg/rpmsg_core.c
@@ -358,33 +358,6 @@ rpmsg_show_attr(src, src, "0x%x\n");
rpmsg_show_attr(dst, dst, "0x%x\n");
rpmsg_show_attr(announce, announce ? "true" : "false", "%s\n");
-static ssize_t driver_override_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct rpmsg_device *rpdev = to_rpmsg_device(dev);
- int ret;
-
- ret = driver_set_override(dev, &rpdev->driver_override, buf, count);
- if (ret)
- return ret;
-
- return count;
-}
-
-static ssize_t driver_override_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct rpmsg_device *rpdev = to_rpmsg_device(dev);
- ssize_t len;
-
- device_lock(dev);
- len = sysfs_emit(buf, "%s\n", rpdev->driver_override);
- device_unlock(dev);
- return len;
-}
-static DEVICE_ATTR_RW(driver_override);
-
static ssize_t modalias_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -405,7 +378,6 @@ static struct attribute *rpmsg_dev_attrs[] = {
&dev_attr_dst.attr,
&dev_attr_src.attr,
&dev_attr_announce.attr,
- &dev_attr_driver_override.attr,
NULL,
};
ATTRIBUTE_GROUPS(rpmsg_dev);
@@ -424,9 +396,11 @@ static int rpmsg_dev_match(struct device *dev, const struct device_driver *drv)
const struct rpmsg_driver *rpdrv = to_rpmsg_driver(drv);
const struct rpmsg_device_id *ids = rpdrv->id_table;
unsigned int i;
+ int ret;
- if (rpdev->driver_override)
- return !strcmp(rpdev->driver_override, drv->name);
+ ret = device_match_driver_override(dev, drv);
+ if (ret >= 0)
+ return ret;
if (ids)
for (i = 0; ids[i].name[0]; i++)
@@ -535,6 +509,7 @@ static const struct bus_type rpmsg_bus = {
.name = "rpmsg",
.match = rpmsg_dev_match,
.dev_groups = rpmsg_dev_groups,
+ .driver_override = true,
.uevent = rpmsg_uevent,
.probe = rpmsg_dev_probe,
.remove = rpmsg_dev_remove,
@@ -560,11 +535,9 @@ int rpmsg_register_device_override(struct rpmsg_device *rpdev,
device_initialize(dev);
if (driver_override) {
- ret = driver_set_override(dev, &rpdev->driver_override,
- driver_override,
- strlen(driver_override));
+ ret = device_set_driver_override(dev, driver_override);
if (ret) {
- dev_err(dev, "device_set_override failed: %d\n", ret);
+ dev_err(dev, "device_set_driver_override() failed: %d\n", ret);
put_device(dev);
return ret;
}
@@ -573,8 +546,6 @@ int rpmsg_register_device_override(struct rpmsg_device *rpdev,
ret = device_add(dev);
if (ret) {
dev_err(dev, "device_add failed: %d\n", ret);
- kfree(rpdev->driver_override);
- rpdev->driver_override = NULL;
put_device(dev);
}
diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index 8d9e2b4dc7c1..e0dacb736ef9 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -373,7 +373,6 @@ static void virtio_rpmsg_release_device(struct device *dev)
struct rpmsg_device *rpdev = to_rpmsg_device(dev);
struct virtio_rpmsg_channel *vch = to_virtio_rpmsg_channel(rpdev);
- kfree(rpdev->driver_override);
kfree(vch);
}
diff --git a/include/linux/rpmsg.h b/include/linux/rpmsg.h
index fb7ab9165645..c2e3ef8480d5 100644
--- a/include/linux/rpmsg.h
+++ b/include/linux/rpmsg.h
@@ -41,9 +41,6 @@ struct rpmsg_channel_info {
* rpmsg_device - device that belong to the rpmsg bus
* @dev: the device struct
* @id: device id (used to match between rpmsg drivers and devices)
- * @driver_override: driver name to force a match; do not set directly,
- * because core frees it; use driver_set_override() to
- * set or clear it.
* @src: local address
* @dst: destination address
* @ept: the rpmsg endpoint of this channel
@@ -53,7 +50,6 @@ struct rpmsg_channel_info {
struct rpmsg_device {
struct device dev;
struct rpmsg_device_id id;
- const char *driver_override;
u32 src;
u32 dst;
struct rpmsg_endpoint *ept;
--
2.53.0
^ permalink raw reply related
* [PATCH 08/12] vdpa: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 539fec78edb4 ("vdpa: add driver_override support")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/vdpa/vdpa.c | 48 +++++---------------------------------------
include/linux/vdpa.h | 4 ----
2 files changed, 5 insertions(+), 47 deletions(-)
diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index 34874beb0152..caf0ee5d6856 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -67,57 +67,20 @@ static void vdpa_dev_remove(struct device *d)
static int vdpa_dev_match(struct device *dev, const struct device_driver *drv)
{
- struct vdpa_device *vdev = dev_to_vdpa(dev);
+ int ret;
/* Check override first, and if set, only use the named driver */
- if (vdev->driver_override)
- return strcmp(vdev->driver_override, drv->name) == 0;
+ ret = device_match_driver_override(dev, drv);
+ if (ret >= 0)
+ return ret;
/* Currently devices must be supported by all vDPA bus drivers */
return 1;
}
-static ssize_t driver_override_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct vdpa_device *vdev = dev_to_vdpa(dev);
- int ret;
-
- ret = driver_set_override(dev, &vdev->driver_override, buf, count);
- if (ret)
- return ret;
-
- return count;
-}
-
-static ssize_t driver_override_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct vdpa_device *vdev = dev_to_vdpa(dev);
- ssize_t len;
-
- device_lock(dev);
- len = sysfs_emit(buf, "%s\n", vdev->driver_override);
- device_unlock(dev);
-
- return len;
-}
-static DEVICE_ATTR_RW(driver_override);
-
-static struct attribute *vdpa_dev_attrs[] = {
- &dev_attr_driver_override.attr,
- NULL,
-};
-
-static const struct attribute_group vdpa_dev_group = {
- .attrs = vdpa_dev_attrs,
-};
-__ATTRIBUTE_GROUPS(vdpa_dev);
-
static const struct bus_type vdpa_bus = {
.name = "vdpa",
- .dev_groups = vdpa_dev_groups,
+ .driver_override = true,
.match = vdpa_dev_match,
.probe = vdpa_dev_probe,
.remove = vdpa_dev_remove,
@@ -132,7 +95,6 @@ static void vdpa_release_dev(struct device *d)
ops->free(vdev);
ida_free(&vdpa_index_ida, vdev->index);
- kfree(vdev->driver_override);
kfree(vdev);
}
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 2bfe3baa63f4..782c42d25db1 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -72,9 +72,6 @@ struct vdpa_mgmt_dev;
* struct vdpa_device - representation of a vDPA device
* @dev: underlying device
* @vmap: the metadata passed to upper layer to be used for mapping
- * @driver_override: driver name to force a match; do not set directly,
- * because core frees it; use driver_set_override() to
- * set or clear it.
* @config: the configuration ops for this device.
* @map: the map ops for this device
* @cf_lock: Protects get and set access to configuration layout.
@@ -90,7 +87,6 @@ struct vdpa_mgmt_dev;
struct vdpa_device {
struct device dev;
union virtio_map vmap;
- const char *driver_override;
const struct vdpa_config_ops *config;
const struct virtio_map_ops *map;
struct rw_semaphore cf_lock; /* Protects get/set config */
--
2.53.0
^ permalink raw reply related
* [PATCH 09/12] s390/cio: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: ebc3d1791503 ("s390/cio: introduce driver_override on the css bus")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/s390/cio/cio.h | 5 -----
drivers/s390/cio/css.c | 34 ++++------------------------------
2 files changed, 4 insertions(+), 35 deletions(-)
diff --git a/drivers/s390/cio/cio.h b/drivers/s390/cio/cio.h
index 08a5e9380e75..bad142c536e1 100644
--- a/drivers/s390/cio/cio.h
+++ b/drivers/s390/cio/cio.h
@@ -103,11 +103,6 @@ struct subchannel {
struct work_struct todo_work;
struct schib_config config;
u64 dma_mask;
- /*
- * Driver name to force a match. Do not set directly, because core
- * frees it. Use driver_set_override() to set or clear it.
- */
- const char *driver_override;
} __attribute__ ((aligned(8)));
DECLARE_PER_CPU_ALIGNED(struct irb, cio_irb);
diff --git a/drivers/s390/cio/css.c b/drivers/s390/cio/css.c
index 5ab239f38588..e5a0ec6b4e3e 100644
--- a/drivers/s390/cio/css.c
+++ b/drivers/s390/cio/css.c
@@ -159,7 +159,6 @@ static void css_subchannel_release(struct device *dev)
sch->config.intparm = 0;
cio_commit_config(sch);
- kfree(sch->driver_override);
kfree(sch);
}
@@ -323,37 +322,9 @@ static ssize_t modalias_show(struct device *dev, struct device_attribute *attr,
static DEVICE_ATTR_RO(modalias);
-static ssize_t driver_override_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
-{
- struct subchannel *sch = to_subchannel(dev);
- int ret;
-
- ret = driver_set_override(dev, &sch->driver_override, buf, count);
- if (ret)
- return ret;
-
- return count;
-}
-
-static ssize_t driver_override_show(struct device *dev,
- struct device_attribute *attr, char *buf)
-{
- struct subchannel *sch = to_subchannel(dev);
- ssize_t len;
-
- device_lock(dev);
- len = sysfs_emit(buf, "%s\n", sch->driver_override);
- device_unlock(dev);
- return len;
-}
-static DEVICE_ATTR_RW(driver_override);
-
static struct attribute *subch_attrs[] = {
&dev_attr_type.attr,
&dev_attr_modalias.attr,
- &dev_attr_driver_override.attr,
NULL,
};
@@ -1356,9 +1327,11 @@ static int css_bus_match(struct device *dev, const struct device_driver *drv)
struct subchannel *sch = to_subchannel(dev);
const struct css_driver *driver = to_cssdriver(drv);
struct css_device_id *id;
+ int ret;
/* When driver_override is set, only bind to the matching driver */
- if (sch->driver_override && strcmp(sch->driver_override, drv->name))
+ ret = device_match_driver_override(dev, drv);
+ if (ret == 0)
return 0;
for (id = driver->subchannel_type; id->match_flags; id++) {
@@ -1415,6 +1388,7 @@ static int css_uevent(const struct device *dev, struct kobj_uevent_env *env)
static const struct bus_type css_bus_type = {
.name = "css",
+ .driver_override = true,
.match = css_bus_match,
.probe = css_probe,
.remove = css_remove,
--
2.53.0
^ permalink raw reply related
* [PATCH 10/12] s390/ap: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When the AP masks are updated via apmask_store() or aqmask_store(),
ap_bus_revise_bindings() is called after ap_attr_mutex has been
released.
This calls __ap_revise_reserved(), which accesses the driver_override
field without holding any lock, racing against a concurrent
driver_override_store() that may free the old string, resulting in a
potential UAF.
Fix this by using the driver-core driver_override infrastructure, which
protects all accesses with an internal spinlock.
Note that unlike most other buses, the AP bus does not check
driver_override in its match() callback; the override is checked in
ap_device_probe() and __ap_revise_reserved() instead.
Also note that we do not enable the driver_override feature of struct
bus_type, as AP - in contrast to most other buses - passes "" to
sysfs_emit() when the driver_override pointer is NULL. Thus, printing
"\n" instead of "(null)\n".
Additionally, AP has a custom counter that is modified in the
corresponding custom driver_override_store().
Fixes: d38a87d7c064 ("s390/ap: Support driver_override for AP queue devices")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/s390/crypto/ap_bus.c | 34 +++++++++++++++++-----------------
drivers/s390/crypto/ap_bus.h | 1 -
drivers/s390/crypto/ap_queue.c | 24 ++++++------------------
3 files changed, 23 insertions(+), 36 deletions(-)
diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index d652df96a507..f24e27add721 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -859,25 +859,24 @@ static int __ap_queue_devices_with_id_unregister(struct device *dev, void *data)
static int __ap_revise_reserved(struct device *dev, void *dummy)
{
- int rc, card, queue, devres, drvres;
+ int rc, card, queue, devres, drvres, ovrd;
if (is_queue_dev(dev)) {
struct ap_driver *ap_drv = to_ap_drv(dev->driver);
struct ap_queue *aq = to_ap_queue(dev);
- struct ap_device *ap_dev = &aq->ap_dev;
card = AP_QID_CARD(aq->qid);
queue = AP_QID_QUEUE(aq->qid);
- if (ap_dev->driver_override) {
- if (strcmp(ap_dev->driver_override,
- ap_drv->driver.name)) {
- pr_debug("reprobing queue=%02x.%04x\n", card, queue);
- rc = device_reprobe(dev);
- if (rc) {
- AP_DBF_WARN("%s reprobing queue=%02x.%04x failed\n",
- __func__, card, queue);
- }
+ ovrd = device_match_driver_override(dev, &ap_drv->driver);
+ if (ovrd > 0) {
+ /* override set and matches, nothing to do */
+ } else if (ovrd == 0) {
+ pr_debug("reprobing queue=%02x.%04x\n", card, queue);
+ rc = device_reprobe(dev);
+ if (rc) {
+ AP_DBF_WARN("%s reprobing queue=%02x.%04x failed\n",
+ __func__, card, queue);
}
} else {
mutex_lock(&ap_attr_mutex);
@@ -928,7 +927,7 @@ int ap_owned_by_def_drv(int card, int queue)
if (aq) {
const struct device_driver *drv = aq->ap_dev.device.driver;
const struct ap_driver *ap_drv = to_ap_drv(drv);
- bool override = !!aq->ap_dev.driver_override;
+ bool override = device_has_driver_override(&aq->ap_dev.device);
if (override && drv && ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
rc = 1;
@@ -977,7 +976,7 @@ static int ap_device_probe(struct device *dev)
{
struct ap_device *ap_dev = to_ap_dev(dev);
struct ap_driver *ap_drv = to_ap_drv(dev->driver);
- int card, queue, devres, drvres, rc = -ENODEV;
+ int card, queue, devres, drvres, rc = -ENODEV, ovrd;
if (!get_device(dev))
return rc;
@@ -991,10 +990,11 @@ static int ap_device_probe(struct device *dev)
*/
card = AP_QID_CARD(to_ap_queue(dev)->qid);
queue = AP_QID_QUEUE(to_ap_queue(dev)->qid);
- if (ap_dev->driver_override) {
- if (strcmp(ap_dev->driver_override,
- ap_drv->driver.name))
- goto out;
+ ovrd = device_match_driver_override(dev, &ap_drv->driver);
+ if (ovrd > 0) {
+ /* override set and matches, nothing to do */
+ } else if (ovrd == 0) {
+ goto out;
} else {
mutex_lock(&ap_attr_mutex);
devres = test_bit_inv(card, ap_perms.apm) &&
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 51e08f27bd75..04ea256ecf91 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -166,7 +166,6 @@ void ap_driver_unregister(struct ap_driver *);
struct ap_device {
struct device device;
int device_type; /* AP device type. */
- const char *driver_override;
};
#define to_ap_dev(x) container_of((x), struct ap_device, device)
diff --git a/drivers/s390/crypto/ap_queue.c b/drivers/s390/crypto/ap_queue.c
index 3fe2e41c5c6b..ca9819e6f7e7 100644
--- a/drivers/s390/crypto/ap_queue.c
+++ b/drivers/s390/crypto/ap_queue.c
@@ -734,26 +734,14 @@ static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
- struct ap_queue *aq = to_ap_queue(dev);
- struct ap_device *ap_dev = &aq->ap_dev;
- int rc;
-
- device_lock(dev);
- if (ap_dev->driver_override)
- rc = sysfs_emit(buf, "%s\n", ap_dev->driver_override);
- else
- rc = sysfs_emit(buf, "\n");
- device_unlock(dev);
-
- return rc;
+ guard(spinlock)(&dev->driver_override.lock);
+ return sysfs_emit(buf, "%s\n", dev->driver_override.name ?: "");
}
static ssize_t driver_override_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
- struct ap_queue *aq = to_ap_queue(dev);
- struct ap_device *ap_dev = &aq->ap_dev;
int rc = -EINVAL;
bool old_value;
@@ -764,13 +752,13 @@ static ssize_t driver_override_store(struct device *dev,
if (ap_apmask_aqmask_in_use)
goto out;
- old_value = ap_dev->driver_override ? true : false;
- rc = driver_set_override(dev, &ap_dev->driver_override, buf, count);
+ old_value = device_has_driver_override(dev);
+ rc = __device_set_driver_override(dev, buf, count);
if (rc)
goto out;
- if (old_value && !ap_dev->driver_override)
+ if (old_value && !device_has_driver_override(dev))
--ap_driver_override_ctr;
- else if (!old_value && ap_dev->driver_override)
+ else if (!old_value && device_has_driver_override(dev))
++ap_driver_override_ctr;
rc = count;
--
2.53.0
^ permalink raw reply related
* [PATCH 11/12] spi: use generic driver_override infrastructure
From: Danilo Krummrich @ 2026-03-24 0:59 UTC (permalink / raw)
To: Russell King, Greg Kroah-Hartman, Rafael J. Wysocki,
Ioana Ciornei, Nipun Gupta, Nikhil Agarwal, K. Y. Srinivasan,
Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Bjorn Helgaas,
Armin Wolf, Bjorn Andersson, Mathieu Poirier, Vineeth Vijayan,
Peter Oberparleiter, Heiko Carstens, Vasily Gorbik,
Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
Harald Freudenberger, Holger Dengler, Mark Brown,
Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
Alex Williamson, Juergen Gross, Stefano Stabellini,
Oleksandr Tyshchenko, Christophe Leroy (CS GROUP)
Cc: linux-kernel, driver-core, linuxppc-dev, linux-hyperv, linux-pci,
platform-driver-x86, linux-arm-msm, linux-remoteproc, linux-s390,
linux-spi, virtualization, kvm, xen-devel, linux-arm-kernel,
Danilo Krummrich, Gui-Dong Han
In-Reply-To: <20260324005919.2408620-1-dakr@kernel.org>
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Also note that we do not enable the driver_override feature of struct
bus_type, as SPI - in contrast to most other buses - passes "" to
sysfs_emit() when the driver_override pointer is NULL. Thus, printing
"\n" instead of "(null)\n".
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 5039563e7c25 ("spi: Add driver_override SPI device attribute")
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
---
drivers/spi/spi.c | 19 +++++++------------
include/linux/spi/spi.h | 5 -----
2 files changed, 7 insertions(+), 17 deletions(-)
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 53dee314d76a..4101c2803eb3 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -50,7 +50,6 @@ static void spidev_release(struct device *dev)
struct spi_device *spi = to_spi_device(dev);
spi_controller_put(spi->controller);
- kfree(spi->driver_override);
free_percpu(spi->pcpu_statistics);
kfree(spi);
}
@@ -73,10 +72,9 @@ static ssize_t driver_override_store(struct device *dev,
struct device_attribute *a,
const char *buf, size_t count)
{
- struct spi_device *spi = to_spi_device(dev);
int ret;
- ret = driver_set_override(dev, &spi->driver_override, buf, count);
+ ret = __device_set_driver_override(dev, buf, count);
if (ret)
return ret;
@@ -86,13 +84,8 @@ static ssize_t driver_override_store(struct device *dev,
static ssize_t driver_override_show(struct device *dev,
struct device_attribute *a, char *buf)
{
- const struct spi_device *spi = to_spi_device(dev);
- ssize_t len;
-
- device_lock(dev);
- len = sysfs_emit(buf, "%s\n", spi->driver_override ? : "");
- device_unlock(dev);
- return len;
+ guard(spinlock)(&dev->driver_override.lock);
+ return sysfs_emit(buf, "%s\n", dev->driver_override.name ?: "");
}
static DEVICE_ATTR_RW(driver_override);
@@ -376,10 +369,12 @@ static int spi_match_device(struct device *dev, const struct device_driver *drv)
{
const struct spi_device *spi = to_spi_device(dev);
const struct spi_driver *sdrv = to_spi_driver(drv);
+ int ret;
/* Check override first, and if set, only use the named driver */
- if (spi->driver_override)
- return strcmp(spi->driver_override, drv->name) == 0;
+ ret = device_match_driver_override(dev, drv);
+ if (ret >= 0)
+ return ret;
/* Attempt an OF style match */
if (of_driver_match_device(dev, drv))
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index af7cfee7b8f6..0dc671c07d3a 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -159,10 +159,6 @@ extern void spi_transfer_cs_change_delay_exec(struct spi_message *msg,
* @modalias: Name of the driver to use with this device, or an alias
* for that name. This appears in the sysfs "modalias" attribute
* for driver coldplugging, and in uevents used for hotplugging
- * @driver_override: If the name of a driver is written to this attribute, then
- * the device will bind to the named driver and only the named driver.
- * Do not set directly, because core frees it; use driver_set_override() to
- * set or clear it.
* @pcpu_statistics: statistics for the spi_device
* @word_delay: delay to be inserted between consecutive
* words of a transfer
@@ -224,7 +220,6 @@ struct spi_device {
void *controller_state;
void *controller_data;
char modalias[SPI_NAME_SIZE];
- const char *driver_override;
/* The statistics */
struct spi_statistics __percpu *pcpu_statistics;
--
2.53.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox