* Re: [Intel-wired-lan] [PATCH iwl-next v1] ixgbe: Implement PCI reset handler
From: Paul Menzel @ 2026-06-17 9:03 UTC (permalink / raw)
To: Sergey Temerkhanov
Cc: intel-wired-lan, netdev, Aleksandr Loktionov, Bjorn Helgaas,
linux-pci
In-Reply-To: <20260617084329.199110-1-sergey.temerkhanov@intel.com>
[Cc: +Aleksandr (as in Reviewed-by:), +PCI subsystem]
Dear Sergey,
Thank you for your patch.
Am 17.06.26 um 10:43 schrieb Sergey Temerkhanov:
> Implement PCI device reset handler to allow the network device to
> get re-initialized and function after a PCI-level reset.
Please describe the problem in more detail. When does PCI-level reset
occur, and what is the current problematic situation?
Also, what is ixgbe specific compared to a general PCIe implementation?
Please share details how to test it, and how you tested it.
> Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe.h | 1 +
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 72 +++++++++++++++++++
> 2 files changed, 73 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 594ccb28da20..c4b0c5bb89c6 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -912,6 +912,7 @@ enum ixgbe_state_t {
> __IXGBE_PTP_TX_IN_PROGRESS,
> __IXGBE_RESET_REQUESTED,
> __IXGBE_PHY_INIT_COMPLETE,
> + __IXGBE_PCIE_RESET_IN_PROGRESS,
> };
>
> struct ixgbe_cb {
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 2ac274c73d61..a61ee5fff7be 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -12352,6 +12352,76 @@ static pci_ers_result_t ixgbe_io_slot_reset(struct pci_dev *pdev)
> return result;
> }
>
> +#define IXGBE_PCIE_RESET_RETRIES 1000
Why 1000? Isn’t there a generic PCIe macro? Please extend the commit
message.
> +
> +/**
> + * ixgbe_reset_prep - called before the pci bus is reset.
> + * @pdev: Pointer to PCI device
> + *
> + * Prepare the card for a reset, preventing the service task from running.
> + */
> +static void ixgbe_reset_prep(struct pci_dev *pdev)
> +{
> + struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
> + unsigned int timeout = IXGBE_PCIE_RESET_RETRIES;
> +
> + if (!adapter)
> + return;
> +
> + /* Prevent the service task from being requeued in the timer callback
> + * while we're resetting.
> + */
> + if (test_bit(__IXGBE_SERVICE_INITED, &adapter->state)) {
> + timer_delete_sync(&adapter->service_timer);
> + /* Prevent the service task from running while we're resetting. */
One of the two comments seems redundant.
> + cancel_work_sync(&adapter->service_task);
> + }
> +
> + pci_clear_master(pdev);
> +
> + while (test_and_set_bit(__IXGBE_RESETTING, &adapter->state) && --timeout)
> + usleep_range(1000, 2000);
> +
> + if (!timeout) {
> + e_err(drv, "Timed out waiting for __IXGBE_RESETTING to be released. Reset is needed\n");
> + pci_set_master(pdev);
> + return;
> + }
> +
> + set_bit(__IXGBE_PCIE_RESET_IN_PROGRESS, &adapter->state);
> + smp_mb__after_atomic();
> +}
> +
> +/**
> + * ixgbe_reset_done - called after the pci bus has been reset.
> + * @pdev: Pointer to PCI device
> + *
> + * Allow the service task to run and schedule re-initialization.
> + */
> +static void ixgbe_reset_done(struct pci_dev *pdev)
> +{
> + struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
> +
> + smp_mb__before_atomic();
> + if (!test_and_clear_bit(__IXGBE_PCIE_RESET_IN_PROGRESS, &adapter->state)) {
> + e_err(drv, "Reset done called without PCIe reset in progress\n");
How can this happen? What should the user reading this error do?
> + return;
> + }
> +
> + /* Allow the service task to run */
> + if (!test_bit(__IXGBE_REMOVING, &adapter->state)) {
> + clear_bit(__IXGBE_RESETTING, &adapter->state);
> + smp_mb__after_atomic();
> + }
> +
> + /* Schedule re-initialization */
> + if (!test_bit(__IXGBE_DOWN, &adapter->state)) {
> + set_bit(__IXGBE_RESET_REQUESTED, &adapter->state);
> + if (test_bit(__IXGBE_SERVICE_INITED, &adapter->state))
> + mod_timer(&adapter->service_timer, jiffies + 1);
> + }
> +}
> +
> /**
> * ixgbe_io_resume - called when traffic can start flowing again.
> * @pdev: Pointer to PCI device
> @@ -12384,6 +12454,8 @@ static const struct pci_error_handlers ixgbe_err_handler = {
> .error_detected = ixgbe_io_error_detected,
> .slot_reset = ixgbe_io_slot_reset,
> .resume = ixgbe_io_resume,
> + .reset_prepare = ixgbe_reset_prep,
> + .reset_done = ixgbe_reset_done,
> };
>
> static DEFINE_SIMPLE_DEV_PM_OPS(ixgbe_pm_ops, ixgbe_suspend, ixgbe_resume);
Kind regards,
Paul
^ permalink raw reply
* Re: [PATCH net v3] rtase: Workaround for TX hang caused by short UDP packets entering hardware PTP parsing
From: Simon Horman @ 2026-06-17 8:59 UTC (permalink / raw)
To: justinlai0215
Cc: Simon Horman, kuba, davem, edumazet, pabeni, andrew+netdev,
linux-kernel, netdev, stable, richardcochran, david.laight.linux,
aleksander.lobakin, pkshih, larry.chiu
In-Reply-To: <20260615131653.15730-1-justinlai0215@realtek.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
rtase: Workaround for TX hang caused by short UDP packets entering hardware PTP parsing
The patch adds a workaround for a hardware bug in rtase where short UDP
packets triggering PTP parsing can cause a TX hang due to out-of-bounds reads.
It pads affected packets so the transport data reaches a minimum length.
Cc: stable@vger.kernel.org
> diff --git a/drivers/net/ethernet/realtek/rtase/rtase_main.c b/drivers/net/ethernet/realtek/rtase/rtase_main.c
> index 55105d34bc797..4c295a39c7a0c 100644
> --- a/drivers/net/ethernet/realtek/rtase/rtase_main.c
> +++ b/drivers/net/ethernet/realtek/rtase/rtase_main.c
[ ... ]
> @@ -1249,6 +1250,81 @@ static u32 rtase_tx_csum(struct sk_buff *skb, const struct net_device *dev)
> return csum_cmd;
> }
>
> +static bool rtase_get_udp_offset(struct sk_buff *skb, u32 *udp_offset)
> +{
> + int no = skb_network_offset(skb);
> + struct ipv6hdr *i6h, _i6h;
> + struct iphdr *ih, _ih;
> +
> + switch (vlan_get_protocol(skb)) {
> + case htons(ETH_P_IP):
[Severity: High]
Does relying on vlan_get_protocol(skb) allow an unprivileged user to bypass
this workaround?
If an attacker uses an AF_PACKET/SOCK_RAW socket to inject a raw Ethernet
frame containing an IPv4/UDP packet, they could set skb->protocol to an
arbitrary value like htons(ETH_P_ALL). The switch statement will fall
through to the default case and return false, causing rtase_skb_pad() to
skip the padding.
Since the hardware independently parses the raw frame based on the EtherType,
could it still process the unpadded packet and trigger the TX hang?
> + ih = skb_header_pointer(skb, no, sizeof(_ih), &_ih);
> + if (!ih)
> + return false;
[ ... ]
> +static bool rtase_skb_pad(struct sk_buff *skb)
> +{
> + __be16 *dest, _dest;
> + u32 trans_data_len;
> + u32 udp_offset;
> + u16 dest_port;
> + u32 pad_len;
> +
> + if (!rtase_get_udp_offset(skb, &udp_offset))
> + return true;
> +
> + trans_data_len = skb->len - udp_offset;
> + if (trans_data_len < offsetof(struct udphdr, len) ||
> + trans_data_len >= RTASE_MIN_PAD_LEN)
> + return true;
[Severity: High]
Does this logic inadvertently skip padding for underflowed lengths and tiny
payloads?
If the IP header is malformed such that skb->len < udp_offset, the unsigned
subtraction for trans_data_len will underflow to a large positive value.
This causes trans_data_len >= RTASE_MIN_PAD_LEN to be true, bypassing the
padding.
Additionally, if the packet has fewer than 4 bytes of UDP payload, meaning
trans_data_len < offsetof(struct udphdr, len), it explicitly returns true.
Since the hardware blindly fetches the UDP destination port at
udp_offset + 2 via DMA, won't transmitting these short or malformed packets
without padding still expose the hardware to out-of-bounds reads?
> +
> + dest = skb_header_pointer(skb,
> + udp_offset + offsetof(struct udphdr, dest),
> + sizeof(_dest), &_dest);
[ ... ]
> @@ -1362,6 +1438,9 @@ static netdev_tx_t rtase_start_xmit(struct sk_buff *skb,
> opts2 |= rtase_tx_csum(skb, dev);
> }
>
> + if (!rtase_skb_pad(skb))
> + goto err_dma_0;
> +
> frags = rtase_xmit_frags(ring, skb, opts1, opts2);
> if (unlikely(frags < 0))
> goto err_dma_0;
[Severity: High]
This is a pre-existing issue, but does the err_dma_1 error path in
rtase_start_xmit() contain a double free of the skb?
When transmitting a fragmented packet, rtase_xmit_frags() maps the
fragments and assigns ring->skbuff[entry] = skb for the last fragment
descriptor. Back in rtase_start_xmit(), if dma_map_single() fails for the
linear part, it jumps to err_dma_1:
err_dma_1:
ring->skbuff[entry] = NULL;
rtase_tx_clear_range(ring, ring->cur_idx + 1, frags);
err_dma_0:
tp->stats.tx_dropped++;
dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
rtase_tx_clear_range() iterates over the mapped fragments and calls
dev_kfree_skb_any() when it finds the skb pointer. Execution then falls
through to err_dma_0, which unconditionally calls dev_kfree_skb_any(skb) a
second time on the same skb pointer.
^ permalink raw reply
* Re: [PATCH net] net: airoha: Fix TX scheduler queue mask loop upper bound
From: Wayen Yan @ 2026-06-17 8:55 UTC (permalink / raw)
To: lorenzo; +Cc: netdev, nbd, linux-arm-kernel, linux-mediatek
In-Reply-To: <178166704952.2212140.11002626760717132754@gmail.com>
On Tue, Jun 17, 2026, Lorenzo Bianconi wrote:
> Even if the current codebase supports just AIROHA_NUM_QOS_CHANNEL (4), the hw
> exposes 32 hw QoS channels (AIROHA_NUM_TX_RING). Here we are just clearing the
> configuration, so I guess the current implementation is correct.
Hi Lorenzo,
You are right that there is no functional impact, and I agree this
should not go to net. Let me explain the register layout I was worried
about, and you can decide whether it is worth a net-next cleanup or
should just be dropped.
The two macros are:
REG_QUEUE_CLOSE_CFG(_n) = 0x00a0 + ((_n) & 0xfc)
TXQ_DISABLE_CHAN_QUEUE_MASK(_n, _m) = BIT((_m) + (((_n) & 0x3) << 3))
REG_QUEUE_CLOSE_CFG() masks the channel with 0xfc, and the bit macro
folds the channel with & 0x3 (mod 4) shifted by 3. So one 32-bit
register holds 4 channels x 8 queues, 8 queue bits per channel:
channel 0 -> reg 0x00a0, bits 0..7
channel 1 -> reg 0x00a0, bits 8..15
channel 2 -> reg 0x00a0, bits 16..23
channel 3 -> reg 0x00a0, bits 24..31
channel 4 -> reg 0x00a4, bits 0..7
...
In airoha_qdma_set_chan_tx_sched() the loop variable 'i' is passed as
the *queue* argument _m, not as a channel:
for (i = 0; i < AIROHA_NUM_TX_RING; i++) // i = 0..31
airoha_qdma_clear(qdma, REG_QUEUE_CLOSE_CFG(channel),
TXQ_DISABLE_CHAN_QUEUE_MASK(channel, i));
Since each channel only has AIROHA_NUM_QOS_QUEUES (8) queues, the correct
logic is to clear the 8 queue bits belonging to 'channel'. With i running
up to 31 the BIT() shift instead walks past those 8 bits and into the bit
ranges of the other channels folded into the same register. For channel 0
the accumulated mask becomes 0xffffffff, i.e. it touches channels 1..3 as
well.
This is harmless today only because REG_QUEUE_CLOSE_CFG is written
exclusively here, via airoha_qdma_clear() (RMW clear), and the register
resets to 0 and is never set anywhere -- so clearing extra bits is a
no-op. Functionally the current code is fine, as you say.
The point is just the loop-bound semantics: 'i' is a per-channel queue
index, so the bound should be AIROHA_NUM_QOS_QUEUES (8), not
AIROHA_NUM_TX_RING (32). The two happen to be related (32 == 4 channels *
8 queues) but mean different things.
Since there is no functional change, feel free to drop this if you would
rather not carry a cosmetic patch. If you think the clarity is worth it I
can resend against net-next without the Fixes tag.
Thanks,
Wayen
^ permalink raw reply
* Re: [PATCH net v3] rtase: Workaround for TX hang caused by short UDP packets entering hardware PTP parsing
From: Simon Horman @ 2026-06-17 8:54 UTC (permalink / raw)
To: justinlai0215
Cc: Simon Horman, kuba, davem, edumazet, pabeni, andrew+netdev,
linux-kernel, netdev, stable, richardcochran, david.laight.linux,
aleksander.lobakin, pkshih, larry.chiu
In-Reply-To: <20260615131653.15730-1-justinlai0215@realtek.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
rtase: Workaround for TX hang caused by short UDP packets entering hardware PTP parsing
This patch adds a workaround for a hardware TX hang that occurs when
short UDP packets matching destination ports 319/320 enter a PTP parsing
path that reads beyond the packet end. The workaround pads affected
packets so the transport data reaches RTASE_MIN_PAD_LEN before
transmission.
> diff --git a/drivers/net/ethernet/realtek/rtase/rtase_main.c b/drivers/net/ethernet/realtek/rtase/rtase_main.c
> index 55105d34bc79..4c295a39c7a0 100644
> --- a/drivers/net/ethernet/realtek/rtase/rtase_main.c
> +++ b/drivers/net/ethernet/realtek/rtase/rtase_main.c
[ ... ]
> +static bool rtase_get_udp_offset(struct sk_buff *skb, u32 *udp_offset)
> +{
> + int no = skb_network_offset(skb);
> + struct ipv6hdr *i6h, _i6h;
> + struct iphdr *ih, _ih;
> +
> + switch (vlan_get_protocol(skb)) {
> + case htons(ETH_P_IP):
> + ih = skb_header_pointer(skb, no, sizeof(_ih), &_ih);
> + if (!ih)
> + return false;
> +
> + if (ih->ihl < 5)
> + return false;
> +
> + if (ih->protocol != IPPROTO_UDP)
> + return false;
> +
> + *udp_offset = no + ih->ihl * 4;
[Low]
Should ih->ihl also be checked against the actual data present in the
skb? With ihl up to 15 (60 bytes), if the skb only contains the basic
20-byte IP header, the computed udp_offset can exceed skb->len.
Downstream in rtase_skb_pad():
trans_data_len = skb->len - udp_offset;
trans_data_len is u32, so when udp_offset > skb->len this underflows to
a very large value, which trivially satisfies:
trans_data_len >= RTASE_MIN_PAD_LEN
and rtase_skb_pad() silently skips the padding. A malformed packet from
an AF_PACKET sender with CAP_NET_RAW could craft a mismatched ihl and
bypass the workaround. Would clamping or validating udp_offset against
skb->len before the subtraction be appropriate?
> +
> + return true;
> + case htons(ETH_P_IPV6):
> + i6h = skb_header_pointer(skb, no, sizeof(_i6h), &_i6h);
> + if (!i6h)
> + return false;
> +
> + if (i6h->nexthdr != IPPROTO_UDP)
> + return false;
> +
> + *udp_offset = no + sizeof(*i6h);
> +
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +static bool rtase_skb_pad(struct sk_buff *skb)
> +{
> + __be16 *dest, _dest;
> + u32 trans_data_len;
> + u32 udp_offset;
> + u16 dest_port;
> + u32 pad_len;
> +
> + if (!rtase_get_udp_offset(skb, &udp_offset))
> + return true;
> +
> + trans_data_len = skb->len - udp_offset;
> + if (trans_data_len < offsetof(struct udphdr, len) ||
> + trans_data_len >= RTASE_MIN_PAD_LEN)
> + return true;
[Low]
When trans_data_len is less than offsetof(struct udphdr, len) (i.e., the
transport region is under 4 bytes), this returns true and skips the
padding entirely.
The commit message states the hardware "may access data beyond the end
of the packet" during PTP parsing. If the hardware also reads out of
bounds while classifying the packet, an IPv4 last fragment with 1-3
trailing bytes could still be misclassified as PTP based on whatever
garbage memory the hardware happens to read, and the workaround would
be bypassed.
The implicit assumption here seems to be "if the driver cannot read the
dest port from the skb, the hardware cannot classify it either", which
appears to contradict the premise that the hardware reads beyond the
packet boundary. Should these very short transport-data cases also be
padded to be safe?
> +
> + dest = skb_header_pointer(skb,
> + udp_offset + offsetof(struct udphdr, dest),
> + sizeof(_dest), &_dest);
[ ... ]
^ permalink raw reply
* RE: [PATCH net v2] tipc: fix use-after-free of the discoverer in tipc_disc_rcv()
From: Tung Quang Nguyen @ 2026-06-17 8:47 UTC (permalink / raw)
To: Weiming Shi
Cc: jmaloy@redhat.com, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, davem@davemloft.net,
xmei5@asu.edu, netdev@vger.kernel.org,
tipc-discussion@lists.sourceforge.net,
linux-kernel@vger.kernel.org
In-Reply-To: <20260616122246.3136462-2-bestswngs@gmail.com>
>Subject: [PATCH net v2] tipc: fix use-after-free of the discoverer in
>tipc_disc_rcv()
>
>bearer_disable() frees b->disc with tipc_disc_delete()'s plain kfree(), but
>tipc_disc_rcv() still dereferences b->disc in RX softirq under
>rcu_read_lock() (tipc_udp_recv -> tipc_rcv -> tipc_disc_rcv).
>
>L2 bearers are safe thanks to the synchronize_net() in tipc_disable_l2_media(),
>but the UDP bearer defers that call to the
>cleanup_bearer() workqueue, so the discoverer is freed with no grace
>period:
>
> BUG: KASAN: slab-use-after-free in tipc_disc_rcv (net/tipc/discover.c:149)
>Read of size 8 at addr ffff88802348b728 by task poc_tipc/184 <IRQ>
> tipc_disc_rcv (net/tipc/discover.c:149)
> tipc_rcv (net/tipc/node.c:2126)
> tipc_udp_recv (net/tipc/udp_media.c:391)
> udp_rcv (net/ipv4/udp.c:2643)
> ip_local_deliver_finish (net/ipv4/ip_input.c:241) </IRQ> Freed by task 181:
> kfree (mm/slub.c:6565)
> bearer_disable (net/tipc/bearer.c:418)
> tipc_nl_bearer_disable (net/tipc/bearer.c:1001)
>
>The bearer is freed with kfree_rcu(); free the discoverer the same way.
>Add an rcu_head to struct tipc_discoverer and free it and its skb from an RCU
>callback.
>
>Because the RCU callback (tipc_disc_free_rcu) lives in module text, a
>call_rcu() that is still pending when the tipc module is unloaded would invoke a
>freed function. Add an rcu_barrier() to tipc_exit() after the bearer subsystem
>has been torn down, so all pending discoverer callbacks have run before the
>module text goes away.
>
>Reachable from an unprivileged user namespace: the TIPCv2 genl family is
>netnsok and its bearer commands have no GENL_ADMIN_PERM. Needs
>CONFIG_TIPC and CONFIG_TIPC_MEDIA_UDP.
>
>Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash
>values")
>Reported-by: Xiang Mei <xmei5@asu.edu>
>Assisted-by: Claude:claude-opus-4-8
>Signed-off-by: Weiming Shi <bestswngs@gmail.com>
>---
>v2:
> - split the over-80-column container_of() line (Tung Quang Nguyen)
> - add rcu_barrier() to tipc_exit() so a pending call_rcu() cannot fire
> into freed module text after rmmod (Eric Dumazet)
>
> net/tipc/core.c | 3 +++
> net/tipc/discover.c | 14 ++++++++++++--
> 2 files changed, 15 insertions(+), 2 deletions(-)
>
>diff --git a/net/tipc/core.c b/net/tipc/core.c index
>434e70eabe08..747328e58d30 100644
>--- a/net/tipc/core.c
>+++ b/net/tipc/core.c
>@@ -218,6 +218,9 @@ static void __exit tipc_exit(void)
> unregister_pernet_device(&tipc_net_ops);
> tipc_unregister_sysctl();
>
>+ /* Wait for tipc_disc_free_rcu() callbacks queued from module text. */
Please change above comment to: /* TODO: Wait for all timers that called call_rcu() to finish before calling rcu_barrier() */
Note that call_rcu() are used in discover.c and node.c. So, the TODO comment helps we add more checking code later in another patch.
>+ rcu_barrier();
>+
> pr_info("Deactivated\n");
> }
>
>diff --git a/net/tipc/discover.c b/net/tipc/discover.c index
>3e54d2df5683..696b7a8ed54d 100644
>--- a/net/tipc/discover.c
>+++ b/net/tipc/discover.c
>@@ -58,6 +58,7 @@
> * @skb: request message to be (repeatedly) sent
> * @timer: timer governing period between requests
> * @timer_intv: current interval between requests (in ms)
>+ * @rcu: RCU head for deferred freeing
> */
> struct tipc_discoverer {
> u32 bearer_id;
>@@ -69,6 +70,7 @@ struct tipc_discoverer {
> struct sk_buff *skb;
> struct timer_list timer;
> unsigned long timer_intv;
>+ struct rcu_head rcu;
> };
>
> /**
>@@ -382,6 +384,15 @@ int tipc_disc_create(struct net *net, struct tipc_bearer
>*b,
> return 0;
> }
>
>+static void tipc_disc_free_rcu(struct rcu_head *rp) {
>+ struct tipc_discoverer *d =
>+ container_of(rp, struct tipc_discoverer, rcu);
>+
>+ kfree_skb(d->skb);
>+ kfree(d);
>+}
>+
> /**
> * tipc_disc_delete - destroy object sending periodic link setup requests
> * @d: ptr to link dest structure
>@@ -389,8 +400,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer
>*b, void tipc_disc_delete(struct tipc_discoverer *d) {
> timer_shutdown_sync(&d->timer);
>- kfree_skb(d->skb);
>- kfree(d);
>+ call_rcu(&d->rcu, tipc_disc_free_rcu);
> }
>
> /**
>--
>2.43.0
^ permalink raw reply
* Re: [PATCH net] ice: eswitch: fix use-after-free of metadata_dst in repr release
From: Simon Horman @ 2026-06-17 8:47 UTC (permalink / raw)
To: Doruk Tan Ozturk
Cc: anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev, davem,
edumazet, kuba, pabeni, piotr.raczynski, michal.swiatkowski,
wojciech.drewek, intel-wired-lan, netdev, linux-kernel, stable
In-Reply-To: <20260615140532.52676-1-doruk@0sec.ai>
On Mon, Jun 15, 2026 at 04:05:32PM +0200, Doruk Tan Ozturk wrote:
> ice_eswitch_release_repr() frees the port representor metadata_dst via
> metadata_dst_free(), which directly kfree()s the object and ignores the
> dst_entry refcount. The eswitch slow-path TX routine
> ice_eswitch_port_start_xmit() takes a reference on this dst with
> dst_hold() and attaches it to the skb via skb_dst_set(). If such an skb
> is still in flight (e.g. queued in a qdisc) when the representor is torn
> down, the metadata_dst is freed while the skb still points at it. When
> the skb is later freed, dst_release() operates on already-freed memory.
>
> Replace metadata_dst_free() with dst_release() so the metadata_dst is
> freed only after the last reference is dropped. The dst subsystem frees
> metadata_dst objects from dst_destroy() once the refcount reaches zero
> (DST_METADATA is set by metadata_dst_alloc()).
>
> Same class of bug and fix as commit c32b26aaa2f9 ("netfilter:
> nft_tunnel: fix use-after-free on object destroy").
I think that the commit cited above moves the code in question around
but did not introduce the call to dst_release. And I think that this
bug goes back to when switchdev support was added.
I would suggest:
Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment")
> Cc: stable@vger.kernel.org
> Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
Otherwise, this looks good to me.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* [PATCH iwl-next v1] ixgbe: Implement PCI reset handler
From: Sergey Temerkhanov @ 2026-06-17 8:43 UTC (permalink / raw)
To: intel-wired-lan; +Cc: netdev
Implement PCI device reset handler to allow the network device to
get re-initialized and function after a PCI-level reset.
Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 1 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 72 +++++++++++++++++++
2 files changed, 73 insertions(+)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 594ccb28da20..c4b0c5bb89c6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -912,6 +912,7 @@ enum ixgbe_state_t {
__IXGBE_PTP_TX_IN_PROGRESS,
__IXGBE_RESET_REQUESTED,
__IXGBE_PHY_INIT_COMPLETE,
+ __IXGBE_PCIE_RESET_IN_PROGRESS,
};
struct ixgbe_cb {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2ac274c73d61..a61ee5fff7be 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -12352,6 +12352,76 @@ static pci_ers_result_t ixgbe_io_slot_reset(struct pci_dev *pdev)
return result;
}
+#define IXGBE_PCIE_RESET_RETRIES 1000
+
+/**
+ * ixgbe_reset_prep - called before the pci bus is reset.
+ * @pdev: Pointer to PCI device
+ *
+ * Prepare the card for a reset, preventing the service task from running.
+ */
+static void ixgbe_reset_prep(struct pci_dev *pdev)
+{
+ struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
+ unsigned int timeout = IXGBE_PCIE_RESET_RETRIES;
+
+ if (!adapter)
+ return;
+
+ /* Prevent the service task from being requeued in the timer callback
+ * while we're resetting.
+ */
+ if (test_bit(__IXGBE_SERVICE_INITED, &adapter->state)) {
+ timer_delete_sync(&adapter->service_timer);
+ /* Prevent the service task from running while we're resetting. */
+ cancel_work_sync(&adapter->service_task);
+ }
+
+ pci_clear_master(pdev);
+
+ while (test_and_set_bit(__IXGBE_RESETTING, &adapter->state) && --timeout)
+ usleep_range(1000, 2000);
+
+ if (!timeout) {
+ e_err(drv, "Timed out waiting for __IXGBE_RESETTING to be released. Reset is needed\n");
+ pci_set_master(pdev);
+ return;
+ }
+
+ set_bit(__IXGBE_PCIE_RESET_IN_PROGRESS, &adapter->state);
+ smp_mb__after_atomic();
+}
+
+/**
+ * ixgbe_reset_done - called after the pci bus has been reset.
+ * @pdev: Pointer to PCI device
+ *
+ * Allow the service task to run and schedule re-initialization.
+ */
+static void ixgbe_reset_done(struct pci_dev *pdev)
+{
+ struct ixgbe_adapter *adapter = pci_get_drvdata(pdev);
+
+ smp_mb__before_atomic();
+ if (!test_and_clear_bit(__IXGBE_PCIE_RESET_IN_PROGRESS, &adapter->state)) {
+ e_err(drv, "Reset done called without PCIe reset in progress\n");
+ return;
+ }
+
+ /* Allow the service task to run */
+ if (!test_bit(__IXGBE_REMOVING, &adapter->state)) {
+ clear_bit(__IXGBE_RESETTING, &adapter->state);
+ smp_mb__after_atomic();
+ }
+
+ /* Schedule re-initialization */
+ if (!test_bit(__IXGBE_DOWN, &adapter->state)) {
+ set_bit(__IXGBE_RESET_REQUESTED, &adapter->state);
+ if (test_bit(__IXGBE_SERVICE_INITED, &adapter->state))
+ mod_timer(&adapter->service_timer, jiffies + 1);
+ }
+}
+
/**
* ixgbe_io_resume - called when traffic can start flowing again.
* @pdev: Pointer to PCI device
@@ -12384,6 +12454,8 @@ static const struct pci_error_handlers ixgbe_err_handler = {
.error_detected = ixgbe_io_error_detected,
.slot_reset = ixgbe_io_slot_reset,
.resume = ixgbe_io_resume,
+ .reset_prepare = ixgbe_reset_prep,
+ .reset_done = ixgbe_reset_done,
};
static DEFINE_SIMPLE_DEV_PM_OPS(ixgbe_pm_ops, ixgbe_suspend, ixgbe_resume);
base-commit: c50bfa9768ff3a5163746c6362a8a910a0b4dca0
--
2.53.0
^ permalink raw reply related
* Re: [GIT PULL] Networking for 7.2
From: pr-tracker-bot @ 2026-06-17 8:41 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: torvalds, kuba, davem, netdev, linux-kernel, pabeni
In-Reply-To: <20260617000705.931602-1-kuba@kernel.org>
The pull request you sent on Tue, 16 Jun 2026 17:07:05 -0700:
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git tags/net-next-7.2
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/b85966adbf5de0668a815c6e3527f87e0c387fb4
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
^ permalink raw reply
* Re: [PATCH v18 net-next 01/11] net/nebula-matrix: add minimum nbl build framework
From: Uwe Kleine-König @ 2026-06-17 8:40 UTC (permalink / raw)
To: illusion.wang
Cc: dimon.zhao, alvin.wang, sam.chen, netdev, andrew+netdev, corbet,
kuba, horms, linux-doc, pabeni, vadim.fedorenko, lukas.bulwahn,
edumazet, enelsonmoore, skhan, hkallweit1, open list
In-Reply-To: <20260611044916.2383-2-illusion.wang@nebula-matrix.com>
[-- Attachment #1: Type: text/plain, Size: 3848 bytes --]
On Thu, Jun 11, 2026 at 12:49:00PM +0800, illusion.wang wrote:
> +static int nbl_probe(struct pci_dev *pdev,
> + const struct pci_device_id *id)
> +{
> + return 0;
> +}
> +
> +static void nbl_remove(struct pci_dev *pdev)
> +{
> +}
> [...]
> +static const struct pci_device_id nbl_id_table[] = {
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18110),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18110_LX),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18110_BASE_T),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18110_LX_BASE_T),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18110_OCP),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18110_LX_OCP),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18110_BASE_T_OCP),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18110_LX_BASE_T_OCP),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18000),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18000_LX),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18000_BASE_T),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18000_LX_BASE_T),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18000_OCP),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18000_LX_OCP),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18000_BASE_T_OCP),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + { PCI_DEVICE(NBL_VENDOR_ID, NBL_DEVICE_ID_M18000_LX_BASE_T_OCP),
> + .driver_data = BIT(NBL_CAP_HAS_NET_BIT) | BIT(NBL_CAP_IS_NIC_BIT) |
> + BIT(NBL_CAP_IS_LEONIS_BIT) },
> + /* required as sentinel */
> + {
> + 0,
Please drop this zero. The most usual style is `{ }`.
> + }
> +};
> +MODULE_DEVICE_TABLE(pci, nbl_id_table);
> +
> +static struct pci_driver nbl_driver = {
> + .name = NBL_DRIVER_NAME,
> + .id_table = nbl_id_table,
> + .probe = nbl_probe,
> + .remove = nbl_remove,
> +};
The pci bus probe function has (pci_device_probe() ->
__pci_device_probe()):
int error = 0;
if (drv->probe) {
...
}
return error;
So given that the probe function does nothing apart from returning zero,
you can just drop .probe(). (There is an additional check against
.id_table, but I'm pretty sure that isn't relevant because
pci_bus_match() already makes sure that there is a match.) The same is
true for .remove().
Best regards
Uwe
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* [PATCH net] net: rnpgbe: fix mailbox endianness handling
From: Dong Yibo @ 2026-06-17 8:35 UTC (permalink / raw)
To: andrew+netdev, davem, edumazet, kuba, pabeni, vadim.fedorenko
Cc: netdev, linux-kernel, dong100, yaojun
Mailbox data is exchanged through 32-bit MMIO accesses but the
mailbox payload is defined using little-endian FW structures with
__le16 and __le32 fields.
The mailbox read/write helpers previously operated on raw u32
buffers without performing endian conversion. On big-endian
systems this causes mailbox payload fields to be byte-swapped in
memory, resulting in corrupted FW command and reply structures.
Convert mailbox data between CPU-endian MMIO values and the
little-endian mailbox wire format using cpu_to_le32() on reads and
le32_to_cpu() on writes.
Also switch the helper interfaces to use void */const void * since
the mailbox transport layer operates on opaque payload buffers
rather than native-endian u32 arrays.
Fixes: 4543534c3ef5 ("net: rnpgbe: Add basic mbx ops support")
Signed-off-by: Dong Yibo <dong100@mucse.com>
---
drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c | 16 ++++++++++------
drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h | 5 +++--
.../net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c | 7 +++----
3 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c
index de5e29230b3c..0fccfc49ffc7 100644
--- a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c
+++ b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.c
@@ -166,10 +166,12 @@ static void mucse_mbx_inc_pf_ack(struct mucse_hw *hw)
*
* Return: 0 on success, negative errno on failure
**/
-static int mucse_read_mbx_pf(struct mucse_hw *hw, u32 *msg, u16 size)
+static int mucse_read_mbx_pf(struct mucse_hw *hw, void *msg, u16 size)
{
const int size_in_words = size / sizeof(u32);
struct mucse_mbx_info *mbx = &hw->mbx;
+ int off = MUCSE_MBX_FWPF_SHM;
+ __le32 *msg_le32 = msg;
int err;
err = mucse_obtain_mbx_lock_pf(hw);
@@ -177,7 +179,7 @@ static int mucse_read_mbx_pf(struct mucse_hw *hw, u32 *msg, u16 size)
return err;
for (int i = 0; i < size_in_words; i++)
- msg[i] = mbx_data_rd32(mbx, MUCSE_MBX_FWPF_SHM + 4 * i);
+ msg_le32[i] = cpu_to_le32(mbx_data_rd32(mbx, off + 4 * i));
/* Hw needs write data_reg at last */
mbx_data_wr32(mbx, MUCSE_MBX_FWPF_SHM, 0);
/* flush reqs as we have read this request data */
@@ -236,7 +238,7 @@ static int mucse_poll_for_msg(struct mucse_hw *hw)
* Return: 0 if it successfully received a message notification and
* copied it into the receive buffer, negative errno on failure
**/
-int mucse_poll_and_read_mbx(struct mucse_hw *hw, u32 *msg, u16 size)
+int mucse_poll_and_read_mbx(struct mucse_hw *hw, void *msg, u16 size)
{
int err;
@@ -290,10 +292,11 @@ static void mucse_mbx_inc_pf_req(struct mucse_hw *hw)
* Return: 0 if it successfully copied message into the buffer,
* negative errno on failure
**/
-static int mucse_write_mbx_pf(struct mucse_hw *hw, u32 *msg, u16 size)
+static int mucse_write_mbx_pf(struct mucse_hw *hw, const void *msg, u16 size)
{
const int size_in_words = size / sizeof(u32);
struct mucse_mbx_info *mbx = &hw->mbx;
+ const __le32 *msg_le32 = msg;
int err;
err = mucse_obtain_mbx_lock_pf(hw);
@@ -301,7 +304,8 @@ static int mucse_write_mbx_pf(struct mucse_hw *hw, u32 *msg, u16 size)
return err;
for (int i = 0; i < size_in_words; i++)
- mbx_data_wr32(mbx, MUCSE_MBX_FWPF_SHM + i * 4, msg[i]);
+ mbx_data_wr32(mbx, MUCSE_MBX_FWPF_SHM + i * 4,
+ le32_to_cpu(msg_le32[i]));
/* flush acks as we are overwriting the message buffer */
hw->mbx.fw_ack = mucse_mbx_get_fwack(mbx);
@@ -360,7 +364,7 @@ static int mucse_poll_for_ack(struct mucse_hw *hw)
* Return: 0 if it successfully copied message into the buffer and
* received an ack to that message within delay * timeout_cnt period
**/
-int mucse_write_and_wait_ack_mbx(struct mucse_hw *hw, u32 *msg, u16 size)
+int mucse_write_and_wait_ack_mbx(struct mucse_hw *hw, const void *msg, u16 size)
{
int err;
diff --git a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h
index e6fcc8d1d3ca..25bfc97c24c0 100644
--- a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h
+++ b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx.h
@@ -14,7 +14,8 @@
#define MUCSE_MBX_REQ BIT(0) /* Request a req to mailbox */
#define MUCSE_MBX_PFU BIT(3) /* PF owns the mailbox buffer */
-int mucse_write_and_wait_ack_mbx(struct mucse_hw *hw, u32 *msg, u16 size);
+int mucse_write_and_wait_ack_mbx(struct mucse_hw *hw,
+ const void *msg, u16 size);
void mucse_init_mbx_params_pf(struct mucse_hw *hw);
-int mucse_poll_and_read_mbx(struct mucse_hw *hw, u32 *msg, u16 size);
+int mucse_poll_and_read_mbx(struct mucse_hw *hw, void *msg, u16 size);
#endif /* _RNPGBE_MBX_H */
diff --git a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c
index 8c8bd5e8e1db..2ac97915a098 100644
--- a/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c
+++ b/drivers/net/ethernet/mucse/rnpgbe/rnpgbe_mbx_fw.c
@@ -28,12 +28,11 @@ static int mucse_fw_send_cmd_wait_resp(struct mucse_hw *hw,
int err;
mutex_lock(&hw->mbx.lock);
- err = mucse_write_and_wait_ack_mbx(hw, (u32 *)req, len);
+ err = mucse_write_and_wait_ack_mbx(hw, req, len);
if (err)
goto out;
do {
- err = mucse_poll_and_read_mbx(hw, (u32 *)reply,
- sizeof(*reply));
+ err = mucse_poll_and_read_mbx(hw, reply, sizeof(*reply));
if (err)
goto out;
/* mucse_write_and_wait_ack_mbx return 0 means fw has
@@ -125,7 +124,7 @@ int mucse_mbx_powerup(struct mucse_hw *hw, bool is_powerup)
len = le16_to_cpu(req.datalen);
mutex_lock(&hw->mbx.lock);
- err = mucse_write_and_wait_ack_mbx(hw, (u32 *)&req, len);
+ err = mucse_write_and_wait_ack_mbx(hw, &req, len);
mutex_unlock(&hw->mbx.lock);
return err;
--
2.25.1
^ permalink raw reply related
* Re: [PATCH net] ipv6: ndisc: fix NULL deref in accept_untracked_na()
From: Jiayuan Chen @ 2026-06-17 8:32 UTC (permalink / raw)
To: Weiming Shi, David S . Miller, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, netdev, linux-kernel, Xiang Mei
In-Reply-To: <20260617065512.2529757-2-bestswngs@gmail.com>
On 6/17/26 2:55 PM, Weiming Shi wrote:
> accept_untracked_na() re-fetches the inet6_dev with __in6_dev_get(dev)
> and dereferences idev->cnf.accept_untracked_na without a NULL check,
Does ipv6_rpl_srh_rcv have same problem?
^ permalink raw reply
* Re: [REGRESSION 6.16] r8169 RTL8168h/8111h fails to probe — "Unable to change power state from D3cold to D0" — bisected to 4d4c10f763d7
From: Thorsten Leemhuis @ 2026-06-17 8:32 UTC (permalink / raw)
To: Josh Perry, mario.limonciello, bhelgaas
Cc: hkallweit1, nic_swsd, rafael, linux-pci, netdev, regressions
In-Reply-To: <d4aaa5e8-7366-461c-94b1-ccf3631c8bf9@6bit.com>
On 6/12/26 03:07, Josh Perry wrote:
> #regzbot introduced: 4d4c10f763d7
>
> Since v6.16 one of two onboard RTL8168h/8111h NICs on this board fails
> to probe on boot; the device drops to D3cold and the driver can't bring
> it back:
FWIW, that commit is 4d4c10f763d780 ("PCI: Explicitly put devices into
D0 when initializing") [v6.16-rc1] from Mario, who is already CCed, but
looks like might be on holiday or something due to inactivity on the
lists in the recent days. So it might take a few days before this moves on.
Josh, this is not my area of expertise, but there are two things I guess
might be helpful:
* retry with 7.1
* upload "dmesg" and "sudo lspci -vvv" output from working and broken
kernels somewhere (like bugzilla.kernel.org).
Ciao, Thorsten
> r8169 0000:02:00.0 eth0: RTL8168h/8111h, 00:2b:67:48:40:01, XID 541,
> IRQ 137
> r8169 0000:04:00.0: Unable to change power state from D3cold to D0,
> device inaccessible
> r8169 0000:04:00.0: Mem-Wr-Inval unavailable
> r8169 0000:04:00.0: error -EIO: PCI read failed
> r8169 0000:04:00.0: probe with driver r8169 failed with error -5
>
> The board has two identical RTL8168h NICs (both XID 541): 0000:02:00.0
> and 0000:04:00.0. Only 04:00.0 fails — its sibling 02:00.0, on a
> different root port, probes and works normally on the very same kernel
> and boot. The failing NIC then does not appear (no enp4s0), taking the
> machine's WAN offline. This strongly suggests the problem is port/
> topology-specific rather than device- or driver-specific: the upstream
> port behind 04:00.0 is placed in D3cold and the endpoint cannot be
> resumed to D0.
>
> Hardware: RTL8168h/8111h, XID 541, PCI 04:00.0 (onboard 1GbE).
> Platform: Lenovo ThinkCentre M90n-1 (11AHS0B200), BIOS M2AKT49A
> (2026-03-25, latest available). Firmware is current, so this is not a
> platform-firmware issue.
>
> Bisection: v6.15 good, v6.16 bad (verified by booting both). I then
> reverted 4d4c10f763d7 ("PCI: Explicitly put devices into D0 when
> initializing") together with its follow-up 907a7a2e5bf4 ("PCI/PM: Set up
> runtime PM even for devices without PCI PM") on top of 6.16.7: the NIC
> probes and links at 1Gbps/Full normally, with no workaround:
>
> r8169 0000:04:00.0 eth1: RTL8168h/8111h, 00:2b:67:48:40:02, XID 541,
> IRQ 138
> r8169 0000:04:00.0 enp4s0: Link is Up - 1Gbps/Full - flow control rx/tx
>
> Workaround: booting an unmodified v6.16+ kernel with pcie_port_pm=off
> also restores the NIC, which is consistent with the upstream port being
> placed in D3cold and the device failing to resume to D0 after the
> explicit-D0 init change.
>
> The follow-up 907a7a2e5bf4 does not fix this resume case: v6.18.33 is
> still affected (retested today on current firmware).
>
> Happy to test patches or provide full dmesg / lspci.
>
^ permalink raw reply
* Re: [PATCH net-next v6 1/2] dinghai: add ZTE network driver support
From: Uwe Kleine-König @ 2026-06-17 8:30 UTC (permalink / raw)
To: han.junyang
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, horms, linux-kernel,
netdev, ran.ming, han.chengfei, zhang.yanze
In-Reply-To: <20260616213057452I2KLm3mVgWYl_SUTy_YYS@zte.com.cn>
[-- Attachment #1: Type: text/plain, Size: 637 bytes --]
Hello,
On Tue, Jun 16, 2026 at 09:30:57PM +0800, han.junyang@zte.com.cn wrote:
> +static const struct pci_device_id dh_pf_pci_table[] = {
> + { PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_PF_DEVICE_ID), 0 },
> + { PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_VF_DEVICE_ID), 0 },
> + { 0, }
> +};
Please make this:
+static const struct pci_device_id dh_pf_pci_table[] = {
+ { PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_PF_DEVICE_ID) },
+ { PCI_DEVICE(ZXDH_PF_VENDOR_ID, ZXDH_VF_DEVICE_ID) },
+ { }
+};
(because the assignment to .driver_data is superflous and initializing
it using a list expression is in the way for one of my patch quests).
Best regards
Uwe
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* [PATCH 5.10.y] net: add missing ns_capable check for peer netns
From: Maximilian Heyne @ 2026-06-17 8:27 UTC (permalink / raw)
To: stable
Cc: Maximilian Heyne, Wolfgang Grandegger, Marc Kleine-Budde,
David S. Miller, Jakub Kicinski, Eric W. Biederman, Eric Dumazet,
linux-can, netdev, linux-kernel
The upstream commit 7b735ef81286 ("rtnetlink: add missing
netlink_ns_capable() check for peer netns") doesn't apply on older
stable kernels due to refactoring. Therefore, this patch is an attempt
to implement the same capability check just directly in the respective
interface types.
Approximate the netlink_ns_capable check with an ns_capable check. As
the newlink operation is synchronous this should result in the same
behavior.
Without this commit, for example, the following command creating a veth
device in network namespace of pid 1 succeeds:
$ unshare -U -r -n -- bash -c '
ip link add veth0 type veth peer name foobar netns 1
sleep 60' &
$ ip link show foobar
13: foobar@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:09:69:92:92:cc brd ff:ff:ff:ff:ff:ff link-netnsid 1
With this patch, it's returning -EPERM.
This fixes CVE-2026-31692
Cc: stable@vger.kernel.org
Fixes: 81adee47dfb6 ("net: Support specifying the network namespace upon device creation.")
Assisted-by: Kiro:claude
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
drivers/net/can/vxcan.c | 5 +++++
drivers/net/veth.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index 1bfede407270d..05fcbfacc3433 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -198,6 +198,11 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
if (IS_ERR(peer_net))
return PTR_ERR(peer_net);
+ if (!ns_capable(peer_net->user_ns, CAP_NET_ADMIN)) {
+ put_net(peer_net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(peer_net, ifname, name_assign_type,
&vxcan_link_ops, tbp, extack);
if (IS_ERR(peer)) {
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 743716ebebdb9..bda3add65c76e 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1341,6 +1341,11 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
if (IS_ERR(net))
return PTR_ERR(net);
+ if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+ put_net(net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(net, ifname, name_assign_type,
&veth_link_ops, tbp, extack);
if (IS_ERR(peer)) {
--
2.50.1
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related
* [PATCH 5.15.y] net: add missing ns_capable check for peer netns
From: Maximilian Heyne @ 2026-06-17 8:27 UTC (permalink / raw)
To: stable
Cc: Maximilian Heyne, Wolfgang Grandegger, Marc Kleine-Budde,
David S. Miller, Jakub Kicinski, Eric Dumazet, Eric W. Biederman,
linux-can, netdev, linux-kernel
The upstream commit 7b735ef81286 ("rtnetlink: add missing
netlink_ns_capable() check for peer netns") doesn't apply on older
stable kernels due to refactoring. Therefore, this patch is an attempt
to implement the same capability check just directly in the respective
interface types.
Approximate the netlink_ns_capable check with an ns_capable check. As
the newlink operation is synchronous this should result in the same
behavior.
Without this commit, for example, the following command creating a veth
device in network namespace of pid 1 succeeds:
$ unshare -U -r -n -- bash -c '
ip link add veth0 type veth peer name foobar netns 1
sleep 60' &
$ ip link show foobar
13: foobar@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:09:69:92:92:cc brd ff:ff:ff:ff:ff:ff link-netnsid 1
With this patch, it's returning -EPERM.
This fixes CVE-2026-31692
Cc: stable@vger.kernel.org
Fixes: 81adee47dfb6 ("net: Support specifying the network namespace upon device creation.")
Assisted-by: Kiro:claude
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
drivers/net/can/vxcan.c | 5 +++++
drivers/net/veth.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index afd9060c5421c..8a61011fdaeef 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -198,6 +198,11 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
if (IS_ERR(peer_net))
return PTR_ERR(peer_net);
+ if (!ns_capable(peer_net->user_ns, CAP_NET_ADMIN)) {
+ put_net(peer_net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(peer_net, ifname, name_assign_type,
&vxcan_link_ops, tbp, extack);
if (IS_ERR(peer)) {
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index cfacf8965bc59..c644d59d70900 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1664,6 +1664,11 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
if (IS_ERR(net))
return PTR_ERR(net);
+ if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+ put_net(net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(net, ifname, name_assign_type,
&veth_link_ops, tbp, extack);
if (IS_ERR(peer)) {
--
2.50.1
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related
* [PATCH 6.1.y] net: add missing ns_capable check for peer netns
From: Maximilian Heyne @ 2026-06-17 8:27 UTC (permalink / raw)
To: stable
Cc: Maximilian Heyne, Wolfgang Grandegger, Marc Kleine-Budde,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Eric W. Biederman, linux-can, netdev, linux-kernel
The upstream commit 7b735ef81286 ("rtnetlink: add missing
netlink_ns_capable() check for peer netns") doesn't apply on older
stable kernels due to refactoring. Therefore, this patch is an attempt
to implement the same capability check just directly in the respective
interface types.
Approximate the netlink_ns_capable check with an ns_capable check. As
the newlink operation is synchronous this should result in the same
behavior.
Without this commit, for example, the following command creating a veth
device in network namespace of pid 1 succeeds:
$ unshare -U -r -n -- bash -c '
ip link add veth0 type veth peer name foobar netns 1
sleep 60' &
$ ip link show foobar
13: foobar@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:09:69:92:92:cc brd ff:ff:ff:ff:ff:ff link-netnsid 1
With this patch, it's returning -EPERM.
This fixes CVE-2026-31692
Cc: stable@vger.kernel.org
Fixes: 81adee47dfb6 ("net: Support specifying the network namespace upon device creation.")
Assisted-by: Kiro:claude
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
drivers/net/can/vxcan.c | 5 +++++
drivers/net/veth.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index 98c669ad51414..da4affff65476 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -211,6 +211,11 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
if (IS_ERR(peer_net))
return PTR_ERR(peer_net);
+ if (!ns_capable(peer_net->user_ns, CAP_NET_ADMIN)) {
+ put_net(peer_net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(peer_net, ifname, name_assign_type,
&vxcan_link_ops, tbp, extack);
if (IS_ERR(peer)) {
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index e1e8c825483aa..dac8cc5a79f5a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1707,6 +1707,11 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
if (IS_ERR(net))
return PTR_ERR(net);
+ if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+ put_net(net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(net, ifname, name_assign_type,
&veth_link_ops, tbp, extack);
if (IS_ERR(peer)) {
--
2.50.1
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related
* [PATCH 6.6.y] net: add missing ns_capable check for peer netns
From: Maximilian Heyne @ 2026-06-17 8:26 UTC (permalink / raw)
To: stable
Cc: Maximilian Heyne, Wolfgang Grandegger, Marc Kleine-Budde,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Eric W. Biederman, linux-can, netdev, linux-kernel
The upstream commit 7b735ef81286 ("rtnetlink: add missing
netlink_ns_capable() check for peer netns") doesn't apply on older
stable kernels due to refactoring. Therefore, this patch is an attempt
to implement the same capability check just directly in the respective
interface types.
Approximate the netlink_ns_capable check with an ns_capable check. As
the newlink operation is synchronous this should result in the same
behavior.
Without this commit, for example, the following command creating a veth
device in network namespace of pid 1 succeeds:
$ unshare -U -r -n -- bash -c '
ip link add veth0 type veth peer name foobar netns 1
sleep 60' &
$ ip link show foobar
13: foobar@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:09:69:92:92:cc brd ff:ff:ff:ff:ff:ff link-netnsid 1
With this patch, it's returning -EPERM.
This fixes CVE-2026-31692
Cc: stable@vger.kernel.org
Fixes: 81adee47dfb6 ("net: Support specifying the network namespace upon device creation.")
Assisted-by: Kiro:claude
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
drivers/net/can/vxcan.c | 5 +++++
drivers/net/veth.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index 98c669ad51414..da4affff65476 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -211,6 +211,11 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
if (IS_ERR(peer_net))
return PTR_ERR(peer_net);
+ if (!ns_capable(peer_net->user_ns, CAP_NET_ADMIN)) {
+ put_net(peer_net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(peer_net, ifname, name_assign_type,
&vxcan_link_ops, tbp, extack);
if (IS_ERR(peer)) {
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 2b3b0beb55c88..ba4ca6c6bc9d8 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1857,6 +1857,11 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
if (IS_ERR(net))
return PTR_ERR(net);
+ if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+ put_net(net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(net, ifname, name_assign_type,
&veth_link_ops, tbp, extack);
if (IS_ERR(peer)) {
--
2.50.1
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related
* [PATCH 6.12.y] net: add missing ns_capable check for peer netns
From: Maximilian Heyne @ 2026-06-17 8:25 UTC (permalink / raw)
To: stable
Cc: Maximilian Heyne, Marc Kleine-Budde, Vincent Mailhol, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Daniel Borkmann, Nikolay Aleksandrov, Eric W. Biederman,
linux-can, netdev, linux-kernel, bpf
The upstream commit 7b735ef81286 ("rtnetlink: add missing
netlink_ns_capable() check for peer netns") doesn't apply on older
stable kernels due to refactoring. Therefore, this patch is an attempt
to implement the same capability check just directly in the respective
interface types.
Approximate the netlink_ns_capable check with an ns_capable check. As
the newlink operation is synchronous this should result in the same
behavior.
Without this commit, for example, the following command creating a veth
device in network namespace of pid 1 succeeds:
$ unshare -U -r -n -- bash -c '
ip link add veth0 type veth peer name foobar netns 1
sleep 60' &
$ ip link show foobar
13: foobar@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:09:69:92:92:cc brd ff:ff:ff:ff:ff:ff link-netnsid 1
With this patch, it's returning -EPERM.
This fixes CVE-2026-31692
Cc: stable@vger.kernel.org
Fixes: 81adee47dfb6 ("net: Support specifying the network namespace upon device creation.")
Assisted-by: Kiro:claude
Signed-off-by: Maximilian Heyne <mheyne@amazon.de>
---
drivers/net/can/vxcan.c | 5 +++++
drivers/net/netkit.c | 5 +++++
drivers/net/veth.c | 5 +++++
3 files changed, 15 insertions(+)
diff --git a/drivers/net/can/vxcan.c b/drivers/net/can/vxcan.c
index 9e1b7d41005f8..851c93bf0b310 100644
--- a/drivers/net/can/vxcan.c
+++ b/drivers/net/can/vxcan.c
@@ -211,6 +211,11 @@ static int vxcan_newlink(struct net *net, struct net_device *dev,
if (IS_ERR(peer_net))
return PTR_ERR(peer_net);
+ if (!ns_capable(peer_net->user_ns, CAP_NET_ADMIN)) {
+ put_net(peer_net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(peer_net, ifname, name_assign_type,
&vxcan_link_ops, tbp, extack);
if (IS_ERR(peer)) {
diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c
index fba2c734f0ec7..e0c42fa0c835c 100644
--- a/drivers/net/netkit.c
+++ b/drivers/net/netkit.c
@@ -413,6 +413,11 @@ static int netkit_new_link(struct net *src_net, struct net_device *dev,
if (IS_ERR(net))
return PTR_ERR(net);
+ if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+ put_net(net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(net, ifname, ifname_assign_type,
&netkit_link_ops, tbp, extack);
if (IS_ERR(peer)) {
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 77e4b0d1ca557..6ffde7ee2119d 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1854,6 +1854,11 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
if (IS_ERR(net))
return PTR_ERR(net);
+ if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
+ put_net(net);
+ return -EPERM;
+ }
+
peer = rtnl_create_link(net, ifname, name_assign_type,
&veth_link_ops, tbp, extack);
if (IS_ERR(peer)) {
--
2.50.1
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply related
* Re: [PATCH net v3] net: pch_gbe: handle TX skb allocation failure
From: Simon Horman @ 2026-06-17 8:24 UTC (permalink / raw)
To: Ruoyu Wang
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Masayuki Ohtake, netdev, linux-kernel
In-Reply-To: <20260615125043.3537046-1-ruoyuw560@gmail.com>
On Mon, Jun 15, 2026 at 08:50:42PM +0800, Ruoyu Wang wrote:
> pch_gbe_alloc_tx_buffers() allocates an skb for each TX descriptor and
> then passes the returned pointer to skb_reserve(). If netdev_alloc_skb()
> fails, skb_reserve() dereferences NULL.
>
> Make pch_gbe_alloc_tx_buffers() return an error when an skb allocation
> fails. On failure, let pch_gbe_alloc_tx_buffers() clean the partially
> allocated TX ring before returning the error. While bringing the device
> up, release the RX buffer pool through a shared cleanup helper before
> unwinding the IRQ setup.
>
> Fixes: 77555ee72282 ("net: Add Gigabit Ethernet driver of Topcliff PCH")
> Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
> ---
> Changes in v3:
> - Move the partial TX ring cleanup into pch_gbe_alloc_tx_buffers(), as
> suggested by Simon Horman.
>
> Changes in v2:
> - Add the kernel-doc return value description for
> pch_gbe_alloc_tx_buffers().
Thanks for the updates.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH v2] ice: retry reading NVM if admin queue returns EBUSY
From: Robert Malz @ 2026-06-17 8:11 UTC (permalink / raw)
To: Loktionov, Aleksandr
Cc: Nguyen, Anthony L, Kitszel, Przemyslaw,
intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org
In-Reply-To: <IA3PR11MB8986729EE79F3F3FBAAC68C9E5E42@IA3PR11MB8986.namprd11.prod.outlook.com>
(resend)
Hey Aleksandr,
Thanks for taking a look at this.
exit loop, just like in OOT, happens during:
> if (hw->adminq.sq_last_status != LIBIE_AQ_RC_EBUSY ||
> retry_cnt > ICE_SQ_SEND_MAX_EXECUTE)
> break;
And by the way, I have v3 ready, which I plan to send 24 hours after
the initial submission, it doesn't change any code but I want to keep
the netdev bots happy.
Thanks,
Robert
On Wed, Jun 17, 2026 at 9:47 AM Loktionov, Aleksandr
<aleksandr.loktionov@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> > Of Robert Malz via Intel-wired-lan
> > Sent: Wednesday, June 17, 2026 12:08 AM
> > To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> > Przemyslaw <przemyslaw.kitszel@intel.com>
> > Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org
> > Subject: [Intel-wired-lan] [PATCH v2] ice: retry reading NVM if
> > admin queue returns EBUSY
> >
> > When the admin queue command to read NVM returns EBUSY, the driver
> > currently treats it as a fatal error and aborts the entire read
> > operation. This can cause spurious NVM read failures during periods
> > of high firmware activity.
> >
> > Add retry logic to ice_read_flat_nvm() that handles EBUSY responses
> > from the admin queue. When an EBUSY error is encountered, release
> > the NVM resource lock, wait for ICE_SQ_SEND_DELAY_TIME_MS, re-
> > acquire it, and retry the failed read. The retry is attempted up to
> > ICE_SQ_SEND_MAX_EXECUTE times before giving up.
> >
> > Code was extracted from OOT ice driver 1.15.4 release. Additional
> > change was made to reset last_cmd in case of retry to make sure that
> > all commands are retried properly.
> >
> > Fixes: e94509906d6b ("ice: create function to read a section of the
> > NVM and Shadow RAM")
> > Signed-off-by: Robert Malz <robert.malz@canonical.com>
> > ---
> > Changes in v2:
> > - change ICE_AQ_RC_EBUSY -> LIBIE_AQ_RC_EBUSY
> >
> > drivers/net/ethernet/intel/ice/ice_nvm.c | 25 +++++++++++++++++++--
> > ---
> > 1 file changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c
> > b/drivers/net/ethernet/intel/ice/ice_nvm.c
> > index 7e187a804dfa..b3120605d66f 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_nvm.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
> > @@ -67,6 +67,7 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset,
> > u32 *length, u8 *data, {
> > u32 inlen = *length;
> > u32 bytes_read = 0;
> > + int retry_cnt = 0;
> > bool last_cmd;
> > int status;
> >
> > @@ -96,11 +97,25 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset,
> > u32 *length, u8 *data,
> > offset, read_size,
> > data + bytes_read, last_cmd,
> > read_shadow_ram, NULL);
> > - if (status)
> > - break;
> > -
> > - bytes_read += read_size;
> > - offset += read_size;
> > + if (status) {
> > + if (hw->adminq.sq_last_status !=
> > LIBIE_AQ_RC_EBUSY ||
> > + retry_cnt > ICE_SQ_SEND_MAX_EXECUTE)
> > + break;
> > + ice_debug(hw, ICE_DBG_NVM,
> > + "NVM read EBUSY error, retry %d\n",
> > + retry_cnt + 1);
> > + last_cmd = false;
> > + ice_release_nvm(hw);
> > + msleep(ICE_SQ_SEND_DELAY_TIME_MS);
> > + status = ice_acquire_nvm(hw, ICE_RES_READ);
> > + if (status)
> > + break;
> > + retry_cnt++;
> It looks like you added the retry_cnt increment but you didn't add it into the loop exit condition.
>
>
> > + } else {
> > + bytes_read += read_size;
> > + offset += read_size;
> > + retry_cnt = 0;
> > + }
> > } while (!last_cmd);
> >
> > *length = bytes_read;
> > --
> > 2.34.1
>
^ permalink raw reply
* Re: [PATCH] e1000: Remove redundant else after return
From: Lovekesh Solanki @ 2026-06-17 7:58 UTC (permalink / raw)
To: andrew
Cc: andrew+netdev, anthony.l.nguyen, davem, edumazet, kuba,
lovekeshsolanki00, netdev, pabeni, przemyslaw.kitszel
In-Reply-To: <ead7bfc9-3978-4442-9cd1-23c2182b36b3@lunn.ch>
Hi Andrew,
I read the documentation you linked and understand simple standalone
cleanups are discouraged.
Thanks for the review, I will drop this patch.
Regards,
Lovekesh
^ permalink raw reply
* [PATCH net v4] tipc: fix slab-use-after-free Read in tipc_aead_decrypt_done
From: Doruk Tan Ozturk @ 2026-06-17 7:58 UTC (permalink / raw)
To: jmaloy
Cc: davem, edumazet, kuba, pabeni, horms, aleksander.lobakin,
tung.quang.nguyen, tipc-discussion, netdev, linux-kernel,
Doruk Tan Ozturk, stable
tipc_aead_decrypt() goes straight from tipc_bearer_hold(b) to
crypto_aead_decrypt(req) without taking a reference on the netns, unlike
the encrypt path. When crypto_aead_decrypt() is offloaded asynchronously
(e.g. the SIMD aead wrapper queuing to cryptd), the cryptd worker runs
tipc_aead_decrypt_done() later. If the bearer's netns is torn down in the
meantime, cleanup_net() -> tipc_exit_net() -> tipc_crypto_stop() frees the
per-netns tipc_crypto, and the completion then reads it:
tipc_aead_decrypt_done() dereferences aead->crypto->stats and
aead->crypto->net, and tipc_crypto_rcv_complete() dereferences
aead->crypto->aead[] and the node table -- reading freed memory.
Decoded KASAN splat (v7.1-rc7, CONFIG_KASAN_INLINE + TIPC + TIPC_CRYPTO):
BUG: KASAN: slab-use-after-free in tipc_aead_decrypt_done (net/tipc/crypto.c:999)
Read of size 8 at addr ffff8881056258a8 by task kworker/u16:2/51
Workqueue: events_unbound
Call Trace:
tipc_aead_decrypt_done (net/tipc/crypto.c:999)
process_one_work (kernel/workqueue.c:3314)
worker_thread (kernel/workqueue.c:3397 kernel/workqueue.c:3478)
kthread (kernel/kthread.c:436)
ret_from_fork (arch/x86/kernel/process.c:158)
ret_from_fork_asm (arch/x86/entry/entry_64.S:245)
Allocated by task 169:
__kasan_kmalloc (mm/kasan/common.c:398 mm/kasan/common.c:415)
tipc_crypto_start (net/tipc/crypto.c:1502)
tipc_init_net (net/tipc/core.c:72)
ops_init (net/core/net_namespace.c:137)
setup_net (net/core/net_namespace.c:446)
copy_net_ns (net/core/net_namespace.c:579)
create_new_namespaces (kernel/nsproxy.c:132)
__x64_sys_unshare (kernel/fork.c:3316)
do_syscall_64 (arch/x86/entry/syscall_64.c:63)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
Freed by task 8:
kfree (mm/slub.c:6566)
tipc_exit_net (net/tipc/core.c:119)
cleanup_net (net/core/net_namespace.c:704)
process_one_work (kernel/workqueue.c:3314)
kthread (kernel/kthread.c:436)
This is the same class of bug that commit e279024617134 ("net/tipc: fix
slab-use-after-free Read in tipc_aead_encrypt_done") fixed for the encrypt
side. The encrypt path takes maybe_get_net(aead->crypto->net) before
crypto_aead_encrypt() and drops it with put_net() on the synchronous
return paths and in tipc_aead_encrypt_done(); the -EINPROGRESS/-EBUSY
return keeps the reference for the async callback to release. The decrypt
path was left without the equivalent guard.
Mirror the encrypt-side fix on the decrypt path: take a net reference
before crypto_aead_decrypt() (failing with -ENODEV and the matching
bearer put if it cannot be acquired), keep it across the
-EINPROGRESS/-EBUSY async return, and drop it with put_net() on the
synchronous success/error return and at the end of
tipc_aead_decrypt_done().
Reproduced under KASAN on v7.1-rc7: a UDP bearer with a cluster key is
flooded with crafted encrypted frames from an unknown peer (driving the
cluster-key decrypt path) while the bearer's netns is repeatedly torn
down. The completion must run asynchronously to outlive
tipc_crypto_stop(); on x86 the stock aesni gcm(aes) now decrypts
synchronously, so the async path was exercised via cryptd offload. The
unguarded aead->crypto dereference in tipc_aead_decrypt_done() is the
unpatched upstream path; tipc_aead_decrypt() still lacks
maybe_get_net(aead->crypto->net), so the completion can outlive the free
on any config where crypto_aead_decrypt() goes async.
Found by 0sec automated security-research tooling (https://0sec.ai).
Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication")
Cc: stable@vger.kernel.org
Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
---
v4:
- Use the net parameter for maybe_get_net()/put_net() instead of
dereferencing aead->crypto->net, which is the per-netns structure at
risk during teardown (per the automated review forwarded by Simon
Horman). net == aead->crypto->net here; no functional change.
v3:
- Rewrite the changelog with the decoded stack trace and frame the
reproduction on the current tree (v7.1-rc7); drop the v6.12.92
references (Tung Quang Nguyen).
v2:
- Add Cc: stable@vger.kernel.org and Alexander Lobakin's Reviewed-by.
No functional change.
net/tipc/crypto.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/tipc/crypto.c b/net/tipc/crypto.c
index 6d3b6b89b1d1..16f1ed1f6b1b 100644
--- a/net/tipc/crypto.c
+++ b/net/tipc/crypto.c
@@ -941,12 +941,20 @@ static int tipc_aead_decrypt(struct net *net, struct tipc_aead *aead,
goto exit;
}
+ /* Get net to avoid freed tipc_crypto when delete namespace */
+ if (!maybe_get_net(net)) {
+ tipc_bearer_put(b);
+ rc = -ENODEV;
+ goto exit;
+ }
+
/* Now, do decrypt */
rc = crypto_aead_decrypt(req);
if (rc == -EINPROGRESS || rc == -EBUSY)
return rc;
tipc_bearer_put(b);
+ put_net(net);
exit:
kfree(ctx);
@@ -984,6 +992,7 @@ static void tipc_aead_decrypt_done(void *data, int err)
}
tipc_bearer_put(b);
+ put_net(net);
}
static inline int tipc_ehdr_size(struct tipc_ehdr *ehdr)
--
2.43.0
^ permalink raw reply related
* RE: [Intel-wired-lan] [PATCH net v2] iavf: validate num_vsis in VIRTCHNL_OP_GET_VF_RESOURCES response
From: Romanowski, Rafal @ 2026-06-17 7:49 UTC (permalink / raw)
To: Simon Horman, Junrui Luo
Cc: Nguyen, Anthony L, Kitszel, Przemyslaw, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Mitch Williams, Greg Rose, intel-wired-lan@lists.osuosl.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Yuhao Jiang,
stable@vger.kernel.org
In-Reply-To: <20260518185611.GF98116@horms.kernel.org>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Simon
> Horman
> Sent: Monday, May 18, 2026 8:56 PM
> To: Junrui Luo <moonafterrain@outlook.com>
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Andrew Lunn <andrew+netdev@lunn.ch>;
> David S. Miller <davem@davemloft.net>; Eric Dumazet
> <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> <pabeni@redhat.com>; Mitch Williams <mitch.a.williams@intel.com>; Greg Rose
> <gregory.v.rose@intel.com>; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Yuhao Jiang
> <danisjiang@gmail.com>; stable@vger.kernel.org
> Subject: Re: [Intel-wired-lan] [PATCH net v2] iavf: validate num_vsis in
> VIRTCHNL_OP_GET_VF_RESOURCES response
>
> On Thu, May 14, 2026 at 02:55:04PM +0800, Junrui Luo wrote:
> > The VF allocates a fixed-size buffer for IAVF_MAX_VF_VSI (3) VSI
> > entries when processing a VIRTCHNL_OP_GET_VF_RESOURCES response from
> > the PF. However, num_vsis from the PF response is used unchecked as
> > the loop bound when iterating over vsi_res[] in multiple functions.
> >
> > A PF sending num_vsis greater than IAVF_MAX_VF_VSI, or the received
> > message is shorter than num_vsis claims leads to out-of-bounds
> > accesses on the vsi_res[] array.
> >
> > Clamp num_vsis based on the actual bytes copied from the PF response.
> >
> > Fixes: 5eae00c57f5e ("i40evf: main driver core")
> > Reported-by: Yuhao Jiang <danisjiang@gmail.com>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
> > ---
> > Changes in v2:
> > - Clamp num_vsis based on actual received message length instead of
> > IAVF_MAX_VF_VSI suggested by Przemek
> > - Link to v1:
> >
> https://lore.kernel.org/r/SYBPR01MB7881AF11C45AEDC0D4CA89C1AF062@SYB
> PR
> > 01MB7881.ausprd01.prod.outlook.com
>
> Reviewed-by: Simon Horman <horms@kernel.org>
>
> There is an AI-generated review of this patchset available on sashiko.dev.
> However, I believe that the issues raised there can be considered in the context of
> possible follow-up. I do not believe they should block progress of this patch.
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH v2] ice: retry reading NVM if admin queue returns EBUSY
From: Loktionov, Aleksandr @ 2026-06-17 7:47 UTC (permalink / raw)
To: Robert Malz, Nguyen, Anthony L, Kitszel, Przemyslaw
Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org
In-Reply-To: <20260616220827.1647052-1-robert.malz@canonical.com>
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Robert Malz via Intel-wired-lan
> Sent: Wednesday, June 17, 2026 12:08 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH v2] ice: retry reading NVM if
> admin queue returns EBUSY
>
> When the admin queue command to read NVM returns EBUSY, the driver
> currently treats it as a fatal error and aborts the entire read
> operation. This can cause spurious NVM read failures during periods
> of high firmware activity.
>
> Add retry logic to ice_read_flat_nvm() that handles EBUSY responses
> from the admin queue. When an EBUSY error is encountered, release
> the NVM resource lock, wait for ICE_SQ_SEND_DELAY_TIME_MS, re-
> acquire it, and retry the failed read. The retry is attempted up to
> ICE_SQ_SEND_MAX_EXECUTE times before giving up.
>
> Code was extracted from OOT ice driver 1.15.4 release. Additional
> change was made to reset last_cmd in case of retry to make sure that
> all commands are retried properly.
>
> Fixes: e94509906d6b ("ice: create function to read a section of the
> NVM and Shadow RAM")
> Signed-off-by: Robert Malz <robert.malz@canonical.com>
> ---
> Changes in v2:
> - change ICE_AQ_RC_EBUSY -> LIBIE_AQ_RC_EBUSY
>
> drivers/net/ethernet/intel/ice/ice_nvm.c | 25 +++++++++++++++++++--
> ---
> 1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c
> b/drivers/net/ethernet/intel/ice/ice_nvm.c
> index 7e187a804dfa..b3120605d66f 100644
> --- a/drivers/net/ethernet/intel/ice/ice_nvm.c
> +++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
> @@ -67,6 +67,7 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset,
> u32 *length, u8 *data, {
> u32 inlen = *length;
> u32 bytes_read = 0;
> + int retry_cnt = 0;
> bool last_cmd;
> int status;
>
> @@ -96,11 +97,25 @@ ice_read_flat_nvm(struct ice_hw *hw, u32 offset,
> u32 *length, u8 *data,
> offset, read_size,
> data + bytes_read, last_cmd,
> read_shadow_ram, NULL);
> - if (status)
> - break;
> -
> - bytes_read += read_size;
> - offset += read_size;
> + if (status) {
> + if (hw->adminq.sq_last_status !=
> LIBIE_AQ_RC_EBUSY ||
> + retry_cnt > ICE_SQ_SEND_MAX_EXECUTE)
> + break;
> + ice_debug(hw, ICE_DBG_NVM,
> + "NVM read EBUSY error, retry %d\n",
> + retry_cnt + 1);
> + last_cmd = false;
> + ice_release_nvm(hw);
> + msleep(ICE_SQ_SEND_DELAY_TIME_MS);
> + status = ice_acquire_nvm(hw, ICE_RES_READ);
> + if (status)
> + break;
> + retry_cnt++;
It looks like you added the retry_cnt increment but you didn't add it into the loop exit condition.
> + } else {
> + bytes_read += read_size;
> + offset += read_size;
> + retry_cnt = 0;
> + }
> } while (!last_cmd);
>
> *length = bytes_read;
> --
> 2.34.1
^ permalink raw reply
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: John Ogness @ 2026-06-17 7:42 UTC (permalink / raw)
To: Breno Leitao, Sebastian Andrzej Siewior, pmladek
Cc: Jakub Kicinski, Petr Mladek, Sergey Senozhatsky, Peter Zijlstra,
Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Clark Williams,
Steven Rostedt, linux-rt-devel, linux-kernel, stable,
Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <ajF5S0uY-8F0jzoh@gmail.com>
On 2026-06-16, Breno Leitao <leitao@debian.org> wrote:
>> So this is not an issue since commit 7eab73b18630e ("netconsole: convert
>> to NBCON console infrastructure"). Because from here now on writes are
>> deferred to the nbcon thread. So this purely about -stable in this case.
>
> Does the nbcon thread handle defer even for consoles that support atomic
> operations?
The all "printk deferred" variants have zero effect on nbcon
drivers. The "printk deferred" variants exist purely as duct tape for
legacy console drivers.
If nbcon drivers provide a safe write_atomic(), they will _always_ write
synchronously when the CPU is in an emergency state. Otherwise nbcon
drivers _always_ defer to their dedicated console printing kthread and
there they use the write_thread() callback.
> netconsole is marked with CON_NBCON_ATOMIC_UNSAFE, which means it rarely
> performs inline/direct printk and instead pushes to the thread, which
> flushes in a safe context.
CON_NBCON_ATOMIC_UNSAFE means it _never_ performs inline/direct printk
console writing. That flags means that in panic, at the _very_ end, just
before going into an infinite nop loop, the CON_NBCON_ATOMIC_UNSAFE
consoles will be flushed directly from the panic context.
> For drivers that behave correctly, I'd like to be able to drop
> CON_NBCON_ATOMIC_UNSAFE, potentially setting it at runtime based on the
> underlying driver capabilities. If netconsole is backed by a well-behaving
> network driver, we could eventually remove the flag (!?)
>
> Would that approach cause any issues?
Removing the flag means the driver can safely write from _any_ context
(including scheduler and NMI), regardless what locks that context may be
holding.
Note that the nbcon framework allows console drivers to mark unsafe
regions in themselves, where atomic writing would not be possible. In
such scenarios, it defers to the dedicated printing kthread (except
during panic, where more agressive tactics are used).
John Ogness
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox