* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
From: Zhou, Yun @ 2026-06-18 9:14 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
clrkwllms, rostedt, netdev, linux-kernel, linux-rt-devel
In-Reply-To: <20260618083952.IbGzrvJL@linutronix.de>
On 6/18/26 16:39, Sebastian Andrzej Siewior wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> On 2026-06-17 17:20:28 [+0800], Yun Zhou wrote:
>> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
> There is also the `threadirqs' option.
>
>> network traffic, the IRQ can enter suspend with desc->depth == 1
>> (masked by the oneshot mechanism between handler invocations).
> That would be irq_desc::depth.
>
>> During suspend, the kernel increments depth to 2 and masks the
>> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
>> due to IRQCHIP_MASK_ON_SUSPEND).
> The interrupt should be masked while the depth counter goes 0->1, no?
>
>> On resume, depth is decremented
>> back to 1, but since it does not reach 0, the unmask is never
>> called. The MPIC CPU routing remains cleared, permanently disabling
>> interrupt delivery.
> But why not? In my naive assumption, we get into suspend with
> irq_desc::depth = 2 and the threaded should be woken up. Once the
> treaded handler is done the counter should decrement by one. Then again
> during resume reaching 0 leading to the unmask. If the thread handler is
> frozen and defrosted on resume then it should still happen but in
> different order.
>
> Something is missing here based on my naive assumption.
>
>> Fix by freeing the IRQ in suspend and re-requesting it in resume.
>> This ensures a clean IRQ state (depth=0, proper hardware routing)
>> on every resume cycle, regardless of the pre-suspend depth. This
>> follows the approach used by other drivers (e.g. igb).
> The igb shutdowns the device entirely, not just freeing the IRQ.
You are right. The original analysis was wrong — mvneta uses
request_percpu_irq() which sets IRQF_NO_SUSPEND, so the PM framework
never touches this IRQ. The depth never changes from 1.
The actual root cause is simpler: mvneta_percpu_isr() calls
disable_percpu_irq() before scheduling NAPI, and enable_percpu_irq()
is called in napi_complete_done(). If suspend hits during active NAPI
polling, the MPIC percpu IRQ stays masked after resume because
mvneta_start_dev() doesn't restore it.
Will send a v3 with the correct one-liner fix (enable_percpu_irq in
the resume path). Apologies for the incorrect analysis.
BR,
Yun
^ permalink raw reply
* Re: [PATCH bpf] bpf: zero-initialize the fib lookup flow struct
From: Toke Høiland-Jørgensen @ 2026-06-18 9:13 UTC (permalink / raw)
To: Avinash Duduskar, ast, daniel, andrii
Cc: bpf, davem, dsahern, eddyz87, edumazet, emil, horms,
john.fastabend, jolsa, kuba, linux-kernel, martin.lau, memxor,
netdev, pabeni, sdf, song, yonghong.song
In-Reply-To: <20260617224719.1428599-1-avinash.duduskar@gmail.com>
Avinash Duduskar <avinash.duduskar@gmail.com> writes:
> bpf_ipv4_fib_lookup() and bpf_ipv6_fib_lookup() build the flow key on
> the stack with a bare "struct flowi4 fl4;" / "struct flowi6 fl6;" and
> fill it field by field, but never set flowi4_l3mdev / flowi6_l3mdev.
>
> On the non-DIRECT path the lookup goes through the fib rules whenever the
> netns has custom rules, which a VRF installs:
>
> bpf_ipv4_fib_lookup() -> fib_lookup() -> __fib_lookup()
> -> l3mdev_update_flow() reads !fl->flowi_l3mdev
> -> fib_rules_lookup() -> fib_rule_match()
> -> l3mdev_fib_rule_match() uses fl->flowi_l3mdev
>
> l3mdev_update_flow() resolves the l3mdev master from the ingress device
> only while the field is still zero. Left at a nonzero stack value the
> resolution is skipped, and l3mdev_fib_rule_match() then tests that value
> as an ifindex, so the VRF master is not resolved and the rule fails to
> match: an ingress enslaved to a VRF can fail to select its table. FIB
> rules matching on an L3 master device (l3mdev_fib_rule_iif_match()/
> _oif_match()) read the same value, so an "ip rule iif/oif <vrf>"
> mismatches the same way.
>
> Zero-initialize the whole flow struct rather than adding one more
> field assignment, so any flowi field added later is covered too.
> ip_route_input_slow() likewise zeroes the field before its input lookup.
>
> CONFIG_INIT_STACK_ALL_ZERO masks this by default, but it depends on
> compiler support (CC_HAS_AUTO_VAR_INIT_ZERO), so INIT_STACK_NONE builds,
> including older toolchains that fall back to it, are exposed. Built with
> INIT_STACK_ALL_PATTERN, a plain bpf_fib_lookup (no VLAN, no DIRECT) over a
> VRF slave whose destination is routed only in the VRF table returns
> BPF_FIB_LKUP_RET_NOT_FWDED, and resolves with this patch. On the default
> config the lookup succeeds either way, so ordinary testing does not catch
> the bug.
>
> Fixes: 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
> Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
^ permalink raw reply
* Re: [PATCH net] netconsole: don't drop the last byte of a full-sized message
From: Simon Horman @ 2026-06-18 9:13 UTC (permalink / raw)
To: Breno Leitao
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, linux-kernel, asantostc, gustavold,
kernel-team
In-Reply-To: <20260616-max_print_chunk-v1-1-8dc125d67083@debian.org>
On Tue, Jun 16, 2026 at 09:09:52AM -0700, Breno Leitao wrote:
> nt->buf is exactly MAX_PRINT_CHUNK bytes, but scnprintf() reserves one
> byte for its NUL terminator, so a non-fragmented payload of exactly
> MAX_PRINT_CHUNK loses its last byte (emitted as a stray NUL in the
> release path). Grow nt->buf to MAX_PRINT_CHUNK + 1 and bound the
> scnprintf() calls with sizeof(nt->buf); the transmitted length stays
> capped at MAX_PRINT_CHUNK.
>
> Alternatively, nt->buf could be left at MAX_PRINT_CHUNK and the NUL byte
> reserved by routing exactly-MAX_PRINT_CHUNK payloads to fragmentation
> ('len < MAX_PRINT_CHUNK'), at the cost of fragmenting those messages.
> But it would look less sane, thus the current approach.
>
> Fixes: c62c0a17f9b7 ("netconsole: Append kernel version to message")
> Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* Re: [PATCH net] net: ethernet: ti: icssg: guard PA stat lookups
From: Simon Horman @ 2026-06-18 9:10 UTC (permalink / raw)
To: Philippe Schenker
Cc: netdev, Philippe Schenker, danishanwar, rogerq, linux-arm-kernel,
stable, Andrew Lunn, David Carlier, David S. Miller, Eric Dumazet,
Jacob Keller, Jakub Kicinski, Kevin Hao, Meghana Malladi,
Paolo Abeni, Vadim Fedorenko, linux-kernel
In-Reply-To: <20260616143642.1972071-1-dev@pschenker.ch>
On Tue, Jun 16, 2026 at 04:35:34PM +0200, Philippe Schenker wrote:
> From: Philippe Schenker <philippe.schenker@impulsing.ch>
>
> icssg_ndo_get_stats64() unconditionally calls emac_get_stat_by_name()
> with FW PA stat names regardless of whether the PA stats block is
> present on the hardware. emac_get_stat_by_name() already guards the
> PA stats lookup with `if (emac->prueth->pa_stats)`; when that pointer
> is NULL the lookup falls through to netdev_err() and returns -EINVAL.
> Because ndo_get_stats64 is polled regularly by the networking stack
> this produces thousands of log entries of the form:
>
> icssg-prueth icssg1-eth end0: Invalid stats FW_RX_ERROR
>
> A secondary consequence is that the int(-EINVAL) return value is
> implicitly widened to a near-ULLONG_MAX unsigned value when accumulated
> into the __u64 fields of rtnl_link_stats64, silently corrupting the
> rx_errors, rx_dropped and tx_dropped counters reported by `ip -s link`.
>
> Every other PA-aware code path in the driver is already guarded with
> the same `if (emac->prueth->pa_stats)` check. Apply the same guard
> here.
>
> Fixes: 0d15a26b247d ("net: ti: icssg-prueth: Add ICSSG FW Stats")
nit: no blank line between tags
>
> Signed-off-by: Philippe Schenker <philippe.schenker@impulsing.ch>
>
> Cc: danishanwar@ti.com
> Cc: rogerq@kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
From: Zhou, Yun @ 2026-06-18 9:03 UTC (permalink / raw)
To: Maxime Chevallier, marcin.s.wojtas, andrew+netdev, davem,
edumazet, kuba, pabeni, bigeasy, clrkwllms, rostedt
Cc: netdev, linux-kernel, linux-rt-devel
In-Reply-To: <95249596-5f05-421c-9c8a-693c7b26c4f6@bootlin.com>
On 6/17/26 20:49, Maxime Chevallier wrote:
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> Hi,
>
> On 6/17/26 11:20, Yun Zhou wrote:
>> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
>> network traffic, the IRQ can enter suspend with desc->depth == 1
>> (masked by the oneshot mechanism between handler invocations).
>>
>> During suspend, the kernel increments depth to 2 and masks the
>> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
>> due to IRQCHIP_MASK_ON_SUSPEND). On resume, depth is decremented
>> back to 1, but since it does not reach 0, the unmask is never
>> called. The MPIC CPU routing remains cleared, permanently disabling
>> interrupt delivery.
>>
>> Fix by freeing the IRQ in suspend and re-requesting it in resume.
>> This ensures a clean IRQ state (depth=0, proper hardware routing)
>> on every resume cycle, regardless of the pre-suspend depth. This
>> follows the approach used by other drivers (e.g. igb).
> This description makes it sound like it's not really a mvneta problem,
> but rather a broader effect from preempt-rt / irq management / suspend
> interactions.
>
> Is this the expected way to deal with that ?
>
You were right to question this. After deeper investigation, I found
that the original analysis was incorrect.
The real root cause is entirely within the mvneta driver:
mvneta_percpu_isr() calls disable_percpu_irq() to mask the MPIC percpu
IRQ before scheduling NAPI. The corresponding enable_percpu_irq() is
called in napi_complete_done(). If suspend occurs during active NAPI
polling (between disable and enable), the MPIC percpu IRQ remains
masked after resume — mvneta_start_dev() only restores the NIC-level
INTR_NEW_MASK register, not the irqchip-level per-CPU mask.
The fix is a one-liner: call on_each_cpu(mvneta_percpu_enable) in the
resume path to ensure the MPIC percpu IRQ is unmasked. I will send a
v3 with the correct fix and updated description.
The previous free_irq/request_irq approach happened to work as a
side-effect (request_percpu_irq → enable_percpu_irq restores the mask),
but it was fixing the symptom rather than the actual cause.
Thank you very much for your rigorous review,
Yun
^ permalink raw reply
* [PATCH v6.6-v6.1] netfilter: nf_tables: always walk all pending catchall elements
From: Shivani Agarwal @ 2026-06-18 8:34 UTC (permalink / raw)
To: stable, gregkh
Cc: pablo, fw, phil, davem, edumazet, kuba, pabeni, horms,
netfilter-devel, coreteam, netdev, linux-kernel, ajay.kaher,
alexey.makhalov, vamsi-krishna.brahmajosyula, yin.ding,
tapas.kundu, Yiming Qian, Sasha Levin, Shivani Agarwal
From: Florian Westphal <fw@strlen.de>
[ Upstream commit 7cb9a23d7ae40a702577d3d8bacb7026f04ac2a9 ]
During transaction processing we might have more than one catchall element:
1 live catchall element and 1 pending element that is coming as part of the
new batch.
If the map holding the catchall elements is also going away, its
required to toggle all catchall elements and not just the first viable
candidate.
Otherwise, we get:
WARNING: ./include/net/netfilter/nf_tables.h:1281 at nft_data_release+0xb7/0xe0 [nf_tables], CPU#2: nft/1404
RIP: 0010:nft_data_release+0xb7/0xe0 [nf_tables]
[..]
__nft_set_elem_destroy+0x106/0x380 [nf_tables]
nf_tables_abort_release+0x348/0x8d0 [nf_tables]
nf_tables_abort+0xcf2/0x3ac0 [nf_tables]
nfnetlink_rcv_batch+0x9c9/0x20e0 [..]
Fixes: 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
Reported-by: Yiming Qian <yimingqian591@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Shivani: Modified to apply on v6.6.y-v6.1.y ]
Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com>
---
net/netfilter/nf_tables_api.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 196ac4e76..0581f6479 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -620,7 +620,6 @@ static void nft_map_catchall_deactivate(const struct nft_ctx *ctx,
elem.priv = catchall->elem;
nft_setelem_data_deactivate(ctx->net, set, &elem);
- break;
}
}
@@ -5241,7 +5240,6 @@ static void nft_map_catchall_activate(const struct nft_ctx *ctx,
elem.priv = catchall->elem;
nft_setelem_data_activate(ctx->net, set, &elem);
- break;
}
}
--
2.53.0
^ permalink raw reply related
* RE: [EXTERNAL] [PATCH net v2] net: marvell: prestera: initialize err in prestera_port_sfp_bind
From: Elad Nachman @ 2026-06-18 8:55 UTC (permalink / raw)
To: Ruoyu Wang, Taras Chornyi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Russell King,
Oleksandr Mazur, Yevhen Orlov, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <20260617193228.1653582-1-ruoyuw560@gmail.com>
>
>
> From: Ruoyu Wang <ruoyuw560@gmail.com>
> Sent: Wednesday, June 17, 2026 10:32 PM
> To: Taras Chornyi <taras.chornyi@plvision.eu>; Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Russell King <linux@armlinux.org.uk>; Oleksandr Mazur <oleksandr.mazur@plvision.eu>; Yevhen Orlov <yevhen.orlov@plvision.eu>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [EXTERNAL] [PATCH net v2] net: marvell: prestera: initialize err in prestera_port_sfp_bind
>
> prestera_port_sfp_bind() returns err after walking the ports node. If no child node matches the port's front-panel id, err is never assigned. Initialize err to 0 because absence of a matching optional port device tree node is not an error. In
>
> prestera_port_sfp_bind() returns err after walking the ports node. If no
> child node matches the port's front-panel id, err is never assigned.
>
> Initialize err to 0 because absence of a matching optional port device
> tree node is not an error. In that case no phylink is created and port
> creation should continue with port->phy_link left NULL. Errors from
> malformed matched nodes and phylink_create() still propagate.
>
> Fixes: 52323ef75414 ("net: marvell: prestera: add phylink support")
> Signed-off-by: Ruoyu Wang <mailto:ruoyuw560@gmail.com>
> ---
> v2:
> - Add net tree target to the subject.
> - Explain why the no-match path returns 0 instead of -ENODEV.
>
> drivers/net/ethernet/marvell/prestera/prestera_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/marvell/prestera/prestera_main.c b/drivers/net/ethernet/marvell/prestera/prestera_main.c
> index 41e19e9ad28d4..a82e7a8029851 100644
> --- a/drivers/net/ethernet/marvell/prestera/prestera_main.c
> +++ b/drivers/net/ethernet/marvell/prestera/prestera_main.c
> @@ -373,7 +373,7 @@ static int prestera_port_sfp_bind(struct prestera_port *port)
> struct device_node *ports, *node;
> struct fwnode_handle *fwnode;
> struct phylink *phy_link;
> - int err;
> + int err = 0;
>
> if (!sw->np)
> return 0;
> --
> 2.51.0
>
prestera_port_sfp_bind() iterates only SFP ports.
Although all currently existing switch boards have at least one SFP uplink port,
In theory a manufacturer might produce a switch board without any SFP ports,
which will unnecessarily fail this function call, so for resolving this case indeed
err should be initialized to zero to make this function return 0 and not an error.
Acked-by: Elad Nachman <enachman@marvell.com>
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH 1/2] igc: Wait for MAC passthrough after reset
From: Ruinskiy, Dima @ 2026-06-18 8:51 UTC (permalink / raw)
To: Loktionov, Aleksandr, kao, acelan, Nguyen, Anthony L,
Kitszel, Przemyslaw
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, intel-wired-lan@lists.osuosl.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <IA3PR11MB8986B77F49DF672178FEE4BCE5E32@IA3PR11MB8986.namprd11.prod.outlook.com>
On 18/06/2026 10:55, Loktionov, Aleksandr wrote:
>
>
>> -----Original Message-----
>> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
>> Of Chia-Lin Kao (AceLan) via Intel-wired-lan
>> Sent: Thursday, June 18, 2026 9:33 AM
>> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
>> Przemyslaw <przemyslaw.kitszel@intel.com>
>> Cc: Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller
>> <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub
>> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; intel-
>> wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
>> kernel@vger.kernel.org
>> Subject: [Intel-wired-lan] [PATCH 1/2] igc: Wait for MAC passthrough
>> after reset
>>
>> Some systems support MAC passthrough for dock Ethernet controllers by
>> having firmware rewrite the receive address registers after the
>> controller reset completes.
>>
>> igc resets the controller before reading RAL0/RAH0, so that reset can
>> restore the controller native MAC address temporarily. If the driver
>> reads the registers immediately, it can race the firmware rewrite and
>> keep the native dock MAC instead of the host passthrough MAC.
>>
>> For LMVP devices, poll RAL0/RAH0 after reset and before reading the
>> MAC address. Stop once the address registers change to another valid
>> Ethernet address, allowing firmware a bounded window to complete the
>> passthrough update.
>>
> Good day, Chia-Lin
>
> It'd be great if you could share more details on how to reproduce the issue.
>
> What exact hardware setup is affected (dock model, NIC, system)?
> Which firmware/BIOS version?
> How often does the race trigger?
> Do you have a way to reliably reproduce it?
>
> Also, what is the observed behavior vs. expected behavior? For example,
> which MAC address is seen and which one should be used?
>
In addition to that - I would ask - when the race triggers - how much
wait time do you need to reliably resolve it (i.e., for the FW to have
completed the MAC update)?
Because 100 iterations of 100msec each - this translates to up-to 10
seconds, no?
The weak spot here is what if you are on an LMvP system where MAC
passthrough has not been enabled. You will always wait for the full 10
seconds after every reset until you give up and just continue with the
default MAC. Hardly desirable behavior.
We've implemented something like this in another driver at one point,
and the default polling timeout there is 1 second (which does not affect
the UX too much).
A better way may be using a FW interrupt to notify the driver when the
MAC address has been updated. The usability of this approach depends on
whether it is possible to update the MAC address up the stack after the
device has already been initialized. Does the framework support this?
Thanks,
Dima.
>
>> Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
>> ---
>> drivers/net/ethernet/intel/igc/igc_main.c | 48
>> +++++++++++++++++++++++
>> 1 file changed, 48 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>> b/drivers/net/ethernet/intel/igc/igc_main.c
>> index 2c9e2dfd8499..fa9752ed8bc5 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -11,6 +11,7 @@
>> #include <net/pkt_sched.h>
>> #include <linux/bpf_trace.h>
>> #include <net/xdp_sock_drv.h>
>> +#include <linux/etherdevice.h>
>> #include <linux/pci.h>
>> #include <linux/mdio.h>
>>
>> @@ -69,6 +70,52 @@ static const struct pci_device_id igc_pci_tbl[] = {
>>
>> MODULE_DEVICE_TABLE(pci, igc_pci_tbl);
>>
>> +static void igc_read_rar0(struct igc_hw *hw, u8 *addr, u32 *ral, u32
>> +*rah) {
>> + *ral = rd32(IGC_RAL(0));
>> + *rah = rd32(IGC_RAH(0));
>> +
>> + addr[0] = *ral & 0xff;
>> + addr[1] = (*ral >> 8) & 0xff;
>> + addr[2] = (*ral >> 16) & 0xff;
>> + addr[3] = (*ral >> 24) & 0xff;
>> + addr[4] = *rah & 0xff;
>> + addr[5] = (*rah >> 8) & 0xff;
>> +}
>> +
>> +static bool igc_is_lmvp_device(struct pci_dev *pdev) {
>> + switch (pdev->device) {
>> + case IGC_DEV_ID_I225_LMVP:
>> + case IGC_DEV_ID_I226_LMVP:
>> + return true;
>> + default:
>> + return false;
>> + }
>> +}
>> +
>> +static void igc_wait_for_lmvp_mac_passthrough(struct pci_dev *pdev,
>> + struct igc_hw *hw)
>> +{
>> + u8 addr[ETH_ALEN] __aligned(2);
>> + u32 orig_ral, orig_rah;
>> + u32 ral, rah;
>> + int i;
>> +
>> + if (!igc_is_lmvp_device(pdev))
>> + return;
>> +
>> + igc_read_rar0(hw, addr, &orig_ral, &orig_rah);
>> +
>> + for (i = 0; i < 100; i++) {
>> + msleep(100);
>> + igc_read_rar0(hw, addr, &ral, &rah);
>> + if ((ral != orig_ral || rah != orig_rah) &&
>> + is_valid_ether_addr(addr))
>> + return;
>> + }
>> +}
>> +
>> enum latency_range {
>> lowest_latency = 0,
>> low_latency = 1,
>> @@ -7259,6 +7306,7 @@ static int igc_probe(struct pci_dev *pdev,
>> * known good starting state
>> */
>> hw->mac.ops.reset_hw(hw);
>> + igc_wait_for_lmvp_mac_passthrough(pdev, hw);
>>
>> if (igc_get_flash_presence_i225(hw)) {
>> if (hw->nvm.ops.validate(hw) < 0) {
>> --
>> 2.53.0
>
^ permalink raw reply
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Sebastian Andrzej Siewior @ 2026-06-18 8:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Petr Mladek, Jakub Kicinski, John Ogness, Sergey Senozhatsky,
Vlad Poenaru, Thomas Gleixner, netdev, David S . Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Breno Leitao,
Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <20260617111504.GK49951@noisy.programming.kicks-ass.net>
On 2026-06-17 13:15:04 [+0200], Peter Zijlstra wrote:
>
> Can't we push all the legacy consoles into a single legacy kthread? I
> mean, converting all consoles is of course awesome, but should we really
> wait for that?
That would be
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 85fbf1801cbe0..c72f8d7027aee 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -27,11 +27,7 @@ int devkmsg_sysctl_set_loglvl(const struct ctl_table *table, int write,
* nbcon consoles have had their chance to print the panic messages
* first.
*/
-#ifdef CONFIG_PREEMPT_RT
# define force_legacy_kthread() (true)
-#else
-# define force_legacy_kthread() (false)
-#endif
#ifdef CONFIG_PRINTK
and if I remember correctly it was due to delayed CI output limited to
RT. But this does not fix stable down to 5.10 LTS.
Sebastian
^ permalink raw reply related
* RE: [Intel-wired-lan] [PATCH 1/2] igc: Wait for MAC passthrough after reset
From: Kwapulinski, Piotr @ 2026-06-18 8:49 UTC (permalink / raw)
To: kao, acelan, Nguyen, Anthony L, Kitszel, Przemyslaw
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, intel-wired-lan@lists.osuosl.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20260618073324.1843310-1-acelan.kao@canonical.com>
>-----Original Message-----
>From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Chia-Lin Kao (AceLan) via Intel-wired-lan
>Sent: Thursday, June 18, 2026 9:33 AM
>To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>
>Cc: Andrew Lunn <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
>Subject: [Intel-wired-lan] [PATCH 1/2] igc: Wait for MAC passthrough after reset
>
>Some systems support MAC passthrough for dock Ethernet controllers by having firmware rewrite the receive address registers after the controller reset completes.
>
>igc resets the controller before reading RAL0/RAH0, so that reset can restore the controller native MAC address temporarily. If the driver reads the registers immediately, it can race the firmware rewrite and keep the native dock MAC instead of the host passthrough MAC.
>
>For LMVP devices, poll RAL0/RAH0 after reset and before reading the MAC address. Stop once the address registers change to another valid Ethernet address, allowing firmware a bounded window to complete the passthrough update.
>
>Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
>---
> drivers/net/ethernet/intel/igc/igc_main.c | 48 +++++++++++++++++++++++
> 1 file changed, 48 insertions(+)
>
>diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
>index 2c9e2dfd8499..fa9752ed8bc5 100644
>--- a/drivers/net/ethernet/intel/igc/igc_main.c
>+++ b/drivers/net/ethernet/intel/igc/igc_main.c
>@@ -11,6 +11,7 @@
> #include <net/pkt_sched.h>
> #include <linux/bpf_trace.h>
> #include <net/xdp_sock_drv.h>
>+#include <linux/etherdevice.h>
> #include <linux/pci.h>
> #include <linux/mdio.h>
>
>@@ -69,6 +70,52 @@ static const struct pci_device_id igc_pci_tbl[] = {
>
> MODULE_DEVICE_TABLE(pci, igc_pci_tbl);
>
>+static void igc_read_rar0(struct igc_hw *hw, u8 *addr, u32 *ral, u32
>+*rah) {
>+ *ral = rd32(IGC_RAL(0));
>+ *rah = rd32(IGC_RAH(0));
>+
>+ addr[0] = *ral & 0xff;
>+ addr[1] = (*ral >> 8) & 0xff;
>+ addr[2] = (*ral >> 16) & 0xff;
>+ addr[3] = (*ral >> 24) & 0xff;
>+ addr[4] = *rah & 0xff;
>+ addr[5] = (*rah >> 8) & 0xff;
>+}
>+
>+static bool igc_is_lmvp_device(struct pci_dev *pdev) {
>+ switch (pdev->device) {
>+ case IGC_DEV_ID_I225_LMVP:
>+ case IGC_DEV_ID_I226_LMVP:
>+ return true;
>+ default:
>+ return false;
>+ }
>+}
>+
>+static void igc_wait_for_lmvp_mac_passthrough(struct pci_dev *pdev,
>+ struct igc_hw *hw)
>+{
>+ u8 addr[ETH_ALEN] __aligned(2);
>+ u32 orig_ral, orig_rah;
>+ u32 ral, rah;
>+ int i;
Hello AceLan
Please move ral, rah and 'i' right into the loop.
Thank you.
Piotr
>+
>+ if (!igc_is_lmvp_device(pdev))
>+ return;
>+
>+ igc_read_rar0(hw, addr, &orig_ral, &orig_rah);
>+
>+ for (i = 0; i < 100; i++) {
>+ msleep(100);
>+ igc_read_rar0(hw, addr, &ral, &rah);
>+ if ((ral != orig_ral || rah != orig_rah) &&
>+ is_valid_ether_addr(addr))
>+ return;
>+ }
>+}
>+
> enum latency_range {
> lowest_latency = 0,
> low_latency = 1,
>@@ -7259,6 +7306,7 @@ static int igc_probe(struct pci_dev *pdev,
> * known good starting state
> */
> hw->mac.ops.reset_hw(hw);
>+ igc_wait_for_lmvp_mac_passthrough(pdev, hw);
>
> if (igc_get_flash_presence_i225(hw)) {
> if (hw->nvm.ops.validate(hw) < 0) {
>--
>2.53.0
>
>
^ permalink raw reply
* Re: [PATCH bpf v3 1/2] bpf: Fix partial copy of non-linear test_run output
From: Paul Chaignon @ 2026-06-18 8:46 UTC (permalink / raw)
To: Sun Jian
Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, davem,
edumazet, kuba, pabeni, horms, shuah, hawk, john.fastabend, sdf,
toke, lorenzo
In-Reply-To: <20260617093557.63880-2-sun.jian.kdev@gmail.com>
On Wed, Jun 17, 2026 at 05:35:56PM +0800, Sun Jian wrote:
> For non-linear test_run output, bpf_test_finish() derives the linear
> data copy length from copy_size - frag_size. This only matches the
> linear data length when copy_size is the full packet size.
>
> When userspace provides a short data_out buffer, copy_size is clamped to
> that buffer size. If copy_size is smaller than frag_size, the computed
> length becomes negative and bpf_test_finish() returns -ENOSPC before
> copying the packet prefix or updating data_size_out.
>
> Compute the linear data length from the packet layout instead, and clamp
> the linear copy length to copy_size. This preserves the expected
> partial-copy semantics: return -ENOSPC, copy the packet prefix that fits
> in data_out, and report the full packet length through data_size_out.
>
> Fixes: 7855e0db150ad ("bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature")
> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
> net/bpf/test_run.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 2bc04feadfab..f15c613aaa4e 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -453,12 +453,8 @@ static int bpf_test_finish(const union bpf_attr *kattr,
> }
>
> if (data_out) {
> - int len = sinfo ? copy_size - frag_size : copy_size;
> -
> - if (len < 0) {
> - err = -ENOSPC;
> - goto out;
> - }
> + u32 head_len = size - frag_size;
> + u32 len = min(copy_size, head_len);
>
> if (copy_to_user(data_out, data, len))
> goto out;
Acked-by: Paul Chaignon <paul.chaignon@gmail.com>
^ permalink raw reply
* RE: [PATCH net v3] tipc: fix use-after-free of the discoverer in tipc_disc_rcv()
From: Tung Quang Nguyen @ 2026-06-18 8:45 UTC (permalink / raw)
To: Weiming Shi
Cc: Simon Horman, netdev@vger.kernel.org,
tipc-discussion@lists.sourceforge.net,
linux-kernel@vger.kernel.org, Xiang Mei, Jon Maloy,
David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
In-Reply-To: <20260617135744.3383175-3-bestswngs@gmail.com>
>Subject: [PATCH net v3] tipc: fix use-after-free of the discoverer in
>tipc_disc_rcv()
>
>bearer_disable() frees b->disc with tipc_disc_delete()'s plain kfree(), but
>tipc_disc_rcv() still dereferences b->disc in RX softirq under
>rcu_read_lock() (tipc_udp_recv -> tipc_rcv -> tipc_disc_rcv).
>
>L2 bearers are safe thanks to the synchronize_net() in tipc_disable_l2_media(),
>but the UDP bearer defers that call to the
>cleanup_bearer() workqueue, so the discoverer is freed with no grace
>period:
>
> BUG: KASAN: slab-use-after-free in tipc_disc_rcv (net/tipc/discover.c:149)
>Read of size 8 at addr ffff88802348b728 by task poc_tipc/184 <IRQ>
> tipc_disc_rcv (net/tipc/discover.c:149)
> tipc_rcv (net/tipc/node.c:2126)
> tipc_udp_recv (net/tipc/udp_media.c:391)
> udp_rcv (net/ipv4/udp.c:2643)
> ip_local_deliver_finish (net/ipv4/ip_input.c:241) </IRQ> Freed by task 181:
> kfree (mm/slub.c:6565)
> bearer_disable (net/tipc/bearer.c:418)
> tipc_nl_bearer_disable (net/tipc/bearer.c:1001)
>
>The bearer is freed with kfree_rcu(); free the discoverer the same way.
>Add an rcu_head to struct tipc_discoverer and free it and its skb from an RCU
>callback.
>
>Because the RCU callback (tipc_disc_free_rcu) lives in module text, a
>call_rcu() that is still pending when the tipc module is unloaded would invoke a
>freed function. Add an rcu_barrier() to tipc_exit() after the bearer subsystem
>has been torn down, so all pending discoverer callbacks have run before the
>module text goes away.
>
>Reachable from an unprivileged user namespace: the TIPCv2 genl family is
>netnsok and its bearer commands have no GENL_ADMIN_PERM. Needs
>CONFIG_TIPC and CONFIG_TIPC_MEDIA_UDP.
>
>Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash
>values")
>Reported-by: Xiang Mei <xmei5@asu.edu>
>Assisted-by: Claude:claude-opus-4-8
>Signed-off-by: Weiming Shi <bestswngs@gmail.com>
>---
>v3:
> - Reword the rcu_barrier() comment as a TODO (Tung Quang Nguyen).
>v2:
> - split the over-80-column container_of() line (Tung Quang Nguyen)
> - add rcu_barrier() to tipc_exit() so a pending call_rcu() cannot fire
> into freed module text after rmmod (Eric Dumazet)
>
> net/tipc/core.c | 5 +++++
> net/tipc/discover.c | 14 ++++++++++++--
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
>diff --git a/net/tipc/core.c b/net/tipc/core.c index
>434e70eabe08..1ddecea1df6e 100644
>--- a/net/tipc/core.c
>+++ b/net/tipc/core.c
>@@ -218,6 +218,11 @@ static void __exit tipc_exit(void)
> unregister_pernet_device(&tipc_net_ops);
> tipc_unregister_sysctl();
>
>+ /* TODO: Wait for all timers that called call_rcu() to finish before
>+ * calling rcu_barrier().
>+ */
>+ rcu_barrier();
>+
> pr_info("Deactivated\n");
> }
>
>diff --git a/net/tipc/discover.c b/net/tipc/discover.c index
>3e54d2df5683..b9d06595b067 100644
>--- a/net/tipc/discover.c
>+++ b/net/tipc/discover.c
>@@ -58,6 +58,7 @@
> * @skb: request message to be (repeatedly) sent
> * @timer: timer governing period between requests
> * @timer_intv: current interval between requests (in ms)
>+ * @rcu: RCU head for deferred freeing
> */
> struct tipc_discoverer {
> u32 bearer_id;
>@@ -69,6 +70,7 @@ struct tipc_discoverer {
> struct sk_buff *skb;
> struct timer_list timer;
> unsigned long timer_intv;
>+ struct rcu_head rcu;
> };
>
> /**
>@@ -382,6 +384,15 @@ int tipc_disc_create(struct net *net, struct tipc_bearer
>*b,
> return 0;
> }
>
>+static void tipc_disc_free_rcu(struct rcu_head *rp) {
>+ struct tipc_discoverer *d = container_of(rp, struct tipc_discoverer,
>+ rcu);
>+
>+ kfree_skb(d->skb);
>+ kfree(d);
>+}
>+
> /**
> * tipc_disc_delete - destroy object sending periodic link setup requests
> * @d: ptr to link dest structure
>@@ -389,8 +400,7 @@ int tipc_disc_create(struct net *net, struct tipc_bearer
>*b, void tipc_disc_delete(struct tipc_discoverer *d) {
> timer_shutdown_sync(&d->timer);
>- kfree_skb(d->skb);
>- kfree(d);
>+ call_rcu(&d->rcu, tipc_disc_free_rcu);
> }
>
> /**
>--
>2.43.0
>
Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
^ permalink raw reply
* Re: [PATCH bpf v3 2/2] selftests/bpf: Cover partial copy of non-linear test_run output
From: Paul Chaignon @ 2026-06-18 8:44 UTC (permalink / raw)
To: sun jian
Cc: bot+bpf-ci, bpf, netdev, linux-kselftest, linux-kernel, ast,
daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song,
jolsa, davem, edumazet, kuba, pabeni, horms, shuah, hawk,
john.fastabend, sdf, toke, lorenzo, martin.lau, clm,
ihor.solodrai
In-Reply-To: <CABFUUZFeh18OjQ6EhjD17ZwQKb1aiVNkYKv-hAkVGrHhPpjE4Q@mail.gmail.com>
On Wed, Jun 17, 2026 at 10:19:52PM +0800, sun jian wrote:
> On Wed, Jun 17, 2026 at 6:31 PM <bot+bpf-ci@kernel.org> wrote:
> >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> > > index 01f1d1b6715a..9cc898e6a9f7 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/prog_run_opts.c
> > > @@ -4,6 +4,10 @@
> > >
> > > #include "test_pkt_access.skel.h"
> > >
> > > +#define NONLINEAR_PKT_LEN 9000
> > > +#define NONLINEAR_LINEAR_DATA_LEN 64
> > > +#define SHORT_OUT_LEN 100
> > > +
> >
> > [ ... ]
> >
> > > @@ -20,6 +24,69 @@ static void check_run_cnt(int prog_fd, __u64 run_cnt)
> > > "incorrect number of repetitions, want %llu have %llu\n", run_cnt, info.run_cnt);
> > > }
> > >
> > > +static void init_pkt(__u8 *pkt, size_t len)
> > > +{
> > > + size_t i;
> > > +
> > > + for (i = 0; i < len; i++)
> > > + pkt[i] = i & 0xff;
> > > +}
> >
> > A question was raised on v2 about whether pkt_v4 could be reused by
> > reducing the linear area to ETH_HLEN, rather than introducing a custom
> > init_pkt() with a 9000-byte stack packet.
> >
> > Can't we reuse pkt_v4 here by reducing the linear area to ETH_HLEN?
> > The v3 still adds init_pkt() and the NONLINEAR_PKT_LEN packet, so this
> > doesn't seem to have been picked up.
> >
> > > +
> > > +static void test_skb_nonlinear_data_out_partial(struct test_pkt_access *skel)
> > > +{
> > > + LIBBPF_OPTS(bpf_test_run_opts, topts);
> > > + __u8 pkt[NONLINEAR_PKT_LEN];
> > > + __u8 out[SHORT_OUT_LEN];
> > > + struct __sk_buff skb = {};
> > > + int prog_fd, err;
> > > +
> > > + init_pkt(pkt, sizeof(pkt));
> > > +
> > > + skb.data_end = NONLINEAR_LINEAR_DATA_LEN;
> > > +
> > > + topts.data_in = pkt;
> > > + topts.data_size_in = sizeof(pkt);
> > > + topts.data_out = out;
> > > + topts.data_size_out = sizeof(out);
> > > + topts.ctx_in = &skb;
> > > + topts.ctx_size_in = sizeof(skb);
> > > +
> > > + prog_fd = bpf_program__fd(skel->progs.tc_pass_prog);
> >
> > [ ... ]
> >
> > > diff --git a/tools/testing/selftests/bpf/progs/test_pkt_access.c b/tools/testing/selftests/bpf/progs/test_pkt_access.c
> > > index bce7173152c6..cd284401eebd 100644
> > > --- a/tools/testing/selftests/bpf/progs/test_pkt_access.c
> > > +++ b/tools/testing/selftests/bpf/progs/test_pkt_access.c
> > > @@ -150,3 +150,15 @@ int test_pkt_access(struct __sk_buff *skb)
> > >
> > > return TC_ACT_UNSPEC;
> > > }
> > > +
> > > +SEC("tc")
> > > +int tc_pass_prog(struct __sk_buff *skb)
> > > +{
> > > + return TC_ACT_OK;
> > > +}
> > > +
> > > +SEC("xdp.frags")
> > > +int xdp_frags_pass_prog(struct xdp_md *ctx)
> > > +{
> > > + return XDP_PASS;
> > > +}
> >
> > A related suggestion on v2 was that, once pkt_v4 is reused, the existing
> > BPF program could be reused instead of adding new pass-through programs.
> >
> > Could tc_pass_prog and xdp_frags_pass_prog be dropped in favour of the
> > existing program? The v3 still adds both of these, so this point also
> > seems to be open.
> >
> >
> > ---
> > AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> > See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> >
> > CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27680511802
>
> Hi,
>
> Thanks for checking this.
Hi Sun Jian,
It would help if you could reply inline instead of at the end of the
messages, especially when there are multiple comments. See [1] for an
explanation of how that works.
1: https://kernelnewbies.org/FirstKernelPatch#Responding_inline
>
> I tried reusing pkt_v4 and the existing TC program, but they do not fit
> the skb case this test is trying to cover.
>
> For skb test_run, IPv4/IPv6 inputs with a too-short L3 header in the
> linear area are rejected before bpf_test_finish(). With pkt_v4 and a
> linear area of ETH_HLEN, the test fails with -EINVAL before reaching the
> partial copy-out path. If the linear area is increased enough to pass the
> IPv4 check, pkt_v4 is too small to both trigger the old
> copy_size - frag_size path and verify that the copied prefix spans the
> linear data and the first fragment. pkt_v6 has the same issue: after
> making the IPv6 header linear, only 20 bytes remain in frags.
>
> The existing test_pkt_access program has its own packet-access coverage
> goals and is not just a pass-through carrier. With such a short linear
> area or small packet fixture, it can fail before the test hits the
> bpf_test_finish()'s partial copy-out path. A pass-through TC program is
> therefore a better fit, because it keeps the test focused on the
> bpf_test_finish() copy-out semantics.
If we're keeping tc_pass_prog() then can't we use pkt_v4 and get rid of
init_pkt?
>
> For XDP, this object does not have an existing xdp.frags pass-through
> program, so the small XDP frags program is needed to cover the other
> caller of the shared bpf_test_finish() path.
>
> Thanks,
> Sun Jian
^ permalink raw reply
* Re: [PATCH v2] net: mvneta: free/request IRQ across suspend/resume
From: Sebastian Andrzej Siewior @ 2026-06-18 8:39 UTC (permalink / raw)
To: Yun Zhou
Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, kuba, pabeni,
clrkwllms, rostedt, netdev, linux-kernel, linux-rt-devel
In-Reply-To: <20260617092028.1722407-1-yun.zhou@windriver.com>
On 2026-06-17 17:20:28 [+0800], Yun Zhou wrote:
> On PREEMPT_RT, the mvneta IRQ handler is force-threaded. Under high
There is also the `threadirqs' option.
> network traffic, the IRQ can enter suspend with desc->depth == 1
> (masked by the oneshot mechanism between handler invocations).
That would be irq_desc::depth.
> During suspend, the kernel increments depth to 2 and masks the
> interrupt at the MPIC level (clearing the SRC_CTL CPU routing bit,
> due to IRQCHIP_MASK_ON_SUSPEND).
The interrupt should be masked while the depth counter goes 0->1, no?
> On resume, depth is decremented
> back to 1, but since it does not reach 0, the unmask is never
> called. The MPIC CPU routing remains cleared, permanently disabling
> interrupt delivery.
But why not? In my naive assumption, we get into suspend with
irq_desc::depth = 2 and the threaded should be woken up. Once the
treaded handler is done the counter should decrement by one. Then again
during resume reaching 0 leading to the unmask. If the thread handler is
frozen and defrosted on resume then it should still happen but in
different order.
Something is missing here based on my naive assumption.
> Fix by freeing the IRQ in suspend and re-requesting it in resume.
> This ensures a clean IRQ state (depth=0, proper hardware routing)
> on every resume cycle, regardless of the pre-suspend depth. This
> follows the approach used by other drivers (e.g. igb).
The igb shutdowns the device entirely, not just freeing the IRQ.
> Fixes: 9768b45ceb0b ("net: mvneta: support suspend and resume")
> Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Sebastian
^ permalink raw reply
* [PATCH v5.10 2/2] net/sched: cls_u32: use skb_header_pointer_careful()
From: Shivani Agarwal @ 2026-06-18 8:08 UTC (permalink / raw)
To: stable, gregkh
Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
xiaosuo, iri, jhs, ajay.kaher, alexey.makhalov,
vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu, GangMin Kim,
Bin Lan, Shivani Agarwal
In-Reply-To: <20260618080807.1269070-1-shivani.agarwal@broadcom.com>
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit cabd1a976375780dabab888784e356f574bbaed8 ]
skb_header_pointer() does not fully validate negative @offset values.
Use skb_header_pointer_careful() instead.
GangMin Kim provided a report and a repro fooling u32_classify():
BUG: KASAN: slab-out-of-bounds in u32_classify+0x1180/0x11b0
net/sched/cls_u32.c:221
Fixes: fbc2e7d9cf49 ("cls_u32: use skb_header_pointer() to dereference data safely")
Reported-by: GangMin Kim <km.kim1503@gmail.com>
Closes: https://lore.kernel.org/netdev/CANn89iJkyUZ=mAzLzC4GdcAgLuPnUoivdLaOs6B9rq5_erj76w@mail.gmail.com/T/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260128141539.3404400-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Bin Lan <lanbincn@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Shivani: Modified to apply on 5.10.y ]
Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com>
---
net/sched/cls_u32.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index f2a0c1068..e501390cc 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -149,10 +149,8 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp,
int toff = off + key->off + (off2 & key->offmask);
__be32 *data, hdata;
- if (skb_headroom(skb) + toff > INT_MAX)
- goto out;
-
- data = skb_header_pointer(skb, toff, 4, &hdata);
+ data = skb_header_pointer_careful(skb, toff, 4,
+ &hdata);
if (!data)
goto out;
if ((*data ^ key->val) & key->mask) {
@@ -202,8 +200,9 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp,
if (ht->divisor) {
__be32 *data, hdata;
- data = skb_header_pointer(skb, off + n->sel.hoff, 4,
- &hdata);
+ data = skb_header_pointer_careful(skb,
+ off + n->sel.hoff,
+ 4, &hdata);
if (!data)
goto out;
sel = ht->divisor & u32_hash_fold(*data, &n->sel,
@@ -217,7 +216,7 @@ static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp,
if (n->sel.flags & TC_U32_VAROFFSET) {
__be16 *data, hdata;
- data = skb_header_pointer(skb,
+ data = skb_header_pointer_careful(skb,
off + n->sel.offoff,
2, &hdata);
if (!data)
--
2.53.0
^ permalink raw reply related
* [PATCH v5.10 1/2] net: add skb_header_pointer_careful() helper
From: Shivani Agarwal @ 2026-06-18 8:08 UTC (permalink / raw)
To: stable, gregkh
Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
xiaosuo, iri, jhs, ajay.kaher, alexey.makhalov,
vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu, Bin Lan,
Shivani Agarwal
In-Reply-To: <20260618080807.1269070-1-shivani.agarwal@broadcom.com>
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit 13e00fdc9236bd4d0bff4109d2983171fbcb74c4 ]
This variant of skb_header_pointer() should be used in contexts
where @offset argument is user-controlled and could be negative.
Negative offsets are supported, as long as the zone starts
between skb->head and skb->data.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260128141539.3404400-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Adjust context ]
Signed-off-by: Bin Lan <lanbincn@139.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Shivani: Modified to apply on 5.10.y ]
Signed-off-by: Shivani Agarwal <shivani.agarwal@broadcom.com>
---
include/linux/skbuff.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8abbb64bd..a2daeba8b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3686,6 +3686,18 @@ skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer)
skb_headlen(skb), buffer);
}
+/* Variant of skb_header_pointer() where @offset is user-controlled
+ * and potentially negative.
+ */
+static inline void * __must_check
+skb_header_pointer_careful(const struct sk_buff *skb, int offset,
+ int len, void *buffer)
+{
+ if (unlikely(offset < 0 && -offset > skb_headroom(skb)))
+ return NULL;
+ return skb_header_pointer(skb, offset, len, buffer);
+}
+
/**
* skb_needs_linearize - check if we need to linearize a given skb
* depending on the given device features.
--
2.53.0
^ permalink raw reply related
* [PATCH v5.10 0/2] Fix CVE-2026-23204
From: Shivani Agarwal @ 2026-06-18 8:08 UTC (permalink / raw)
To: stable, gregkh
Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
xiaosuo, iri, jhs, ajay.kaher, alexey.makhalov,
vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu,
Shivani Agarwal
To fix CVE-2026-23204, commit cabd1a976375 is required; however,
it depends on commit 13e00fdc9236. Therefore, both patches
have been backported to v5.10.
Eric Dumazet (2):
net: add skb_header_pointer_careful() helper
net/sched: cls_u32: use skb_header_pointer_careful()
include/linux/skbuff.h | 12 ++++++++++++
net/sched/cls_u32.c | 13 ++++++-------
2 files changed, 18 insertions(+), 7 deletions(-)
--
2.53.0
^ permalink raw reply
* Re: [PATCH net-next 0/2] appletalk: move the protocol out of tree
From: Finn Thain @ 2026-06-18 8:31 UTC (permalink / raw)
To: Andrew Lunn
Cc: Carsten Strotmann, Jakub Kicinski, Carsten Strotmann,
John Paul Adrian Glaubitz, davem, netdev, edumazet, pabeni,
andrew+netdev, horms, geert, chleroy, npiggin, mpe, maddy,
linux-mips, linux-m68k, linuxppc-dev
In-Reply-To: <2791b2e3-bf58-4dce-9262-4f1d8d3241fb@lunn.ch>
On Thu, 18 Jun 2026, Andrew Lunn wrote:
> appletalk is just one of many many drivers where the listed Maintainers
> does not respond to patches, or there is no Maintainer at all. So a lot
> of work falls on the top level netdev Maintainers.
It goes with the territory. If that messes up their performance reviews, I
am okay with that. We all make our own choices.
> In fact, a lot of the AI driven bug fixes tend to fall into this
> category of old drivers with no active Maintainers, since that tends to
> be where the poorer quality code is.
That has not been my experience. I rarely see a review from sashiko-bot on
the scsi mailing list that doesn't list as many pre-existing bugs as new
bugs. This is almost always actively developed code, not mature code.
In anycase, quality is irrelevant here. I'm happy to see fixes for any
code base whatever its level of quality and whatever the quality metric.
What matters more to me than quality is utility.
> So top level netdev Maintainers are having to do a lot more work, on old
> drivers which very few people care about. That is a poor use of their
> talent, when we actually want them working on drivers for modern
> hardware with a lot of users.
>
Again, that has not been my experience. Linux often gets installed because
the hardware is not modern enough so the vendor has abandoned it and so
there's no better alternative than Linux.
As for wasted talent, this industry discards skillsets just as fast as you
discard e-waste. It goes with the territory. Moreover, if maintainers are
not using AI to make themselves more effective then they should admit to
retro-computing.
^ permalink raw reply
* Re: [PATCH net] net: sit: require CAP_NET_ADMIN in the device netns for changelink
From: Nicolas Dichtel @ 2026-06-18 8:25 UTC (permalink / raw)
To: Maoyi Xie, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Simon Horman, Kuniyuki Iwashima, Xiao Liang, Kees Cook, netdev,
linux-kernel, stable
In-Reply-To: <20260618070817.3378283-1-maoyixie.tju@gmail.com>
Le 18/06/2026 à 09:08, Maoyi Xie a écrit :
> ipip6_changelink() operates on at most two netns, dev_net(dev) and the
> tunnel link netns t->net. They differ once the device is created in or
> moved to a netns other than the one the request runs in. The rtnl
> changelink path checks CAP_NET_ADMIN only against dev_net(dev), so a
> caller privileged there but not in t->net can rewrite a tunnel that
> lives in t->net.
>
> Gate ipip6_changelink() on rtnl_dev_link_net_capable() at its top,
> before any attribute is parsed. sit was the one tunnel type not covered
> by the recent series that added this check to the other changelink()
> handlers.
>
> Fixes: 5e6700b3bf98 ("sit: add support of x-netns")
> Link: https://lore.kernel.org/netdev/20260612085941.3158249-1-maoyixie.tju@gmail.com/
> Cc: stable@vger.kernel.org
> Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
^ permalink raw reply
* Re: [PATCH net v2] net: marvell: prestera: initialize err in prestera_port_sfp_bind
From: Maxime Chevallier @ 2026-06-18 8:18 UTC (permalink / raw)
To: Ruoyu Wang, Taras Chornyi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Russell King,
Oleksandr Mazur, Yevhen Orlov, netdev, linux-kernel
In-Reply-To: <20260617193228.1653582-1-ruoyuw560@gmail.com>
Hi,
On 6/17/26 21:32, Ruoyu Wang wrote:
> prestera_port_sfp_bind() returns err after walking the ports node. If no
> child node matches the port's front-panel id, err is never assigned.
>
> Initialize err to 0 because absence of a matching optional port device
> tree node is not an error. In that case no phylink is created and port
> creation should continue with port->phy_link left NULL. Errors from
> malformed matched nodes and phylink_create() still propagate.
>
> Fixes: 52323ef75414 ("net: marvell: prestera: add phylink support")
> Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
Sorry for the grumpiness, but Andrew did ask you to read :
https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html
and one of the first things it says it to wait 24h before a repost :/
The patch in itself LGTM, so :
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Please wait 24h next time,
Maxime
^ permalink raw reply
* neigh: poor scalability of forced GC when neighbour count exceeds gc_thresh3
From: Vimal Agrawal @ 2026-06-18 8:17 UTC (permalink / raw)
To: netdev; +Cc: David Ahern, Jakub Kicinski, Vimal Agrawal
While investigating a soft lockup observed during neighbour table
growth, I noticed that neighbour allocation latency increases
significantly once the number of entries exceeds gc_thresh3.
Test setup:
net.ipv6.neigh.default.gc_thresh1 = 16384
net.ipv6.neigh.default.gc_thresh2 = 32768
net.ipv6.neigh.default.gc_thresh3 = 32768
I created approximately 50,000 reachable neighbour entries and
measured time spent in __neigh_create(). Once the table size exceeds
gc_thresh3, neighbour creation latency increases dramatically (in my
testing, individual allocations can take >16 ms). Profiling shows that
most of the time is spent waiting on tbl->lock, typically held by
neigh_forced_gc().
The relevant path is:
static int neigh_forced_gc(struct neigh_table *tbl)
{
...
write_lock_bh(&tbl->lock);
list_for_each_entry_safe(n, tmp, &tbl->gc_list, gc_list) {
if (refcount_read(&n->refcnt) == 1) {
...
In my workload, most entries are active/reachable and have refcnt > 1,
so the GC walk scans a large portion of the neighbour table without
reclaiming entries. As a result, the lock can be held for a long
period while traversing the GC list.
Another observation is that once gc_thresh3 is exceeded, every new
neighbour allocation attempts a forced GC:
entries = atomic_inc_return(&tbl->gc_entries) - 1;
if (entries >= gc_thresh3 ||
(entries >= READ_ONCE(tbl->gc_thresh2) &&
time_after(now, READ_ONCE(tbl->last_flush) + 5 * HZ))) {
if (!neigh_forced_gc(tbl) && entries >= gc_thresh3) {
...
Unlike the gc_thresh2 case, there is no rate limiting once the table
is already above gc_thresh3. Under sustained neighbour creation this
results in repeated full GC scans, further increasing contention on
tbl->lock.
Has this scalability issue been discussed previously, or is there a
reason why forced GC above gc_thresh3 is intentionally not
rate-limited?
I would be interested in feedback before working on a patch.
Thanks,
Vimal Agrawal
^ permalink raw reply
* Re: [PATCH net-next 0/2] appletalk: move the protocol out of tree
From: Geert Uytterhoeven @ 2026-06-18 8:13 UTC (permalink / raw)
To: Andrew Lunn
Cc: Finn Thain, Carsten Strotmann, Jakub Kicinski,
John Paul Adrian Glaubitz, davem, netdev, edumazet, pabeni,
andrew+netdev, horms, chleroy, npiggin, mpe, maddy, linux-mips,
linux-m68k, linuxppc-dev
In-Reply-To: <2791b2e3-bf58-4dce-9262-4f1d8d3241fb@lunn.ch>
Hi Andrew,
On Thu, 18 Jun 2026 at 10:01, Andrew Lunn <andrew@lunn.ch> wrote:
> If the appletalk community can take the workload off the top level
> maintainers, respond to all patches within 2 to 3 days, give
> Reviewed-by, or make change requests, it can probably stay in the
> Mainline kernel. Otherwise it will move out of tree.
"2 or 3 days" is rather short. If we would have to move all code
maintained by people who cannot respond to all patches within 2 to
3 days out of the mainline kernel, you'd end up with a networking
subsystem without supporting OS ;-)
git grep three.*week -- Documentation/
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* RE: [PATCH] net: fman: fix clock and device node leak in probe error paths
From: Madalin Bucur @ 2026-06-18 8:08 UTC (permalink / raw)
To: ZhaoJinming
Cc: Sean Anderson, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <20260618075435.1262533-1-zhaojinming@uniontech.com>
> -----Original Message-----
> From: ZhaoJinming <zhaojinming@uniontech.com>
> Sent: Thursday, June 18, 2026 10:55 AM
> To: Madalin Bucur <madalin.bucur@nxp.com>
> Cc: Sean Anderson <sean.anderson@linux.dev>; Andrew Lunn
> <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; ZhaoJinming <zhaojinming@uniontech.com>
> Subject: [PATCH] net: fman: fix clock and device node leak in probe error paths
>
> In read_dts_node(), fm_node is acquired via of_node_get() and clk is
> acquired via of_clk_get(). After a successful of_clk_get() call, the
> error paths for clk_get_rate failure and of_property_read_u32_array
> failure correctly goto fman_node_put (releasing fm_node) but miss
> clk_put(clk).
>
> Worse, all error paths from the MURAM node lookup onward goto
> fman_free directly, skipping both fman_node_put and clk_put, leaking
> both the fm_node and clk references.
>
> of_clk_get() eventually calls __of_clk_get() -> clk_hw_create_clk() ->
> alloc_clk() -> kzalloc_obj() to allocate the clk struct, so clk_put()
> must be called to release this memory. Without it, the allocated clk
> struct is leaked on every probe failure after of_clk_get() succeeds.
>
> Introduce a clk_put label between the success return and fman_node_put
> in the unwind chain, and redirect all error paths after of_clk_get()
> to this new label. Since no goto target remains for fman_free, fold
> it into fman_node_put and remove the now-unused label.
>
> Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
> ---
> drivers/net/ethernet/freescale/fman/fman.c | 19 ++++++++++---------
> 1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/fman/fman.c
> b/drivers/net/ethernet/freescale/fman/fman.c
> index 013273a2de32..734cbe8efd7e 100644
> --- a/drivers/net/ethernet/freescale/fman/fman.c
> +++ b/drivers/net/ethernet/freescale/fman/fman.c
> @@ -2736,7 +2736,7 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> err = -EINVAL;
> dev_err(&of_dev->dev, "%s: Failed to determine FM%d clock rate\n",
> __func__, fman->dts_params.id);
> - goto fman_node_put;
> + goto clk_put;
> }
> /* Rounding to MHz */
> fman->dts_params.clk_freq = DIV_ROUND_UP(clk_rate, 1000000);
> @@ -2746,7 +2746,7 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> if (err) {
> dev_err(&of_dev->dev, "%s: failed to read fsl,qman-channel-range
> for %pOF\n",
> __func__, fm_node);
> - goto fman_node_put;
> + goto clk_put;
> }
> fman->dts_params.qman_channel_base = range[0];
> fman->dts_params.num_of_qman_channels = range[1];
> @@ -2757,7 +2757,7 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> err = -EINVAL;
> dev_err(&of_dev->dev, "%s: could not find MURAM node\n",
> __func__);
> - goto fman_free;
> + goto clk_put;
> }
>
> err = of_address_to_resource(muram_node, 0,
> @@ -2766,7 +2766,7 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> of_node_put(muram_node);
> dev_err(&of_dev->dev, "%s: of_address_to_resource() = %d\n",
> __func__, err);
> - goto fman_free;
> + goto clk_put;
> }
>
> of_node_put(muram_node);
> @@ -2776,7 +2776,7 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> if (err < 0) {
> dev_err(&of_dev->dev, "%s: irq %d allocation failed (error = %d)\n",
> __func__, irq, err);
> - goto fman_free;
> + goto clk_put;
> }
>
> if (fman->dts_params.err_irq != 0) {
> @@ -2786,7 +2786,7 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> if (err < 0) {
> dev_err(&of_dev->dev, "%s: irq %d allocation failed (error
> = %d)\n",
> __func__, fman->dts_params.err_irq, err);
> - goto fman_free;
> + goto clk_put;
> }
> }
>
> @@ -2794,7 +2794,7 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> if (IS_ERR(base_addr)) {
> err = PTR_ERR(base_addr);
> dev_err(&of_dev->dev, "%s: devm_ioremap() failed\n", __func__);
> - goto fman_free;
> + goto clk_put;
> }
>
> fman->dts_params.base_addr = base_addr;
> @@ -2806,7 +2806,7 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
> if (err) {
> dev_err(&of_dev->dev, "%s: of_platform_populate() failed\n",
> __func__);
> - goto fman_free;
> + goto clk_put;
> }
>
> #ifdef CONFIG_DPAA_ERRATUM_A050385
> @@ -2816,9 +2816,10 @@ static struct fman *read_dts_node(struct
> platform_device *of_dev)
>
> return fman;
>
> +clk_put:
> + clk_put(clk);
> fman_node_put:
> of_node_put(fm_node);
> -fman_free:
> kfree(fman);
> return ERR_PTR(err);
> }
> --
> 2.20.1
Acked-by: Madalin Bucur <madalin.bucur@nxp.com>
Thanks
^ permalink raw reply
* Re: [PATCH 1/2] igc: Wait for MAC passthrough after reset
From: Andrew Lunn @ 2026-06-18 8:08 UTC (permalink / raw)
To: Chia-Lin Kao (AceLan)
Cc: Tony Nguyen, Przemek Kitszel, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, intel-wired-lan,
netdev, linux-kernel
In-Reply-To: <20260618073324.1843310-1-acelan.kao@canonical.com>
> +static void igc_wait_for_lmvp_mac_passthrough(struct pci_dev *pdev,
> + struct igc_hw *hw)
> +{
> + u8 addr[ETH_ALEN] __aligned(2);
> + u32 orig_ral, orig_rah;
> + u32 ral, rah;
> + int i;
> +
> + if (!igc_is_lmvp_device(pdev))
> + return;
> +
> + igc_read_rar0(hw, addr, &orig_ral, &orig_rah);
> +
> + for (i = 0; i < 100; i++) {
> + msleep(100);
> + igc_read_rar0(hw, addr, &ral, &rah);
> + if ((ral != orig_ral || rah != orig_rah) &&
> + is_valid_ether_addr(addr))
> + return;
> + }
Please use one of the helpers from iopoll.h
Andrew
^ permalink raw reply
* Re: [PATCHv2] net: emac: Fix NULL pointer dereference in emac_probe
From: Andrew Lunn @ 2026-06-18 8:03 UTC (permalink / raw)
To: Rosen Penev
Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, open list
In-Reply-To: <20260618023405.415644-1-rosenp@gmail.com>
On Wed, Jun 17, 2026 at 07:34:05PM -0700, Rosen Penev wrote:
> Move devm_request_irq() after devm_platform_ioremap_resource() so that
> dev->emacp is mapped before the interrupt handler can fire. An early
> interrupt hitting emac_irq() would dereference the NULL dev->emacp and
> crash.
>
> Also remove redundant error message. devm_platform_ioremap_resource()
> already returns an error message with dev_err_probe().
I still think there is a bigger problem that interrupts can fire
early, but:
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox