* RE: Ethtool : PRBS feature
From: Das, Shubham @ 2026-06-22 15:10 UTC (permalink / raw)
To: Andrew Lunn, Maxime Chevallier
Cc: Alexander H Duyck, lee@trager.us, netdev@vger.kernel.org,
mkubecek@suse.cz, D H, Siddaraju, Chintalapalle, Balaji,
Lindberg, Magnus, niklas.damberg@ericsson.com
In-Reply-To: <b61ee484-f9f5-4504-9e88-aeac701cd4e2@lunn.ch>
> Do you at least have the functionality of the standard C45 registers, even if the addresses and bit fields are messed up?
> If you do, maybe we should actually start with a C45 conforming implementation, and then you can do a translation layer to whatever oddball implementation you have?
The PHY supports the equivalent functionality (PRBS TX, PRBS RX/checker, BER testing, error injection, and symbol/error counters read),
but these are not exposed through standard Clause 45 PRBS registers. Instead, all operations are implemented by PHY firmware
and accessed through a command/control register interface.
If we want to support both driver-owned NIC architectures and use cases where the PHY driver directly manages the PRBS functionality,
Should this be exposed through a common phylib abstraction/API or a different approach ?
- Shubham D
> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 21 June 2026 00:51
> To: Maxime Chevallier <maxime.chevallier@bootlin.com>
> Cc: Das, Shubham <shubham.das@intel.com>; Alexander H Duyck
> <alexander.duyck@gmail.com>; lee@trager.us; netdev@vger.kernel.org;
> mkubecek@suse.cz; D H, Siddaraju <siddaraju.dh@intel.com>; Chintalapalle,
> Balaji <balaji.chintalapalle@intel.com>; Lindberg, Magnus
> <magnus.k.lindberg@ericsson.com>; niklas.damberg@ericsson.com
> Subject: Re: Ethtool : PRBS feature
>
> On Sat, Jun 20, 2026 at 04:39:06PM +0200, Maxime Chevallier wrote:
> > Hi,
> >
> > On 6/20/26 15:48, Das, Shubham wrote:
> > >> Can you change the firmware to expose the 802.3 registers for PRBS?
> > >> You can then write a library which both plylib and your driver can use.
> > >
> > > Andrew,
> > >
> > > No, exposing the PRBS registers to drivers is not possible in our design (the
> registers are buried deep within the Accelerator/NIC/PHY/Analog IP hierarchy).
> > >
> > > Additionally, the PHY PRBS registers are not in accordance with the IEEE
> Clause 45 definitions. For instance, the PRBS registers are paged and 32-bit wide.
> > >
>
> Hi Shubham
>
> Do you at least have the functionality of the standard C45 registers, even if the
> addresses and bit fields are messed up?
>
> If you do, maybe we should actually start with a C45 conforming implementation,
> and then you can do a translation layer to whatever oddball implementation you
> have?
>
> > > Given these constraints, we think ethtool --phy-test is a reasonable
> > > starting point for exposing the long-established Ethernet PRBS
> > > functionality to Linux userspace, as it aligns well with the
> > > driver-owned NIC architecture model.
>
> I agree an ethtool --phy-test makes sense, but we need to ensure standard based
> C45 functionality is covered, not just your oddball vendor functionality.
>
> Andrew
^ permalink raw reply
* RE: [PATCH 1/2] Protect skb pointer used by two different kernel instances
From: Selvamani Rajagopal @ 2026-06-22 15:03 UTC (permalink / raw)
To: Eric Dumazet
Cc: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto,
David S. Miller, Jakub Kicinski, Paolo Abeni,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Lunn
In-Reply-To: <CANn89iLLrOZ4Ep=dBgs0sC0tWsrZHy8DnCPLw_Em+7v2UtE9tA@mail.gmail.com>
> -----Original Message-----
> From: Eric Dumazet <edumazet@google.com>
> Subject: Re: [PATCH 1/2] Protect skb pointer used by two different kernel instances
>
>
> >
> > Fixes: b542d13fab0f ("net: ethernet: oa_tc6: Interrupt is active low, level triggered.")
> > Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
> > ---
>
> OK but please use "net: ethernet: oa_tc6:" prefix in the patch title.
Yes. Another miss. Sorry about that.
^ permalink raw reply
* [PATCH net] netpoll: fix a use-after-free on shutdown path
From: Breno Leitao @ 2026-06-22 15:01 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Amerigo Wang
Cc: netdev, linux-kernel, vlad.wing, asantostc, kernel-team, stable,
Breno Leitao
There is a use-after-free error on netpoll, which is clearly detected by
KASAN.
BUG: KASAN: slab-use-after-free in _raw_spin_lock_irqsave+0x3b/0x80
Read of size 1 at addr ... by task kworker/9:1
Workqueue: events queue_process
Call Trace:
skb_dequeue+0x1e/0xb0
queue_process+0x2c/0x600
process_scheduled_works+0x4b6/0x850
worker_thread+0x414/0x5a0
Allocated by task 242:
__netpoll_setup+0x201/0x4a0
netpoll_setup+0x249/0x550
enabled_store+0x32f/0x380
Freed by task 0:
kfree+0x1b7/0x540
rcu_core+0x3f8/0x7a0
The problem happens when there is a pending TX worker running in
parallel with the cleanup path.
This is what happens on netpoll shutdown path:
1) __netpoll_cleanup() is called
2) set dev->npinfo to NULL
3) call_rcu() with rcu_cleanup_netpoll_info()
3.1) rcu_cleanup_netpoll_info() tries to cancel all workers with
cancel_delayed_work(), but doesn't wait for the worker to finish
4) and kfree(npinfo);
Because 3.1) doesn't really cancel the work, as the comment says "we
can't call cancel_delayed_work_sync here, as we are in softirq", the TX
worker can run after 4).
Tl;DR: queue_process() is not an RCU reader, it reaches npinfo through
the work item via container_of().
In reality, we can improve this cleanup path by a lot, but, given that
this is targeting net, just do the sane path:
1) set dev->npinfo to NULL
2) synchronize net / RCU
3) cancel_delayed_work_sync() any new worker (that potentially showed up
after the grace period -- and should exit soon given they will see
dev->npinfo = NULL)
4) then rcu_cleanup_netpoll_info() -> kfree() npinfo
In the future, we can do the cleanup inline here, and don't need
npinfo->rcu rcu_head, but that is net-next material.
Cc: stable@vger.kernel.org
Fixes: 38e6bc185d95 ("netpoll: make __netpoll_cleanup non-block")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
net/core/netpoll.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 229dde818ab33..5765015b40720 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -634,9 +634,6 @@ static void rcu_cleanup_netpoll_info(struct rcu_head *rcu_head)
skb_queue_purge(&npinfo->txq);
- /* we can't call cancel_delayed_work_sync here, as we are in softirq */
- cancel_delayed_work(&npinfo->tx_work);
-
/* clean after last, unfinished work */
__skb_queue_purge(&npinfo->txq);
/* now cancel it again */
@@ -664,6 +661,14 @@ static void __netpoll_cleanup(struct netpoll *np)
ops->ndo_netpoll_cleanup(np->dev);
RCU_INIT_POINTER(np->dev->npinfo, NULL);
+ /*
+ * synchronize_net() does not protect the worker
+ * (queue_process() is not an RCU reader). It fences the
+ * senders -- the real RCU readers -- so they cannot re-arm
+ * tx_work after the np->dev->npinfo was set to NULL.
+ */
+ synchronize_net();
+ cancel_delayed_work_sync(&npinfo->tx_work);
call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
}
---
base-commit: d07d80b6a129a44538cda1549b7acf95154fb197
change-id: 20260622-netpoll_rcu_fix-def7bce1207a
Best regards,
--
Breno Leitao <leitao@debian.org>
^ permalink raw reply related
* Re: [PATCH iproute2-next] "ip help" wrong output, exit code.
From: Stephen Hemminger @ 2026-06-22 14:57 UTC (permalink / raw)
To: Dmitri Seletski; +Cc: netdev
In-Reply-To: <069b13e1-f689-410b-bd40-b5e5831b67e7@gmail.com>
On Sun, 21 Jun 2026 22:48:59 +0100
Dmitri Seletski <drjoms@gmail.com> wrote:
> From 0805e07105cd15c5b94271a4706e50e3c65dbde5 Mon Sep 17 00:00:00 2001
> From: Dmitri Seletski <drjoms@gmail.com>
> Date: Sun, 21 Jun 2026 22:12:43 +0100
> Subject: [PATCH iproute2-next] "ip help" wrong output, exit code.
>
> Changed output of "ip help" from standard error to standard output. And
> Exit is now 0 instead of -1. "ip help|grep bridge" - now gives bridge
> syntax instead of flooding user with everything from "ip help".
> ---
> ip/ip.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/ip/ip.c b/ip/ip.c
> index e4b71bde..4627b61c 100644
> --- a/ip/ip.c
> +++ b/ip/ip.c
> @@ -56,7 +56,7 @@ static void usage(void) __attribute__((noreturn));
>
> static void usage(void)
> {
> -fprintf(stderr,
> +fprintf(stdout,
> "Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
> " ip [ -force ] -batch filename\n"
> "where OBJECT := { address | addrlabel | fou | help | ila | ioam | l2tp
> | link |\n"
> @@ -72,7 +72,7 @@ static void usage(void)
> " -o[neline] | -t[imestamp] | -ts[hort] | -b[atch]
> [filename] |\n"
> " -rc[vbuf] [size] | -n[etns] name | -N[umeric] |
> -a[ll] |\n"
> " -c[olor]}\n");
> -exit(-1);
> +exit(0);
> }
Your mailer damages white space.
^ permalink raw reply
* [PATCH net] nfc: nci: fix out-of-bounds write in nci_target_auto_activated()
From: Samuel Page @ 2026-06-22 14:52 UTC (permalink / raw)
To: David Heidelberg
Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, oe-linux-nfc, netdev, linux-kernel, stable,
Samuel Page
nci_target_auto_activated() appends a target to the fixed-size array
ndev->targets[NCI_MAX_DISCOVERED_TARGETS] and increments ndev->n_targets
without first checking the array is full; unlike its sibling
nci_add_new_target(), which bails out when n_targets already equals
NCI_MAX_DISCOVERED_TARGETS.
ndev->n_targets is only cleared by nci_clear_target_list(), so an NFCC
that repeatedly re-runs discovery (RF_DISCOVER_RSP, which re-enters
NCI_DISCOVERY without clearing the target list) and reports an
auto-activated target (RF_INTF_ACTIVATED_NTF) drives n_targets past the
limit. The append then writes a struct nfc_target past the end of the
array (a slab out-of-bounds write), and nfc_targets_found() goes on to
walk the array with the inflated count:
BUG: KASAN: slab-out-of-bounds in nci_add_new_protocol+0x94/0x2ac [nci]
Write of size 2 at addr ffff0000c7299a18 by task kworker/u8:0/12
Workqueue: nfc0_nci_rx_wq nci_rx_work [nci]
Call trace:
nci_add_new_protocol+0x94/0x2ac [nci]
nci_ntf_packet+0xddc/0x11a0 [nci]
nci_rx_work+0x15c/0x1e0 [nci]
process_one_work+0x2dc/0x500
worker_thread+0x240/0x460
kthread+0x1c0/0x1d0
ret_from_fork+0x10/0x20
The buggy address belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1024 bytes to the right of
allocated 1560-byte region [ffff0000c7299000, ffff0000c7299618)
Guard nci_target_auto_activated() with the same check used by
nci_add_new_target().
Fixes: 019c4fbaa790 ("NFC: Add NCI multiple targets support")
Cc: stable@vger.kernel.org
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
---
net/nfc/nci/ntf.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
index c96512bb8653..566ca839fa48 100644
--- a/net/nfc/nci/ntf.c
+++ b/net/nfc/nci/ntf.c
@@ -603,6 +603,12 @@ static void nci_target_auto_activated(struct nci_dev *ndev,
struct nfc_target *target;
int rc;
+ /* This is a new target, check if we've enough room */
+ if (ndev->n_targets == NCI_MAX_DISCOVERED_TARGETS) {
+ pr_debug("not enough room, ignoring new target...\n");
+ return;
+ }
+
target = &ndev->targets[ndev->n_targets];
rc = nci_add_new_protocol(ndev, target, ntf->rf_protocol,
base-commit: 47186409c092cd7dd70350999186c700233e854d
--
2.54.0
^ permalink raw reply related
* RE: [PATCH net-next v5 1/4] dpll: add DPLL_PIN_TYPE_INT_NCO pin type
From: Kubalewski, Arkadiusz @ 2026-06-22 14:47 UTC (permalink / raw)
To: Vecera, Ivan, Jiri Pirko, Vadim Fedorenko, Jakub Kicinski
Cc: netdev@vger.kernel.org, Jiri Pirko, David S. Miller,
Donald Hunter, Eric Dumazet, Schmidt, Michal, Paolo Abeni,
Vaananen, Pasi, Oros, Petr, Prathosh Satish, Simon Horman,
linux-kernel@vger.kernel.org
In-Reply-To: <23e47140-f69f-451d-9154-29071130c11c@redhat.com>
>From: Ivan Vecera <ivecera@redhat.com>
>Sent: Friday, June 19, 2026 7:08 PM
>
>On 6/17/26 1:59 PM, Kubalewski, Arkadiusz wrote:
>>> From: Ivan Vecera <ivecera@redhat.com>
>>> Sent: Monday, June 15, 2026 2:00 PM
>>>
>>> On 6/11/26 2:09 PM, Jiri Pirko wrote:
>>>> Wed, Jun 10, 2026 at 05:45:46PM +0200, ivecera@redhat.com wrote:
>>>>> On 6/10/26 3:04 PM, Kubalewski, Arkadiusz wrote:
>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>> Sent: Tuesday, June 9, 2026 4:59 PM
>>>>>>>
>>>>>>> On 6/9/26 4:00 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>>>>>> Sent: Tuesday, June 9, 2026 10:51 AM
>>>>>>>>>
>>>>>>>>> Mon, Jun 08, 2026 at 07:03:46PM +0200,
>>>>>>>>> arkadiusz.kubalewski@intel.com
>>>>>>>>> wrote:
>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>> Sent: Monday, June 8, 2026 5:48 PM
>>>>>>>>>>>
>>>>>>>>>>> On 6/8/26 4:43 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>>>> Sent: Sunday, May 31, 2026 9:44 PM ...
>>>>>>>>>>>>> -
>>>>>>>>>>>>> name: gnss
>>>>>>>>>>>>> doc: GNSS recovered clock
>>>>>>>>>>>>> + -
>>>>>>>>>>>>> + name: int-nco
>>>>>>>>>>>>> + doc: |
>>>>>>>>>>>>> + Device internal numerically controlled oscillator.
>>>>>>>>>>>>> + When connected as a DPLL input, the DPLL enters
>>>>>>>>>>>>> NCO
>>>>>>>>>>>>> mode
>>>>>>>>>>>>> + where the output frequency is adjusted by the host
>>>>>>>>>>>>> via
>>>>>>>>>>>>> + the PTP clock interface.
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Ivan!
>>>>>>>>>>>>
>>>>>>>>>>>> How would you control this in case of automatic mode dpll?
>>>>>>>>>>>> Automatic mode DPLL shall be controlled on HW level, such pin
>>>>>>>>>>>> brakes that rule and requires some driver magic to show it is
>>>>>>>>>>>> higher priority then the rest of the pins?
>>>>>>>>>>>
>>>>>>>>>>> The NCO pin can be connected only in manual mode. In other
>>>>>>>>>>> words
>>>>>>>>>>> a
>>>>>>>>>>> DPLL in automatic mode cannot select NCO pin (switch to NCO
>>>>>>>>>>> mode)
>>>>>>>>>>> by
>>>>>>>>>>> its own.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Being picky on DPLL_MODE for enabling feature is not something
>>>>>>>>>> we
>>>>>>>>>> can allow if it is not related to HW limitation, is it?
>>>>>>>>>> Could you please elaborate why it is not possible for AUTOMATIC
>>>>>>>>>> mode?
>>>>>>>>>
>>>>>>>>> In automatic mode, the pin selection logic is defined upon prio.
>>>>>>>>> I
>>>>>>>>> can imagine that if NCO pin has the highest prio of the available
>>>>>>>>> ones, it gets picked. I would be aligned 100% with automatic mode
>>>>>>>>> behaviour.
>>>>>>>>> Is there a real usecase for it?
>>>>>>>>>
>>>>>>>>> [..]
>>>>>>>>
>>>>>>>> This is not true. AUTOMATIC mode is HW solution, SW driver ONLY
>>>>>>>> configures priorities on the inputs, not manages the active
>>>>>>>> inputs.
>>>>>>>> This brakes that behavior, the SW driver would have to manually
>>>>>>>> override the AUTMATIC mode to be fed from such NCO pin as it
>>>>>>>> doesn't
>>>>>>>> exists on it's priority list, HW cannot pick or use it.
>>>>>>>
>>>>>>> Correct, AUTO mode is hardware feature and it should not be
>>>>>>> emulated
>>>>>>> by a
>>>>>>> driver. If the hardware does not support it then the switching
>>>>>>> between
>>>>>>> input references should be done by userspace (by monitoring ffo,
>>>>>>> phase_offset, operstate).
>>>>>>>
>>>>>>
>>>>>> Yes, exactly, so for AUTOMATIC mode HW it will not be possible to
>>>>>> create
>>>>>> such pin, which means that NCO pin would serve only a MANUAL mode
>>>>>> implementation.
>>>>>> Basically this is something we shall not allow to happen. DPLL API
>>>>>> should be designed to cover the case where AUTO mode is able to
>>>>>> implement
>>>>>> all features consistently.
>>>>>
>>>>> If you don't like the proposal from Jiri (NCO switch driven by NCO
>>>>> pin
>>>>> priority -> highest==enter_nco else leave_nco) then it could be
>>>>> possible
>>>>> to handle the switching by allowing the state 'connected' in AUTO
>>>>> mode
>>>>> for the NCO pin type. Then the implementation will be the same for
>>>>> both
>>>>> selection modes.
>>>>>
>>>>> Only difference would be that a user does not need to switch the
>>>>> device
>>>> >from the AUTO to MANUAL mode.
>>>>>
>>>>>>>> The real use case is that any DPLL can switch the mode to this one
>>>>>>>> instead of implementing MANUAL mode just to use the feature with a
>>>>>>>> 'virtual' pin.
>>>>>>>
>>>>>>> I don't expect this... but it is up to a driver. I don't plan such
>>>>>>> functionality in zl3073x as the NCO pin does not expose prio_get()
>>>>>>> and
>>>>>>> prio_set() callbacks - so it is clear that this pin cannot be part
>>>>>>> of
>>>>>>> the
>>>>>>> automatic selection.
>>>>>>>
>>>>>>> Ivan
>>>>>>
>>>>>> There is a difference between particular HW and API capabilities,
>>>>>> with
>>>>>> the
>>>>>> proposed API we would disallow the possibility of such
>>>>>> implementation
>>>>>> for
>>>>>> existing HW variants.
>>>>>>
>>>>>> DPLL NCO MODE would allow that but as pointed here by Ivan and by
>>>>>> Jiri
>>>>>> in
>>>>>> the other email it would also require the extra implementation for
>>>>>> some
>>>>>> configuration - device level phase/ffo handling.
>>>>>>
>>>>>> To summarize it all, I don't have such simple solution for it.
>>>>>>
>>>>>> First thing that comes to my mind is to combine both approaches.
>>>>>> Make it possible for AUTMATIC mode to also set "CONNECTED" state
>>>>>> on certain kind of "OVERRIDE" pins, where it could be determined by
>>>>>> the type of PIN and embed that logic into the DPLL subsystem.
>>>>>
>>>>> The possible states for particual pins are now handled at a driver
>>>>> level
>>>>> so the driver decides if the requested state is correct or not. So it
>>>>> could be easy to implement this.
>>>>>
>>>>> For auto mode allowed states:
>>>>> - input references: selectable / disconnected
>>>>> - nco pin: connected / disconnected
>>>>>
>>>>>> Basically, if driver registers such NCO pin it would be always
>>>>>> selected
>>>>>> manually, and in such case all the other pins are going to
>>>>>> disconnected
>>>>>> state while DPLL mode is also a "OVERRIDE" or something like it.
>>>>>
>>>>> I would leave this decision on the driver level... Imagine the
>>>>> potential
>>>>> HW that would allow to switch NCO mode if there is no valid input
>>>>> reference.
>>>>>
>>>>> Example:
>>>>>
>>>>> REF0 (prio 0) -> +------+ -> OUT0
>>>>> REF1 (prio 1) -> | DPLL | -> ...
>>>>> NCO (prio 2) -> +------+ -> OUTn
>>>>>
>>>>> Such HW would prefer REF0 or REF1 and lock to one of them if they are
>>>>> qualified. But if they are NOT, then it switches to NCO mode.
>>
>> Now you said yourself "NCO mode" ... I agree that it would be a mode in
>> that case. Where instead of running on regular/built in XO dpll would
>> run
>> on NCO and user could select it, and this would be addition to regular
>> behavior.
>>
>> I also agree that the pin approach might be better/easier to use,
>> assuming
>> frequency offset for all the outputs given dpll drives, it makes more
>> sense
>> to have it configurable on input side.
>
>+1
>
>>>>>
>>>>> In this situation the relevant driver would allow to configure
>>>>> priority
>>>>> and state 'selectable' for this NCO pin.
>>>>>
>>>>>> Perhaps the pin type could include OVERRIDE in it's name to make it
>>>>>> less
>>>>>> confusing and needs some extra documentation.
>>>>>>
>>>>>> Thoughts?
>>>>> I think _INT_ is ok. In the case of TYPE_INT_OSCILLATOR it is also
>>>>> obvious that it is not a standard input reference.
>>>>>
>>>>> Jiri, Vadim, Arek, thoughts?
>>>>
>>>> I agree with you, the driver should have the flexibility to implement
>>>> this according to his/hw's needs/capabilities. If it implements prio
>>>> selection in AUTO mode, let it have it. If it implements manual NCO
>>>> pin
>>>> selection in AUTO mode using connected/disconnected override, let it
>>>> have it.
>>
>> I don't know 'current' HW that is capable of using AUTO mode as a part
>> of
>> HW-based priority source selection and use such NCO input..
>> But as already explained above, this is special mode of regular XO,
>> which
>> allows DPLL's output frequency offset configuration.
>
>Lets keep this available for potential future HW. I can imagine a
>situation where a user will prefer an automatic switch to NCO mode
>if there is no qualified input reference - automatic switch means
>that HW will support this (not emulated by the driver).
>
>>>>
>>>> Moreover, I actually like the "override" capability for pins in AUTO
>>>> mode in general. It may be handy for other usecases as well.
>>>>
>>> Arek? Vadim?
>>>
>>> Thanks,
>>> Ivan
>>
>> Agree, 'override' capability of a pin would be the way to go for this
>> and
>> other similar further cases.
>>
>> I believe a single approach on this would be best, I mean if AUTO mode
>> needs a capability, to switch from regular behavior to 'OVERRIDE', and
>> 'OVERRIDE' is only pin capability that allows such behavior for AUTO
>> mode, then similar approach should be used on MANUAL mode, to make
>> userspace know that such pin is always available to set "CONNECTED"
>> and make the userspace implementation consistent on enabling it no
>> matter
>> if AUTO or MANUAL mode dpll.
>
>Proposal:
>1) new pin capability
> - name: state-connected-override
> - doc: pin state can be changed to connected in any DPLL mode
>
>2) new NCO pin type to switch the DPLL to NCO mode when connected
>
>3) automatic-only DPLL
> - should expose NCO pin with state-connected-override capability
>
>4) manual-only DPLL
> - does not need to expose NCO pin with state-connected-override cap
>
It makes sense only if such AUTO mode DPLL can have both OVERRIDE and
non-OVERRIDE possibilities. So further HW design could implement it
without OVERRIDE and current AUTO mode could OVERRIDE and use it.
Having it this way, requires different user space implementation,
but don't think we will be able to ease it anyhow.
Thank you!
Arkadiusz
>5) dual-mode DPLL (supporting mode switching)
> - if it exposes NCO pin with the override cap then it has to support
> switching to NCO mode directly from AUTO mode
> - if does not expose NCO pin with the override cap then a user MUST
> switch the DPLL mode from AUTO to MANUAL to be able to make NCO
> pin connected to the DPLL
>
>Vadim, Jiri, Arek - thoughts?
>
>Thanks,
>Ivan
^ permalink raw reply
* [PATCH] net: stmmac: fix missed le32_to_cpu()
From: Ben Dooks @ 2026-06-22 14:37 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Maxime Coquelin, Alexandre Torgue,
Russell King (Oracle), Ben Dooks, Maxime Chevallier, netdev,
linux-stm32, linux-arm-kernel, linux-kernel
The print in ndesc_display_ring() sends the des2 and des3
to the pr_info() without passing them through the relevant
conversion to cpu order.
Fix the (prototype) sparse warnings by using le32_to_cpu():
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: warning: incorrect type in argument 6 (different base types)
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: expected unsigned int
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: got restricted __le32 [usertype] des2
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: warning: incorrect type in argument 7 (different base types)
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: expected unsigned int
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: got restricted __le32 [usertype] des3
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
drivers/net/ethernet/stmicro/stmmac/norm_desc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
index c4b613564f87..74c9b7b1fe8f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
@@ -258,7 +258,7 @@ static void ndesc_display_ring(void *head, unsigned int size, bool rx,
pr_info("%03d [%pad]: 0x%x 0x%x 0x%x 0x%x",
i, &dma_addr,
(unsigned int)x, (unsigned int)(x >> 32),
- p->des2, p->des3);
+ le32_to_cpu(p->des2), le32_to_cpu(p->des3));
p++;
}
pr_info("\n");
--
2.37.2.352.g3c44437643
^ permalink raw reply related
* Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Toke Høiland-Jørgensen @ 2026-06-22 14:36 UTC (permalink / raw)
To: Ralf Lici
Cc: netdev, Daniel Gröber, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
Beniamino Galvani
In-Reply-To: <20260622133452.432257-1-ralf@mandelbit.com>
[ skipping some of the netfilter-related context until we hear from the
netfilter devs ]
>> > My second concern is that the SIIT boundary would be a property of
>> > rule and hook placement. That gives flexibility, but it also means the
>> > translation point has to be constrained and documented very carefully
>> > to avoid ambiguous TTL/Hop Limit, PMTU/ICMP, and hook-order behavior.
>> > For this use case I would rather have the route that matches the
>> > translation prefix also be the object that says: leave this family
>> > here and continue in the other one.
>>
>> Yeah, with flexibility comes the ability to shoot yourself in the foot.
>> But that's not really different from much of the other functionality we
>> have in the kernel today, is it? For netfilter in particular it's
>> certainly possible to configure a broken NAT configuration that leads to
>> packet drops (or just invalid packets being sent out on a network
>> device).
>>
>
> True, misconfiguration is always possible and that alone is not an
> argument against the netfilter model. But what do we actually gain in
> capability from that flexibility? I agree on the UX argument (an admin
> would look in nft first), but in terms of what the feature can do, I
> can't yet see what the nft model unlocks. More on this just below.
>
>> > After looking at the available kernel mechanisms again, I think the
>> > better model is probably LWT: routes carry an ipxlat encap referencing a
>> > named translator domain configured over netlink. That should represent
>> > the stateless, prefix-based and symmetric nature of ipxlat.
>>
>> I think this description actually hits the nail on the head: What are we
>> implementing here? Is it a product feature, or a building block for one?
>> The properties you mention wrt consistency, symmetry etc are properties
>> of the high-level feature (which is also generally the level things are
>> specified in RFCs). Whereas other packet mangling features in the kernel
>> are more in the "building block" category, where it's possible to
>> configure things to implement a particular feature set / compliance with
>> a particular RFC, but it's also possible to do things that are outside
>> of that.
>>
>> I think this relates to the "mechanism, not policy" approach that we
>> take to most things in the kernel: implement the building blocks to do
>> something in the most general way we can, and then leave it up to
>> userspace to configure things in a way that results in a consistent
>> high-level system behaviour.
>>
>
> That's a good point, and I agree that we should not bake a high-level
> product policy into the kernel if what we need is a reusable mechanism
> (the LWT idea was my attempt at exactly that). What I am still trying to
> understand is whether there is a useful generic trigger for stateless
> cross-family translation beyond the route/prefix/policy-routing cases.
>
> Routes and policy routing already cover the selectors I can make
> coherent for a stateless, per-packet translator: destination/source
> prefix, iif/oif/VRF, mark, TOS/DSCP, and so on. nft can of course match
> much more than that, but the additional selectors that would materially
> change the translation decision seem to be selectors such as L4 fields,
> payload state, or conntrack state. Those are exactly the selectors I am
> struggling to make correct for a stateless translator:
>
> - non-first fragments carry no L4 header at all, yet the translator must
> rewrite every fragment (an nft ... tcp dport trigger cannot fire on
> them);
>
> - ICMP errors must be translated too, but the flow identity lives in the
> quoted inner header (reversed), not in anything an L4/ct match on the
> error packet can see and there is no conntrack to associate them,
> since this is stateless.
True in principle, but if (say) you deploy this on a network that is
configured so it will never fragment packets, this won't be an issue in
practice.
I.e., you're quite right that arbitrary matching criteria cannot be
guaranteed to result in coherent translation. But I think that goes into
the "use it wrong, get wrong results" bin. E.g., if you match on
something that results in only a subset of the packets of a flow being
translated, well, only that subset of the packets will make it to the
destination. The SIIT translator itself should not try to fix this, but
neither should it prevent it; that's what I mean by "building block" -
it's up to the builder using the blocks to make sure the building
doesn't collapse, that's out of scope for the block manufacturer to
worry about :)
> So an L4-conditional trigger does not look like a good primitive for
> correct stateless SIIT unless the action also defragments/refragments or
> uses conntrack-like state. Those may be valid mechanisms, but they move
> the design away from the stateless per-packet SIIT boundary this RFC is
> trying to model.
>
> So my first question is: is there a useful nft configuration this should
> enable that is not naturally expressible as route selection, while still
> remaining stateless SIIT rather than a NAT64-like stateful feature?
> Maybe there is a real use case there, but I cannot construct one yet.
So the poster child for "match on arbitrary criteria" is of course BPF.
You can write BPF programs that match on arbitrary parts of the packet
header, custom encapsulation headers,or even on out of band things like
system state, phase of the moon, or what have you. And we should
certainly allow a BPF program to make the decision on whether to perform
the SIIT translation.
Which... maybe is an argument to keep it as a device like you do in this
RFC series? Redirecting to a device is trivially supported from TC-BPF,
which also makes it possible to use the translation mechanism without
going through the routing subsystem at all, saving a bit of overhead.
Whereas making it a route action ties it very closely to the routing
subsystem.
WDYT?
-Toke
^ permalink raw reply
* Re: [PATCH net 0/2] nfc: llcp: fix OOB reads and integer bugs in TLV parsers
From: Muhammad Bilal @ 2026-06-22 13:58 UTC (permalink / raw)
To: david; +Cc: netdev, linux-kernel, oe-linux-nfc, horms, stable
In-Reply-To: <78e283f0-1493-4d72-92fe-e6444458fb91@ixit.cz>
On Sun, Jun 21, 2026 at 09:52 PM David Heidelberg wrote:
> could I ask for the patches rebase against for-next?
Hi David,
Rebased onto nfc/for-next.
v2 is now sent and available here:
https://lore.kernel.org/netdev/20260622131802.239035-1-meatuni001@gmail.com/
Patch 2/2 was dropped because the equivalent fixes for nfc_llcp_recv_snl()
were already merged (ed85d4cbbfaa), so only llcp_commands.c remains.
Thanks,
Muhammad Bilal
^ permalink raw reply
* Re: [Intel-wired-lan] [PATCH iwl-net] ice: clear the default forwarding VSI rule when releasing a VSI
From: Marcin Szycik @ 2026-06-22 13:52 UTC (permalink / raw)
To: Petr Oros, netdev
Cc: Przemek Kitszel, Eric Dumazet, linux-kernel, Andrew Lunn,
Tony Nguyen, Michal Swiatkowski, Jacob Keller, Jakub Kicinski,
Paolo Abeni, David S. Miller, intel-wired-lan
In-Reply-To: <20260622081030.2312129-1-poros@redhat.com>
On 22/06/2026 10:10, Petr Oros wrote:
> When a VSI is configured as the switch's default forwarding VSI
> (ICE_SW_LKUP_DFLT) and is then torn down, the rule is left behind in
> the switch. ice_vsi_release() no longer removes it, and the SR-IOV VF
> free path (ice_free_vfs() -> ice_free_vf_res() -> ice_vf_vsi_release()
> -> ice_vsi_release()) does not disable promiscuous mode either, which
> only happens on VF reset in ice_vf_clear_all_promisc_modes().
>
> A trusted VF that enters unicast promiscuous mode becomes the default
> forwarding VSI (this is the default mode, when the PF does not have VF
> true-promiscuous mode enabled). If the VFs are then destroyed without
> the VF first leaving promiscuous mode, the ICE_SW_LKUP_DFLT rule for
> the now-freed VSI is leaked. When VFs are recreated, a VSI reuses the
> freed hw_vsi_id. If it is assigned a different VSI handle than the
> leaked rule holds, ice_set_dflt_vsi() does not recognize it as
> already-default, and ice_add_update_vsi_list() folds the dangling
> (freed) handle into a VSI list, which the firmware rejects. The VSI
> handle assigned on re-creation varies, so the failure is intermittent
> rather than every cycle.
>
> Reproduce by repeatedly running the cycle below on the two ports of the
> same card, where $VF0 and $VF1 are the netdevs of vf 15 once they
> appear. The VF must be brought up so iavf actually pushes the unicast
> promiscuous request, and the rule must settle before the VFs are torn
> down again:
>
> echo 16 > /sys/class/net/$PF0/device/sriov_numvfs
> echo 16 > /sys/class/net/$PF1/device/sriov_numvfs
> ip link set $PF0 vf 15 trust on
> ip link set $PF1 vf 15 trust on
> ip link set $VF0 up
> ip link set $VF1 up
> ip link set $VF0 promisc on
> ip link set $VF1 promisc on
> sleep 1
> echo 0 > /sys/class/net/$PF0/device/sriov_numvfs
> echo 0 > /sys/class/net/$PF1/device/sriov_numvfs
>
> Within a few cycles the ice PF and iavf VF log:
>
> Failed to set VSI 25 as the default forwarding VSI, error -22
> Turning on/off promiscuous mode for VF 63 failed, error: -22
> PF returned error -53 (IAVF_ERR_ADMIN_QUEUE_ERROR) to our request 14
>
> This cleanup used to live in ice_vsi_release() but was dropped by the
> referenced refactor. Restore it. Clear the default forwarding VSI rule
> in ice_vsi_release() when this VSI owns it, which covers every teardown
> path.
>
> Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---
> drivers/net/ethernet/intel/ice/ice_lib.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index 2717cc31bff8fe..408464434506ef 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -2872,6 +2872,9 @@ int ice_vsi_release(struct ice_vsi *vsi)
> return -ENODEV;
> pf = vsi->back;
>
> + if (ice_is_vsi_dflt_vsi(vsi))
> + ice_clear_dflt_vsi(vsi);
In the referenced commit, the chunk of code that contained these missing 2 lines
was moved to ice_vsi_decfg(). It also sounds like a good place for them and will
be called from ice_vsi_release(). Are you sure we should place them directly in
ice_vsi_release() instead?
Thanks,
Marcin
> +
> if (test_bit(ICE_FLAG_RSS_ENA, pf->flags))
> ice_rss_clean(vsi);
>
^ permalink raw reply
* Re: [PATCH net 3/3] net/mlx5e: Reject unsupported CB Shaper TSA in ETS validation
From: Pavan Chebbi @ 2026-06-22 13:50 UTC (permalink / raw)
To: Tariq Toukan
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
netdev, Paolo Abeni, Alexei Lazar, Carolina Jubran,
Leon Romanovsky, linux-kernel, linux-rdma, Mark Bloch,
Saeed Mahameed, Gal Pressman
In-Reply-To: <20260622112925.624795-4-tariqt@nvidia.com>
[-- Attachment #1: Type: text/plain, Size: 622 bytes --]
On Mon, Jun 22, 2026 at 5:02 PM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> From: Alexei Lazar <alazar@nvidia.com>
>
> Credit Based (CB) TSA is not supported by the mlx5 driver, so reject
> any configurations that specify it.
>
> Fixes: 08fb1dacdd76 ("net/mlx5e: Support DCBNL IEEE ETS")
> Signed-off-by: Alexei Lazar <alazar@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]
^ permalink raw reply
* Re: [PATCH net 2/3] net/mlx5e: Validate bandwidth for non-ETS traffic classes
From: Pavan Chebbi @ 2026-06-22 13:49 UTC (permalink / raw)
To: Tariq Toukan
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
netdev, Paolo Abeni, Alexei Lazar, Carolina Jubran,
Leon Romanovsky, linux-kernel, linux-rdma, Mark Bloch,
Saeed Mahameed, Gal Pressman
In-Reply-To: <20260622112925.624795-3-tariqt@nvidia.com>
[-- Attachment #1: Type: text/plain, Size: 720 bytes --]
On Mon, Jun 22, 2026 at 5:00 PM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> From: Alexei Lazar <alazar@nvidia.com>
>
> The IEEE 802.1Qaz standard defines that bandwidth allocation percentages
> only apply to ETS traffic classes.
>
> Reject ETS configurations that specify non-zero bandwidth for traffic
> classes.
>
> Fixes: 08fb1dacdd76 ("net/mlx5e: Support DCBNL IEEE ETS")
> Signed-off-by: Alexei Lazar <alazar@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]
^ permalink raw reply
* Re: [PATCH] net: liquidio: fix BAR resource leak on PF number failure
From: Simon Horman @ 2026-06-22 13:48 UTC (permalink / raw)
To: Haoxiang Li
Cc: andrew+netdev, davem, kuba, pabeni, felix.manlunas,
ricardo.farrington, netdev, linux-kernel, stable
In-Reply-To: <20260620083728.2722895-1-haoxiang_li2024@163.com>
On Sat, Jun 20, 2026 at 04:37:28PM +0800, Haoxiang Li wrote:
> If cn23xx_get_pf_num() fails, the function returns without
> unmapping either BAR. Unmap both BARs before returning from
> the error path.
I think it would be worth noting how this problem was found,
and if a publicly available tool was used, naming it.
>
> Fixes: 0c45d7fe12c7 ("liquidio: fix use of pf in pass-through mode in a virtual machine")
> Cc: stable@vger.kernel.org
> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
There is an AI-generated review of this patch available on sashiko.dev.
I don't think the issue raised there directly affects this patch.
But you may want to consider looking into that in the context of
a separate follow-up.
> ---
> drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
> index 75f22f74774c..a1548ca81ecd 100644
> --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
> +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
> @@ -1167,8 +1167,11 @@ int setup_cn23xx_octeon_pf_device(struct octeon_device *oct)
> return 1;
> }
>
> - if (cn23xx_get_pf_num(oct) != 0)
> + if (cn23xx_get_pf_num(oct) != 0) {
> + octeon_unmap_pci_barx(oct, 0);
> + octeon_unmap_pci_barx(oct, 1);
> return 1;
> + }
>
> if (cn23xx_sriov_config(oct)) {
> octeon_unmap_pci_barx(oct, 0);
I think this would be best handled by introducing an idiomatic goto unwind
ladder to this function. Something like this (compile tested only!):
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
index 75f22f74774c..73362b92d0fd 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -1163,18 +1163,14 @@ int setup_cn23xx_octeon_pf_device(struct octeon_device *oct)
if (octeon_map_pci_barx(oct, 1, MAX_BAR1_IOREMAP_SIZE)) {
dev_err(&oct->pci_dev->dev, "%s CN23XX BAR1 map failed\n",
__func__);
- octeon_unmap_pci_barx(oct, 0);
- return 1;
+ goto err_free_barx_0;
}
if (cn23xx_get_pf_num(oct) != 0)
- return 1;
+ goto err_free_barx_1;
- if (cn23xx_sriov_config(oct)) {
- octeon_unmap_pci_barx(oct, 0);
- octeon_unmap_pci_barx(oct, 1);
- return 1;
- }
+ if (cn23xx_sriov_config(oct))
+ goto err_free_barx_1;
octeon_write_csr64(oct, CN23XX_SLI_MAC_CREDIT_CNT, 0x3F802080802080ULL);
@@ -1205,6 +1201,12 @@ int setup_cn23xx_octeon_pf_device(struct octeon_device *oct)
oct->coproc_clock_rate = 1000000ULL * cn23xx_coprocessor_clock(oct);
return 0;
+
+err_free_barx_0:
+ octeon_unmap_pci_barx(oct, 0);
+err_free_barx_1:
+ octeon_unmap_pci_barx(oct, 1);
+ return 1;
}
EXPORT_SYMBOL_GPL(setup_cn23xx_octeon_pf_device);
--
pw-bot: changes-requested
^ permalink raw reply related
* Re: [PATCH net 1/3] net/mlx5e: Report zero bandwidth for non-ETS traffic classes
From: Pavan Chebbi @ 2026-06-22 13:48 UTC (permalink / raw)
To: Tariq Toukan
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
netdev, Paolo Abeni, Alexei Lazar, Carolina Jubran,
Leon Romanovsky, linux-kernel, linux-rdma, Mark Bloch,
Saeed Mahameed, Gal Pressman
In-Reply-To: <20260622112925.624795-2-tariqt@nvidia.com>
[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]
On Mon, Jun 22, 2026 at 5:00 PM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> From: Alexei Lazar <alazar@nvidia.com>
>
> The IEEE 802.1Qaz standard defines that bandwidth allocation percentages
> only apply to Enhanced Transmission Selection (ETS) traffic classes.
> For STRICT and VENDOR transmission selection algorithms, bandwidth
> percentage values are not applicable.
>
> Currently for non-ETS 100 bandwidth is being reported for all traffic
> classes in the get operation due to hardware limitation, regardless of
> their TSA type.
>
> Fix this by reporting 0 for non-ETS traffic classes.
>
> Fixes: 820c2c5e773d ("net/mlx5e: Read ETS settings directly from firmware")
> Signed-off-by: Alexei Lazar <alazar@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
LGTM. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]
^ permalink raw reply
* Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Ralf Lici @ 2026-06-22 13:34 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: netdev, Daniel Gröber, Antonio Quartulli, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
linux-kernel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
Beniamino Galvani
In-Reply-To: <87tsr4gcag.fsf@toke.dk>
On Mon, 15 Jun 2026 15:31:51 +0200, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> >> >> I think a better model is to treat the device as basically a loopback
> >> >> device that translates packets before looping them back (so when they
> >> >> come back they appear to be coming from that device).
> >> >>
> >> >> Any reason why that wouldn't work?
> >> >>
> >> >
> >> > That's indeed the intended model for the ipxlat netdevice: route packets
> >> > to it, translate them, then loop them back into the stack as packets
> >> > received from that same device. That seemed like the simplest model and
> >> > the one that exposes the translation point most clearly.
> >>
> >> Right. I think this could be made a bit more explicit in the
> >> documentation as well, since it's a bit of an unusual model.
> >>
> >> And, well, taking a step back: is it really the right model? Regular NAT
> >> lives in netfilter, why can't this be a netfilter module as well? Seems
> >> to me you could have something like:
> >>
> >> table ip xlat4 {
> >> chain postrouting {
> >> type nat hook postrouting priority srcnat; policy accept;
> >> ip daddr 0.0.0.0/0 oifname "eth0" xlat to 64:ff9b::/96
> >> }
> >> }
> >> table ip6 xlat6 {
> >> chain prerouting {
> >> type nat hook prerouting priority dstnat; policy accept;
> >> ip6 saddr 64::ff0b::/96 iifname "eth0" xlat from 64::ff9b::/96
> >> }
> >> }
> >>
> >> and that would provide the functionality without having to implement a
> >> new interface type and the associated multiple traversals through the
> >> stack? Did you consider this as an alternative to the new device type?
> >>
> >
> > We did consider netfilter, and your example is syntactically attractive,
> > but I am no longer convinced it is the cleanest model for SIIT.
> >
> > An nft expression cannot simply rewrite ETH_P_IP <-> ETH_P_IPV6 and
> > return ACCEPT as if this were normal NAT because the current hook
> > invocation, dst, and conntrack-related state were established for the
> > packet as it entered that hook. A cross-family translator would need to
> > consume the skb, clear or rebuild route and ct metadata as appropriate,
> > do an other-family route lookup, and resume at a well-defined point in
> > that family. That seems possible, but it would be a new stateless
> > cross-family action, not just a new mode of the existing nft nat
> > expression (which is built around nf_nat_setup_info and assumes the
> > packet's L3 family does not change AFAICT).
>
> Right, I did not expect it would be possible to actually share code with
> the existing NAT functionality, but conceptually they're similar. I.e.,
> if I was an admin trying to figure out if my system supported SIIT
> translation, my chain of thought would be something along the line of:
> "SIIT is a variant of NAT, and I know NAT is a long-standing feature of
> netfilter, so I wonder if SIIT exists there as well".
>
> Adding the netfilter folks to Cc to try to get their attention and an
> opinion on this :)
>
That's the right move, it would be great to have their opinion on these
architectural questions.
For the netfilter folks just added: the context is a stateless
IPv6<>IPv4 SIIT translator (RFC 7915). The open design question is
whether the stateless cross-family translation should live as an nft
action, or as a route/LWT action (ILA / seg6_local lineage).
> > My second concern is that the SIIT boundary would be a property of
> > rule and hook placement. That gives flexibility, but it also means the
> > translation point has to be constrained and documented very carefully
> > to avoid ambiguous TTL/Hop Limit, PMTU/ICMP, and hook-order behavior.
> > For this use case I would rather have the route that matches the
> > translation prefix also be the object that says: leave this family
> > here and continue in the other one.
>
> Yeah, with flexibility comes the ability to shoot yourself in the foot.
> But that's not really different from much of the other functionality we
> have in the kernel today, is it? For netfilter in particular it's
> certainly possible to configure a broken NAT configuration that leads to
> packet drops (or just invalid packets being sent out on a network
> device).
>
True, misconfiguration is always possible and that alone is not an
argument against the netfilter model. But what do we actually gain in
capability from that flexibility? I agree on the UX argument (an admin
would look in nft first), but in terms of what the feature can do, I
can't yet see what the nft model unlocks. More on this just below.
> > After looking at the available kernel mechanisms again, I think the
> > better model is probably LWT: routes carry an ipxlat encap referencing a
> > named translator domain configured over netlink. That should represent
> > the stateless, prefix-based and symmetric nature of ipxlat.
>
> I think this description actually hits the nail on the head: What are we
> implementing here? Is it a product feature, or a building block for one?
> The properties you mention wrt consistency, symmetry etc are properties
> of the high-level feature (which is also generally the level things are
> specified in RFCs). Whereas other packet mangling features in the kernel
> are more in the "building block" category, where it's possible to
> configure things to implement a particular feature set / compliance with
> a particular RFC, but it's also possible to do things that are outside
> of that.
>
> I think this relates to the "mechanism, not policy" approach that we
> take to most things in the kernel: implement the building blocks to do
> something in the most general way we can, and then leave it up to
> userspace to configure things in a way that results in a consistent
> high-level system behaviour.
>
That's a good point, and I agree that we should not bake a high-level
product policy into the kernel if what we need is a reusable mechanism
(the LWT idea was my attempt at exactly that). What I am still trying to
understand is whether there is a useful generic trigger for stateless
cross-family translation beyond the route/prefix/policy-routing cases.
Routes and policy routing already cover the selectors I can make
coherent for a stateless, per-packet translator: destination/source
prefix, iif/oif/VRF, mark, TOS/DSCP, and so on. nft can of course match
much more than that, but the additional selectors that would materially
change the translation decision seem to be selectors such as L4 fields,
payload state, or conntrack state. Those are exactly the selectors I am
struggling to make correct for a stateless translator:
- non-first fragments carry no L4 header at all, yet the translator must
rewrite every fragment (an nft ... tcp dport trigger cannot fire on
them);
- ICMP errors must be translated too, but the flow identity lives in the
quoted inner header (reversed), not in anything an L4/ct match on the
error packet can see and there is no conntrack to associate them,
since this is stateless.
So an L4-conditional trigger does not look like a good primitive for
correct stateless SIIT unless the action also defragments/refragments or
uses conntrack-like state. Those may be valid mechanisms, but they move
the design away from the stateless per-packet SIIT boundary this RFC is
trying to model.
So my first question is: is there a useful nft configuration this should
enable that is not naturally expressible as route selection, while still
remaining stateless SIIT rather than a NAT64-like stateful feature?
Maybe there is a real use case there, but I cannot construct one yet.
> That being said:
>
> > Very roughly, userspace could look like:
> >
> > ip xlat add siit0 prefix6 64:ff9b::/96
> > ip route add ... encap ipxlat id siit0
> > ip -6 route add ... encap ipxlat id siit0
> >
> > There are some useful precedents for this: ILA is stateless address
> > translation as LWT, seg6_local already has cross-family LWT actions, and
> > ioam6 has a similar split between separately configured objects and
> > route attachments.
> >
> > The invariant I would like v2 to follow is that the original-family
> > route lookup selects translation as its terminal route action. The
> > translated skb then gets a fresh lookup in the other family. From that
> > point on, TTL/Hop Limit where applicable, PMTU, ICMP errors, and
> > netfilter visibility belong to the translated family.
> >
> > So I think your question addresses the core design issue in this RFC. My
> > current preference is to rework the next version around an LWT/domain
> > model instead of the virtual netdevice model, unless prototyping shows a
> > fundamental problem with that approach.
> >
> > Does that model make sense to you?
>
> I did consider this as well before suggesting netfilter as the right
> place to hook things, and I do think the route object model has some
> appeal. I agree it's a better model than the magical loopback interface,
> certainly.
>
> I think in the end this comes down to whether flexibility in how to use
> this translation mechanism is a bug or a feature, as outlined above. I'm
> leaning towards "feature", but could probably be persuaded otherwise :)
>
I agree this is the tradeoff to settle now. The reason I suggested LWT
is that making the family-A route lookup the handoff point gives the
TTL/PMTU/ICMP-and-reroute semantics a natural place to live.
So my other question, where the netfilter maintainers' input would be
very useful, is what semantics would be acceptable for changing address
family inside a netfilter hook. Concretely:
- What verdict should such an expression return? My assumption is it
has to consume the skb and reinject into the other family (NF_STOLEN
+ reinject), since ACCEPT would resume traversal in a family whose
dst/chain no longer apply.
- What becomes of the original-family conntrack entry, which is now
orphaned (and is pure overhead in the stateless case)?
- Where do TTL/Hop-Limit, PMTU/ICMP generation, and netfilter
visibility end up belonging after the family changes?
All that said, the netfilter model has its own strengths, and once we
settle the architectural questions it could turn into a very clean
design. Curious to hear your thoughts on the points above.
--
Ralf Lici
Mandelbit Srl
^ permalink raw reply
* Re: [PATCH] netdevsim: fix use-after-free in __nsim_dev_port_del
From: Simon Horman @ 2026-06-22 13:35 UTC (permalink / raw)
To: hrushirajg23
Cc: Simon Horman, kuba, andrew+netdev, davem, edumazet, pabeni, jiri,
netdev, linux-kernel, syzbot+6c25f4750230faf70be9
In-Reply-To: <20260620064937.16301-1-hrushirajg23@gmail.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
Subject: netdevsim: fix use-after-free in __nsim_dev_port_del
The patch removes the per-port debugfs directory in nsim_destroy()
before free_netdev() so that debugfs files referencing fields inside
the netdevsim struct cannot be reached after the slab is released.
Link: https://syzkaller.appspot.com/bug?extid=6c25f4750230faf70be9
> diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
> index 27e5f109f933..08136e7990cb 100644
> --- a/drivers/net/netdevsim/netdev.c
> +++ b/drivers/net/netdevsim/netdev.c
> @@ -1214,6 +1214,13 @@ void nsim_destroy(struct netdevsim *ns)
> ns->page = NULL;
> }
>
> + /*
> + * Remove per-port debugfs files before free_netdev() releases the
> + * netdevsim struct to prevent use-after-free in concurrent readers.
> + */
> + debugfs_remove_recursive(ns->nsim_dev_port->ddir);
> + ns->nsim_dev_port->ddir = NULL;
> +
> free_netdev(dev);
> }
[High]
Does the same use-after-free window remain on the nsim_create() error
path?
nsim_create() unconditionally calls nsim_ethtool_init(ns) before the
init step that can fail:
SET_NETDEV_DEVLINK_PORT(dev, &nsim_dev_port->devlink_port);
nsim_ethtool_init(ns);
if (nsim_dev_port_is_pf(nsim_dev_port))
err = nsim_init_netdevsim(ns);
else
err = nsim_init_netdevsim_vf(ns);
if (err)
goto err_free_netdev;
...
err_free_netdev:
free_netdev(dev);
return ERR_PTR(err);
nsim_ethtool_init() populates files under nsim_dev_port->ddir whose
data pointers reference fields inside ns, for example:
ethtool = debugfs_create_dir("ethtool", ns->nsim_dev_port->ddir);
debugfs_create_u32("get_err", 0600, ethtool, &ns->ethtool.get_err);
debugfs_create_u32("set_err", 0600, ethtool, &ns->ethtool.set_err);
...
dir = debugfs_create_dir("ring", ethtool);
debugfs_create_u32("rx_max_pending", 0600, dir,
&ns->ethtool.ring.rx_max_pending);
If nsim_init_netdevsim() (or nsim_init_netdevsim_vf()) fails, the
err_free_netdev label runs free_netdev(dev) directly and returns
ERR_PTR(err). The caller then calls nsim_dev_port_debugfs_exit() to
tear down ddir, but in the meantime the netdevsim slab has already
been freed while these debugfs files are still live with data pointers
into it.
Should the same debugfs_remove_recursive(nsim_dev_port->ddir) be done
on the err_free_netdev path before free_netdev(dev), or should the
debugfs file creation be moved past the point of no return so the
fuzzer-reachable failure paths (alloc pressure on mock_phc_create,
register_netdevice, nsim_psp_init, etc.) cannot leave dangling
debugfs entries?
The Fixes: tag e05b2d141fef appears to cover this path as well.
^ permalink raw reply
* Re: [PATCH net] veth: fix NAPI leak in XDP enable error path
From: Pavan Chebbi @ 2026-06-22 13:31 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
netdev, eric.dumazet, Guenter Roeck, Björn Töpel,
Daniel Borkmann, Ilias Apalodimas, Michael S. Tsirkin,
Tariq Toukan
In-Reply-To: <20260622111825.88337-1-edumazet@google.com>
[-- Attachment #1: Type: text/plain, Size: 1610 bytes --]
On Mon, Jun 22, 2026 at 4:48 PM Eric Dumazet <edumazet@google.com> wrote:
>
> During XDP enablement in veth, if xdp_rxq_info_reg() or
> xdp_rxq_info_reg_mem_model() fails, the driver rolls back the changes.
>
> However, the rollback loop:
> for (i--; i >= start; i--) {
>
> decrements the loop index 'i' before the first iteration. This
> correctly skips unregistering the rxq for the failed index 'i' (as
> registration failed or was already cleaned up), but it also
> erroneously skips calling netif_napi_deli() for rq[i].xdp_napi.
>
> Since netif_napi_add() was already called for index 'i', this leaves
> a dangling napi_struct in the device's napi_list. When the veth
> device is later destroyed, the freed queue memory (which contains the
> leaked NAPI structure) can be reused.
>
> The subsequent device teardown iterates the NAPI list and
> corrupts the reallocated memory, leading to UAF.
>
> Fix this by explicitly deleting the NAPI association for the failed
> index 'i' before rolling back the successfully configured queues.
>
> Fixes: b02e5a0ebb17 ("xsk: Propagate napi_id to XDP socket Rx path")
> Reported-by: Guenter Roeck <groeck@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Björn Töpel <bjorn.topel@intel.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Tariq Toukan <tariqt@nvidia.com>
> ---
> drivers/net/veth.c | 2 ++
> 1 file changed, 2 insertions(+)
>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]
^ permalink raw reply
* Re: [PATCH net 1/5] net: hns3: unify copper port ksettings configuration path
From: Jijie Shao @ 2026-06-22 13:30 UTC (permalink / raw)
To: Simon Horman
Cc: shaojijie, davem, edumazet, kuba, pabeni, andrew+netdev,
shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
netdev, linux-kernel
In-Reply-To: <20260618153356.909862-1-horms@kernel.org>
on 2026/6/18 23:33, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> net: hns3: unify copper port ksettings configuration path
>
> Refactor hns3_set_link_ksettings() and hclge_set_phy_link_ksettings()
> to unify the configuration path for copper ports.
>
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
> [ ... ]
>> @@ -863,6 +859,8 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
>> struct hnae3_handle *handle = hns3_get_handle(netdev);
>> struct hnae3_ae_dev *ae_dev = hns3_get_ae_dev(handle);
>> const struct hnae3_ae_ops *ops = hns3_get_ops(handle);
>> + u8 module_type = HNAE3_MODULE_TYPE_UNKNOWN;
>> + u8 media_type = HNAE3_MEDIA_TYPE_UNKNOWN;
>> int ret;
>>
>> /* Chip don't support this mode. */
>> @@ -878,22 +876,20 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
>> cmd->base.autoneg, cmd->base.speed, cmd->base.duplex,
>> cmd->lanes);
>>
>> - /* Only support ksettings_set for netdev with phy attached for now */
>> - if (netdev->phydev) {
>> - if (cmd->base.speed == SPEED_1000 &&
>> - cmd->base.autoneg == AUTONEG_DISABLE)
>> - return -EINVAL;
> [Severity: Medium]
> Does removing this validation allow invalid configurations for native copper
> PHYs?
>
> The previous code explicitly rejected SPEED_1000 with AUTONEG_DISABLE because
> IEEE 802.3 1000BASE-T strictly requires autonegotiation. By delegating
> directly to phy_ethtool_ksettings_set(), which permits this for 1000BASE-X,
> the driver might now silently accept an invalid configuration.
[Accepted] Yes, this is indeed an effective bug.
>
>> - return phy_ethtool_ksettings_set(netdev->phydev, cmd);
>> - } else if (test_bit(HNAE3_DEV_SUPPORT_PHY_IMP_B, ae_dev->caps) &&
>> - ops->set_phy_link_ksettings) {
>> + if (!ops->get_media_type)
>> + return -EOPNOTSUPP;
>> + ops->get_media_type(handle, &media_type, &module_type);
>>
>> - return ops->set_phy_link_ksettings(handle, cmd);
>> - }
>> + if (media_type == HNAE3_MEDIA_TYPE_COPPER) {
>> + if (!ops->set_phy_link_ksettings)
>> + return -EOPNOTSUPP;
>> + return ops->set_phy_link_ksettings(handle, cmd);
>> + }
> [ ... ]
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> [ ... ]
>> @@ -3400,10 +3400,28 @@ hclge_set_phy_link_ksettings(struct hnae3_handle *handle,
>> return ret;
>> }
>>
>> + linkmode_copy(hdev->hw.mac.advertising, cmd->link_modes.advertising);
>> + return 0;
>> +}
>> +
>> +static int
>> +hclge_set_phy_link_ksettings(struct hnae3_handle *handle,
>> + const struct ethtool_link_ksettings *cmd)
>> +{
>> + struct hclge_vport *vport = hclge_get_vport(handle);
>> + struct hclge_dev *hdev = vport->back;
>> + int ret;
>> +
>> + if (hnae3_dev_phy_imp_supported(hdev))
>> + ret = hclge_ethtool_ksettings_set(handle, cmd);
>> + else
>> + ret = phy_ethtool_ksettings_set(handle->netdev->phydev, cmd);
> [Severity: Critical]
> Can handle->netdev->phydev be NULL here, leading to a kernel panic when called
> from userspace?
>
> A copper port can legitimately have a NULL phydev if the port's PHY address
> is configured as PHY_INEXISTENT in the firmware.
>
> Because the updated hns3_set_link_ksettings() now routes all
> HNAE3_MEDIA_TYPE_COPPER configurations here without the previous
> if (netdev->phydev) guard, phy_ethtool_ksettings_set() will unconditionally
> dereference phydev->mdio.addr.
[Accepted] Yes, this is indeed an effective bug.
Thanks,
Jijie Shao
^ permalink raw reply
* Re: [PATCH] fsl/fman: Free init resources on KeyGen failure in fman_init()
From: Pavan Chebbi @ 2026-06-22 13:27 UTC (permalink / raw)
To: Haoxiang Li
Cc: madalin.bucur, sean.anderson, andrew+netdev, davem, edumazet,
kuba, pabeni, florinel.iordache, netdev, linux-kernel, stable
In-Reply-To: <20260622111638.997251-1-haoxiang_li2024@163.com>
[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]
On Mon, Jun 22, 2026 at 4:47 PM Haoxiang Li <haoxiang_li2024@163.com> wrote:
>
> fman_muram_alloc() allocates initialization resources before
> initializing the KeyGen block. If keygen_init() fails, the
> function returns -EINVAL directly and leaves those resources
> allocated. Free the initialization resources before returning
> from the KeyGen failure path.
>
> While at it, drop the unused error check around enable(), which
> always returns 0.
>
> Fixes: 7472f4f281d0 ("fsl/fman: enable FMan Keygen")
> Cc: stable@kernel.org
> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
> ---
> drivers/net/ethernet/freescale/fman/fman.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
LGTM. Note that you should add "net" to the fixes patch titles.
Otherwise you see patch automation has guessed this one for net-next.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
> diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
> index 013273a2de32..3a2a57207e55 100644
> --- a/drivers/net/ethernet/freescale/fman/fman.c
> +++ b/drivers/net/ethernet/freescale/fman/fman.c
> @@ -1995,12 +1995,12 @@ static int fman_init(struct fman *fman)
>
> /* Init KeyGen */
> fman->keygen = keygen_init(fman->kg_regs);
> - if (!fman->keygen)
> + if (!fman->keygen) {
> + free_init_resources(fman);
> return -EINVAL;
> + }
>
> - err = enable(fman, cfg);
> - if (err != 0)
> - return err;
> + enable(fman, cfg);
>
> enable_time_stamp(fman);
>
> --
> 2.25.1
>
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]
^ permalink raw reply
* Re: [PATCH net-next v3] virtio-net: xsk: support tx wake up
From: Michael S. Tsirkin @ 2026-06-22 13:24 UTC (permalink / raw)
To: Menglong Dong
Cc: Menglong Dong, xuanzhuo, eperezma, jasowang, andrew+netdev, davem,
edumazet, kuba, pabeni, netdev, virtualization, linux-kernel
In-Reply-To: <BMgCMK8nRYC94QK7N1SXpQ@linux.dev>
On Mon, Jun 22, 2026 at 08:27:12PM +0800, Menglong Dong wrote:
> On 2026/6/22 06:31 Michael S. Tsirkin <mst@redhat.com> write:
> > On Tue, Jun 16, 2026 at 07:59:12PM +0800, Menglong Dong wrote:
> [...]
> > >
> > > + vring_size = virtqueue_get_vring_size(sq->vq);
> > > + need_wakeup = xsk_uses_need_wakeup(pool);
> > > +
> > > + if (need_wakeup && vring_size == sq->vq->num_free)
> > > + xsk_set_tx_need_wakeup(pool);
> > > +
> >
> > why are we doing this here?
> > the check after virtnet_xsk_xmit_batch not enough?
> > I vaguely think it's some kind of race we are closing?
> > Pls add a comment to explain.
>
> Hi, Michael. Thanks for your review.
>
> Yeah, it's for a race condition between user space and kernel
> space. I added a comment in V2, which is too confusing, and
> I removed it 😢. I'll make it more clear and add it in the V4. The
> origin comment is:
>
> * If the sq->vq is empty, and the tx ring is empty, and the user
> * submit an entry to the tx ring after virtnet_xsk_xmit_batch() and
> * before xsk_set_tx_need_wakeup(), we will lose the chance to wake
> * up the tx napi, so we have to set the need_wakeup flag here.
>
> And the logic is like this:
>
> Kernel: tx NAPI is waked up from skb_xmit_done() ->
> Kernel: sq->vq and xsk->tx_ring are both empty ->
> Kernel: call virtnet_xsk_xmit_batch()
>
> User: submit a entry to the xsk->tx_ring
> User: check the wakeup flag
> User: wakeup flag is not set, skip send()
>
> Kernel: call xsk_set_tx_need_wakeup(), because sq->vq is empty
>
> If we don't send more data, the data in the xsk->tx_ring will
> not be sent forever.
I'm not 100% sure I understand, but when someone fixes cross-CPU races
with no synchronization or CPU memory barriers just with extra checks,
this always gives me pause.
AI helped write this for me, for example:
1. Kernel: xsk_set_tx_need_wakeup stores NEED_WAKEUP (sits in store buffer)
2. Kernel: xsk_tx_peek_release_desc_batch - load, sees empty (reordered before the store is globally visible)
3. Kernel: peek finds nothing, returns 0
4. Userspace: stores entry + producer
5. Userspace: loads flags - doesn't see NEED_WAKEUP yet (still in kernel's store buffer)
6. Userspeace: skips send()
7. Kernel: NEED_WAKEUP store finally becomes visible - too late
Seems legit?
> >
> > > sent = virtnet_xsk_xmit_batch(sq, pool, budget, &kicks);
> > >
> > > + if (need_wakeup) {
> > > + if (vring_size == sq->vq->num_free)
> > > + /* we can't wake up by ourself, and it should be done
> > > + * by the user.
> > > + */
> > > + xsk_set_tx_need_wakeup(pool);
> > > + else
> > > + /* we can wake up from skb_xmit_done() */
> > > + xsk_clear_tx_need_wakeup(pool);
> >
> > But what if we don't have get tx napi so no wakeup in skb_xmit_done?
>
> Sorry that I'm not sure what "get tx napi" means here ;(
>
> There are entry in sq->vq, so skb_xmit_done() will be called after
> the entries in the ring is consumed by the HOST, right?
> Then, the corresponding sq->napi will be scheduled, as we ensure
> that tx napi is always enabled, which means napi->weight is not
> zero, in this commit:
> 1df5116a41a8 ("virtio_net: xsk: prevent disable tx napi")
Oh I forgot we did that. But can xsk bind when tx napi has already
been disabled previously?
> Right?
>
> Thanks!
> Menglong Dong
>
> >
> >
> > > + }
> > > +
> > > if (!is_xdp_raw_buffer_queue(vi, sq - vi->sq))
> > > check_sq_full_and_disable(vi, vi->dev, sq);
> > >
> > > @@ -1470,9 +1488,6 @@ static bool virtnet_xsk_xmit(struct send_queue *sq, struct xsk_buff_pool *pool,
> > > u64_stats_add(&sq->stats.xdp_tx, sent);
> > > u64_stats_update_end(&sq->stats.syncp);
> > >
> > > - if (xsk_uses_need_wakeup(pool))
> > > - xsk_set_tx_need_wakeup(pool);
> > > -
> > > return sent;
> > > }
> > >
> > > --
> > > 2.54.0
> >
> >
> >
>
>
>
^ permalink raw reply
* Re: [PATCH net 2/5] net: hns3: refactor MAC autoneg and speed configuration
From: Jijie Shao @ 2026-06-22 13:18 UTC (permalink / raw)
To: Simon Horman
Cc: shaojijie, davem, edumazet, kuba, pabeni, andrew+netdev,
shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
netdev, linux-kernel
In-Reply-To: <20260618153419.909913-1-horms@kernel.org>
on 2026/6/18 23:34, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://netdev-ai.bots.linux.dev/sashiko/
> ---
> net: hns3: refactor MAC autoneg and speed configuration
>
> This refactor extracts the MAC autoneg and speed/duplex/lane configuration
> out of hclge_mac_init() into a new helper hclge_set_autoneg_speed_dup(),
> and invokes it from both hclge_init_ae_dev() and hclge_reset_ae_dev().
>
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> index 5a00797d9252..2c74675b149f 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> @@ -2957,20 +2957,6 @@ static int hclge_mac_init(struct hclge_dev *hdev)
>> if (!test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
>> hdev->hw.mac.duplex = HCLGE_MAC_FULL;
>>
>> - if (hdev->hw.mac.support_autoneg) {
>> - ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.autoneg);
>> - if (ret)
>> - return ret;
>> - }
>> -
>> - if (!hdev->hw.mac.autoneg) {
>> - ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed,
>> - hdev->hw.mac.req_duplex,
>> - hdev->hw.mac.lane_num);
>> - if (ret)
>> - return ret;
>> - }
>> -
>> mac->link = 0;
> [ ... ]
>
>> @@ -11901,6 +11908,13 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
>> if (ret)
>> goto err_ptp_uninit;
>>
>> + ret = hclge_set_autoneg_speed_dup(hdev);
>> + if (ret) {
>> + dev_err(&pdev->dev,
>> + "failed to set autoneg speed duplex, ret = %d\n", ret);
>> + goto err_ptp_uninit;
>> + }
>> +
> [High]
>
> Has the relocation of the autoneg/speed configuration in hclge_init_ae_dev()
> been considered against the surrounding init order?
>
> Previously the equivalent calls ran inside hclge_mac_init() before
> hclge_update_port_info(). After this patch the helper runs after
> hclge_update_port_info(), and hclge_update_port_info() ->
> hclge_get_sfp_info() unconditionally overwrites hdev->hw.mac.autoneg,
> hdev->hw.mac.support_autoneg and hdev->hw.mac.lane_num from firmware
> whenever resp->speed_ability is non-zero.
>
> The new helper then feeds the firmware-supplied lane_num back through:
>
> if (!hdev->hw.mac.autoneg) {
> ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed,
> hdev->hw.mac.req_duplex,
> hdev->hw.mac.lane_num);
>
> On a non-copper port where firmware reports forced mode (autoneg=0) and a
> stale lane_num inherited from a previous link lifecycle (for example
> lane_num=4 from a prior 100G state), the (req_speed, lane_num) pair given
> to hclge_cfg_mac_speed_dup_hw() can be inconsistent (for example 25G with
> 4 lanes), and firmware rejects it with -EINVAL, which propagates back to
> hclge_init_ae_dev() and unwinds via err_ptp_uninit, failing the entire
> probe.
>
> The follow-up commit a7e52c16ecbf "net: hns3: fix init failure caused by
> lane_num contamination" later in this series fixes this by introducing a
> separate req_lane_num=0 and passing that to hclge_cfg_mac_speed_dup_hw()
> instead of mac.lane_num. Should the fix be folded into this patch so the
> series does not introduce a probe-failure regression at this point and
> then repair it?
[Accepted] This issue has been fixed in the patch "net: hns3: Fix initialization failure caused by lane_num pollution."
It is recommended that I merge that patch with the current patch. Sure.
>
> [Medium]
>
> After this change, the last write to MAC speed/duplex on the init path is
> hclge_cfg_mac_speed_dup_hw() inside the new helper, which programs
> hardware but does not update the cached hdev->hw.mac.speed and
> hdev->hw.mac.duplex. Previously the last writer was
> hclge_cfg_mac_speed_dup() inside hclge_update_port_info(), which updates
> both hardware and the cached fields.
>
> When req_speed differs from the firmware-reported mac.speed (which can
> happen if the active SFP speed differs from cfg.default_speed parsed in
> hclge_configure()), is hardware left programmed with req_speed while
> mac.speed retains the firmware-reported value, leaving the cache out of
> sync with the actual hardware configuration until the periodic service
> task refreshes mac.speed?
[Reject] Yes, Therefore, this issue is not a valid question.
>
>> @@ -12231,6 +12245,13 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev)
>> return ret;
>> }
>>
>> + ret = hclge_set_autoneg_speed_dup(hdev);
>> + if (ret) {
>> + dev_err(&pdev->dev,
>> + "failed to set autoneg speed duplex, ret = %d\n", ret);
>> + return ret;
>> + }
>> +
>> ret = hclge_tp_port_init(hdev);
>> if (ret) {
>> dev_err(&pdev->dev, "failed to init tp port, ret = %d\n",
> [Low]
>
> Could the commit message be tightened to match the resulting code?
>
> The message says the helper "will also be invoked during the hardware
> reset recovery path to re-apply link settings without repeating
> unnecessary buffer or MTU initializations." However hclge_reset_ae_dev()
> still calls hclge_mac_init() immediately before hclge_set_autoneg_speed_dup(),
> and hclge_mac_init() still performs hclge_set_mac_mtu(),
> hclge_set_default_loopback() and hclge_buffer_alloc(), so the avoidance
> of MTU/buffer re-initialization described in the changelog is not
> realized here.
>
> The commit message is also framed as a pure readability/cyclomatic-
> complexity refactor, but the autoneg/speed configuration is not just
> extracted: in hclge_init_ae_dev() it is moved to a materially later
> point (after hclge_update_port_info(), TSO, GRO, VLAN, TM, RSS, FD and
> PTP setup), versus the original code where it was the very first
> hardware-touching step inside hclge_mac_init(). The reset path keeps the
> original relative ordering (helper runs immediately after
> hclge_mac_init()), so the init and reset paths are now asymmetric in
> placement, and the rationale for the late placement on the init path is
> not stated.
[Accepted] ok
^ permalink raw reply
* [PATCH net v2] nfc: llcp: fix OOB read and u8 offset wrap in TLV parsers
From: Muhammad Bilal @ 2026-06-22 13:18 UTC (permalink / raw)
To: David Heidelberg, netdev
Cc: davem, edumazet, kuba, pabeni, horms, krzk, oe-linux-nfc,
linux-kernel, Muhammad Bilal, stable
nfc_llcp_parse_gb_tlv() and nfc_llcp_parse_connection_tlv() contain
three related bugs in their TLV parsing loops:
1. 'offset' is declared u8 but tlv_array_len is u16. When TLV data
advances offset past 255 it silently wraps to zero, causing
infinite loops or double-processing of buffer data.
2. Before reading tlv[0] (type) and tlv[1] (length) there is no
check that offset+2 <= tlv_array_len. A truncated TLV causes
an OOB read of one byte past the buffer end.
3. After reading the length field, the value bytes are accessed
without checking offset+2+length <= tlv_array_len. A crafted
length=0xFF on a short buffer causes up to 255 bytes of OOB
read past the buffer end.
Both functions are reachable without authentication via
nfc_llcp_set_remote_gb() which feeds remote LLCP general bytes
directly into nfc_llcp_parse_gb_tlv() with no additional
validation.
Fix all three issues by widening offset from u8 to u16 and adding
bounds checks for both the TLV header and value field before each
access.
Fixes: 3df40eb3a2ea ("nfc: constify several pointers to u8, char and sk_buff")
Cc: stable@vger.kernel.org
Signed-off-by: Muhammad Bilal <meatuni001@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
---
Notes:
v2:
- Rebased onto current nfc/for-next.
- Dropped the previous nfc_llcp_recv_snl() fix since equivalent checks
were merged by commit ed85d4cbbfaa ("nfc: llcp: bound SNL TLV parsing
to the skb and add length checks").
- Retain only the fixes for u8 offset wraparound and missing TLV bounds
checks in nfc_llcp_parse_gb_tlv() and nfc_llcp_parse_connection_tlv().
- Reject invalid TLVs silently with -EINVAL; dropped the v1 pr_err()
logging, which was reachable from a remote peer.
Link: https://lore.kernel.org/netdev/20260519011937.12903-1-meatuni001@gmail.com/
net/nfc/llcp_commands.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c
index 291f26facbf3a..ca89fe967d6a2 100644
--- a/net/nfc/llcp_commands.c
+++ b/net/nfc/llcp_commands.c
@@ -193,7 +193,8 @@ int nfc_llcp_parse_gb_tlv(struct nfc_llcp_local *local,
const u8 *tlv_array, u16 tlv_array_len)
{
const u8 *tlv = tlv_array;
- u8 type, length, offset = 0;
+ u8 type, length;
+ u16 offset = 0;
pr_debug("TLV array length %d\n", tlv_array_len);
@@ -201,9 +202,15 @@ int nfc_llcp_parse_gb_tlv(struct nfc_llcp_local *local,
return -ENODEV;
while (offset < tlv_array_len) {
+ if (offset + 2 > tlv_array_len)
+ return -EINVAL;
+
type = tlv[0];
length = tlv[1];
+ if (offset + 2 + length > tlv_array_len)
+ return -EINVAL;
+
pr_debug("type 0x%x length %d\n", type, length);
switch (type) {
@@ -243,7 +250,8 @@ int nfc_llcp_parse_connection_tlv(struct nfc_llcp_sock *sock,
const u8 *tlv_array, u16 tlv_array_len)
{
const u8 *tlv = tlv_array;
- u8 type, length, offset = 0;
+ u8 type, length;
+ u16 offset = 0;
pr_debug("TLV array length %d\n", tlv_array_len);
@@ -251,9 +259,15 @@ int nfc_llcp_parse_connection_tlv(struct nfc_llcp_sock *sock,
return -ENOTCONN;
while (offset < tlv_array_len) {
+ if (offset + 2 > tlv_array_len)
+ return -EINVAL;
+
type = tlv[0];
length = tlv[1];
+ if (offset + 2 + length > tlv_array_len)
+ return -EINVAL;
+
pr_debug("type 0x%x length %d\n", type, length);
switch (type) {
base-commit: ed85d4cbbfaa4e630c5aa0d607348b42620d976b
--
2.54.0
^ permalink raw reply related
* Re: [RFC v2] Enabling CONFIG_NTP_PPS for NOHZ by adding ntp_error to system_time_snapshot
From: David Woodhouse @ 2026-06-22 13:11 UTC (permalink / raw)
To: Thomas Gleixner, John Stultz, Stephen Boyd, Miroslav Lichvar,
Richard Cochran, linux-kernel, netdev
Cc: Rodolfo Giometti, Alexander Gordeev
In-Reply-To: <fea6f61960132ffafc94a8b74adc37d8dd2f79be.camel@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 4094 bytes --]
On Mon, 2026-06-22 at 10:04 +0100, David Woodhouse wrote:
> Hm, I'm leaning towards adding it unconditionally in
> ktime_get_snapshot_id() and get_device_system_crosststamp(), and not
> adding the extra field to the system_time_snapshot at all...
That works.
https://git.infradead.org/?p=users/dwmw2/linux.git;a=shortlog;h=refs/heads/pps
David Woodhouse (4):
timekeeping: Apply extrapolated ntp_error to clock snapshots
pps: Drop the !NO_HZ_COMMON dependency from NTP_PPS
pps: Always use ktime_get_snapshot_id() for pps_get_ts()
ptp: ptp_vmclock: Add simulated 1PPS support
Booted with NOHZ_FULL in QEMU with a test program as /init that sets
the clock coarsely from vmclock using clock_settime(), then binds the
simulated 1PPS signal to the kernel's hardpps support.
Then uses PTP_SYS_OFFSET_PRECISE() every second to see the difference
between the kernel's timekeeping, and the vmclock that it's supposed to
be converging to.
The only "problem" is that if there's occasionally a nanosecond of
jitter after ntpdata->pps_jitter has literally reached zero,
hardpps_update_phase() complains:
[ 33.917103] hardpps: PPSJITTER: jitter=1, limit=0
That's *probably* only an issue for the vmclock hack; even with
upcoming PTM-based PPS devices that literally latch the counter value
on the pulse, I don't think it'll literally get to *zero*.
When chronyd on the host adjusts the time, we see it propagate to the
guest and the guest's timekeeping quickly converge from +36ns to +0ns
again...
[ 0.653414] Run /init as init process
PPS: coarse-set CLOCK_REALTIME from vmclock
[ 1.655824] pps pps0: bound kernel consumer: edge=0x1
PPS: enabled, bound to hardpps, STA_PPSTIME|STA_PPSFREQ set
PRECISE: dev-sys(adj)=+5609ns OK
EXT[ 0] diff=+5608ns
EXT[ 1] diff=+5262ns
[ 2.917076] hardpps: PPSJITTER: jitter=5173, limit=0
EXT[ 2] diff=+4914ns
EXT[ 3] diff=+1197ns
EXT[ 4] diff=-297ns
EXT[ 5] diff=-103ns
EXT[ 6] diff=+0ns
EXT[ 7] diff=+0ns
EXT[ 8] diff=-1ns
EXT[ 9] diff=+0ns
EXT[10] diff=+0ns
EXT[11] diff=+0ns
EXT[12] diff=+0ns
EXT[13] diff=+0ns
EXT[14] diff=+0ns
EXT[15] diff=+0ns
EXT[16] diff=+1ns
EXT[17] diff=+0ns
EXT[18] diff=+0ns
EXT[19] diff=-1ns
EXT[20] diff=-1ns
EXT[21] diff=+0ns
EXT[22] diff=+0ns
EXT[23] diff=-1ns
EXT[24] diff=-1ns
EXT[25] diff=-1ns
EXT[26] diff=+0ns
EXT[27] diff=+0ns
EXT[28] diff=+0ns
EXT[29] diff=+0ns
EXT[30] diff=+0ns
EXT[31] diff=+0ns
EXT[32] diff=+1ns
[ 33.917103] hardpps: PPSJITTER: jitter=1, limit=0
EXT[33] diff=+1ns
[ 34.917102] hardpps: PPSJITTER: jitter=1, limit=0
EXT[34] diff=+0ns
EXT[35] diff=+0ns
EXT[36] diff=+0ns
[ 37.917105] hardpps: PPSJITTER: jitter=1, limit=0
EXT[37] diff=+0ns
[ 38.917010] hardpps: PPSJITTER: jitter=1, limit=0
EXT[38] diff=-1ns
EXT[39] diff=+0ns
EXT[40] diff=+0ns
EXT[41] diff=-1ns
EXT[42] diff=-1ns
EXT[43] diff=+0ns
[ 44.917107] hardpps: PPSJITTER: jitter=1, limit=0
EXT[44] diff=+0ns
[ 45.917105] hardpps: PPSJITTER: jitter=1, limit=0
EXT[45] diff=-1ns
EXT[46] diff=+0ns
EXT[47] diff=-1ns
EXT[48] diff=+0ns
EXT[49] diff=-1ns
EXT[50] diff=-1ns
EXT[51] diff=+0ns
EXT[52] diff=-1ns
[ 53.917104] hardpps: PPSJITTER: jitter=1, limit=0
EXT[53] diff=+0ns
[ 54.917099] hardpps: PPSJITTER: jitter=1, limit=0
EXT[54] diff=+0ns
EXT[55] diff=+0ns
EXT[56] diff=+0ns
vmclock: host_cv=0x998433b116699 offset=0xfff667dd86c7d42d guest_cv=0x20c1d93ac6 tsc_khz=2400000
EXT[57] diff=+0ns
[ 58.917101] hardpps: PPSJITTER: jitter=18, limit=0
EXT[58] diff=+36ns
EXT[59] diff=+26ns
EXT[60] diff=+41ns
EXT[61] diff=+44ns
EXT[62] diff=+26ns
EXT[63] diff=+40ns
EXT[64] diff=+31ns
EXT[65] diff=+33ns
EXT[66] diff=+17ns
EXT[67] diff=+21ns
EXT[68] diff=+23ns
EXT[69] diff=+14ns
EXT[70] diff=+10ns
EXT[71] diff=+21ns
EXT[72] diff=+23ns
EXT[73] diff=+24ns
EXT[74] diff=+14ns
EXT[75] diff=+21ns
EXT[76] diff=+23ns
EXT[77] diff=+14ns
EXT[78] diff=+20ns
EXT[79] diff=+23ns
EXT[80] diff=+24ns
EXT[81] diff=+14ns
EXT[82] diff=+2ns
EXT[83] diff=+0ns
EXT[84] diff=+0ns
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply
* Re: [PATCH net v2 2/2] net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels
From: Lorenzo Bianconi @ 2026-06-22 13:11 UTC (permalink / raw)
To: Simon Horman
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Wayen Yan, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260622123136.GC827683@horms.kernel.org>
[-- Attachment #1: Type: text/plain, Size: 1689 bytes --]
> On Fri, Jun 19, 2026 at 01:37:14PM +0200, Lorenzo Bianconi wrote:
> > airoha_tc_htb_alloc_leaf_queue() assigns queue IDs based on the channel
> > index (opt->qid = AIROHA_NUM_TX_RING + channel), but updates
> > real_num_tx_queues with a simple increment (num_tx_queues + 1). When QoS
> > channels are allocated sparsely (e.g., channels 0 and 3 without 1 and
> > 2), the returned qid can exceed real_num_tx_queues, causing out-of-bounds
> > accesses in the networking stack.
> > For example, allocating channel 0 then channel 3 results in
> > real_num_tx_queues = 34 but qid = 35, which is out of range [0, 34).
> > Fix this by computing real_num_tx_queues based on the highest active
> > channel index rather than using a simple counter, in both the allocation
> > and deletion paths.
> >
> > Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>
> Thanks for the update since v1.
>
> Reviewed-by: Simon Horman <horms@kernel.org>
Hi Simon,
thx for the review.
>
> FTR, there is an AI-generated review of this patch on sashiko.dev.
> I do not think that should impede the progress of this patch but
> you may want to consider it in the context of follow-up.
Even if it is not introduced by this patch, I do not think what is reported
by Sashiko is a real issue since airoha_eth driver implements
ndo_select_queue() callback and the selected queue is always in the range
[0, AIROHA_NUM_TX_RING[. HTB queues (in the range
[AIROHA_NUM_TX_RING, AIROHA_NUM_TX_RING + AIROHA_NUM_QOS_CHANNELS[) are just
'offloaded' and never used in the TC sw path. Agree?
Regards,
Lorenzo
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply
* Re: [PATCH net 4/5] net: hns3: differentiate autoneg default values between copper and fiber
From: Jijie Shao @ 2026-06-22 13:11 UTC (permalink / raw)
To: Simon Horman
Cc: shaojijie, davem, edumazet, kuba, pabeni, andrew+netdev,
shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
netdev, linux-kernel
In-Reply-To: <20260618153450.909964-1-horms@kernel.org>
on 2026/6/18 23:34, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> net: hns3: differentiate autoneg default values between copper and fiber
>
> This commit addresses a link loss issue during driver initialization on
> optical ports connected to forced-mode remote switches. It modifies
> hclge_init_ae_dev to set autonegotiation based on media type. Copper ports
> retain the AUTONEG_ENABLE default, while optical ports inherit the firmware
> pre-configured autoneg setting.
>
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> @@ -11916,6 +11916,9 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
>> if (ret)
>> goto err_ptp_uninit;
>>
>> + if (hdev->hw.mac.media_type != HNAE3_MEDIA_TYPE_COPPER)
>> + hdev->hw.mac.req_autoneg = hdev->hw.mac.autoneg;
>> +
> [Severity: High]
> Does this apply the wrong speed when autonegotiation is disabled on fiber
> ports?
>
> During driver initialization, hclge_update_port_info() dynamically probes
> the SFP module and updates mac->speed, but does not update mac->req_speed,
> which retains its statically configured default capability.
>
> When hdev->hw.mac.autoneg is 0 (forced-mode), req_autoneg becomes 0, and
> then hclge_set_autoneg_speed_dup() uses req_speed:
>
> drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:hclge_set_autoneg_speed_dup() {
> ...
> if (!hdev->hw.mac.req_autoneg) {
> ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed,
> ...
> }
>
> Could this override the dynamically probed SFP speed with the static default
> speed?
[Reject] This is initializing the path. mac->speed has been parsed in hclge_configure()
and an initial value has been assigned to req_speed.
>
> [Severity: High]
> Does this unconditionally disable autonegotiation for V1 optical hardware?
>
> Looking at hclge_update_port_info(), it only queries SFP information via
> hclge_get_sfp_info() for V2 hardware. For V1 hardware, it falls back to
> hclge_get_sfp_speed(), which leaves mac->autoneg uninitialized (0):
>
> drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:hclge_update_port_info() {
> ...
> if (hdev->ae_dev->dev_version >= HNAE3_DEVICE_VERSION_V2) {
> speed = mac->speed;
> ret = hclge_get_sfp_info(hdev, mac);
> } else {
> speed = HCLGE_MAC_SPEED_UNKNOWN;
> ret = hclge_get_sfp_speed(hdev, &speed);
> }
> ...
> }
>
> If mac->autoneg is left as 0, wouldn't req_autoneg unconditionally be set
> to 0 here, overriding the speed and breaking connectivity for V1 devices?
[Reject] No, this process is the same as before the modification.
Before the modification, mac->autoneg was not actively assigned a value (it was initialized to 0 by default).
Thanks,
Jijie Shao
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox