Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 1/3] net: busy-poll: introduce sk_tx_busy_loop()
From: Menglong Dong @ 2026-06-17 12:00 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Menglong Dong, Jakub Kicinski, jasowang, mst, xuanzhuo, eperezma,
	andrew+netdev, davem, edumazet, pabeni, magnus.karlsson, sdf,
	horms, ast, daniel, hawk, john.fastabend, bjorn, kerneljasonxing,
	netdev, virtualization, linux-kernel, bpf
In-Reply-To: <ajJrckiXEUztBQDz@boxer>

On Wed, Jun 17, 2026 at 5:40 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Sun, Jun 14, 2026 at 06:12:46PM +0800, Menglong Dong wrote:
> > On 2026/6/14 02:21, Jakub Kicinski wrote:
> > > On Thu, 11 Jun 2026 15:12:40 +0800 menglong8.dong@gmail.com wrote:
[...]
> >
> > I'm not sure if it is a good idea to introduce the sk_tx_busy_loop().
> > Maybe we can modify the driver instead by using the same NAPI
> > for both data sending and receiving, just like others do. The
> > advantage of introduce sk_tx_busy_loop() is that we can split the
> > data sending and receiving, which maybe more efficient.
>
> Would be good if you back your changes by any performance numbers. I
> believe that drivers do tx processing via rx napi as before AF_XDP it was
> only about cleaning up writebacks, AF_XDP added more weight via actual tx
> descriptors submission.
>
> Maybe you can vibe-code virtio-net to work only with rx napi and see what
> are the results.

Hi, Maciej. I have not done such performance testing yet. It's a good
and interesting
idea to do such testing on viriot-net, and I'll do it. If there is no obvious
performance differences, I'll modify virtio-net by sending data via rx napi
instead.

>
> Side note/question - Do you have a tx-only use case for AF_XDP ? I am
> planning (for a long time actually) to implement asymmetric AF_XDP
> sockets. Currently for ZC scenarios xsk socket occupies both rx and tx
> queues even when you do rx or tx only.

I think this is an interesting idea, and will be helpful in some cases.
I'm improving the performance of MySQL with AF_XDP. For this case,
tx-only is not suitable, as data reading and writing are both needed.

But for the other case, such as Redis, data reading is mostly. And in
this case, I think it's a good idea to use such "tx-only" ZC AF_XDP.

In my case, I don't want to occupy the whole NIC or the whole queue
with AF_XDP, and the other users can use the NIC too. However, the
ZC of AF_XDP has a little additional overhead to the skb in rx path,
as there is an extra data copy.

If such "tx-only" ZC is supported, the performance of AF_XDP is still
good in the read mostly case, and doesn't have additional overhead to
others too.

I haven't used AF_XDP for such a "reading mostly" case yet, so I'm not
sure if I'm right ;)

Thanks!
Menglong Dong

>
> >
> > >
> > > Third, this series does not apply.
> >
> > Ah, I'll rebase this series if a V2 is acceptable.
> >
> > Thanks!
> > Menglong Dong
> >
> > >
> > >
> >
> >
> >
> >

^ permalink raw reply

* RE: [PATCH net-next v5 1/4] dpll: add DPLL_PIN_TYPE_INT_NCO pin type
From: Kubalewski, Arkadiusz @ 2026-06-17 11:59 UTC (permalink / raw)
  To: Vecera, Ivan, Jiri Pirko, Vadim Fedorenko, Jakub Kicinski
  Cc: netdev@vger.kernel.org, Jiri Pirko, David S. Miller,
	Donald Hunter, Eric Dumazet, Jakub Kicinski, Schmidt, Michal,
	Paolo Abeni, Vaananen, Pasi, Oros, Petr, Prathosh Satish,
	Simon Horman, Vadim Fedorenko, linux-kernel@vger.kernel.org
In-Reply-To: <ca33b9b8-aafa-40f0-9943-a6b6736af4e4@redhat.com>

>From: Ivan Vecera <ivecera@redhat.com>
>Sent: Monday, June 15, 2026 2:00 PM
>
>On 6/11/26 2:09 PM, Jiri Pirko wrote:
>> Wed, Jun 10, 2026 at 05:45:46PM +0200, ivecera@redhat.com wrote:
>>> On 6/10/26 3:04 PM, Kubalewski, Arkadiusz wrote:
>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>> Sent: Tuesday, June 9, 2026 4:59 PM
>>>>>
>>>>> On 6/9/26 4:00 PM, Kubalewski, Arkadiusz wrote:
>>>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>>>> Sent: Tuesday, June 9, 2026 10:51 AM
>>>>>>>
>>>>>>> Mon, Jun 08, 2026 at 07:03:46PM +0200,
>>>>>>> arkadiusz.kubalewski@intel.com
>>>>>>> wrote:
>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>> Sent: Monday, June 8, 2026 5:48 PM
>>>>>>>>>
>>>>>>>>> On 6/8/26 4:43 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>> Sent: Sunday, May 31, 2026 9:44 PM ...
>>>>>>>>>>>           -
>>>>>>>>>>>             name: gnss
>>>>>>>>>>>             doc: GNSS recovered clock
>>>>>>>>>>> +      -
>>>>>>>>>>> +        name: int-nco
>>>>>>>>>>> +        doc: |
>>>>>>>>>>> +          Device internal numerically controlled oscillator.
>>>>>>>>>>> +          When connected as a DPLL input, the DPLL enters NCO
>>>>>>>>>>> mode
>>>>>>>>>>> +          where the output frequency is adjusted by the host
>>>>>>>>>>> via
>>>>>>>>>>> +          the PTP clock interface.
>>>>>>>>>>
>>>>>>>>>> Hi Ivan!
>>>>>>>>>>
>>>>>>>>>> How would you control this in case of automatic mode dpll?
>>>>>>>>>> Automatic mode DPLL shall be controlled on HW level, such pin
>>>>>>>>>> brakes that rule and requires some driver magic to show it is
>>>>>>>>>> higher priority then the rest of the pins?
>>>>>>>>>
>>>>>>>>> The NCO pin can be connected only in manual mode. In other words
>>>>>>>>> a
>>>>>>>>> DPLL in automatic mode cannot select NCO pin (switch to NCO mode)
>>>>>>>>> by
>>>>>>>>> its own.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Being picky on DPLL_MODE for enabling feature is not something we
>>>>>>>> can allow if it is not related to HW limitation, is it?
>>>>>>>> Could you please elaborate why it is not possible for AUTOMATIC
>>>>>>>> mode?
>>>>>>>
>>>>>>> In automatic mode, the pin selection logic is defined upon prio. I
>>>>>>> can imagine that if NCO pin has the highest prio of the available
>>>>>>> ones, it gets picked. I would be aligned 100% with automatic mode
>>>>>>> behaviour.
>>>>>>> Is there a real usecase for it?
>>>>>>>
>>>>>>> [..]
>>>>>>
>>>>>> This is not true. AUTOMATIC mode is HW solution, SW driver ONLY
>>>>>> configures priorities on the inputs, not manages the active inputs.
>>>>>> This brakes that behavior, the SW driver would have to manually
>>>>>> override the AUTMATIC mode to be fed from such NCO pin as it doesn't
>>>>>> exists on it's priority list, HW cannot pick or use it.
>>>>>
>>>>> Correct, AUTO mode is hardware feature and it should not be emulated
>>>>> by a
>>>>> driver. If the hardware does not support it then the switching
>>>>> between
>>>>> input references should be done by userspace (by monitoring ffo,
>>>>> phase_offset, operstate).
>>>>>
>>>>
>>>> Yes, exactly, so for AUTOMATIC mode HW it will not be possible to
>>>> create
>>>> such pin, which means that NCO pin would serve only a MANUAL mode
>>>> implementation.
>>>> Basically this is something we shall not allow to happen. DPLL API
>>>> should be designed to cover the case where AUTO mode is able to
>>>> implement
>>>> all features consistently.
>>>
>>> If you don't like the proposal from Jiri (NCO switch driven by NCO pin
>>> priority -> highest==enter_nco else leave_nco) then it could be
>>> possible
>>> to handle the switching by allowing the state 'connected' in AUTO mode
>>> for the NCO pin type. Then the implementation will be the same for both
>>> selection modes.
>>>
>>> Only difference would be that a user does not need to switch the device
>>>from the AUTO to MANUAL mode.
>>>
>>>>>> The real use case is that any DPLL can switch the mode to this one
>>>>>> instead of implementing MANUAL mode just to use the feature with a
>>>>>> 'virtual' pin.
>>>>>
>>>>> I don't expect this... but it is up to a driver. I don't plan such
>>>>> functionality in zl3073x as the NCO pin does not expose prio_get()
>>>>> and
>>>>> prio_set() callbacks - so it is clear that this pin cannot be part of
>>>>> the
>>>>> automatic selection.
>>>>>
>>>>> Ivan
>>>>
>>>> There is a difference between particular HW and API capabilities, with
>>>> the
>>>> proposed API we would disallow the possibility of such implementation
>>>> for
>>>> existing HW variants.
>>>>
>>>> DPLL NCO MODE would allow that but as pointed here by Ivan and by Jiri
>>>> in
>>>> the other email it would also require the extra implementation for
>>>> some
>>>> configuration - device level phase/ffo handling.
>>>>
>>>> To summarize it all, I don't have such simple solution for it.
>>>>
>>>> First thing that comes to my mind is to combine both approaches.
>>>> Make it possible for AUTMATIC mode to also set "CONNECTED" state
>>>> on certain kind of "OVERRIDE" pins, where it could be determined by
>>>> the type of PIN and embed that logic into the DPLL subsystem.
>>>
>>> The possible states for particual pins are now handled at a driver
>>> level
>>> so the driver decides if the requested state is correct or not. So it
>>> could be easy to implement this.
>>>
>>> For auto mode allowed states:
>>> - input references: selectable / disconnected
>>> - nco pin: connected / disconnected
>>>
>>>> Basically, if driver registers such NCO pin it would be always
>>>> selected
>>>> manually, and in such case all the other pins are going to
>>>> disconnected
>>>> state while DPLL mode is also a "OVERRIDE" or something like it.
>>>
>>> I would leave this decision on the driver level... Imagine the
>>> potential
>>> HW that would allow to switch NCO mode if there is no valid input
>>> reference.
>>>
>>> Example:
>>>
>>> REF0 (prio 0) -> +------+ -> OUT0
>>> REF1 (prio 1) -> | DPLL | -> ...
>>> NCO  (prio 2) -> +------+ -> OUTn
>>>
>>> Such HW would prefer REF0 or REF1 and lock to one of them if they are
>>> qualified. But if they are NOT, then it switches to NCO mode.

Now you said yourself "NCO mode" ... I agree that it would be a mode in
that case. Where instead of running on regular/built in XO dpll would run
on NCO and user could select it, and this would be addition to regular
behavior.

I also agree that the pin approach might be better/easier to use, assuming
frequency offset for all the outputs given dpll drives, it makes more sense
to have it configurable on input side.

>>>
>>> In this situation the relevant driver would allow to configure priority
>>> and state 'selectable' for this NCO pin.
>>>
>>>> Perhaps the pin type could include OVERRIDE in it's name to make it
>>>> less
>>>> confusing and needs some extra documentation.
>>>>
>>>> Thoughts?
>>> I think _INT_ is ok. In the case of TYPE_INT_OSCILLATOR it is also
>>> obvious that it is not a standard input reference.
>>>
>>> Jiri, Vadim, Arek, thoughts?
>>
>> I agree with you, the driver should have the flexibility to implement
>> this according to his/hw's needs/capabilities. If it implements prio
>> selection in AUTO mode, let it have it. If it implements manual NCO pin
>> selection in AUTO mode using connected/disconnected override, let it
>> have it.

I don't know 'current' HW that is capable of using AUTO mode as a part of
HW-based priority source selection and use such NCO input..
But as already explained above, this is special mode of regular XO, which
allows DPLL's output frequency offset configuration.

>>
>> Moreover, I actually like the "override" capability for pins in AUTO
>> mode in general. It may be handy for other usecases as well.
>>
>Arek? Vadim?
>
>Thanks,
>Ivan

Agree, 'override' capability of a pin would be the way to go for this and
other similar further cases.

I believe a single approach on this would be best, I mean if AUTO mode
needs a capability, to switch from regular behavior to 'OVERRIDE', and
'OVERRIDE' is only pin capability that allows such behavior for AUTO
mode, then similar approach should be used on MANUAL mode, to make
userspace know that such pin is always available to set "CONNECTED"
and make the userspace implementation consistent on enabling it no matter
if AUTO or MANUAL mode dpll.

Thank you!
Arkadiusz 

^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Petr Mladek @ 2026-06-17 11:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sebastian Andrzej Siewior, Jakub Kicinski, John Ogness,
	Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev,
	David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel,
	linux-kernel, stable, Frederic Weisbecker, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <20260617111504.GK49951@noisy.programming.kicks-ass.net>

On Wed 2026-06-17 13:15:04, Peter Zijlstra wrote:
> On Wed, Jun 17, 2026 at 12:12:07PM +0200, Petr Mladek wrote:
> > On Tue 2026-06-16 17:31:22, Sebastian Andrzej Siewior wrote:
> > > On 2026-06-16 08:11:28 [-0700], Jakub Kicinski wrote:
> > > > > 
> > > > > Adding sched and printk folks for opinions while eyeballing
> > > > > WARN_ON_DEFERRED().
> > > > 
> > > > Thanks a lot for looking into this! To be clear - the printk_deferred /
> > > > WARN_DEFERRED would be just for stable? Or there's still some
> > > > sensitivity even with nbcon?
> > > 
> > > We already have printk_deferred(). WARN_DEFERRED() would be new. I
> > > *think* this is not limited netpoll/ netconsole but all console drivers
> > > not using CON_NBCON if the printk (via WARN) occurs with the rq held.
> > > I don't remember all the details but printk_deferred() was introduced to
> > > circumvent this until printk is fixed.
> > 
> > Just to make it clear. The problem with the legacy consoles is that
> > they are called under console_lock() which is a semaphore. And it
> > calls wake_up_process() in console_unlock() when there is another
> > waiter on the lock.
> > 
> > > Once we get rid of those legacy drivers and NBCON is the default we can
> > > get rid of printk_deferred() :)
> > 
> > Yup.
> 
> Can't we push all the legacy consoles into a single legacy kthread? I
> mean, converting all consoles is of course awesome, but should we really
> wait for that?

I am afraid that converting the consoles one by one is the deal with
Linus. I could imagine to moving last few sinners into the kthread
when the majority is converted. But we are far from there :-/

Best Regards,
Petr

^ permalink raw reply

* Re: [PATCH] net: airoha: Fix MODULE_LICENSE to match SPDX GPL-2.0-only identifier
From: Leon Romanovsky @ 2026-06-17 11:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Wayen Yan, netdev, lorenzo, horms, pabeni, kuba, edumazet,
	andrew+netdev, angelogioacchino.delregno, matthias.bgg,
	linux-arm-kernel, linux-mediatek
In-Reply-To: <178156440888.329386.11011872053824456703.git-patchwork-notify@kernel.org>

On Mon, Jun 15, 2026 at 11:00:08PM +0000, patchwork-bot+netdevbpf@kernel.org wrote:
> Hello:
> 
> This patch was applied to netdev/net-next.git (main)
> by Jakub Kicinski <kuba@kernel.org>:
> 
> On Sun, 14 Jun 2026 07:52:39 +0800 you wrote:
> > Both airoha_eth.c and airoha_npu.c declare SPDX-License-Identifier:
> > GPL-2.0-only but use MODULE_LICENSE("GPL"), which the kernel module
> > loader interprets as GPL-2.0+ (any GPL version). This mismatch causes
> > license compliance tools (FOSSology, ScanCode, etc.) to misidentify
> > the effective license as more permissive than intended.
> > 
> > Replace MODULE_LICENSE("GPL") with MODULE_LICENSE("GPL v2") to
> > align with the GPL-2.0-only SPDX identifier. Per include/linux/module.h,
> > "GPL v2" maps to GPL-2.0-only, matching the source files' declared
> > license.
> > 
> > [...]
> 
> Here is the summary with links:
>   - net: airoha: Fix MODULE_LICENSE to match SPDX GPL-2.0-only identifier
>     https://git.kernel.org/netdev/net-next/c/b0d62ed16424

Jakub,

This patch doesn't fix anything. License rules are pretty clear.

Documentation/process/license-rules.rst
  444     "GPL"                         Module is licensed under GPL version 2. This
  445                                   does not express any distinction between
  446                                   GPL-2.0-only or GPL-2.0-or-later. The exact
  447                                   license information can only be determined
  448                                   via the license information in the
  449                                   corresponding source files.
  450
  451     "GPL v2"                      Same as "GPL". It exists for historic
  452                                   reasons.

> 
> You are awesome, thank you!
> -- 
> Deet-doot-dot, I am a bot.
> https://korg.docs.kernel.org/patchwork/pwbot.html
> 
> 
> 

^ permalink raw reply

* Re: [PATCH net] selftests: vlan_bridge_binding: Fix flaky operational state check
From: Petr Machata @ 2026-06-17 11:46 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, kuba, pabeni, edumazet, petrm, horms, razor
In-Reply-To: <20260617104323.1069457-1-idosch@nvidia.com>


Ido Schimmel <idosch@nvidia.com> writes:

> check_operstate() busy waits for up to one second for the operational
> state to change to the expected state. This is not enough since carrier
> loss events can be delayed by the kernel for up to one second (see
> __linkwatch_run_queue()), leading to sporadic failures.
>
> Fix by increasing the busy wait period to two seconds.
>
> Fixes: dca12e9ab760 ("selftests: net: Add a VLAN bridge binding selftest")
> Reported-by: Jakub Kicinski <kuba@kernel.org>
> Closes: https://lore.kernel.org/netdev/20260616092733.3a31be4d@kernel.org/
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>

Reviewed-by: Petr Machata <petrm@nvidia.com>

^ permalink raw reply

* Re: [PATCH net] net: rnpgbe: fix mailbox endianness handling
From: Yibo Dong @ 2026-06-17 11:46 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, vadim.fedorenko,
	netdev, linux-kernel, yaojun
In-Reply-To: <bf5366c5-88ae-418c-9c5d-b2249a7f43fc@lunn.ch>

On Wed, Jun 17, 2026 at 11:40:42AM +0200, Andrew Lunn wrote:

Hi Andrew:
> On Wed, Jun 17, 2026 at 04:35:31PM +0800, Dong Yibo wrote:
> > Mailbox data is exchanged through 32-bit MMIO accesses but the
> > mailbox payload is defined using little-endian FW structures with
> > __le16 and __le32 fields.
> 
> Given you are using __le16 and __le32, why did sparse not find these
> issues? It would be good to understand this, because if sparse missed
> this, what else has sparse missed which is also broken?
> 
> 	Andrew
> 

My understanding is as follows:
The firmware structures are defined with__le16 / __le32 for wire format,
but the original code cast these struct pointers to u32 * before passing
them to the mailbox read/write routines:
- Send path: (u32 *)&req -> msg buffer -> writel()
- Receive path: readl() -> msg buffer -> (u32 *)&reply
Sparse only sees pure u32 = u32 assignments here, so no type mismatch is
reported. In fact, readl()/writel() operate on 'native CPU-ordered u32
values', not little-endian values.
The __le annotations correctly describe the firmware wire format, but
the original mailbox transport using plain u32 * buffers erased all endian
type information at the MMIO boundary, hiding this mismatch from sparse.

I have also checked the rest of the rnpgbe driver: all __le types are
confined strictly to mailbox firmware structures, and this fix covers all
MMIO <-> structure data transfer paths. Comparisons between two __le fields
(e.g., reply->opcode != req->opcode) are safe, as both values share the
same byte order.

Thanks for your feedback!

^ permalink raw reply

* Re: [PATCH net] dpaa2-switch: fix VLAN upper check not rejecting bridge join
From: Simon Horman @ 2026-06-17 11:45 UTC (permalink / raw)
  To: ioana.ciornei
  Cc: Simon Horman, andrew+netdev, davem, edumazet, kuba, pabeni,
	netdev, f.fainelli, vladimir.oltean, linux-kernel
In-Reply-To: <20260616105430.3725910-1-ioana.ciornei@nxp.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
dpaa2-switch: fix VLAN upper check not rejecting bridge join

The blamed commit refactored the prechangeupper event handling but
failed to actually return an error in case
dpaa2_switch_prevent_bridging_with_8021q_upper() detected a 802.1q upper
on a port which tries to join a bridge. Fix this by returning err
instead of 0.

> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> index 52c1cb9cb7e02..46ae81c2fa013 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c
> @@ -2177,7 +2177,7 @@ dpaa2_switch_prechangeupper_sanity_checks(struct net_device *netdev,
>  	if (err) {
>  		NL_SET_ERR_MSG_MOD(extack,
>  				   "Cannot join a bridge while VLAN uppers are present");
> -		return 0;
> +		return err;

[Severity: High]
If a user adds an 802.1q VLAN upper device to a port after it has already
successfully joined a bridge, won't this check trigger when the port leaves
the bridge?

In drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c,
dpaa2_switch_prechangeupper_sanity_checks() is called for both bridge joins
and unlinks from dpaa2_switch_port_prechangeupper():

    err = dpaa2_switch_prechangeupper_sanity_checks(netdev, upper_dev, extack);
    if (err)
        return err;

    if (!info->linking)
        dpaa2_switch_port_pre_bridge_leave(netdev);

Since returning an error aborts the notifier chain, an unlink notification
would fail this check, preventing dpaa2_switch_port_pre_bridge_leave() from
running. Does this lead to state inconsistency and skipped switchdev
unoffloads?

Could this check be guarded so it only runs when info->linking is true?

>  	}
>  
>  	netdev_for_each_lower_dev(upper_dev, other_dev, iter) {

^ permalink raw reply

* Re: [PATCH net-next] docs: exclude driver and netdevsim bugs
From: Leon Romanovsky @ 2026-06-17 11:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, johannes,
	corbet, skhan, workflows, linux-doc
In-Reply-To: <20260615091909.78ad2b03@kernel.org>

On Mon, Jun 15, 2026 at 09:19:09AM -0700, Jakub Kicinski wrote:
> On Mon, 15 Jun 2026 12:14:36 +0300 Leon Romanovsky wrote:
> > > +Unless explicitly excluded all bug fixes should be targeting the ``net``
> > > +tree and contain an appropriate Fixes tag.
> > > +
> > > +Obvious exclusions:
> > > +
> > > + - fixes for bugs which only exist in ``net-next`` should target ``net-next``
> > > +   (please still include the Fixes tag in the commit message)
> > > + - bugs which cannot be reached, e.g. in code paths not executed given
> > > +   current in-tree callers
> > > + - fixes for compiler warnings and typos  
> > 
> > If you decide to resubmit this patch, could you please remove "fixes for
> > compiler warnings" from the exclusion list?
> > 
> > It is quite frustrating to receive a compiler warning originating from a
> > different subsystem after the merge window, knowing it will not be
> > addressed until the next merge window (around eight weeks later).
> 
> Agreed, FWIW, but not planning to resubmit.
> I think people misunderstood that I'm __documenting what I already do__
> rather than trying to have a discussion :/

I'm pretty sure that people aren't aware of it.

Thanks

^ permalink raw reply

* Re: [PATCH net 4/4] net: ti: icssg: Fix XSK zero copy TX during application wakeup
From: Meghana Malladi @ 2026-06-17 11:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: diogo.ivo, haokexin, vadim.fedorenko, devnexen, horms,
	jacob.e.keller, sdf, john.fastabend, hawk, daniel, ast, pabeni,
	edumazet, davem, andrew+netdev, bpf, linux-kernel, netdev,
	linux-arm-kernel, srk, Vignesh Raghavendra, Roger Quadros,
	danishanwar
In-Reply-To: <20260616081954.0d12aa13@kernel.org>

On 6/16/26 20:49, Jakub Kicinski wrote:
> On Tue, 16 Jun 2026 16:41:00 +0530 Meghana Malladi wrote:
>> On 6/16/26 04:51, Jakub Kicinski wrote:
>>> On Fri, 12 Jun 2026 00:27:44 +0530 Meghana Malladi wrote:
>>>> @@ -169,9 +169,6 @@ static int emac_xsk_xmit_zc(struct prueth_emac *emac,
>>>>    
>>>>    		num_tx++;
>>>>    	}
>>>> -
>>>> -	xsk_tx_release(tx_chn->xsk_pool);
>>>> -	return num_tx;
>>>
>>> Why are you deleting this?
>>>    
>>
>> xsk_sendmsg() also calls this without an rcu-lock when transmitting the
>> packets if the xmit was successful, so I was assuming it is not required
>> and I removed this.
> 
> I think you still need it. Besides, seems like a separate cleanup.
> 

Okay, I will add it back then.

>>>>    void prueth_xmit_free(struct prueth_tx_chn *tx_chn,
>>>> @@ -279,9 +276,6 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
>>>>    		num_tx++;
>>>>    	}
>>>>    
>>>> -	if (!num_tx)
>>>> -		return 0;
>>>
>>> Does something prevent us from running all this code if budget is 0?
>>> If budget is 0 we can complete normal Tx with skbs but we must
>>> not touch any AF-XDP related state.
>>
>> Can you elaborate more, I couldn't interpret your comment here
> 
> netpoll may call napi from any context, including from IRQ.
> It uses budget of 0 to indicate that it's trying to only reap tx
> completions, without doing any Rx or XDP work. XDPs can't be called
> from IRQ context.
> 

Ah I wasn't aware of this, I will add a check to ensure AF_XDP runs only 
when budget > 0 then.

>>>>    	netif_txq = netdev_get_tx_queue(ndev, chn);
>>>>    	netdev_tx_completed_queue(netif_txq, num_tx, total_bytes);
>>>>    
>>>> @@ -306,7 +300,9 @@ int emac_tx_complete_packets(struct prueth_emac *emac, int chn,
>>>>    
>>>>    		netif_txq = netdev_get_tx_queue(ndev, chn);
>>>>    		txq_trans_cond_update(netif_txq);
>>>
>>> This looks misplaced, now we will hit it even if we didn't complete
>>> or submit any Tx.
>>
>> This code needs to be hit for packet transmission in zero copy mode.
>> emac_xsk_xmit_zc() submits the packets to the DMA in NAPI context,
>> when application wakes up the driver and triggers NAPI. Once DMA
>> transfer is done, irq gets triggered NAPI gets called which will handle
>> the tx packet completion + submit next Tx batch packets to the DMA.
>>
>> if (tx_chn->xsk_pool) -> check ensure this hits and runs for zero copy
>> only. Also above check (!num_tx) returns early during the application
>> wakeup (where budget is zero), hence it is removed.
> 
> I'm commenting on txq_trans_cond_update(), you're calling it
> effectively on every NAPI call when XSK is bound, whether
> Tx is making progress or not.

Ok got it, but I wonder if it will hurt in anyway to call this even when 
there are no Tx completions.
Nonetheless, I will move this inside xsk_frames_done check.

^ permalink raw reply

* Re: [PATCH net-next 0/2] appletalk: move the protocol out of tree
From: Carsten Strotmann @ 2026-06-17 11:15 UTC (permalink / raw)
  To: Jakub Kicinski, Carsten Strotmann
  Cc: John Paul Adrian Glaubitz, davem, netdev, edumazet, pabeni,
	andrew+netdev, horms, geert, chleroy, npiggin, mpe, maddy,
	linux-mips, linux-m68k, linuxppc-dev
In-Reply-To: <20260616084901.3319d82e@kernel.org>

Hi Jakub,

On Tuesday 16 June 2026 05:49:01 PM (+02:00), Jakub Kicinski wrote:

 > > the solution, as Adrian pointed out, is to leave these features in
 > > the Linux kernel but have them disabled by default.
 > 
 > I think y'all need to internalize that "just leave it in" means work.
 > _Someone_ has to handle the reports and patches. And since nobody is
 > doing that the code is going to GitHub, where it can continue to "just
 > be left" or whatever, without racking up CVEs for the Linux kernel
 > and leading to maintainer burn out :/
 > 

That's a good point. The large influx of reports is a problem, 
and burn out of maintainers is a too high cost.

 > > Maybe put a warning message in the kernel config tools that people
 > > should only enable these if they know what they are doing.
 > > 
 > > These "retro"-features should not pose any security risk of they are
 > > not compiled into a kernel.
 > 
 > Nobody is stopping you from using this code! It's perfectly suitable 
 > to be an out of tree module. Maybe it'd be harder if someone wanted to
 > remove a CPU architecture you want to use, but protocols are perfectly
 > fine as loadable modules. You can continue to use the code from:
 >  https://github.com/linux-netdev/mod-orphan
 > 
 > Presumably you could get Debian to package that and you wouldn't even
 > know the sources no longer live in the kernel tree.
 > 

It seems the current situation is the price of success (of Linux, which is 
good).

I guess the way to go would be to move these old drivers to userspace in 
order to 
reduce dependencies on the Linux Kernel. But that is not a task for the 
Linux-Maintainers, but for the Retro-Community.

Thanks for your work and the background information

Carsten

-- 
https://strotmann.de

^ permalink raw reply

* [PATCH net 4/5] net: hns3: differentiate autoneg default values between copper and fiber
From: Jijie Shao @ 2026-06-17 11:27 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
	netdev, linux-kernel, shaojijie
In-Reply-To: <20260617112721.75186-1-shaojijie@huawei.com>

From: Shuaisong Yang <yangshuaisong@h-partners.com>

Fix a link loss issue during driver initialization on optical ports
connected to forced-mode (non-autoneg) remote switches.

Previously, during driver probe or initialization, hclge_configure()
blindly hardcoded hdev->hw.mac.req_autoneg to AUTONEG_ENABLE for all
media types. While this is necessary for copper (BASE-T) ports to
establish a link, many high-speed optical (fiber) ports in data
centers are connected to switches running in forced mode (fixed speed,
autoneg disabled). Forcing autoneg on these optical ports during
initialization causes a permanent link failure since the remote end
refuses to respond to autoneg pulses.

Fix this by implementing media-type differentiated initialization in
hclge_init_ae_dev(). Copper ports continue to default to
AUTONEG_ENABLE, while optical ports strictly inherit the preset
autoneg status pre-configured by the firmware (hdev->hw.mac.autoneg),
preserving native compatibility with forced-mode network environments.

Fixes: 05eb60e9648c ("net: hns3: using user configure after hardware reset")
Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 63e7b7458de0..853e97b0b6ff 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -11916,6 +11916,9 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 	if (ret)
 		goto err_ptp_uninit;
 
+	if (hdev->hw.mac.media_type != HNAE3_MEDIA_TYPE_COPPER)
+		hdev->hw.mac.req_autoneg = hdev->hw.mac.autoneg;
+
 	ret = hclge_set_autoneg_speed_dup(hdev);
 	if (ret) {
 		dev_err(&pdev->dev,
-- 
2.33.0


^ permalink raw reply related

* [PATCH net 5/5] net: hns3: fix init failure caused by lane_num contamination
From: Jijie Shao @ 2026-06-17 11:27 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
	netdev, linux-kernel, shaojijie
In-Reply-To: <20260617112721.75186-1-shaojijie@huawei.com>

From: Shuaisong Yang <yangshuaisong@h-partners.com>

Fix an initialization (probe) failure introduced when the driver
attempts to pre-query port settings from the firmware before link
setup.

To accurately implement the media-type differentiated autoneg
initialization (introduced in the previous patch), the driver queries
the firmware for preset autoneg configurations via
hclge_update_port_info() before invoking link setup. However, this
query also inadvertently pulls the stale 'lane_num' value from the last
active lifecycle (e.g., lane_num = 4 from a previous 100G link up state)
and overwrites the driver's runtime storage.

When the driver later tries to initialize the MAC with its default
speed (e.g., 25G, which requires 1 lane) but passes the stale
lane_num=4, the firmware rejects this invalid hardware parameter
combination (25G with 4 lanes) with -EINVAL, causing the entire driver
probe/initialization sequence to fail.

Fix this by introducing a new user-intent tracking variable
`req_lane_num`. Initialize `req_lane_num = 0` during probe, and pass it
into hclge_cfg_mac_speed_dup_hw(). According to the firmware
specification, passing a lane_num of 0 triggers the firmware's
automatic fallback mechanism to select the correct number of lanes
matching the speed, effectively neutralizing any cross-lifecycle
parameter contamination.

Fixes: 05eb60e9648c ("net: hns3: using user configure after hardware reset")
Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 8 +++++++-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 853e97b0b6ff..50837d2c7998 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1577,6 +1577,11 @@ static int hclge_configure(struct hclge_dev *hdev)
 	hdev->hw.mac.req_autoneg = AUTONEG_ENABLE;
 	hdev->hw.mac.req_duplex = DUPLEX_FULL;
 
+	/* When lane_num is 0, the firmware will automatically
+	 * select the appropriate lane_num based on the speed.
+	 */
+	hdev->hw.mac.req_lane_num = 0;
+
 	hclge_parse_link_mode(hdev, cfg.speed_ability);
 
 	hdev->hw.mac.max_speed = hclge_get_max_speed(cfg.speed_ability);
@@ -2652,6 +2657,7 @@ static int hclge_cfg_mac_speed_dup_h(struct hnae3_handle *handle, int speed,
 	if (ret)
 		return ret;
 
+	hdev->hw.mac.req_lane_num = lane_num;
 	if (speed != SPEED_UNKNOWN)
 		hdev->hw.mac.req_speed = (u32)speed;
 	if (duplex != DUPLEX_UNKNOWN)
@@ -11747,7 +11753,7 @@ static int hclge_set_autoneg_speed_dup(struct hclge_dev *hdev)
 	if (!hdev->hw.mac.req_autoneg) {
 		ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed,
 						 hdev->hw.mac.req_duplex,
-						 hdev->hw.mac.lane_num);
+						 hdev->hw.mac.req_lane_num);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 032b472d2368..4ca6458625a9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -287,6 +287,7 @@ struct hclge_mac {
 	u8 support_autoneg;
 	u8 speed_type;	/* 0: sfp speed, 1: active speed */
 	u8 lane_num;
+	u8 req_lane_num;
 	u32 speed;
 	u32 req_speed;
 	u32 max_speed;
-- 
2.33.0


^ permalink raw reply related

* [PATCH net 1/5] net: hns3: unify copper port ksettings configuration path
From: Jijie Shao @ 2026-06-17 11:27 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
	netdev, linux-kernel, shaojijie
In-Reply-To: <20260617112721.75186-1-shaojijie@huawei.com>

From: Shuaisong Yang <yangshuaisong@h-partners.com>

Refactor hns3_set_link_ksettings() and hclge_set_phy_link_ksettings()
to unify the configuration path for copper ports.

Previously, netdevs with a native kernel phy attached bypassed the main
MAC parameter caching logic and returned early via
phy_ethtool_ksettings_set(). This prevented the driver from updating
hdev->hw.mac.req_xxx variables for kernel PHY setups, leaving them
out-of-sync during reset recovery.

Clean this up by routing all copper port configurations through
ops->set_phy_link_ksettings(), and perform driver-level or kernel-level
PHY arbitration inside hclge_set_phy_link_ksettings() via
hnae3_dev_phy_imp_supported(). This ensures that the user's intended link
profiles (req_speed, req_duplex, req_autoneg) are uniformly recorded
across all copper and fiber deployment topologies, laying the groundwork
for stable reset recovery.

Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3_ethtool.c    | 26 ++++++++-----------
 .../hisilicon/hns3/hns3pf/hclge_main.c        | 24 ++++++++++++++---
 2 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 9cb7ce9fd311..0c215f5c6a9b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -811,12 +811,11 @@ static int hns3_get_link_ksettings(struct net_device *netdev,
 }
 
 static int hns3_check_ksettings_param(const struct net_device *netdev,
-				      const struct ethtool_link_ksettings *cmd)
+				      const struct ethtool_link_ksettings *cmd,
+				      u8 media_type)
 {
 	struct hnae3_handle *handle = hns3_get_handle(netdev);
 	const struct hnae3_ae_ops *ops = hns3_get_ops(handle);
-	u8 module_type = HNAE3_MODULE_TYPE_UNKNOWN;
-	u8 media_type = HNAE3_MEDIA_TYPE_UNKNOWN;
 	u32 lane_num;
 	u8 autoneg;
 	u32 speed;
@@ -836,9 +835,6 @@ static int hns3_check_ksettings_param(const struct net_device *netdev,
 			return 0;
 	}
 
-	if (ops->get_media_type)
-		ops->get_media_type(handle, &media_type, &module_type);
-
 	if (cmd->base.duplex == DUPLEX_HALF &&
 	    media_type != HNAE3_MEDIA_TYPE_COPPER) {
 		netdev_err(netdev,
@@ -863,6 +859,8 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
 	struct hnae3_handle *handle = hns3_get_handle(netdev);
 	struct hnae3_ae_dev *ae_dev = hns3_get_ae_dev(handle);
 	const struct hnae3_ae_ops *ops = hns3_get_ops(handle);
+	u8 module_type = HNAE3_MODULE_TYPE_UNKNOWN;
+	u8 media_type = HNAE3_MEDIA_TYPE_UNKNOWN;
 	int ret;
 
 	/* Chip don't support this mode. */
@@ -878,22 +876,20 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
 		  cmd->base.autoneg, cmd->base.speed, cmd->base.duplex,
 		  cmd->lanes);
 
-	/* Only support ksettings_set for netdev with phy attached for now */
-	if (netdev->phydev) {
-		if (cmd->base.speed == SPEED_1000 &&
-		    cmd->base.autoneg == AUTONEG_DISABLE)
-			return -EINVAL;
+	if (!ops->get_media_type)
+		return -EOPNOTSUPP;
+	ops->get_media_type(handle, &media_type, &module_type);
 
-		return phy_ethtool_ksettings_set(netdev->phydev, cmd);
-	} else if (test_bit(HNAE3_DEV_SUPPORT_PHY_IMP_B, ae_dev->caps) &&
-		   ops->set_phy_link_ksettings) {
+	if (media_type == HNAE3_MEDIA_TYPE_COPPER) {
+		if (!ops->set_phy_link_ksettings)
+			return -EOPNOTSUPP;
 		return ops->set_phy_link_ksettings(handle, cmd);
 	}
 
 	if (ae_dev->dev_version < HNAE3_DEVICE_VERSION_V2)
 		return -EOPNOTSUPP;
 
-	ret = hns3_check_ksettings_param(netdev, cmd);
+	ret = hns3_check_ksettings_param(netdev, cmd, media_type);
 	if (ret)
 		return ret;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index dd4045c773d4..5a00797d9252 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3358,8 +3358,8 @@ static int hclge_get_phy_link_ksettings(struct hnae3_handle *handle,
 }
 
 static int
-hclge_set_phy_link_ksettings(struct hnae3_handle *handle,
-			     const struct ethtool_link_ksettings *cmd)
+hclge_ethtool_ksettings_set(struct hnae3_handle *handle,
+			    const struct ethtool_link_ksettings *cmd)
 {
 	struct hclge_desc desc[HCLGE_PHY_LINK_SETTING_BD_NUM];
 	struct hclge_vport *vport = hclge_get_vport(handle);
@@ -3400,10 +3400,28 @@ hclge_set_phy_link_ksettings(struct hnae3_handle *handle,
 		return ret;
 	}
 
+	linkmode_copy(hdev->hw.mac.advertising, cmd->link_modes.advertising);
+	return 0;
+}
+
+static int
+hclge_set_phy_link_ksettings(struct hnae3_handle *handle,
+			     const struct ethtool_link_ksettings *cmd)
+{
+	struct hclge_vport *vport = hclge_get_vport(handle);
+	struct hclge_dev *hdev = vport->back;
+	int ret;
+
+	if (hnae3_dev_phy_imp_supported(hdev))
+		ret = hclge_ethtool_ksettings_set(handle, cmd);
+	else
+		ret = phy_ethtool_ksettings_set(handle->netdev->phydev, cmd);
+	if (ret)
+		return ret;
+
 	hdev->hw.mac.req_autoneg = cmd->base.autoneg;
 	hdev->hw.mac.req_speed = cmd->base.speed;
 	hdev->hw.mac.req_duplex = cmd->base.duplex;
-	linkmode_copy(hdev->hw.mac.advertising, cmd->link_modes.advertising);
 
 	return 0;
 }
-- 
2.33.0


^ permalink raw reply related

* [PATCH net 3/5] net: hns3: fix permanent link down deadlock after reset
From: Jijie Shao @ 2026-06-17 11:27 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
	netdev, linux-kernel, shaojijie
In-Reply-To: <20260617112721.75186-1-shaojijie@huawei.com>

From: Shuaisong Yang <yangshuaisong@h-partners.com>

Fix a critical race condition deadlock where the network interface
remains permanently Link Down after a hardware reset under specific
ethtool sequences.

This issue exclusively manifests in firmware-controlled PHY topologies
where the driver relies on the IMP firmware to arbitrate link parameters.
Standard devices driven by the kernel's native PHY_LIB are unaffected.

The deadlock occurs via the following path:
1. User disables autoneg and forces an unmatched speed, forcing link
   down: `ethtool -s ethx autoneg off speed 10 duplex full`
2. User re-enables autoneg: `ethtool -s ethx autoneg on`. The netdev
   stack passes cmd->base.speed as SPEED_UNKNOWN (0xffffffff).
3. Driver saves req_autoneg=1, but before the interface can link up,
   a hardware reset is triggered.
4. During reset recovery, MAC init reads the un-synchronized runtime
   state mac.autoneg (which is still 0/OFF), misinterprets it as
   forced mode, and pushes the cached SPEED_UNKNOWN into the hardware
   registers, causing the MAC firmware state machine to freeze.
   Meanwhile, PHY init reads req_autoneg=1 and enables PHY autoneg.

Since the MAC is frozen with 0xffffffff and PHY is running autoneg,
they mismatch permanently.

Fix this by:
1. Intercepting SPEED_UNKNOWN/DUPLEX_UNKNOWN in
   hclge_set_phy_link_ksettings() and hclge_cfg_mac_speed_dup_h() to
   prevent it from corrupting the driver's cached valid configuration.
2. Save req_autoneg in hclge_set_autoneg().
3. Aligning the state judgment in hclge_set_autoneg_speed_dup() to use
   req_autoneg instead of the un-synchronized runtime mac.autoneg,
   ensuring both MAC and PHY consistently enter the autoneg branch to
   eliminate configuration discrepancies during reset recovery.

Fixes: 05eb60e9648c ("net: hns3: using user configure after hardware reset")
Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
 .../hisilicon/hns3/hns3pf/hclge_main.c        | 22 +++++++++++++------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 2c74675b149f..63e7b7458de0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2652,8 +2652,10 @@ static int hclge_cfg_mac_speed_dup_h(struct hnae3_handle *handle, int speed,
 	if (ret)
 		return ret;
 
-	hdev->hw.mac.req_speed = (u32)speed;
-	hdev->hw.mac.req_duplex = duplex;
+	if (speed != SPEED_UNKNOWN)
+		hdev->hw.mac.req_speed = (u32)speed;
+	if (duplex != DUPLEX_UNKNOWN)
+		hdev->hw.mac.req_duplex = duplex;
 
 	return 0;
 }
@@ -2684,6 +2686,7 @@ static int hclge_set_autoneg(struct hnae3_handle *handle, bool enable)
 {
 	struct hclge_vport *vport = hclge_get_vport(handle);
 	struct hclge_dev *hdev = vport->back;
+	int ret;
 
 	if (!hdev->hw.mac.support_autoneg) {
 		if (enable) {
@@ -2695,7 +2698,10 @@ static int hclge_set_autoneg(struct hnae3_handle *handle, bool enable)
 		}
 	}
 
-	return hclge_set_autoneg_en(hdev, enable);
+	ret = hclge_set_autoneg_en(hdev, enable);
+	if (!ret)
+		hdev->hw.mac.req_autoneg = enable;
+	return ret;
 }
 
 static int hclge_get_autoneg(struct hnae3_handle *handle)
@@ -3406,8 +3412,10 @@ hclge_set_phy_link_ksettings(struct hnae3_handle *handle,
 		return ret;
 
 	hdev->hw.mac.req_autoneg = cmd->base.autoneg;
-	hdev->hw.mac.req_speed = cmd->base.speed;
-	hdev->hw.mac.req_duplex = cmd->base.duplex;
+	if (cmd->base.speed != SPEED_UNKNOWN)
+		hdev->hw.mac.req_speed = cmd->base.speed;
+	if (cmd->base.duplex != DUPLEX_UNKNOWN)
+		hdev->hw.mac.req_duplex = cmd->base.duplex;
 
 	return 0;
 }
@@ -11731,12 +11739,12 @@ static int hclge_set_autoneg_speed_dup(struct hclge_dev *hdev)
 	int ret;
 
 	if (hdev->hw.mac.support_autoneg) {
-		ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.autoneg);
+		ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.req_autoneg);
 		if (ret)
 			return ret;
 	}
 
-	if (!hdev->hw.mac.autoneg) {
+	if (!hdev->hw.mac.req_autoneg) {
 		ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed,
 						 hdev->hw.mac.req_duplex,
 						 hdev->hw.mac.lane_num);
-- 
2.33.0


^ permalink raw reply related

* [PATCH net 2/5] net: hns3: refactor MAC autoneg and speed configuration
From: Jijie Shao @ 2026-06-17 11:27 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
	netdev, linux-kernel, shaojijie
In-Reply-To: <20260617112721.75186-1-shaojijie@huawei.com>

From: Shuaisong Yang <yangshuaisong@h-partners.com>

Extract the MAC autoneg and speed/duplex/lane configuration logic out
of hclge_mac_init() and encapsulate it into a new dedicated helper
function hclge_set_autoneg_speed_dup().

Currently, hclge_mac_init() handles various heterogeneous operations
including MTU settings, buffer allocation, and loopback initialization.
Stripping the complex link state machine configuration improves code
readability and reduces cyclomatic complexity. This helper function
will also be invoked during the hardware reset recovery path to
re-apply link settings without repeating unnecessary buffer or MTU
initializations.

Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
 .../hisilicon/hns3/hns3pf/hclge_main.c        | 49 +++++++++++++------
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 5a00797d9252..2c74675b149f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2957,20 +2957,6 @@ static int hclge_mac_init(struct hclge_dev *hdev)
 	if (!test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state))
 		hdev->hw.mac.duplex = HCLGE_MAC_FULL;
 
-	if (hdev->hw.mac.support_autoneg) {
-		ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.autoneg);
-		if (ret)
-			return ret;
-	}
-
-	if (!hdev->hw.mac.autoneg) {
-		ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed,
-						 hdev->hw.mac.req_duplex,
-						 hdev->hw.mac.lane_num);
-		if (ret)
-			return ret;
-	}
-
 	mac->link = 0;
 
 	if (mac->user_fec_mode & BIT(HNAE3_FEC_USER_DEF)) {
@@ -11740,6 +11726,27 @@ static int hclge_set_wol(struct hnae3_handle *handle,
 	return ret;
 }
 
+static int hclge_set_autoneg_speed_dup(struct hclge_dev *hdev)
+{
+	int ret;
+
+	if (hdev->hw.mac.support_autoneg) {
+		ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.autoneg);
+		if (ret)
+			return ret;
+	}
+
+	if (!hdev->hw.mac.autoneg) {
+		ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed,
+						 hdev->hw.mac.req_duplex,
+						 hdev->hw.mac.lane_num);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 {
 	struct pci_dev *pdev = ae_dev->pdev;
@@ -11901,6 +11908,13 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 	if (ret)
 		goto err_ptp_uninit;
 
+	ret = hclge_set_autoneg_speed_dup(hdev);
+	if (ret) {
+		dev_err(&pdev->dev,
+			"failed to set autoneg speed duplex, ret = %d\n", ret);
+		goto err_ptp_uninit;
+	}
+
 	INIT_KFIFO(hdev->mac_tnl_log);
 
 	hclge_dcb_ops_set(hdev);
@@ -12231,6 +12245,13 @@ static int hclge_reset_ae_dev(struct hnae3_ae_dev *ae_dev)
 		return ret;
 	}
 
+	ret = hclge_set_autoneg_speed_dup(hdev);
+	if (ret) {
+		dev_err(&pdev->dev,
+			"failed to set autoneg speed duplex, ret = %d\n", ret);
+		return ret;
+	}
+
 	ret = hclge_tp_port_init(hdev);
 	if (ret) {
 		dev_err(&pdev->dev, "failed to init tp port, ret = %d\n",
-- 
2.33.0


^ permalink raw reply related

* [PATCH net 0/5] net: hns3: fix configuration deadlocks and refactor link setup
From: Jijie Shao @ 2026-06-17 11:27 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
	netdev, linux-kernel, shaojijie

This patch series addresses a sequence of link configuration deadlocks
and parameter contamination issues in the hns3 network driver, which
typically occur during hardware resets or driver initialization under
specific user-configured scenarios.

The bugs root from asynchronous discrepancies between the MAC state machine
and cached user requests during sudden hardware resets, leading to invalid
parameter combos or frozen registers.

The series is organized as follows:
- Patch 1 refactors the ethtool link settings entry path to unify copper
  port handling (both native kernel PHY_LIB and firmware-controlled PHY)
  and ensures req_xxx configurations are uniformly saved across all modes.
- Patch 2 refactors the MAC initialization by extracting the autoneg and
  speed configuration logic out of hclge_mac_init() into a dedicated
  helper function.
- Patch 3 fixes a permanent link-down deadlock after a reset by
  ensuring that the driver caches and uses the user's intended autoneg
  /speed settings (req_***) rather than un-synchronized runtime states
  or SPEED_UNKNOWN tokens.
- Patch 4 fixes a link loss issue on optical ports during
  initialization by differentiating autoneg default values between
  copper and fiber media types.
- Patch 5 fixes an initialization (probe) failure caused by lane_num
  contamination from previous active lifecycle by introducing
  req_lane_num=0, which leverages firmware automatic lane matching.

Shuaisong Yang (5):
  net: hns3: unify copper port ksettings configuration path
  net: hns3: refactor MAC autoneg and speed configuration
  net: hns3: fix permanent link down deadlock after reset
  net: hns3: differentiate autoneg default values between copper and
    fiber
  net: hns3: fix init failure caused by lane_num contamination

 .../ethernet/hisilicon/hns3/hns3_ethtool.c    |  26 ++---
 .../hisilicon/hns3/hns3pf/hclge_main.c        | 100 ++++++++++++++----
 .../hisilicon/hns3/hns3pf/hclge_main.h        |   1 +
 3 files changed, 90 insertions(+), 37 deletions(-)


base-commit: 406e8a651a7b854c41fecd5117bb282b3a6c2c6b
-- 
2.33.0


^ permalink raw reply

* Re: [PATCH net-next v7 2/2] net: ti: icssg-prueth: Add ethtool ops for Frame Preemption MAC Merge
From: Meghana Malladi @ 2026-06-17 11:25 UTC (permalink / raw)
  To: MD Danish Anwar, Jakub Kicinski
  Cc: elfring, haokexin, vadim.fedorenko, devnexen, horms,
	jacob.e.keller, arnd, basharath, afd, parvathi, vladimir.oltean,
	rogerq, pabeni, edumazet, davem, andrew+netdev, linux-arm-kernel,
	netdev, linux-kernel, srk, vigneshr
In-Reply-To: <a62d5243-d641-48e7-a1f5-88150513be48@ti.com>

On 6/17/26 10:58, MD Danish Anwar wrote:
> Meghana,
> 
> On 16/06/26 6:24 pm, Meghana Malladi wrote:
>> Hi Jakub,
>>
>> On 6/16/26 05:09, Jakub Kicinski wrote:
>>> On Mon, 15 Jun 2026 16:10:41 -0700 Jakub Kicinski wrote:
>>>>> diff --git a/drivers/net/ethernet/ti/icssg/icssg_stats.h b/drivers/
>>>>> net/ethernet/ti/icssg/icssg_stats.h
>>>>> index 5ec0b38e0c67..8073deac35c3 100644
>>>>> --- a/drivers/net/ethernet/ti/icssg/icssg_stats.h
>>>>> +++ b/drivers/net/ethernet/ti/icssg/icssg_stats.h
>>>>> @@ -189,6 +187,11 @@ static const struct icssg_pa_stats
>>>>> icssg_all_pa_stats[] = {
>>>>>        ICSSG_PA_STATS(FW_INF_DROP_PRIOTAGGED),
>>>>>        ICSSG_PA_STATS(FW_INF_DROP_NOTAG),
>>>>>        ICSSG_PA_STATS(FW_INF_DROP_NOTMEMBER),
>>>>> +    ICSSG_PA_STATS(FW_PREEMPT_BAD_FRAG),
>>>>> +    ICSSG_PA_STATS(FW_PREEMPT_ASSEMBLY_ERR),
>>>>> +    ICSSG_PA_STATS(FW_PREEMPT_FRAG_CNT_TX),
>>>>> +    ICSSG_PA_STATS(FW_PREEMPT_ASSEMBLY_OK),
>>>>> +    ICSSG_PA_STATS(FW_PREEMPT_FRAG_CNT_RX),
>>>>>        ICSSG_PA_STATS(FW_RX_EOF_SHORT_FRMERR),
>>>>>        ICSSG_PA_STATS(FW_RX_B0_DROP_EARLY_EOF),
>>>>>        ICSSG_PA_STATS(FW_TX_JUMBO_FRM_CUTOFF),
>>>>
>>>> [Medium]
>>>> Are these five new entries duplicating values that already have a
>>>> standard uAPI?
>>>>
>>>> The same five firmware counters are exposed through the new
>>>> .get_mm_stats callback as the standardized MAC Merge stats
>>>> (MACMergeFrameAssOkCount, MACMergeFrameAssErrorCount,
>>>> MACMergeFragCountRx,
>>>> MACMergeFragCountTx, MACMergeFrameSmdErrorCount in struct
>>>> ethtool_mm_stats), and adding them to icssg_all_pa_stats[] also
>>>> publishes them via emac_get_strings() / emac_get_ethtool_stats() as
>>>> ethtool -S strings.
>>>>
>>>> Documentation/networking/statistics.rst describes ethtool -S as the
>>>> private-driver-stats interface; counters that have a standard uAPI are
>>>> expected to flow only through that uAPI.
>>>>
>>>> Could the firmware-register lookup table used by emac_get_stat_by_name()
>>>> be separated from the ethtool -S string table, so the new preemption
>>>> counters feed get_mm_stats without also showing up under ethtool -S?
>>>
>>> This -- not sure about the other complaints but this one looks legit.
>>
>> I agree that this is legit, but right now there is no other place holder
>> other than pa stats to put the mac merge firmware counters. I believe
> 
> You can put a boolean is_standard_stats. Only those where
> is_standard_stats=false will be populated via ethtool. Others will be
> populated via the standard interface.
> 
> Look at icssg_miig_stats for reference.
> 

Sure, since you were already doing some refactoring w.r.t HSR standard 
stats I thought this could also be covered there.

I will send out another version addressing this then.

>> the effort needs to go in re-structuring the hardware and firmware stats
>> implementation to address this issue.
>>
> 


^ permalink raw reply

* Re: [PATCH v2 4/4] drm/xe/hw_error: Use HW_ERR prefix in log
From: Raag Jadav @ 2026-06-17 11:21 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: intel-xe, dri-devel, netdev, rodrigo.vivi, riana.tauro, dev,
	airlied, simona, kuba
In-Reply-To: <ah16Hfwq7goxBm27@black.igk.intel.com>

On Mon, Jun 01, 2026 at 02:25:06PM +0200, Raag Jadav wrote:
> On Mon, Jun 01, 2026 at 01:13:12PM +0200, Michal Wajdeczko wrote:
> > On 5/23/2026 7:00 AM, Raag Jadav wrote:
> > > Hardware errors should be logged with HW_ERR prefix. Make them
> > > consistent with existing logs.
> > > 
> > > Fixes: 01aab7e1c9d4 ("drm/xe/xe_hw_error: Add support for PVC SoC errors")
> > > Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_hw_error.c | 12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c
> > > index 5135e8e4093f..4b72959b2276 100644
> > > --- a/drivers/gpu/drm/xe/xe_hw_error.c
> > > +++ b/drivers/gpu/drm/xe/xe_hw_error.c
> > > @@ -223,9 +223,9 @@ static void log_hw_error(struct xe_tile *tile, const char *name,
> > >  	struct xe_device *xe = tile_to_xe(tile);
> > >  
> > >  	if (severity == DRM_XE_RAS_ERR_SEV_CORRECTABLE)
> > > -		drm_warn(&xe->drm, "%s %s detected\n", name, severity_str);
> > > +		drm_warn(&xe->drm, HW_ERR "%s %s detected\n", name, severity_str);
> > 
> > function is per-tile, so shouldn't we use tile-oriented logs instead?
> 
> Agree, but then it needs to be done file-wide which I'll pursue once
> the fix lands and propageted.

Even better, there's a driver-wide refactor[1] incoming.

Raag

[1] https://patchwork.freedesktop.org/series/168333/

> > 		xe_tile_warn(tile, HW_ERR ...)
> > 
> > >  	else
> > > -		drm_err_ratelimited(&xe->drm, "%s %s detected\n", name, severity_str);
> > > +		drm_err_ratelimited(&xe->drm, HW_ERR "%s %s detected\n", name, severity_str);
> > 
> > 		xe_tile_err_ratelimited(tile, HW_ERR ...)
> > 
> > >  }
> > >  
> > >  static void log_gt_err(struct xe_tile *tile, const char *name, int i, u32 err,
> > > @@ -235,10 +235,10 @@ static void log_gt_err(struct xe_tile *tile, const char *name, int i, u32 err,
> > >  	struct xe_device *xe = tile_to_xe(tile);
> > >  
> > >  	if (severity == DRM_XE_RAS_ERR_SEV_CORRECTABLE)
> > > -		drm_warn(&xe->drm, "%s %s detected, ERROR_STAT_GT_VECTOR%d:0x%08x\n",
> > > +		drm_warn(&xe->drm, HW_ERR "%s %s detected, ERROR_STAT_GT_VECTOR%d:0x%08x\n",
> > >  			 name, severity_str, i, err);
> > >  	else
> > > -		drm_err_ratelimited(&xe->drm, "%s %s detected, ERROR_STAT_GT_VECTOR%d:0x%08x\n",
> > > +		drm_err_ratelimited(&xe->drm, HW_ERR "%s %s detected, ERROR_STAT_GT_VECTOR%d:0x%08x\n",
> > >  				    name, severity_str, i, err);
> > >  }
> > >  
> > > @@ -255,9 +255,9 @@ static void log_soc_error(struct xe_tile *tile, const char * const *reg_info,
> > >  
> > >  	if (strcmp(name, "Undefined")) {
> > >  		if (severity == DRM_XE_RAS_ERR_SEV_CORRECTABLE)
> > > -			drm_warn(&xe->drm, "%s SOC %s detected", name, severity_str);
> > > +			drm_warn(&xe->drm, HW_ERR "%s SOC %s detected", name, severity_str);
> > >  		else
> > > -			drm_err_ratelimited(&xe->drm, "%s SOC %s detected", name, severity_str);
> > > +			drm_err_ratelimited(&xe->drm, HW_ERR "%s SOC %s detected", name, severity_str);
> > >  		atomic_inc(&info[index].counter);
> > >  	}
> > >  }
> > 

^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Peter Zijlstra @ 2026-06-17 11:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jakub Kicinski, Sebastian Andrzej Siewior, John Ogness,
	Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev,
	David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel,
	linux-kernel, stable, Frederic Weisbecker, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <ajJ46o4fomfxY5CX@pathway.suse.cz>

On Wed, Jun 17, 2026 at 12:37:30PM +0200, Petr Mladek wrote:
> On Tue 2026-06-16 14:17:19, Jakub Kicinski wrote:
> > On Tue, 16 Jun 2026 19:02:57 +0200 Peter Zijlstra wrote:
> > > > So this is not an issue since commit 7eab73b18630e ("netconsole: convert
> > > > to NBCON console infrastructure"). Because from here now on writes are
> > > > deferred to the nbcon thread. So this purely about -stable in this case.  
> > > 
> > > Hmm, I thought netconsole had some reserved skbs and could to writes
> > > 'atomic' like? That said, it was 2.6 era the last time I looked at
> > > netconsole.
> > 
> > Yes, that part is fine. The problem is that netconsole tries
> > to reap Tx completions if the Tx queue is full. We can't call
> > skb destructor in irq context so we put the completed skbs on
> > a queue and try to arm softirq to get to them later.
> > Arming softirq causes a ksoftirq wake up.
> > 
> > We already skip the completion polling if we detect getting called
> > from the same networking driver. It's best effort, anyway.
> > Networking-side fix would be to toss another OR condition into
> > the skip. But we don't have one that'd work cleanly :S
> 
> Alternative solution might be to offload the ksoftirq wake up
> to an irq_work. It might make this part safe for the
> console->write_atomic() call.
> 
> Well, my understanding is that there are more problems.
> AFAIK, some drivers do not use an IRQ safe locking, see
> https://lore.kernel.org/all/oth5t27z6acp7qxut7u45ekyil7djirg2ny3bnsvnzeqasavxb@nhwdxahvcosh/

But anything using locking is not ->write_atomic() and should be driven
from a kthread, no?


^ permalink raw reply

* [PATCH 2/2] selftests/bpf: Add test for bpf_sock_read_xattr() kfunc
From: Christian Brauner @ 2026-06-17 11:18 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann
  Cc: Alexander Viro, Jan Kara, Simon Horman, Kuniyuki Iwashima,
	Willem de Bruijn, linux-fsdevel, netdev, bpf, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Christian Brauner (Amutable)
In-Reply-To: <20260617-work-bpf-sock-xattr-v1-0-a1276f7c9da3@kernel.org>

Add a selftest that loads the kfunc in sleepable and non-sleepable
lsm/socket_connect programs and checks that a value set via fsetxattr()
on a socket is read back.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 tools/testing/selftests/bpf/bpf_experimental.h     |  3 +
 .../testing/selftests/bpf/prog_tests/sock_xattr.c  | 67 ++++++++++++++++++++++
 .../testing/selftests/bpf/progs/sock_read_xattr.c  | 54 +++++++++++++++++
 3 files changed, 124 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 2234bd6bc9d3..5b825157b125 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -446,6 +446,9 @@ extern void bpf_iter_dmabuf_destroy(struct bpf_iter_dmabuf *it) __weak __ksym;
 extern int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__str,
 				 struct bpf_dynptr *value_p) __weak __ksym;
 
+extern int bpf_sock_read_xattr(struct socket *sock, const char *name__str,
+			       struct bpf_dynptr *value_p) __weak __ksym;
+
 #define PREEMPT_BITS	8
 #define SOFTIRQ_BITS	8
 #define HARDIRQ_BITS	4
diff --git a/tools/testing/selftests/bpf/prog_tests/sock_xattr.c b/tools/testing/selftests/bpf/prog_tests/sock_xattr.c
new file mode 100644
index 000000000000..b5816e90f01a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/sock_xattr.c
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2026 Christian Brauner */
+
+#include <errno.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/xattr.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <test_progs.h>
+
+#include "sock_read_xattr.skel.h"
+
+static const char xattr_value[] = "bpf_sock_value";
+static const char xattr_name[] = "user.bpf_test";
+
+static void test_read_sock_xattr(void)
+{
+	struct sockaddr_in addr = {};
+	struct sock_read_xattr *skel = NULL;
+	struct bpf_link *link = NULL;
+	int sock_fd = -1, err;
+
+	sock_fd = socket(AF_INET, SOCK_STREAM, 0);
+	if (!ASSERT_OK_FD(sock_fd, "socket"))
+		return;
+
+	err = fsetxattr(sock_fd, xattr_name, xattr_value, sizeof(xattr_value), 0);
+	if (!ASSERT_OK(err, "fsetxattr"))
+		goto out;
+
+	skel = sock_read_xattr__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "sock_read_xattr__open_and_load"))
+		goto out;
+
+	skel->bss->monitored_pid = sys_gettid();
+
+	/* Only attach the functional program; the verifier-only programs
+	 * above are not pid-gated and would clobber the shared globals.
+	 */
+	link = bpf_program__attach(skel->progs.read_sock_xattr);
+	if (!ASSERT_OK_PTR(link, "attach read_sock_xattr"))
+		goto out;
+
+	addr.sin_family = AF_INET;
+	addr.sin_port = htons(1234);
+	addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+	/* Only the lsm/socket_connect hook matters; the connect may fail. */
+	connect(sock_fd, (struct sockaddr *)&addr, sizeof(addr));
+
+	ASSERT_EQ(skel->data->read_ret, sizeof(xattr_value), "read_ret");
+	ASSERT_STREQ(skel->bss->value, xattr_value, "value");
+
+out:
+	bpf_link__destroy(link);
+	if (sock_fd >= 0)
+		close(sock_fd);
+	sock_read_xattr__destroy(skel);
+}
+
+void test_sock_xattr(void)
+{
+	RUN_TESTS(sock_read_xattr);
+
+	if (test__start_subtest("read_sock_xattr"))
+		test_read_sock_xattr();
+}
diff --git a/tools/testing/selftests/bpf/progs/sock_read_xattr.c b/tools/testing/selftests/bpf/progs/sock_read_xattr.c
new file mode 100644
index 000000000000..c4a8eae8cc3c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/sock_read_xattr.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2026 Christian Brauner */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include "bpf_experimental.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+char value[16];
+int read_ret = -1;
+__u32 monitored_pid = 0;
+
+static __always_inline void read_xattr(struct socket *sock)
+{
+	struct bpf_dynptr value_ptr;
+
+	bpf_dynptr_from_mem(value, sizeof(value), 0, &value_ptr);
+	bpf_sock_read_xattr(sock, "user.bpf_test", &value_ptr);
+}
+
+SEC("lsm.s/socket_connect")
+__success
+int BPF_PROG(trusted_sock_ptr_sleepable, struct socket *sock)
+{
+	read_xattr(sock);
+	return 0;
+}
+
+SEC("lsm/socket_connect")
+__success
+int BPF_PROG(trusted_sock_ptr_non_sleepable, struct socket *sock)
+{
+	read_xattr(sock);
+	return 0;
+}
+
+SEC("lsm.s/socket_connect")
+__success
+int BPF_PROG(read_sock_xattr, struct socket *sock)
+{
+	struct bpf_dynptr value_ptr;
+	__u32 pid = bpf_get_current_pid_tgid() >> 32;
+
+	if (pid != monitored_pid)
+		return 0;
+
+	bpf_dynptr_from_mem(value, sizeof(value), 0, &value_ptr);
+	read_ret = bpf_sock_read_xattr(sock, "user.bpf_test", &value_ptr);
+	return 0;
+}

-- 
2.47.3


^ permalink raw reply related

* [PATCH 1/2] fs: Add bpf_sock_read_xattr() kfunc to read socket xattrs
From: Christian Brauner @ 2026-06-17 11:18 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann
  Cc: Alexander Viro, Jan Kara, Simon Horman, Kuniyuki Iwashima,
	Willem de Bruijn, linux-fsdevel, netdev, bpf, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Christian Brauner (Amutable)
In-Reply-To: <20260617-work-bpf-sock-xattr-v1-0-a1276f7c9da3@kernel.org>

In c8db08110cbe ("Merge tag 'vfs-7.1-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
we added support for extended attributes for sockets. This comes in two
flavors: sockfs and non-sockfs/filesystem sockets. Filesystem sockets
are actual filesystem objects so reading xattrs must use dedicated fs
helpers such as bpf_get_dentry_xattr() and bpf_get_file_xattr(). Those
are inherently sleeping operations. Sockfs sockets on the other hand
don't need to use sleeping operations as the underlying data structure
is lockless. In addition, retrieval of sockfs extended attributes often
happens from LSM hooks that only provide struct socket and it's
completely nonsensical to grab a reference to a file, then force a
sleeping operation to retrieve the xattr and drop the reference. We know
that the sockfs file cannot go away while the LSM hook runs.

This series adds a bpf_sock_read_xattr() kfunc that, given a struct
socket, reads a user.* extended attribute from the socket's sockfs inode
into a bpf_dynptr. Together with fsetxattr() from userspace this lets a
process label a socket with a user.* xattr and have a BPF LSM program
retrieve that label locklessly. The kfunc mirrors the existing
bpf_cgroup_read_xattr(), including the restriction to the user.*
namespace.

systemd uses user.* xattrs on sockets to implement socket rate limiting
and to tag sockets for other purposes [1] such as implementing a varlink
registry. There is currently no efficient way for a BPF program to read
those labels back. The new helper allows a listening socket marked with
an extended attribute to be read back during bind/connect and then act
on the connect()ing socket. Extended attributes make it possible to
allow an unprivileged user manager such as systemd --user to mark
sockets from userspace and then rediscover them or implement policies.

The kfunc is registered KF_RCU and only for BPF LSM programs. A struct
socket is only guaranteed to live in sockfs when an LSM socket hook hands
it out, which is what keeps SOCK_INODE() valid. Sockets that embed struct
socket outside sockfs (tun, tap) are only reachable from tracing programs
and are excluded by the registration. (Btw, for consistency it would
be nice to force allocation of struct socket from sockfs instead of
simply embedding it in e.g., struct tun_file which makes the SOCKFS_I()
pattern a hazard - at least outside of sockfs functions.)

The read never sleeps and takes no lock. For sockfs the value lives in
the inode's in-memory xattr store and simple_xattr_get() resolves it
with an RCU-protected rhashtable lookup, taking neither the inode lock
nor any xattr lock. The kfunc is therefore usable from both sleepable
and non-sleepable LSM hooks.

Link: https://github.com/systemd/systemd/pull/40559 [1]
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 fs/bpf_fs_kfuncs.c  | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/net.h |  1 +
 net/socket.c        | 25 +++++++++++++++++++++++++
 3 files changed, 63 insertions(+)

diff --git a/fs/bpf_fs_kfuncs.c b/fs/bpf_fs_kfuncs.c
index 11841c3d4260..85fc9519d1ff 100644
--- a/fs/bpf_fs_kfuncs.c
+++ b/fs/bpf_fs_kfuncs.c
@@ -11,6 +11,7 @@
 #include <linux/file.h>
 #include <linux/kernfs.h>
 #include <linux/mm.h>
+#include <linux/net.h>
 #include <linux/xattr.h>
 
 __bpf_kfunc_start_defs();
@@ -359,6 +360,39 @@ __bpf_kfunc int bpf_cgroup_read_xattr(struct cgroup *cgroup, const char *name__s
 }
 #endif /* CONFIG_CGROUPS */
 
+#ifdef CONFIG_NET
+/**
+ * bpf_sock_read_xattr - read xattr of a socket's inode in sockfs
+ * @sock: socket to get xattr from
+ * @name__str: name of the xattr
+ * @value_p: output buffer of the xattr value
+ *
+ * Get xattr *name__str* of *sock* and store the output in *value_p*.
+ *
+ * For security reasons, only *name__str* with prefix "user." is allowed.
+ *
+ * Return: length of the xattr value on success, a negative value on error.
+ */
+__bpf_kfunc int bpf_sock_read_xattr(struct socket *sock, const char *name__str,
+				    struct bpf_dynptr *value_p)
+{
+	struct bpf_dynptr_kern *value_ptr = (struct bpf_dynptr_kern *)value_p;
+	u32 value_len;
+	void *value;
+
+	/* Only allow reading "user.*" xattrs */
+	if (strncmp(name__str, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN))
+		return -EPERM;
+
+	value_len = __bpf_dynptr_size(value_ptr);
+	value = __bpf_dynptr_data_rw(value_ptr, value_len);
+	if (!value)
+		return -EINVAL;
+
+	return sock_read_xattr(sock, name__str, value, value_len);
+}
+#endif /* CONFIG_NET */
+
 /**
  * bpf_real_inode - get the real inode backing a dentry
  * @dentry: dentry to resolve
@@ -385,6 +419,9 @@ BTF_ID_FLAGS(func, bpf_get_file_xattr, KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_set_dentry_xattr, KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_remove_dentry_xattr, KF_SLEEPABLE)
 BTF_ID_FLAGS(func, bpf_real_inode, KF_SLEEPABLE | KF_RET_NULL)
+#ifdef CONFIG_NET
+BTF_ID_FLAGS(func, bpf_sock_read_xattr, KF_RCU)
+#endif
 BTF_KFUNCS_END(bpf_fs_kfunc_set_ids)
 
 static int bpf_fs_kfuncs_filter(const struct bpf_prog *prog, u32 kfunc_id)
diff --git a/include/linux/net.h b/include/linux/net.h
index f268f395ce47..fdcf9956805c 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -285,6 +285,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags);
 struct file *sock_alloc_file(struct socket *sock, int flags, const char *dname);
 struct socket *sockfd_lookup(int fd, int *err);
 struct socket *sock_from_file(struct file *file);
+int sock_read_xattr(struct socket *sock, const char *name, void *value, size_t size);
 #define		     sockfd_put(sock) fput(sock->file)
 int net_ratelimit(void);
 
diff --git a/net/socket.c b/net/socket.c
index 9e8dc769ff7a..3566f8c8ea3f 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -465,6 +465,31 @@ static const struct xattr_handler sockfs_user_xattr_handler = {
 	.set = sockfs_user_xattr_set,
 };
 
+/**
+ * sock_read_xattr - read a user.* xattr from a socket's sockfs inode
+ * @sock: socket whose inode holds the xattr
+ * @name: full xattr name, e.g. "user.bpf_test"
+ * @value: output buffer
+ * @size: size of @value in bytes
+ *
+ * SOCK_INODE() is valid only for sockfs sockets; sock_from_file() rejects
+ * anything else (e.g. tun, tap).
+ * Lockless: simple_xattr_get() looks up the value under RCU, no inode lock.
+ *
+ * Return: length of the value on success, a negative errno on error.
+ */
+int sock_read_xattr(struct socket *sock, const char *name, void *value, size_t size)
+{
+	struct file *file = sock->file;
+	struct sockfs_inode *si;
+
+	if (!file || sock_from_file(file) != sock)
+		return -EOPNOTSUPP;
+
+	si = SOCKFS_I(SOCK_INODE(sock));
+	return simple_xattr_get(&sockfs_xa_cache, &si->xattrs, name, value, size);
+}
+
 static const struct xattr_handler * const sockfs_xattr_handlers[] = {
 	&sockfs_xattr_handler,
 	&sockfs_security_xattr_handler,

-- 
2.47.3


^ permalink raw reply related

* [PATCH 0/2] Add bpf_sock_read_xattr() kfunc to read socket xattrs
From: Christian Brauner @ 2026-06-17 11:18 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann
  Cc: Alexander Viro, Jan Kara, Simon Horman, Kuniyuki Iwashima,
	Willem de Bruijn, linux-fsdevel, netdev, bpf, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	Song Liu, Yonghong Song, Jiri Olsa, Christian Brauner (Amutable)

In c8db08110cbe ("Merge tag 'vfs-7.1-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
we added support for extended attributes for sockets. This comes in two
flavors: sockfs and non-sockfs/filesystem sockets. Filesystem sockets
are actual filesystem objects so reading xattrs must use dedicated fs
helpers such as bpf_get_dentry_xattr() and bpf_get_file_xattr(). Those
are inherently sleeping operations. Sockfs sockets on the other hand
don't need to use sleeping operations as the underlying data structure
is lockless. In addition, retrieval of sockfs extended attributes often
happens from LSM hooks that only provide struct socket and it's
completely nonsensical to grab a reference to a file, then force a
sleeping operation to retrieve the xattr and drop the reference. We know
that the sockfs file cannot go away while the LSM hook runs.

This series adds a bpf_sock_read_xattr() kfunc that, given a struct
socket, reads a user.* extended attribute from the socket's sockfs inode
into a bpf_dynptr. Together with fsetxattr() from userspace this lets a
process label a socket with a user.* xattr and have a BPF LSM program
retrieve that label locklessly. The kfunc mirrors the existing
bpf_cgroup_read_xattr(), including the restriction to the user.*
namespace.

systemd uses user.* xattrs on sockets to implement socket rate limiting
and to tag sockets for other purposes [1] such as implementing a varlink
registry. There is currently no efficient way for a BPF program to read
those labels back. The new helper allows a listening socket marked with
an extended attribute to be read back during bind/connect and then act
on the connect()ing socket. Extended attributes make it possible to
allow an unprivileged user manager such as systemd --user to mark
sockets from userspace and then rediscover them or implement policies.

The kfunc is registered KF_RCU and only for BPF LSM programs. A struct
socket is only guaranteed to live in sockfs when an LSM socket hook hands
it out, which is what keeps SOCK_INODE() valid. Sockets that embed struct
socket outside sockfs (tun, tap) are only reachable from tracing programs
and are excluded by the registration. (Btw, for consistency it would
be nice to force allocation of struct socket from sockfs instead of
simply embedding it in e.g., struct tun_file which makes the SOCKFS_I()
pattern a hazard - at least outside of sockfs functions.)

The read never sleeps and takes no lock. For sockfs the value lives in
the inode's in-memory xattr store and simple_xattr_get() resolves it
with an RCU-protected rhashtable lookup, taking neither the inode lock
nor any xattr lock. The kfunc is therefore usable from both sleepable
and non-sleepable LSM hooks.

Link: https://github.com/systemd/systemd/pull/40559 [1]

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
Christian Brauner (2):
      fs: Add bpf_sock_read_xattr() kfunc to read socket xattrs
      selftests/bpf: Add test for bpf_sock_read_xattr() kfunc

 fs/bpf_fs_kfuncs.c                                 | 37 ++++++++++++
 include/linux/net.h                                |  1 +
 net/socket.c                                       | 25 ++++++++
 tools/testing/selftests/bpf/bpf_experimental.h     |  3 +
 .../testing/selftests/bpf/prog_tests/sock_xattr.c  | 67 ++++++++++++++++++++++
 .../testing/selftests/bpf/progs/sock_read_xattr.c  | 54 +++++++++++++++++
 6 files changed, 187 insertions(+)
---
base-commit: 6b5a2b7d9bc156e505f09e698d85d6a1547c1206
change-id: 20260617-work-bpf-sock-xattr-37ec4c991886


^ permalink raw reply

* Re: [RESEND PATCH v1] net: dsa: motorcomm: add yt92xx dsa driver
From: David Yang @ 2026-06-17 11:15 UTC (permalink / raw)
  To: Kyle Switch
  Cc: andrew, olteanv, davem, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel, ming.xu, xiaolin.xu, jianmin.wang, de.ge
In-Reply-To: <88f726d5-1617-4d2e-8fbb-d3da9478b386@motor-comm.com>

On Wed, Jun 17, 2026 at 10:37 AM Kyle Switch <kyle.switch@motor-comm.com> wrote:
> >> +/* To define the from cpu tag format 8 bytes:
> >> + *
> >> + * 0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7
> >> + *|<----------TPID 0x9988---------->|
> >> + *|<--RESERVE-->|<-----DST PORT---->|
> >> + *|-|<---------RESERVE------------->|
> >> + *|<------------------------------->|
> >> + */
> >> +#define YT922X_TAG_FORMAT2_NAME "yt922x-8b"
> >> +#define YT922X_FORMAT2_TAG_LEN                  8
> >> +#define YT922X_PKT_TYPE          GENMASK(15, 14)
> >> +#define YT922X_8B_CPUTAG_PKT_FROM_CPU      0x1
> >> +#define YT922X_8B_CPUTAG_SRC_PORT          GENMASK(6, 2)
> >> +#define YT922X_8B_CPUTAG_DST_PORTMASK      GENMASK(8, 0)
> >> +#define YT922X_8B_CPUTAG_DST_PORTMASK_0      BIT(15)
> >> +#define YT922X_8B_CPUTAG_DST_PORTMASK_0_EN      0x1
> >> +#define YT922X_8B_CPUTAG_FORCE_DST         BIT(9)
> >> +#define YT922X_8B_CPUTAG_FORCE_DST_EN      0x1
> >
> > If yt922x tag format shares no common with yt921x, make a new tag driver.
>
> Ans: thank you for your suggestion, we will consider whether to create a new driver in the new file.

I'm not an expert in this, but if yt922x tag does support cpu codes
and priority, please consider updating yt921x tagger to support it,
even if you don't use or test these features for now.

> >
> >> +static struct dsa_tag_driver *dsa_tag_driver_array[] = {
> >> +       &DSA_TAG_DRIVER_NAME(yt921x_netdev_ops),
> >> +       &DSA_TAG_DRIVER_NAME(yt922x_4b_netdev_ops),
> >> +       &DSA_TAG_DRIVER_NAME(yt922x_8b_netdev_ops),
> >> +};
> >
> > If both are supported by the chip and 4b does nothing more than 8b
> > does, do not bother with it.
>
> Ans: 4b and 8b dsa tag may have different application scenarios. from my opinion,
>      1. 4b dsa tag can save 4 bytes of payload
>      2. 8b dsa tag carry more package info.

We do not support every tag protocol. For DSA switches,
  - the conduit interface supports jumbo frames so there is room for
the DSA header, or
  - you end up with MTU less than 1500 anyway.
4-byte reduction does not make a practical difference here. An
alternative protocol poses 2x work to everyone else, and unnecessarily
exposes your driver to interoperability issues, as pointed by Andrew.

As I've commented before, if there is a particular reason to add
4-byte protocol, leave it behind for the moment, and focus on a
minimal yt922x_dsa_switch_ops + yt922x_netdev_ops for your first
patchset without any offloading supports. This way, others can easily
see your changes and move the work forward efficiently.

^ permalink raw reply

* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Peter Zijlstra @ 2026-06-17 11:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sebastian Andrzej Siewior, Jakub Kicinski, John Ogness,
	Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner, netdev,
	David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Breno Leitao, Clark Williams, Steven Rostedt, linux-rt-devel,
	linux-kernel, stable, Frederic Weisbecker, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <ajJy92ES-Q8ro97A@pathway.suse.cz>

On Wed, Jun 17, 2026 at 12:12:07PM +0200, Petr Mladek wrote:
> On Tue 2026-06-16 17:31:22, Sebastian Andrzej Siewior wrote:
> > On 2026-06-16 08:11:28 [-0700], Jakub Kicinski wrote:
> > > > 
> > > > Adding sched and printk folks for opinions while eyeballing
> > > > WARN_ON_DEFERRED().
> > > 
> > > Thanks a lot for looking into this! To be clear - the printk_deferred /
> > > WARN_DEFERRED would be just for stable? Or there's still some
> > > sensitivity even with nbcon?
> > 
> > We already have printk_deferred(). WARN_DEFERRED() would be new. I
> > *think* this is not limited netpoll/ netconsole but all console drivers
> > not using CON_NBCON if the printk (via WARN) occurs with the rq held.
> > I don't remember all the details but printk_deferred() was introduced to
> > circumvent this until printk is fixed.
> 
> Just to make it clear. The problem with the legacy consoles is that
> they are called under console_lock() which is a semaphore. And it
> calls wake_up_process() in console_unlock() when there is another
> waiter on the lock.
> 
> > Once we get rid of those legacy drivers and NBCON is the default we can
> > get rid of printk_deferred() :)
> 
> Yup.

Can't we push all the legacy consoles into a single legacy kthread? I
mean, converting all consoles is of course awesome, but should we really
wait for that?

^ permalink raw reply

* Re: [PATCH net-next v6 7/7] net: macb: introduce ndo_xdp_xmit support
From: Koehrer Mathias (ETAS-ICA/XPC-Fe1) @ 2026-06-17 11:08 UTC (permalink / raw)
  To: Paolo Valerio, netdev@vger.kernel.org
  Cc: Nicolas Ferre, Claudiu Beznea, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lorenzo Bianconi,
	Théo Lebrun, Nicolai Buchwitz
In-Reply-To: <GVXPR10MB8360DB5D2F374574CE33C5779F1B2@GVXPR10MB8360.EURPRD10.PROD.OUTLOOK.COM>

Hi,
> within the function gem_xdp_xmit, there should be a call to "xdp_return_frame" for each successful processing in
> "macb_xdp_submit_frame".
> Otherwise, this driver does not work properly with Ethernet drivers that XDP_REDIRECT to this driver and use 
> page_pools. In this case, the pages from the pools are not returned to the pool.
Please ignore the complain.
I think the existing code is fine, I found a bug on my side. Sorry for the confusion.

Best regards

Mathias

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox