Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/4] net: ionic: Add PHC state page for user space access
From: Allen Hubbe @ 2026-04-10 13:10 UTC (permalink / raw)
  To: Jakub Kicinski, Abhijit Gangurde
  Cc: jgg, leon, brett.creeley, andrew+netdev, davem, edumazet, pabeni,
	nikhil.agarwal, linux-rdma, netdev, linux-kernel
In-Reply-To: <20260401170600.312a23d1@kernel.org>

On 4/1/2026 8:06 PM, Jakub Kicinski wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Wed, 1 Apr 2026 15:54:59 +0530 Abhijit Gangurde wrote:
>> diff --git a/include/uapi/rdma/ionic-abi.h b/include/uapi/rdma/ionic-abi.h
>> index 7b589d3e9728..97f695510380 100644
>> --- a/include/uapi/rdma/ionic-abi.h
>> +++ b/include/uapi/rdma/ionic-abi.h
>> @@ -112,4 +112,15 @@ struct ionic_srq_resp {
>>        __aligned_u64 rq_cmb_offset;
>>   };
>>
>> +struct ionic_phc_state {
>> +     __u32 seq;
>> +     __u32 rsvd;
>> +     __aligned_u64 mask;
>> +     __aligned_u64 tick;
>> +     __aligned_u64 nsec;
>> +     __aligned_u64 frac;
>> +     __u32 mult;
>> +     __u32 shift;
>> +};
> 
> You're just exposing kernel timecounter internals.
> Why is this ionic uAPI and not something reusable by other drivers?

The simple answer is just following the same approach as an existing 
implementation.  See struct mlx5_ib_clock_info and 
mlx5_update_clock_info_page().

Making this common might risk presuming that other implementations will 
be a similar design.  Compare these to the sfc driver.  The clock is 
quite different from ionic and mlx5, not using timecounter, because 
instead of a free-running cycle counter the hardware itself provides an 
adjustable clock for timestamping.

^ permalink raw reply

* Re: [PATCH net-next v3 01/12] dt-bindings: net: airoha: Add EN7581 ethernet-ports properties
From: Lorenzo Bianconi @ 2026-04-10 13:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: andrew+netdev, davem, edumazet, pabeni, robh, krzk+dt, conor+dt,
	ansuelsmth, benjamin.larsson, linux-arm-kernel, linux-mediatek,
	netdev, devicetree
In-Reply-To: <20260410025717.1809904-1-kuba@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2119 bytes --]

> This is an AI-generated review of your patch. The human sending this
> email says "For your consideration."
> ---
> dt-bindings: net: airoha: Add EN7581 ethernet-ports properties
> 
> This commit introduces the ethernet-port property to model a net_device
> connected via an external arbiter to the GDM3 or GDM4 ports on EN7581
> and AN7583 SoCs.
> 
> > diff --git a/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml b/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml
> > index fbe2ddcdd909c..ebbd433e9c9fb 100644
> > --- a/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml
> > +++ b/Documentation/devicetree/bindings/net/airoha,en7581-eth.yaml
> > @@ -130,6 +130,30 @@ patternProperties:
> >          maximum: 4
> >          description: GMAC port identifier
> >  
> > +      '#address-cells':
> > +        const: 1
> > +      '#size-cells':
> > +        const: 0
> > +
> > +    patternProperties:
> > +      "^ethernet-port@[0-5]$":
> 
> The commit message states that GDM1 and GDM2 do not support connection
> with the external arbiter. However, since this pattern property appears to
> be placed inside the generic ^ethernet@[1-4]$ block, does this allow a
> device tree to incorrectly configure ethernet-port subnodes on ethernet@1
> or ethernet@2 and still pass schema validation?

ack, I will fix it in v4

> 
> Could this be restricted to GDM3 and GDM4, perhaps by splitting the
> patternProperties or using an if/then block based on the reg property?
> 
> > +        type: object
> > +        unevaluatedProperties: false
> > +        $ref: ethernet-controller.yaml#
> 
> Does referencing ethernet-controller.yaml cause a validation conflict here?
> 
> The ethernet-controller.yaml schema enforces a strict nodename pattern
> of ^ethernet(@.*)?$. Since these new nodes use the -port suffix and are
> named ethernet-port@X, will they unconditionally fail the node name
> validation enforced by the referenced schema during dt_binding_check?

ack, I will fix it in v4

Regards,
Lorenzo

> 
> [ ... ]
> -- 
> pw-bot: cr

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net v2 3/4] iavf: send MAC change request synchronously
From: Przemek Kitszel @ 2026-04-10 13:19 UTC (permalink / raw)
  To: Jose Ignacio Tornos Martinez
  Cc: netdev, stable, edumazet, anthony.l.nguyen, kuba, Jacob Keller,
	intel-wired-lan, pabeni, davem, kohei.enju@gmail.com
In-Reply-To: <89bfd605-1877-4d40-95e1-bfeae6624168@intel.com>

I believe this is a move in right direction,
please also find some more comments

>> +/**
>> + * iavf_poll_virtchnl_response - Poll admin queue for virtchnl response
>> + * @adapter: board private structure
>> + * @condition: callback to check if desired response received
>> + * @cond_data: context data passed to condition callback
>> + * @timeout_ms: maximum time to wait in milliseconds
>> + *
>> + * Polls admin queue and processes all messages until condition 
>> returns true
>> + * or timeout expires. Caller must hold netdev_lock. This can sleep 
>> for up to
>> + * timeout_ms while polling hardware.
>> + *
>> + * Returns 0 on success (condition met), -EAGAIN on timeout or error

kdoc requires "Return:"

>> + */
>> +static int iavf_poll_virtchnl_response(struct iavf_adapter *adapter,

please move it to iavf_virtchnl.[ch], could be useful for other also,
not need to move it then and loose your authorship in default blame view

one more thing is that this could be perhaps integrated with existing
iavf_poll_virtchnl_msg(), which does not call
iavf_virtchnl_completion(), but likely could

>> +                       bool (*condition)(struct iavf_adapter *, void *),
>> +                       void *cond_data,
> 
> this could be const, then no cast on the callsite
> 
>> +                       unsigned int timeout_ms)
>> +{
>> +    struct iavf_hw *hw = &adapter->hw;
>> +    struct iavf_arq_event_info event;
>> +    enum virtchnl_ops v_op;
>> +    enum iavf_status v_ret;
>> +    unsigned long timeout;
>> +    int ret;
>> +
>> +    netdev_assert_locked(adapter->netdev);
>> +
>> +    event.buf_len = IAVF_MAX_AQ_BUF_SIZE;
>> +    event.msg_buf = kzalloc(event.buf_len, GFP_KERNEL);
>> +    if (!event.msg_buf)
>> +        return -ENOMEM;
>> +
>> +    timeout = jiffies + msecs_to_jiffies(timeout_ms);
>> +    while (time_before(jiffies, timeout)) {

please consider do-while (after the rest of changes could be better)

>> +        if (condition(adapter, cond_data)) {
> 
> if condition is met, but timed out, there should be no error
> 
>> +            ret = 0;
>> +            goto out;
>> +        }
>> +
>> +        ret = iavf_clean_arq_element(hw, &event, NULL);

instead of NULL pass "pending" param, if not zero you could omit
next sleep

>> +        if (!ret) {
>> +            v_op = (enum 
>> virtchnl_ops)le32_to_cpu(event.desc.cookie_high);
>> +            v_ret = (enum 
>> iavf_status)le32_to_cpu(event.desc.cookie_low);
>> +
>> +            iavf_virtchnl_completion(adapter, v_op, v_ret,
>> +                         event.msg_buf, event.msg_len);
>> +
>> +            memset(event.msg_buf, 0, IAVF_MAX_AQ_BUF_SIZE);
>> +        }
>> +
>> +        usleep_range(1000, 2000);

very old commit 9e3f23f44f32 ("i40e: reduce wait time for adminq command
completion") said that 50usec is right amount to sleep between checks

> 
> no sleep after message received (ok to do on empty queue)
> 
>> +    }
>> +
>> +    ret = -EAGAIN;
>> +out:
>> +    kfree(event.msg_buf);
>> +    return ret;
>> +}
>> +
>> +/**
>> + * iavf_mac_change_done - Check if MAC change completed
>> + * @adapter: board private structure
>> + * @data: MAC address being checked (as void *)
>> + *
>> + * Callback for iavf_poll_virtchnl_response() to check if MAC change 
>> completed.
>> + *
>> + * Returns true if MAC change completed, false otherwise
>> + */
>> +static bool iavf_mac_change_done(struct iavf_adapter *adapter, void 
>> *data)
>> +{
>> +    const u8 *addr = data;
>> +
>> +    return iavf_is_mac_set_handled(adapter->netdev, addr);
>> +}
>> +
>> +/**
>> + * iavf_set_mac_sync - Synchronously change MAC address
>> + * @adapter: board private structure
>> + * @addr: MAC address to set
>> + *
>> + * Sends MAC change request to PF and polls admin queue for response.
>> + * Caller must hold netdev_lock. This can sleep for up to 2.5 seconds.
>> + *
>> + * Returns 0 on success or error
>> + */
>> +static int iavf_set_mac_sync(struct iavf_adapter *adapter, const u8 
>> *addr)
>> +{
>> +    int ret;
>> +
>> +    netdev_assert_locked(adapter->netdev);
>> +
>> +    ret = iavf_add_ether_addrs(adapter);
>> +    if (ret)
>> +        return ret;
>> +
>> +    return iavf_poll_virtchnl_response(adapter, iavf_mac_change_done,
>> +                       (void *)addr, 2500);
> 
> this function looks elegant, thank you
> 
> I'm a little affraid that this model (if applied to other things than
> setting MAC) will skip some of our "much needed" logic in the watchdog.
> 
> I have not thinked about much yet.
> 
> unrelated: callback looks elegant, but for virtchnl, it is almost always
> the case that we wait for some VC OPCODE to come back, and this is just
> a number. It could be easily coded as a callback too, passing wanted
> value masked in pointer, but I would say that just passing a normal u32
> param will be most clean
> 
> 


^ permalink raw reply

* RE: [PATCH iwl-next] i40e: PTP: set supported flags in ptp_clock_info
From: Korba, Przemyslaw @ 2026-04-10 13:20 UTC (permalink / raw)
  To: Keller, Jacob E, Simon Horman
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	Nguyen, Anthony L, Kitszel, Przemyslaw
In-Reply-To: <dce634db-bd85-4c39-ae01-4272c432f017@intel.com>

> -----Original Message-----
> From: Keller, Jacob E <jacob.e.keller@intel.com>
> Sent: Thursday, March 26, 2026 1:15 AM
> To: Korba, Przemyslaw <przemyslaw.korba@intel.com>; Simon Horman <horms@kernel.org>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>
> Subject: Re: [PATCH iwl-next] i40e: PTP: set supported flags in ptp_clock_info
> 
> On 3/13/2026 6:47 AM, Korba, Przemyslaw wrote:
> >> -----Original Message-----
> >> From: Simon Horman <horms@kernel.org>
> >> Sent: Friday, March 13, 2026 2:35 PM
> >> To: Korba, Przemyslaw <przemyslaw.korba@intel.com>
> >> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> >> <przemyslaw.kitszel@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>
> >> Subject: Re: [PATCH iwl-next] i40e: PTP: set supported flags in ptp_clock_info
> >>
> >> On Wed, Mar 11, 2026 at 12:42:10PM +0000, Korba, Przemyslaw wrote:
> >>>> -----Original Message-----
> >>>> From: Simon Horman <horms@kernel.org>
> >>>> Sent: Tuesday, March 10, 2026 7:25 PM
> >>>> To: Korba, Przemyslaw <przemyslaw.korba@intel.com>
> >>>> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> >>>> <przemyslaw.kitszel@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>
> >>>> Subject: Re: [PATCH iwl-next] i40e: PTP: set supported flags in ptp_clock_info
> >>>>
> >>>> + Jacob
> >>>>
> >>>> On Mon, Mar 09, 2026 at 03:11:51PM +0100, Przemyslaw Korba wrote:
> >>>>> Since upstream commit d9f3e9ecc456 ("net: ptp: introduce
> >>>>> .supported_perout_flags to ptp_clock_info") and commit 7c571ac57d9d ("net:
> >>>>> ptp: introduce .supported_extts_flags to ptp_clock_info"), kernel core
> >>>>> now requires that the driver set the .supported_perout_flags and
> >>>>> .supported_extts_flags fields in PTP clock info. Otherwise, the
> >>>>> additional flags will be rejected by the kernel automatically.
> >>>>>
> >>>>> i40e does not support perout flags, so reject any request with perout
> >>>>> flags.
> >>>>>
> >>>>> Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com>
> >>>>> ---
> >>>>>  drivers/net/ethernet/intel/i40e/i40e_ptp.c | 12 +++++++++++-
> >>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
> >>>>> b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
> >>>>> index 7bcea7d9720f..8d7958692235 100644
> >>>>> --- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
> >>>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
> >>>>> @@ -601,10 +601,18 @@ static int i40e_ptp_feature_enable(struct ptp_clock_info *ptp,
> >>>>>  	/* TODO: Implement flags handling for EXTTS and PEROUT */
> >>>>>  	switch (rq->type) {
> >>>>>  	case PTP_CLK_REQ_EXTTS:
> >>>>> +		if (rq->extts.flags & ~(PTP_ENABLE_FEATURE |
> >>>>> +					PTP_RISING_EDGE |
> >>>>> +					PTP_FALLING_EDGE |
> >>>>> +					PTP_STRICT_FLAGS))
> >>>>> +			return -EOPNOTSUPP;
> >>>>> +
> >>>>>  		func = PTP_PF_EXTTS;
> >>>>>  		chan = rq->extts.index;
> >>>>>  		break;
> >>>>>  	case PTP_CLK_REQ_PEROUT:
> >>>>> +		if (rq->perout.flags)
> >>>>> +			return -EOPNOTSUPP;
> >>>>>  		func = PTP_PF_PEROUT;
> >>>>>  		chan = rq->perout.index;
> >>>>>  		break;
> >>>>
> >>>> I am a little confused.
> >>>>
> >>>> My understanding of the cited patches is that they add checking of flags to the code. So code like the above isn't needed in drivers.
> >>>
> >>> Hi Simon, thank you very much for the review. My understanding is that the driver needs to set the supported flags field, otherwise
> requests
> >> won't go through kernel. The test I've been doing confirm my theory. Here's also example patch, that adds supported flags to drivers:
> >> https://lore.kernel.org/intel-wired-lan/20250414-jk-supported-perout-flags-v2-1-f6b17d15475c@intel.com/
> >>
> >> Sorry for the slow response.
> >>
> >> My understanding is that the hunk above is not required.
> >> But the hunk below is.
> >>
> >
> > Well, you are very correct. Thank you so much for thorough review and let me send a new version!
> >
> Yes, Simon is correct, but we do have to be certain that the driver
> actually implements the facts correctly, i.e. that it will actually
> honor the RISING or FALLING edge, before you actually add the flags to
> the supported flags list.
> 
> I don't see any mention of PTP_RISING_EDGE nor PTP_FALLING_EDGE in the
> driver. Thus, I can't confirm which edge is actually timestamped.
> 
> Thus I would NACK this patch until you can confirm whether the hardware
> either a) timestamps one edge, in which case you should set only that
> flag as allowed, b) timestamps both edges, in which case you should set
> all flags and then explicitly reject the case where only one flag is
> set, or c) can be configured based on which flag is set, in which case
> you should set all the flags and then check the flags when programming
> to enable the appropriate edge.
> 
> This patch does none of these, and is therefor incorrect. Applying it
> will "allow" the userspace to work but they will not get the strict
> behavior of timestamping the desired edge, which completely negates the
> point of the strict mode!
> 
> As an example, look at the ice driver:
> 
> #define GLTSYN_AUX_IN_0_EVNTLVL_RISING_EDGE     BIT(0)
> #define GLTSYN_AUX_IN_0_EVNTLVL_FALLING_EDGE    BIT(1)
> 
>                 /* set event level to requested edge */
>                 if (rq->flags & PTP_FALLING_EDGE)
>                         aux_reg |= GLTSYN_AUX_IN_0_EVNTLVL_FALLING_EDGE;
>                 if (rq->flags & PTP_RISING_EDGE)
>                         aux_reg |= GLTSYN_AUX_IN_0_EVNTLVL_RISING_EDGE;
> 
> 
> It sets the appropriate register values to ensure the correct edges are
> timestamped as requested.
> 
> Thanks,
> Jake

Hi, thank you for your review, and sorry for late response. 
The original point of this patch was to fix the issue, where ts2phc fails due to not seeing supported flags
(now when I think about it iwl-net would be a better place for this patch)
I've read in our documentation FVL supports both rising, falling and both edges, 
but in i40e_ptp_set_timestamp_mode we are hardcoding EVNTLVL register to Rising edge only. 
Implementing other edges would require DCR, and I couldn't find anything like that. 
I think for now setting the rising edge as a supported flag would be the way to go. Do you agree?

^ permalink raw reply

* Re: [PATCH v12 5/6] tls: add hardware offload key update support
From: Sabrina Dubroca @ 2026-04-10 13:25 UTC (permalink / raw)
  To: Rishikesh Jethwani
  Cc: netdev, saeedm, tariqt, mbloch, borisp, john.fastabend, kuba,
	davem, pabeni, edumazet, leon
In-Reply-To: <CAKaoeS3pNe-DH2vA1E7t9NOn3H_ZUGk+cM0LgoYogkWEU0aqLQ@mail.gmail.com>

2026-04-09, 10:46:40 -0700, Rishikesh Jethwani wrote:
> On Mon, Apr 6, 2026 at 1:59 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> >
> > 2026-04-02, 17:55:10 -0600, Rishikesh Jethwani wrote:
> > > During a TLS 1.3 KeyUpdate the NIC key cannot be replaced immediately
> > > if previously encrypted HW records are awaiting ACK. start_rekey sets
> > > up a temporary SW context with the new key and redirects sendmsg through
> > > tls_sw_sendmsg_locked. When no records are pending, complete_rekey runs
> > > inline during setsockopt. Otherwise, clean_acked sets REKEY_READY once
> > > all old-key records are ACKed, and the next sendmsg calls complete_rekey.
> > > complete_rekey flushes remaining SW records, reinstalls HW offload at
> > > the current write_seq, and frees the temporary context.
> > >
> > > If another KeyUpdate arrives while a rekey is already pending,
> > > start_rekey just re-keys the existing SW AEAD in place.
> > >
> > > If complete_rekey fails (tls_dev_add or crypto_aead_setkey),
> > > we stay in SW mode (REKEY_FAILED) until a subsequent rekey
> > > succeeds, while maintaining TLS_HW configuration.
> > >
> > > Tested on Mellanox ConnectX-6 Dx (Crypto Enabled) with multiple
> > > TLS 1.3 key update cycles.
> >
> > Something here doesn't seem to work. I have a very simple
> > client/server pair where one side just loops doing large send()s and
> > does a rekey (send keyupdate + change key) every N iterations (I've
> > set N large enough that it goes about 5 seconds between rekeys), and
> > the other receives all the data and changes its RX key when it sees a
> > keyupdate. If both sides are doing SW, it works. If I configure either
> > side to use offload, decrypt fails after the rekey unless I add a
> > small sleep() just after changing keys on the TX side.
> >
> 
> Found 2 issues with this test case:
> 
> TX HW Rekey: The start_rekey function is missing the EOR (End of
> Record) marker. Fix will be included in the next code submission.

For this, pay attention to the "[ANN] net-next is CLOSED" messages on
netdev, we're approaching the next merge window.

> RX HW Rekey: When the receiver tries to rekey, multiple new records
> are already decrypted by the NIC with old key and present in receive
> queue. To handle these records, we need to store the old key to
> encrypt with the old key and decrypt with the new key. Also, we need
> to consider back-to-back rekeys if we store the old key.
> 
> Any suggestions on how to address this?

I'm sorry, I can't spend this much time on solving this problem. I've
already allocated a significant amount of time to reviewing/testing
this submisison (including pointing out some very basic things). And
it seems you've already found a solution (store and use the old
key(s)).

-- 
Sabrina

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net] ice: fix double free in ice_sf_eth_activate() error path
From: Paul Menzel @ 2026-04-10 13:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: intel-wired-lan, netdev, linux-kernel, Tony Nguyen,
	Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Raczynski, Jiri Pirko,
	Simon Horman, Michal Swiatkowski, stable
In-Reply-To: <2026040919-junior-glue-10d0@gregkh>

Dear Greg,


Thank you for the patch.

Am 09.04.26 um 17:11 schrieb Greg Kroah-Hartman:
> When auxiliary_device_add() fails, the aux_dev_uninit label calls
> auxiliary_device_uninit() and falls through to sf_dev_free and xa_erase.
> The uninit invokes ice_sf_dev_release(), which already frees sf_dev via
> kfree() and erases the entry from ice_sf_aux_id.  The fall-through then
> double-frees sf_dev and double-erases the id.
> 
> This is reachable from userspace via the devlink port function state-set
> netlink command.
> 
> Fix this by returning right after uninit because the release callback
> handles all cleanup correctly.
> 
> Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
> Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Cc: Andrew Lunn <andrew+netdev@lunn.ch>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Piotr Raczynski <piotr.raczynski@intel.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Cc: Simon Horman <horms@kernel.org>
> Cc: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> Fixes: 177ef7f1e2a0 ("ice: base subfunction aux driver")
> Cc: stable <stable@kernel.org>
> Assisted-by: gregkh_clanker_t1000
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
>   drivers/net/ethernet/intel/ice/ice_sf_eth.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_sf_eth.c b/drivers/net/ethernet/intel/ice/ice_sf_eth.c
> index 2cf04bc6edce..6bc8aa896762 100644
> --- a/drivers/net/ethernet/intel/ice/ice_sf_eth.c
> +++ b/drivers/net/ethernet/intel/ice/ice_sf_eth.c
> @@ -304,7 +304,9 @@ ice_sf_eth_activate(struct ice_dynamic_port *dyn_port,
>   	return 0;
>   
>   aux_dev_uninit:
> +	/* ice_sf_dev_release() frees sf_dev and erases the xa entry */
>   	auxiliary_device_uninit(&sf_dev->adev);
> +	return err;
>   sf_dev_free:
>   	kfree(sf_dev);
>   xa_erase:

Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>


Kind regards,

Paul

^ permalink raw reply

* Re: [PATCH net-next v2 02/14] libie: add PCI device initialization helpers to libie
From: Larysa Zaremba @ 2026-04-10 13:32 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Tony Nguyen, davem, kuba, edumazet, andrew+netdev, netdev,
	Phani R Burra, przemyslaw.kitszel, aleksander.lobakin,
	sridhar.samudrala, anjali.singhai, michal.swiatkowski,
	maciej.fijalkowski, emil.s.tantilov, madhu.chittim, joshua.a.hay,
	jacob.e.keller, jayaprakash.shanmugam, jiri, horms, corbet,
	richardcochran, linux-doc, bhelgaas, linux-pci, Bharath R,
	Samuel Salin, Aleksandr Loktionov
In-Reply-To: <2e618260-2153-4c36-be61-d2329c9da13f@redhat.com>

On Thu, Apr 09, 2026 at 10:56:26AM +0200, Paolo Abeni wrote:
> On 4/3/26 9:49 PM, Tony Nguyen wrote:
> > +	mr = libie_find_mmio_region(&mmio_info->mmio_list, offset, size,
> > +				    bar_idx);
> > +	if (mr) {
> > +		pci_warn(pdev,
> > +			 "Mapping of BAR%u (offset=%llu, size=%llu) intersecting region (offset=%llu, size=%llu) already exists\n",
> > +			 bar_idx, (unsigned long long)mr->offset,
> > +			 (unsigned long long)mr->size,
> > +			 (unsigned long long)offset, (unsigned long long)size);
> > +		return mr->offset <= offset &&
> > +		       mr->offset + mr->size >= offset + size;
> 
> Sashiko says:
> 
> ---
> Does returning true here without creating a new tracking object leave
> the new mapping tied to the original mapping's lifetime?
> If the driver unmaps the original region, iounmap() is called and the
> tracking object is freed. Any cached virtual address pointers to the
> sub-region would then become a use-after-free, and subsequent queries
> for the sub-region would fail.
> ---

Current users map and unmap region groups in a 'map 1-2-3 and unmap 3-2-1' 
fashion, and this is not expected to change, so should be fine as-is.

> 
> /P
> 

^ permalink raw reply

* Re: [patch 19/38] kcsan: Replace get_cycles() usage
From: Marco Elver @ 2026-04-10 13:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Dmitry Vyukov, kasan-dev, Arnd Bergmann, x86, Lu Baolu,
	iommu, Michael Grzeschik, netdev, linux-wireless, Herbert Xu,
	linux-crypto, Vlastimil Babka, linux-mm, David Woodhouse,
	Bernie Thompson, linux-fbdev, Theodore Tso, linux-ext4,
	Andrew Morton, Uladzislau Rezki, Andrey Ryabinin, Thomas Sailer,
	linux-hams, Jason A. Donenfeld, Richard Henderson, linux-alpha,
	Russell King, linux-arm-kernel, Catalin Marinas, Huacai Chen,
	loongarch, Geert Uytterhoeven, linux-m68k, Dinh Nguyen,
	Jonas Bonn, linux-openrisc, Helge Deller, linux-parisc,
	Michael Ellerman, linuxppc-dev, Paul Walmsley, linux-riscv,
	Heiko Carstens, linux-s390, David S. Miller, sparclinux
In-Reply-To: <20260410120318.862164111@kernel.org>

On Fri, 10 Apr 2026 at 14:20, Thomas Gleixner <tglx@kernel.org> wrote:
>
> KCSAN uses get_cycles() for two purposes:
>
>   1) Seeding the random state with get_cycles() is a historical leftover.
>
>   2) The microbenchmark uses get_cycles(), which provides an unit less
>      counter value and is not guaranteed to be functional on all
>      systems/platforms.
>
> Use random_get_entropy() for seeding the random state and ktime_get() which
> is universaly functional and provides at least a comprehensible unit.
>
> This is part of a larger effort to remove get_cycles() usage from
> non-architecture code.
>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: Marco Elver <elver@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: kasan-dev@googlegroups.com

Reviewed-by: Marco Elver <elver@google.com>

> ---
>  kernel/kcsan/core.c    |    2 +-
>  kernel/kcsan/debugfs.c |    8 ++++----
>  2 files changed, 5 insertions(+), 5 deletions(-)
>
> --- a/kernel/kcsan/core.c
> +++ b/kernel/kcsan/core.c
> @@ -798,7 +798,7 @@ void __init kcsan_init(void)
>         BUG_ON(!in_task());
>
>         for_each_possible_cpu(cpu)
> -               per_cpu(kcsan_rand_state, cpu) = (u32)get_cycles();
> +               per_cpu(kcsan_rand_state, cpu) = (u32)random_get_entropy();
>
>         /*
>          * We are in the init task, and no other tasks should be running;
> --- a/kernel/kcsan/debugfs.c
> +++ b/kernel/kcsan/debugfs.c
> @@ -58,7 +58,7 @@ static noinline void microbenchmark(unsi
>  {
>         const struct kcsan_ctx ctx_save = current->kcsan_ctx;
>         const bool was_enabled = READ_ONCE(kcsan_enabled);
> -       u64 cycles;
> +       ktime_t nsecs;
>
>         /* We may have been called from an atomic region; reset context. */
>         memset(&current->kcsan_ctx, 0, sizeof(current->kcsan_ctx));
> @@ -70,16 +70,16 @@ static noinline void microbenchmark(unsi
>
>         pr_info("%s begin | iters: %lu\n", __func__, iters);
>
> -       cycles = get_cycles();
> +       nsecs = ktime_get();
>         while (iters--) {
>                 unsigned long addr = iters & ((PAGE_SIZE << 8) - 1);
>                 int type = !(iters & 0x7f) ? KCSAN_ACCESS_ATOMIC :
>                                 (!(iters & 0xf) ? KCSAN_ACCESS_WRITE : 0);
>                 __kcsan_check_access((void *)addr, sizeof(long), type);
>         }
> -       cycles = get_cycles() - cycles;
> +       nsecs = ktime_get() - nsecs;
>
> -       pr_info("%s end   | cycles: %llu\n", __func__, cycles);
> +       pr_info("%s end   | nsecs: %llu\n", __func__, nsecs);
>
>         WRITE_ONCE(kcsan_enabled, was_enabled);
>         /* restore context */
>

^ permalink raw reply

* Re: [patch 09/38] iommu/vt-d: Use sched_clock() instead of get_cycles()
From: Baolu Lu @ 2026-04-10 13:45 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: baolu.lu, x86, iommu, Arnd Bergmann, Michael Grzeschik, netdev,
	linux-wireless, Herbert Xu, linux-crypto, Vlastimil Babka,
	linux-mm, David Woodhouse, Bernie Thompson, linux-fbdev,
	Theodore Tso, linux-ext4, Andrew Morton, Uladzislau Rezki,
	Marco Elver, Dmitry Vyukov, kasan-dev, Andrey Ryabinin,
	Thomas Sailer, linux-hams, Jason A. Donenfeld, Richard Henderson,
	linux-alpha, Russell King, linux-arm-kernel, Catalin Marinas,
	Huacai Chen, loongarch, Geert Uytterhoeven, linux-m68k,
	Dinh Nguyen, Jonas Bonn, linux-openrisc, Helge Deller,
	linux-parisc, Michael Ellerman, linuxppc-dev, Paul Walmsley,
	linux-riscv, Heiko Carstens, linux-s390, David S. Miller,
	sparclinux
In-Reply-To: <20260410120318.187521447@kernel.org>

On 4/10/2026 8:19 PM, Thomas Gleixner wrote:
> Calculating the timeout from get_cycles() is a historical leftover without
> any functional requirement.
> 
> Use ktime_get() instead.

The subject line says "Use sched_clock() ...", but the implementation
actually uses ktime_get(). Is it a typo or anything I misunderstood?

Other parts look good to me,

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

> 
> Signed-off-by: Thomas Gleixner<tglx@kernel.org>
> Cc:x86@kernel.org
> Cc: Lu Baolu<baolu.lu@linux.intel.com>
> Cc:iommu@lists.linux.dev
> ---
>   arch/x86/include/asm/iommu.h |    3 ---
>   drivers/iommu/intel/dmar.c   |    4 ++--
>   drivers/iommu/intel/iommu.h  |    8 ++++++--
>   3 files changed, 8 insertions(+), 7 deletions(-)

Thanks,
baolu

^ permalink raw reply

* Re: [PATCH net v3 0/5] bonding: 3ad: fix carrier state with no valid slaves
From: Jay Vosburgh @ 2026-04-10 14:02 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Louis Scalbert, Jonas Gorski, netdev, andrew+netdev, edumazet,
	pabeni, fbl, andy, shemminger, maheshb
In-Reply-To: <20260409193813.249061e4@kernel.org>

Jakub Kicinski <kuba@kernel.org> wrote:

>On Thu, 9 Apr 2026 13:49:06 +0200 Louis Scalbert wrote:
>> > Signalling link up too early can cause issues for some protocols that
>> > may change behavior in the absence of PDUs from a link partner.  
>> 
>> I agree with your point. I have observed issues with
>> keepalived VRRP when it is configured on top of a bonding interface.
>> 
>> When the bond reports carrier as up while no slave is actually able to
>> receive traffic (due to the partner not being ready, as indicated by the
>> absence of LACP negotiation), the VRRP process interprets the interface
>> as operational. At the same time, the absence of received VRRP
>> advertisements is interpreted as if it were the only router on the
>> segment. As a result, it transitions to the MASTER state.
>> 
>> In reality, another VRRP router may already be MASTER and actively
>> sending advertisements, but those packets are not received due to the
>> bonding state. This leads to a split-brain condition with multiple
>> masters on the network.
>> 
>> Such a situation breaks the assumptions of
>> VRRP, where a single MASTER is expected to handle traffic,
>> and can result in traffic inconsistency or loss when upper-layer
>> processes rely on this behavior.
>
>It's been like this for what, 15 years?
>We have to draw the line between fix and improvement somewhere.
>In Linux we generally draw the line at regressions+crashes/security
>bugs. If a use case never worked correctly it's not getting fixed.
>It's getting enabled.
>
>That said, if Jay wants it as a fix I'm not going to argue.

	My apologies for not responding sooner, I was under the weather
for a few days and am just now catching up.

	My general position is that, first, the current bonding behavior
is compliant to the standard, which gives latitude in how individual
ports (those not partnered with a LACP peer) are managed.  The stated
Cisco, et al, behavior of denying use of such ports is also compliant.
One thing we cannot do is run such ports together as a logical link
aggregation, for both standards compliance reasons and well as the
practical issues of creating topology loops or duplicate frames.

	Second, the minutiae of the standard is not the real issue at
hand, which is that bonding's behavior of enabling an un-partnered port
and setting the bond to carrier up based on carrier state does cause
communications issues with peers that behave differently and deny use of
un-partnered ports.

	As such, in principle I'm not opposed to an option that would
essentially tell bonding to only use LACP-partnered ports for the active
aggregator.  I'll have some time over the weekend to review the patch
set in detail and respond to the specifics.

	-J

---
	-Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply

* Re: [PATCH net] atm: mpoa: fix mpc->dev refcount across mpoad restart
From: Simon Horman @ 2026-04-10 14:03 UTC (permalink / raw)
  To: Shuvam Pandey
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel, syzbot+5ec223ccb83b24ef982f
In-Reply-To: <177555091252.59118.13093904987038690781@gmail.com>

On Tue, Apr 07, 2026 at 02:20:12PM +0545, Shuvam Pandey wrote:
> atm: mpoa: fix mpc->dev refcount across mpoad restart
> 
> mpoad_close() drops the reference held in mpc->dev with dev_put(), but
> the mpoa_client stays alive and keeps the same device pointer.
> 
> A later mpoad attach reuses the existing mpoa_client without
> reacquiring that reference, so the next close can hit the netdevice
> refcount warning. Keep the LEC device reference with the mpoa_client
> until the device unregisters or the client is torn down.

Hi Shuvam,

Including the stack trace would be useful, IMHO.

> 
> Reported-by: syzbot+5ec223ccb83b24ef982f@syzkaller.appspotmail.com
> Link: https://groups.google.com/g/syzkaller-bugs/c/qhZ5MJfLBOE/m/UnotmgRdAQAJ

A fixes tag should go here, indicating the commit which introduced
the bug - typically the commit where it first manifested.

> Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
> ---
>  net/atm/mpc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/atm/mpc.c b/net/atm/mpc.c
> index ce8e9780373b..1e9b9c633e8b 100644
> --- a/net/atm/mpc.c
> +++ b/net/atm/mpc.c
> @@ -886,7 +886,6 @@ static void mpoad_close(struct atm_vcc *vcc)
>  		struct lec_priv *priv = netdev_priv(mpc->dev);
>  		priv->lane2_ops->associate_indicator = NULL;
>  		stop_mpc(mpc);
> -		dev_put(mpc->dev);
>  	}
>  
>  	mpc->in_ops->destroy_cache(mpc);

I'm not really familiar with the object life cycle here.

But it strikes me that the purpose of dev_put() in a close callback
is to indicate the device no longer needs to be held for the connection
being closed. And, if so, I wonder if the problem here is that
there is no corresponding dev_hold() in the (unimplemented) open callback.
(I am assuming there is a 1:1 symmetry between open and close.)

> @@ -1508,6 +1507,8 @@ static void __exit atm_mpoa_cleanup(void)
>  			priv = netdev_priv(mpc->dev);
>  			if (priv->lane2_ops != NULL)
>  				priv->lane2_ops->associate_indicator = NULL;
> +			dev_put(mpc->dev);
> +			mpc->dev = NULL;
>  		}
>  		ddprintk("about to clear caches\n");
>  		mpc->in_ops->destroy_cache(mpc);

AI generated review flags that atm_mpoa_cleanup() already calls
unregister_netdevice_notifier() which will trigger the
NETDEV_UNREGISTER handler in mpoa_event_listener() which already calls
dev_put.

This seems to duplicate that. Which I expect is undesirable.

^ permalink raw reply

* Re: [PATCH net-next v3 0/4] net: move .getsockopt away from __user buffers
From: David Laight @ 2026-04-10 14:15 UTC (permalink / raw)
  To: Breno Leitao
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, metze, axboe,
	Stanislav Fomichev, io-uring, bpf, netdev, Linus Torvalds,
	linux-kernel, kernel-team
In-Reply-To: <adjkn7p4U13WBs2o@gmail.com>

On Fri, 10 Apr 2026 05:29:37 -0700
Breno Leitao <leitao@debian.org> wrote:

...
> Since these legacy protocols represent less than 1% of the cases, I'd prefer to
> optimize for the common path and handle the exceptional cases as exceptions.

Optimising for the common path would pass a fixed size on-stack buffer
(say 64 bytes - or perhaps enough for an IPv6 address) through to the protocol
code, update it, and do the copy_to_user() in the system call stub.
Write to small constant offsets onto the buffer could then just be direct
assignments through some pointer.

The 'buffer descriptor' would need to contain any associated user address for
the unusual cases (also look at the sctp socket options).

But I still think you need functions that read/write at offsets into
the buffer rather than using iov_iter - which is only really designed
for sequential access. It isn't as though you need scatter-gather.

	David

^ permalink raw reply

* Re: [PATCH net-next] net: Rename ifq_idx to rxq_idx in netif_mp_* helpers
From: Nikolay Aleksandrov @ 2026-04-10 14:17 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev, kuba, dw, pabeni
In-Reply-To: <20260410130602.552600-1-daniel@iogearbox.net>

On Fri, Apr 10, 2026 at 03:06:02PM +0200, Daniel Borkmann wrote:
> Rename the leftover ifq_idx parameter naming to rxq_idx to be
> consistent with the rest of the file and the header declaration.
> Back then this was taken out of the queue leasing series given
> the cleanup is independent. No functional change.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Link: https://lore.kernel.org/netdev/20260131160237.07789674@kernel.org
> ---
>  net/core/netdev_rx_queue.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>

> diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
> index 8771e06a0afe..de4dac4c88b3 100644
> --- a/net/core/netdev_rx_queue.c
> +++ b/net/core/netdev_rx_queue.c
> @@ -275,14 +275,14 @@ int netif_mp_open_rxq(struct net_device *dev, unsigned int rxq_idx,
>  	return ret;
>  }
>  
> -static void __netif_mp_close_rxq(struct net_device *dev, unsigned int ifq_idx,
> +static void __netif_mp_close_rxq(struct net_device *dev, unsigned int rxq_idx,
>  				 const struct pp_memory_provider_params *old_p)
>  {
>  	struct netdev_queue_config qcfg[2];
>  	struct netdev_rx_queue *rxq;
>  	int err;
>  
> -	rxq = __netif_get_rx_queue(dev, ifq_idx);
> +	rxq = __netif_get_rx_queue(dev, rxq_idx);
>  
>  	/* Callers holding a netdev ref may get here after we already
>  	 * went thru shutdown via dev_memory_provider_uninstall().
> @@ -295,28 +295,28 @@ static void __netif_mp_close_rxq(struct net_device *dev, unsigned int ifq_idx,
>  			 rxq->mp_params.mp_priv != old_p->mp_priv))
>  		return;
>  
> -	netdev_queue_config(dev, ifq_idx, &qcfg[0]);
> +	netdev_queue_config(dev, rxq_idx, &qcfg[0]);
>  	memset(&rxq->mp_params, 0, sizeof(rxq->mp_params));
> -	netdev_queue_config(dev, ifq_idx, &qcfg[1]);
> +	netdev_queue_config(dev, rxq_idx, &qcfg[1]);
>  
> -	err = netdev_rx_queue_reconfig(dev, ifq_idx, &qcfg[0], &qcfg[1]);
> +	err = netdev_rx_queue_reconfig(dev, rxq_idx, &qcfg[0], &qcfg[1]);
>  	WARN_ON(err && err != -ENETDOWN);
>  }
>  
> -void netif_mp_close_rxq(struct net_device *dev, unsigned int ifq_idx,
> +void netif_mp_close_rxq(struct net_device *dev, unsigned int rxq_idx,
>  			const struct pp_memory_provider_params *old_p)
>  {
> -	if (WARN_ON_ONCE(ifq_idx >= dev->real_num_rx_queues))
> +	if (WARN_ON_ONCE(rxq_idx >= dev->real_num_rx_queues))
>  		return;
> -	if (!netif_rxq_is_leased(dev, ifq_idx))
> -		return __netif_mp_close_rxq(dev, ifq_idx, old_p);
> +	if (!netif_rxq_is_leased(dev, rxq_idx))
> +		return __netif_mp_close_rxq(dev, rxq_idx, old_p);
>  
> -	if (!__netif_get_rx_queue_lease(&dev, &ifq_idx, NETIF_VIRT_TO_PHYS)) {
> +	if (!__netif_get_rx_queue_lease(&dev, &rxq_idx, NETIF_VIRT_TO_PHYS)) {
>  		WARN_ON_ONCE(1);
>  		return;
>  	}
>  	netdev_lock(dev);
> -	__netif_mp_close_rxq(dev, ifq_idx, old_p);
> +	__netif_mp_close_rxq(dev, rxq_idx, old_p);
>  	netdev_unlock(dev);
>  }
>  
> @@ -339,11 +339,11 @@ void netif_rxq_cleanup_unlease(struct netdev_rx_queue *phys_rxq,
>  			       struct netdev_rx_queue *virt_rxq)
>  {
>  	struct pp_memory_provider_params *p = &phys_rxq->mp_params;
> -	unsigned int ifq_idx = get_netdev_rx_queue_index(phys_rxq);
> +	unsigned int rxq_idx = get_netdev_rx_queue_index(phys_rxq);
>  
>  	if (!p->mp_ops)
>  		return;
>  
>  	__netif_mp_uninstall_rxq(virt_rxq, p);
> -	__netif_mp_close_rxq(phys_rxq->dev, ifq_idx, p);
> +	__netif_mp_close_rxq(phys_rxq->dev, rxq_idx, p);
>  }
> -- 
> 2.43.0
> 

^ permalink raw reply

* RE: [PATCH v5 net-next 0/8] dpll/ice: Add TXC DPLL type and full TX reference clock control for E825
From: Nitka, Grzegorz @ 2026-04-10 14:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org, Oros, Petr,
	richardcochran@gmail.com, andrew+netdev@lunn.ch,
	Kitszel, Przemyslaw, Nguyen, Anthony L,
	Prathosh.Satish@microchip.com, Vecera, Ivan, jiri@resnulli.us,
	Kubalewski, Arkadiusz, vadim.fedorenko@linux.dev,
	donald.hunter@gmail.com, horms@kernel.org, pabeni@redhat.com,
	davem@davemloft.net, edumazet@google.com
In-Reply-To: <20260409181041.395a0c37@kernel.org>



> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, April 10, 2026 3:11 AM
> To: Nitka, Grzegorz <grzegorz.nitka@intel.com>
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; intel-wired-
> lan@lists.osuosl.org; Oros, Petr <poros@redhat.com>;
> richardcochran@gmail.com; andrew+netdev@lunn.ch; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Prathosh.Satish@microchip.com; Vecera,
> Ivan <ivecera@redhat.com>; jiri@resnulli.us; Kubalewski, Arkadiusz
> <arkadiusz.kubalewski@intel.com>; vadim.fedorenko@linux.dev;
> donald.hunter@gmail.com; horms@kernel.org; pabeni@redhat.com;
> davem@davemloft.net; edumazet@google.com
> Subject: Re: [PATCH v5 net-next 0/8] dpll/ice: Add TXC DPLL type and full TX
> reference clock control for E825
> 
> On Thu, 9 Apr 2026 11:21:35 +0000 Nitka, Grzegorz wrote:
> > > On Fri,  3 Apr 2026 01:06:18 +0200 Grzegorz Nitka wrote:
> > > > This series adds TX reference clock support for E825 devices and
> exposes
> > > > TX clock selection and synchronization status via the Linux DPLL
> > > > subsystem.
> > > > E825 hardware contains a dedicated Tx clock (TXC) domain that is
> > > > distinct
> > > > from PPS and EEC. TX reference clock selection is device‑wide, shared
> > > > across ports, and mediated by firmware as part of the link bring‑up
> > > > process. As a result, TX clock selection intent may differ from the
> > > > effective hardware configuration, and software must verify the
> outcome
> > > > after link‑up.
> > > > To support this, the series introduces TXC support incrementally across
> > > > the DPLL core and the ice driver:
> > > >
> > > > - add a new DPLL type (TXC) to represent transmit clock generators;
> > >
> > > I'm not grasping why this is needed, isn't it part of any EEC system
> > > that the DPLL can drive the TXC? Is your system going to expose multiple
> > > DPLLs now for one NIC?
> >
> > Hello Jakub,
> > For E825 device, the short answer is yes. We have platform EEC now and
> > we want to add:
> > - TXC DPLLs per port, and
> > - PPS DPLL for TSPLL config purposes (in the near future)
> >
> > EEC (Ethernet Equipment Clock) type DPLL is designed to control multiple
> > source signals (internal-NIC or external), where one drives the dpll device,
> > where multiple outputs are possible, each could drive various components
> > as well as propagate signal to external devices.
> > TXC is specific dpll device that associated with single ETH port to control it's
> source,
> > there is no need to declare any outputs as the single output is already
> determined.
> > Basically, having TXC DPLL indicates per port control over SyncE (or some
> external)
> > clock source.
> 
> Could you share a diagram of how things are wired up?
> DPLL can have multiple outputs and multiple inputs. I'm not getting why
> a single device would have to have multiple actual DPLLs (which makes
> me worried this is just some "convenient use of the uAPI")

Hello Jakub,

Here is the high-level connection diagram for E825 device. I hope you find it helpful:

  +------------------------------------------------------------------+        
  |                                                                  |        
  |                           +-----------------------------+        |        
  |                           |                             |        |        
  |                           |         MAC                 |        |        
  |                           |+------------+-----+         |        |        
  |                           ||RX/1588 |PHC|tspll<----\    |        |        
+---+----+                    ||MUX     +---+-^---|    |    |        |        
| E | RX >--------------------->              |   >--\ |    |        |        
| T |    |    /---------------->              |   >-\| |    |        |        
| H |----+    |               |+---------+----^---+ || |    |        |        
| 1 | TX <----|----------------+TX MUX   < OCXO   | || |    |        |        
|   |PLL |    |               ||         |--------| || |    |        |        
+---+----+    |           /----+         <-ext_ref<-||-|----|---------ext_ref 
| E | RX >----/           |   ||         |--------+ || |    |        |        
| T |    |                |   ||         <  SyncE | || |    |        |        
| H |----+                |   |+-----------^------+ || |    |        |        
| 2 | TX <----------------/   |            | /------||-/    |        |        
|   |PLL |                    +------------|-|------||------+        |        
+---+----+                              /--/ |      ||               |        
| . | RX >---                           |    |      ||               |        
| . |    |                   +----------|----|------||--+            |        
| . |----+                   |        +-^-+--^+     ||  |            |        
|   | TX <---                |        |EEC|PPS|     ||  |            | 
|   |PLL |                   |        +-------+     ||  |            |        
+---+----+                   |        |       <-CLK0/|  |            |        
| E | RX >---                |        |  DPLL |      |  |            |        
| T |    |                   |        |       <-CLK1-/  |            |        
| H |----+                   |        |       |         |            |        
| X | TX <---                |        |       <---SMA---<            |        
|   |PLL |                   |        |       |         |            |        
+---+----+                   |        |       <---GPS---<            |        
  |                          |        |       |         |            |      
  |                          |        |       <---...---<            |        
  |                          |        |       |         |            |        
  |                          |        +-------+         |            |        
  |                          | External timing module   |            |        
  |                          +--------------------------+            |        
  +-------------------------------------------------------------------+        
 
Before this series, we tried different approaches.
One of them was to create MUX pin associated with netdev interface.
EXT_REF and SYNCE pins were registered with this MUX pin.
However I recall there were at least two issues with this solution:
- when using DPLL subsystem not all the connections/relations were visible
  from DPLL pin-get perspective. RT netlink was required
- due to mixing pins from different modules (like fwnode based pin from zl driver
  and the pins from ice), we were not able to safely clean the references between
  pins and dpll (basicaly .. we observed crashes)

Proposed solution just seems to be clean and fully reflects current
connection topology.

What's actually your biggest concern?
The fact we introduce a new DPLL type? Or multiply DPLL instances? Or both?
Do you prefer to see "one big" DPLL with 16 pins in our case (8 ports x 2 tx-clk pins)?
Each pin with the name like, for example, PF0-SyncE/PF0-eRef etc.?

^ permalink raw reply

* [PATCH net-next] net: phy: call phy_init_hw() in phy resume path
From: Biju @ 2026-04-10 14:29 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Ovidiu Panait, Russell King, netdev, linux-kernel,
	Geert Uytterhoeven, Prabhakar Mahadev Lad, Biju Das,
	linux-renesas-soc, Biju Das

From: Ovidiu Panait <ovidiu.panait.rb@renesas.com>

When mac_managed_pm flag is set, mdio_bus_phy_resume() is skipped, so
phy_init_hw(), which performs soft_reset and config_init, is not called
during resume.

This is inconsistent with the non-mac_managed_pm path, where
mdio_bus_phy_resume() calls phy_init_hw() before phy_resume() on every
resume.

To align both paths, add a phy_init_hw() call at the top of
__phy_resume(), before invoking the driver's resume callback. This
guarantees the PHY undergoes soft reset and re-initialization regardless
of whether PM is managed by the MAC or the MDIO bus.

Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
---
 drivers/net/phy/phy_device.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 0edff47478c2..8255f4208d66 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2008,6 +2008,10 @@ int __phy_resume(struct phy_device *phydev)
 	if (!phydrv || !phydrv->resume)
 		return 0;
 
+	ret = phy_init_hw(phydev);
+	if (ret)
+		return ret;
+
 	ret = phydrv->resume(phydev);
 	if (!ret)
 		phydev->suspended = false;
-- 
2.43.0


^ permalink raw reply related

* [syzbot] [mptcp?] possible deadlock in mptcp_pm_nl_set_flags
From: syzbot @ 2026-04-10 14:41 UTC (permalink / raw)
  To: davem, edumazet, geliang, horms, kuba, linux-kernel, martineau,
	matttbe, mptcp, netdev, pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    591cd656a1bf Linux 7.0-rc7
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1779ce06580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5a3e5e8c17cc174e
dashboard link: https://syzkaller.appspot.com/bug?extid=dfa28bb6444d8f169cbb
compiler:       gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/d900f083ada3/non_bootable_disk-591cd656.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/3e99d0e29df0/vmlinux-591cd656.xz
kernel image: https://storage.googleapis.com/syzbot-assets/5877fee0a056/bzImage-591cd656.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+dfa28bb6444d8f169cbb@syzkaller.appspotmail.com

netlink: 8 bytes leftover after parsing attributes in process `syz.8.6915'.
netlink: 8 bytes leftover after parsing attributes in process `syz.8.6915'.
======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G             L     
------------------------------------------------------
syz.8.6915/29783 is trying to acquire lock:
ffffffff8e9ab8a0 (fs_reclaim){+.+.}-{0:0}, at: might_alloc include/linux/sched/mm.h:317 [inline]
ffffffff8e9ab8a0 (fs_reclaim){+.+.}-{0:0}, at: slab_pre_alloc_hook mm/slub.c:4489 [inline]
ffffffff8e9ab8a0 (fs_reclaim){+.+.}-{0:0}, at: slab_alloc_node mm/slub.c:4843 [inline]
ffffffff8e9ab8a0 (fs_reclaim){+.+.}-{0:0}, at: kmem_cache_alloc_lru_noprof+0x51/0x6e0 mm/slub.c:4885

but task is already holding lock:
ffff888034761ae0 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline]
ffff888034761ae0 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_pm_nl_set_flags_all net/mptcp/pm_kernel.c:1482 [inline]
ffff888034761ae0 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_pm_nl_set_flags+0x605/0xd30 net/mptcp/pm_kernel.c:1551

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #6 (sk_lock-AF_INET){+.+.}-{0:0}:
       lock_sock_nested+0x41/0xf0 net/core/sock.c:3780
       lock_sock include/net/sock.h:1709 [inline]
       inet_shutdown+0x67/0x410 net/ipv4/af_inet.c:919
       nbd_mark_nsock_dead+0xae/0x5c0 drivers/block/nbd.c:318
       recv_work+0x5fb/0x8c0 drivers/block/nbd.c:1021
       process_one_work+0xa23/0x19a0 kernel/workqueue.c:3276
       process_scheduled_works kernel/workqueue.c:3359 [inline]
       worker_thread+0x5ef/0xe50 kernel/workqueue.c:3440
       kthread+0x370/0x450 kernel/kthread.c:436
       ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #5 (&nsock->tx_lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       nbd_handle_cmd drivers/block/nbd.c:1143 [inline]
       nbd_queue_rq+0x428/0x1080 drivers/block/nbd.c:1207
       blk_mq_dispatch_rq_list+0x422/0x1e70 block/blk-mq.c:2148
       __blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
       blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
       __blk_mq_sched_dispatch_requests+0xcea/0x1620 block/blk-mq-sched.c:307
       blk_mq_sched_dispatch_requests+0xd7/0x1c0 block/blk-mq-sched.c:329
       blk_mq_run_hw_queue+0x23c/0x670 block/blk-mq.c:2386
       blk_mq_dispatch_list+0x51d/0x1360 block/blk-mq.c:2949
       blk_mq_flush_plug_list block/blk-mq.c:2997 [inline]
       blk_mq_flush_plug_list+0x130/0x600 block/blk-mq.c:2969
       __blk_flush_plug+0x2c4/0x4b0 block/blk-core.c:1230
       blk_finish_plug block/blk-core.c:1257 [inline]
       __submit_bio+0x584/0x6c0 block/blk-core.c:649
       __submit_bio_noacct_mq block/blk-core.c:722 [inline]
       submit_bio_noacct_nocheck+0x562/0xc10 block/blk-core.c:753
       submit_bio_noacct+0xd17/0x2010 block/blk-core.c:884
       blk_crypto_submit_bio include/linux/blk-crypto.h:203 [inline]
       submit_bh_wbc+0x59c/0x770 fs/buffer.c:2821
       submit_bh fs/buffer.c:2826 [inline]
       block_read_full_folio+0x4c8/0x8e0 fs/buffer.c:2458
       filemap_read_folio+0xfc/0x3b0 mm/filemap.c:2501
       do_read_cache_folio+0x2d7/0x6b0 mm/filemap.c:4101
       read_mapping_folio include/linux/pagemap.h:1017 [inline]
       read_part_sector+0xd1/0x370 block/partitions/core.c:723
       adfspart_check_ICS+0x93/0x910 block/partitions/acorn.c:360
       check_partition block/partitions/core.c:142 [inline]
       blk_add_partitions block/partitions/core.c:590 [inline]
       bdev_disk_changed+0x7f8/0xc80 block/partitions/core.c:694
       blkdev_get_whole+0x187/0x290 block/bdev.c:764
       bdev_open+0x2c7/0xe40 block/bdev.c:973
       blkdev_open+0x34e/0x4f0 block/fops.c:697
       do_dentry_open+0x6d8/0x1660 fs/open.c:949
       vfs_open+0x82/0x3f0 fs/open.c:1081
       do_open fs/namei.c:4677 [inline]
       path_openat+0x208c/0x31a0 fs/namei.c:4836
       do_file_open+0x20e/0x430 fs/namei.c:4865
       do_sys_openat2+0x10d/0x1e0 fs/open.c:1366
       do_sys_open fs/open.c:1372 [inline]
       __do_sys_openat fs/open.c:1388 [inline]
       __se_sys_openat fs/open.c:1383 [inline]
       __x64_sys_openat+0x12d/0x210 fs/open.c:1383
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #4 (&cmd->lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       nbd_queue_rq+0xba/0x1080 drivers/block/nbd.c:1199
       blk_mq_dispatch_rq_list+0x422/0x1e70 block/blk-mq.c:2148
       __blk_mq_do_dispatch_sched block/blk-mq-sched.c:168 [inline]
       blk_mq_do_dispatch_sched block/blk-mq-sched.c:182 [inline]
       __blk_mq_sched_dispatch_requests+0xcea/0x1620 block/blk-mq-sched.c:307
       blk_mq_sched_dispatch_requests+0xd7/0x1c0 block/blk-mq-sched.c:329
       blk_mq_run_hw_queue+0x23c/0x670 block/blk-mq.c:2386
       blk_mq_dispatch_list+0x51d/0x1360 block/blk-mq.c:2949
       blk_mq_flush_plug_list block/blk-mq.c:2997 [inline]
       blk_mq_flush_plug_list+0x130/0x600 block/blk-mq.c:2969
       __blk_flush_plug+0x2c4/0x4b0 block/blk-core.c:1230
       blk_finish_plug block/blk-core.c:1257 [inline]
       __submit_bio+0x584/0x6c0 block/blk-core.c:649
       __submit_bio_noacct_mq block/blk-core.c:722 [inline]
       submit_bio_noacct_nocheck+0x562/0xc10 block/blk-core.c:753
       submit_bio_noacct+0xd17/0x2010 block/blk-core.c:884
       blk_crypto_submit_bio include/linux/blk-crypto.h:203 [inline]
       submit_bh_wbc+0x59c/0x770 fs/buffer.c:2821
       submit_bh fs/buffer.c:2826 [inline]
       block_read_full_folio+0x4c8/0x8e0 fs/buffer.c:2458
       filemap_read_folio+0xfc/0x3b0 mm/filemap.c:2501
       do_read_cache_folio+0x2d7/0x6b0 mm/filemap.c:4101
       read_mapping_folio include/linux/pagemap.h:1017 [inline]
       read_part_sector+0xd1/0x370 block/partitions/core.c:723
       adfspart_check_ICS+0x93/0x910 block/partitions/acorn.c:360
       check_partition block/partitions/core.c:142 [inline]
       blk_add_partitions block/partitions/core.c:590 [inline]
       bdev_disk_changed+0x7f8/0xc80 block/partitions/core.c:694
       blkdev_get_whole+0x187/0x290 block/bdev.c:764
       bdev_open+0x2c7/0xe40 block/bdev.c:973
       blkdev_open+0x34e/0x4f0 block/fops.c:697
       do_dentry_open+0x6d8/0x1660 fs/open.c:949
       vfs_open+0x82/0x3f0 fs/open.c:1081
       do_open fs/namei.c:4677 [inline]
       path_openat+0x208c/0x31a0 fs/namei.c:4836
       do_file_open+0x20e/0x430 fs/namei.c:4865
       do_sys_openat2+0x10d/0x1e0 fs/open.c:1366
       do_sys_open fs/open.c:1372 [inline]
       __do_sys_openat fs/open.c:1388 [inline]
       __se_sys_openat fs/open.c:1383 [inline]
       __x64_sys_openat+0x12d/0x210 fs/open.c:1383
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #3 (set->srcu){.+.+}-{0:0}:
       srcu_lock_sync include/linux/srcu.h:199 [inline]
       __synchronize_srcu+0xa2/0x300 kernel/rcu/srcutree.c:1481
       blk_mq_wait_quiesce_done block/blk-mq.c:284 [inline]
       blk_mq_wait_quiesce_done block/blk-mq.c:281 [inline]
       blk_mq_quiesce_queue block/blk-mq.c:304 [inline]
       blk_mq_quiesce_queue+0x149/0x1c0 block/blk-mq.c:299
       elevator_switch+0x17b/0x7e0 block/elevator.c:576
       elevator_change+0x352/0x530 block/elevator.c:681
       elevator_set_default+0x29e/0x360 block/elevator.c:754
       blk_register_queue+0x412/0x590 block/blk-sysfs.c:946
       __add_disk+0x73f/0xe40 block/genhd.c:528
       add_disk_fwnode+0x118/0x5c0 block/genhd.c:597
       add_disk include/linux/blkdev.h:785 [inline]
       nbd_dev_add+0x77a/0xb10 drivers/block/nbd.c:1984
       nbd_init+0x291/0x2b0 drivers/block/nbd.c:2692
       do_one_initcall+0x11d/0x760 init/main.c:1382
       do_initcall_level init/main.c:1444 [inline]
       do_initcalls init/main.c:1460 [inline]
       do_basic_setup init/main.c:1479 [inline]
       kernel_init_freeable+0x6e5/0x7a0 init/main.c:1692
       kernel_init+0x1f/0x1e0 init/main.c:1582
       ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #2 (&q->elevator_lock){+.+.}-{4:4}:
       __mutex_lock_common kernel/locking/mutex.c:614 [inline]
       __mutex_lock+0x1a2/0x1b90 kernel/locking/mutex.c:776
       elevator_change+0x1bc/0x530 block/elevator.c:679
       elevator_set_none+0x92/0xf0 block/elevator.c:769
       blk_mq_elv_switch_none block/blk-mq.c:5110 [inline]
       __blk_mq_update_nr_hw_queues block/blk-mq.c:5155 [inline]
       blk_mq_update_nr_hw_queues+0x4c1/0x15f0 block/blk-mq.c:5220
       nbd_start_device+0x1a6/0xbd0 drivers/block/nbd.c:1489
       nbd_genl_connect+0xff2/0x1a40 drivers/block/nbd.c:2239
       genl_family_rcv_msg_doit+0x214/0x300 net/netlink/genetlink.c:1114
       genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
       genl_rcv_msg+0x560/0x800 net/netlink/genetlink.c:1209
       netlink_rcv_skb+0x159/0x420 net/netlink/af_netlink.c:2550
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
       netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
       netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
       netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
       sock_sendmsg_nosec net/socket.c:727 [inline]
       __sock_sendmsg net/socket.c:742 [inline]
       ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2592
       ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
       __sys_sendmsg+0x170/0x220 net/socket.c:2678
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

-> #1 (&q->q_usage_counter(io)#51){++++}-{0:0}:
       blk_alloc_queue+0x610/0x790 block/blk-core.c:461
       blk_mq_alloc_queue+0x174/0x290 block/blk-mq.c:4429
       __blk_mq_alloc_disk+0x29/0x120 block/blk-mq.c:4476
       nbd_dev_add+0x492/0xb10 drivers/block/nbd.c:1954
       nbd_init+0x291/0x2b0 drivers/block/nbd.c:2692
       do_one_initcall+0x11d/0x760 init/main.c:1382
       do_initcall_level init/main.c:1444 [inline]
       do_initcalls init/main.c:1460 [inline]
       do_basic_setup init/main.c:1479 [inline]
       kernel_init_freeable+0x6e5/0x7a0 init/main.c:1692
       kernel_init+0x1f/0x1e0 init/main.c:1582
       ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

-> #0 (fs_reclaim){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3165 [inline]
       check_prevs_add kernel/locking/lockdep.c:3284 [inline]
       validate_chain kernel/locking/lockdep.c:3908 [inline]
       __lock_acquire+0x14b8/0x2630 kernel/locking/lockdep.c:5237
       lock_acquire kernel/locking/lockdep.c:5868 [inline]
       lock_acquire+0x1cf/0x380 kernel/locking/lockdep.c:5825
       __fs_reclaim_acquire mm/page_alloc.c:4348 [inline]
       fs_reclaim_acquire+0xc4/0x100 mm/page_alloc.c:4362
       might_alloc include/linux/sched/mm.h:317 [inline]
       slab_pre_alloc_hook mm/slub.c:4489 [inline]
       slab_alloc_node mm/slub.c:4843 [inline]
       kmem_cache_alloc_lru_noprof+0x51/0x6e0 mm/slub.c:4885
       sock_alloc_inode+0x25/0x1c0 net/socket.c:322
       alloc_inode+0x68/0x250 fs/inode.c:347
       new_inode_pseudo include/linux/fs.h:3003 [inline]
       sock_alloc+0x44/0x280 net/socket.c:637
       __sock_create+0xc2/0x860 net/socket.c:1569
       mptcp_subflow_create_socket+0xec/0xa30 net/mptcp/subflow.c:1790
       __mptcp_subflow_connect+0x3c6/0x1480 net/mptcp/subflow.c:1631
       mptcp_pm_create_subflow_or_signal_addr+0xc3e/0x18d0 net/mptcp/pm_kernel.c:416
       mptcp_pm_nl_fullmesh net/mptcp/pm_kernel.c:1460 [inline]
       mptcp_pm_nl_set_flags_all net/mptcp/pm_kernel.c:1487 [inline]
       mptcp_pm_nl_set_flags+0x814/0xd30 net/mptcp/pm_kernel.c:1551
       mptcp_pm_set_flags net/mptcp/pm_netlink.c:277 [inline]
       mptcp_pm_nl_set_flags_doit+0x1b0/0x290 net/mptcp/pm_netlink.c:282
       genl_family_rcv_msg_doit+0x214/0x300 net/netlink/genetlink.c:1114
       genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
       genl_rcv_msg+0x560/0x800 net/netlink/genetlink.c:1209
       netlink_rcv_skb+0x159/0x420 net/netlink/af_netlink.c:2550
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
       netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
       netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
       netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
       sock_sendmsg_nosec net/socket.c:727 [inline]
       __sock_sendmsg net/socket.c:742 [inline]
       ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2592
       ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
       __sys_sendmsg+0x170/0x220 net/socket.c:2678
       do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
       do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  fs_reclaim --> &nsock->tx_lock --> sk_lock-AF_INET

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_INET);
                               lock(&nsock->tx_lock);
                               lock(sk_lock-AF_INET);
  lock(fs_reclaim);

 *** DEADLOCK ***

3 locks held by syz.8.6915/29783:
 #0: ffffffff906c06f0 (cb_lock){++++}-{4:4}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1217
 #1: ffffffff906c07a8 (genl_mutex){+.+.}-{4:4}, at: genl_lock net/netlink/genetlink.c:35 [inline]
 #1: ffffffff906c07a8 (genl_mutex){+.+.}-{4:4}, at: genl_op_lock net/netlink/genetlink.c:60 [inline]
 #1: ffffffff906c07a8 (genl_mutex){+.+.}-{4:4}, at: genl_op_lock net/netlink/genetlink.c:57 [inline]
 #1: ffffffff906c07a8 (genl_mutex){+.+.}-{4:4}, at: genl_rcv_msg+0x57b/0x800 net/netlink/genetlink.c:1208
 #2: ffff888034761ae0 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1709 [inline]
 #2: ffff888034761ae0 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_pm_nl_set_flags_all net/mptcp/pm_kernel.c:1482 [inline]
 #2: ffff888034761ae0 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_pm_nl_set_flags+0x605/0xd30 net/mptcp/pm_kernel.c:1551

stack backtrace:
CPU: 3 UID: 0 PID: 29783 Comm: syz.8.6915 Tainted: G             L      syzkaller #0 PREEMPT(full) 
Tainted: [L]=SOFTLOCKUP
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
 print_circular_bug.cold+0x178/0x1c7 kernel/locking/lockdep.c:2043
 check_noncircular+0x146/0x160 kernel/locking/lockdep.c:2175
 check_prev_add kernel/locking/lockdep.c:3165 [inline]
 check_prevs_add kernel/locking/lockdep.c:3284 [inline]
 validate_chain kernel/locking/lockdep.c:3908 [inline]
 __lock_acquire+0x14b8/0x2630 kernel/locking/lockdep.c:5237
 lock_acquire kernel/locking/lockdep.c:5868 [inline]
 lock_acquire+0x1cf/0x380 kernel/locking/lockdep.c:5825
 __fs_reclaim_acquire mm/page_alloc.c:4348 [inline]
 fs_reclaim_acquire+0xc4/0x100 mm/page_alloc.c:4362
 might_alloc include/linux/sched/mm.h:317 [inline]
 slab_pre_alloc_hook mm/slub.c:4489 [inline]
 slab_alloc_node mm/slub.c:4843 [inline]
 kmem_cache_alloc_lru_noprof+0x51/0x6e0 mm/slub.c:4885
 sock_alloc_inode+0x25/0x1c0 net/socket.c:322
 alloc_inode+0x68/0x250 fs/inode.c:347
 new_inode_pseudo include/linux/fs.h:3003 [inline]
 sock_alloc+0x44/0x280 net/socket.c:637
 __sock_create+0xc2/0x860 net/socket.c:1569
 mptcp_subflow_create_socket+0xec/0xa30 net/mptcp/subflow.c:1790
 __mptcp_subflow_connect+0x3c6/0x1480 net/mptcp/subflow.c:1631
 mptcp_pm_create_subflow_or_signal_addr+0xc3e/0x18d0 net/mptcp/pm_kernel.c:416
 mptcp_pm_nl_fullmesh net/mptcp/pm_kernel.c:1460 [inline]
 mptcp_pm_nl_set_flags_all net/mptcp/pm_kernel.c:1487 [inline]
 mptcp_pm_nl_set_flags+0x814/0xd30 net/mptcp/pm_kernel.c:1551
 mptcp_pm_set_flags net/mptcp/pm_netlink.c:277 [inline]
 mptcp_pm_nl_set_flags_doit+0x1b0/0x290 net/mptcp/pm_netlink.c:282
 genl_family_rcv_msg_doit+0x214/0x300 net/netlink/genetlink.c:1114
 genl_family_rcv_msg net/netlink/genetlink.c:1194 [inline]
 genl_rcv_msg+0x560/0x800 net/netlink/genetlink.c:1209
 netlink_rcv_skb+0x159/0x420 net/netlink/af_netlink.c:2550
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1218
 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
 netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
 netlink_sendmsg+0x8b0/0xda0 net/netlink/af_netlink.c:1894
 sock_sendmsg_nosec net/socket.c:727 [inline]
 __sock_sendmsg net/socket.c:742 [inline]
 ____sys_sendmsg+0x9e1/0xb70 net/socket.c:2592
 ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646
 __sys_sendmsg+0x170/0x220 net/socket.c:2678
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f989e39c819
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f989f2e9028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f989e615fa0 RCX: 00007f989e39c819
RDX: 0000000000000000 RSI: 0000200000000400 RDI: 000000000000000a
RBP: 00007f989e432c91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f989e616038 R14: 00007f989e615fa0 R15: 00007ffe13f4f538
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply

* [PATCH] net: Optimize flush calculation in inet_gro_receive()
From: Helge Deller @ 2026-04-10 14:43 UTC (permalink / raw)
  To: netdev, linux-kernel, David S. Miller, David Ahern; +Cc: linux-parisc

For the calculation of the flush variable, use the get_unaligned_xxx() helpers
to access only relevant bits of the IP header.

Note: Since I don't know the network details, I'm not sure if "& ~IP_DF"
(& ~0x4000) is correct, or if "& IP_OFFSET" (& 0x1FFF) should be used instead
(which I believe would be more correct). Instead of possibly breaking things I
left it as is, but maybe some expert can check?

Signed-off-by: Helge Deller <deller@gmx.de>

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index c7731e300a44..58cad2687c2c 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1479,7 +1479,7 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 	struct sk_buff *p;
 	unsigned int hlen;
 	unsigned int off;
-	int flush = 1;
+	u16 flush = 1;
 	int proto;
 
 	off = skb_gro_offset(skb);
@@ -1504,7 +1504,8 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 		goto out;
 
 	NAPI_GRO_CB(skb)->proto = proto;
-	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (ntohl(*(__be32 *)&iph->id) & ~IP_DF));
+	flush = (get_unaligned_be16(&iph->tot_len) ^ skb_gro_len(skb)) |
+	        (get_unaligned_be16(&iph->frag_off) & ~IP_DF);
 
 	list_for_each_entry(p, head, list) {
 		struct iphdr *iph2;

^ permalink raw reply related

* Re: [PATCH net-next] net: phy: call phy_init_hw() in phy resume path
From: Russell King (Oracle) @ 2026-04-10 14:51 UTC (permalink / raw)
  To: Biju
  Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Ovidiu Panait, netdev, linux-kernel,
	Geert Uytterhoeven, Prabhakar Mahadev Lad, linux-renesas-soc,
	Biju Das
In-Reply-To: <20260410142904.439666-1-biju.das.jz@bp.renesas.com>

On Fri, Apr 10, 2026 at 03:29:01PM +0100, Biju wrote:
> From: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
> 
> When mac_managed_pm flag is set, mdio_bus_phy_resume() is skipped, so
> phy_init_hw(), which performs soft_reset and config_init, is not called
> during resume.
> 
> This is inconsistent with the non-mac_managed_pm path, where
> mdio_bus_phy_resume() calls phy_init_hw() before phy_resume() on every
> resume.
> 
> To align both paths, add a phy_init_hw() call at the top of
> __phy_resume(), before invoking the driver's resume callback. This
> guarantees the PHY undergoes soft reset and re-initialization regardless
> of whether PM is managed by the MAC or the MDIO bus.
> 
> Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
> ---
>  drivers/net/phy/phy_device.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 0edff47478c2..8255f4208d66 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2008,6 +2008,10 @@ int __phy_resume(struct phy_device *phydev)
>  	if (!phydrv || !phydrv->resume)
>  		return 0;
>  
> +	ret = phy_init_hw(phydev);
> +	if (ret)
> +		return ret;

Do we want to do this even when phydrv->resume is NULL?

Apart from that, looks fine to me - it seems some paths call
phy_init_hw() can be called with or without phydev->lock held, and
this one will call it with the lock held which seems to be okay.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH net-next] net: phy: call phy_init_hw() in phy resume path
From: Russell King (Oracle) @ 2026-04-10 14:55 UTC (permalink / raw)
  To: Biju
  Cc: Andrew Lunn, Heiner Kallweit, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Ovidiu Panait, netdev, linux-kernel,
	Geert Uytterhoeven, Prabhakar Mahadev Lad, linux-renesas-soc,
	Biju Das
In-Reply-To: <adkOZl4gt5UoGv-0@shell.armlinux.org.uk>

On Fri, Apr 10, 2026 at 03:51:18PM +0100, Russell King (Oracle) wrote:
> On Fri, Apr 10, 2026 at 03:29:01PM +0100, Biju wrote:
> > From: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
> > 
> > When mac_managed_pm flag is set, mdio_bus_phy_resume() is skipped, so
> > phy_init_hw(), which performs soft_reset and config_init, is not called
> > during resume.
> > 
> > This is inconsistent with the non-mac_managed_pm path, where
> > mdio_bus_phy_resume() calls phy_init_hw() before phy_resume() on every
> > resume.
> > 
> > To align both paths, add a phy_init_hw() call at the top of
> > __phy_resume(), before invoking the driver's resume callback. This
> > guarantees the PHY undergoes soft reset and re-initialization regardless
> > of whether PM is managed by the MAC or the MDIO bus.
> > 
> > Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
> > Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
> > ---
> >  drivers/net/phy/phy_device.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > index 0edff47478c2..8255f4208d66 100644
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -2008,6 +2008,10 @@ int __phy_resume(struct phy_device *phydev)
> >  	if (!phydrv || !phydrv->resume)
> >  		return 0;
> >  
> > +	ret = phy_init_hw(phydev);
> > +	if (ret)
> > +		return ret;
> 
> Do we want to do this even when phydrv->resume is NULL?

I should've also added (sorry, busy packing) - with it always being
called even when phydrv->resume is NULL, it means that the call sites
to phy_resume() in phylib which are preceeded by a call to
phy_init_hw() should have that call removed, otherwise we're going to
be calling phy_init_hw() twice.

As the patch currently stands, that's the case when phydrv->resume
is populated, and I think we should avoid that.

> Apart from that, looks fine to me - it seems some paths call
> phy_init_hw() can be called with or without phydev->lock held, and
> this one will call it with the lock held which seems to be okay.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH net-next v6 00/14] net: sleepable ndo_set_rx_mode
From: Stanislav Fomichev @ 2026-04-10 15:09 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Stanislav Fomichev, netdev, davem, edumazet, pabeni
In-Reply-To: <20260409204443.37c57907@kernel.org>

On 04/09, Jakub Kicinski wrote:
> On Tue,  7 Apr 2026 08:30:47 -0700 Stanislav Fomichev wrote:
> > This series adds a new ndo_set_rx_mode_async callback that enables
> > drivers to handle address list updates in a sleepable context. The
> > current ndo_set_rx_mode is called under the netif_addr_lock spinlock
> > with BHs disabled, which prevents drivers from sleeping. This is
> > problematic for ops-locked drivers that need to sleep.
> 
> I merged the queue leasing, now netkit warns :(

No worries, will rebase/test/fix and repost!

^ permalink raw reply

* [PATCH net-next v3 00/12] BIG TCP for UDP tunnels
From: Alice Mikityanska @ 2026-04-10 15:09 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xin Long, Willem de Bruijn, David Ahern,
	Nikolay Aleksandrov
  Cc: Shuah Khan, Stanislav Fomichev, Andrew Lunn, Simon Horman,
	Florian Westphal, netdev, Alice Mikityanska

From: Alice Mikityanska <alice@isovalent.com>

This series is a follow-up to "BIG TCP without HBH in IPv6", and it adds
support for BIG TCP IPv4/IPv6 workloads in vxlan and geneve. Now that
IPv6 BIG TCP doesn't require stripping the HBH in all various
combinations in tunneled traffic, adding BIG TCP becomes feasible.

Patches 01-03 are small fixups to some related code that I'm changing in
the series.

Patch 04 adds accessors for the length field in the UDP header, as
suggested by Paolo in review. The usage of udp_set_len is then added in
the following patches that start using length=0 in BIG TCP UDP packets.

Patches 05-07 close the gaps that prevent BIG TCP packets from going
through UDP tunnel code.

Patch 08 re-adds proper validation of malformed packets that arrive with
length=0 from the wire.

Patch 09 is for proper formatting in tcpdump (set UDP len to 0 rather
than a trimmed value on overflow).

Patches 10-11 bump up tso_max_size for VXLAN and GENEVE.

Patch 12 adds selftests.

Thanks all!

v3 changes: Fixed the redirect in the selftest, rebased over my L2TP fix
[1] for the syzbot report [2].

[1]: https://lore.kernel.org/netdev/20260403174949.843941-1-alice.kernel@fastmail.im/
[2]: https://lore.kernel.org/netdev/69a1dfba.050a0220.3a55be.0026.GAE@google.com/

v2: https://lore.kernel.org/all/20260226201600.222044-1-alice.kernel@fastmail.im/

v2 changes: Addressed the review comments: added UDP len helpers,
consolidated UDP len sanity checks in patch 08 into one, added
selftests. Added fixups to related code (patch 01-03).

v1: https://lore.kernel.org/netdev/20250923134742.1399800-1-maxtram95@gmail.com/

Alice Mikityanska (11):
  net/sched: act_csum: don't mangle UDP tunnel GSO packets
  udp: gso: Simplify handling length in GSO_PARTIAL
  geneve: Fix off-by-one comparing with GRO_LEGACY_MAX_SIZE
  net: Use helpers to get/set UDP len tree-wide
  net: Enable BIG TCP with partial GSO
  udp: Support gro_ipv4_max_size > 65536
  udp: Support BIG TCP GSO packets where they can occur
  udp: Validate UDP length in udp_gro_receive
  udp: Set length in UDP header to 0 for big GSO packets
  vxlan: Enable BIG TCP packets
  selftests: net: Add a test for BIG TCP in UDP tunnels

Daniel Borkmann (1):
  geneve: Enable BIG TCP packets

 drivers/infiniband/core/lag.c                 |   2 +-
 drivers/infiniband/sw/rxe/rxe_net.c           |   4 +-
 drivers/net/amt.c                             |   6 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   2 +-
 drivers/net/ethernet/intel/iavf/iavf_txrx.c   |   2 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c     |   2 +-
 drivers/net/ethernet/intel/idpf/idpf_txrx.c   |   2 +-
 .../marvell/octeontx2/nic/otx2_txrx.c         |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |   4 +-
 .../ethernet/mellanox/mlx5/core/en_selftest.c |   2 +-
 drivers/net/ethernet/sfc/falcon/selftest.c    |   4 +-
 drivers/net/ethernet/sfc/selftest.c           |   4 +-
 drivers/net/ethernet/sfc/siena/selftest.c     |   4 +-
 drivers/net/ethernet/sfc/tc_encap_actions.c   |   2 +-
 .../stmicro/stmmac/stmmac_selftests.c         |   4 +-
 drivers/net/geneve.c                          |   6 +-
 drivers/net/netdevsim/dev.c                   |   2 +-
 drivers/net/netdevsim/psample.c               |   2 +-
 drivers/net/netdevsim/psp.c                   |   8 +-
 drivers/net/vxlan/vxlan_core.c                |   2 +
 drivers/net/wireguard/receive.c               |   2 +-
 include/linux/udp.h                           |  16 ++
 include/trace/events/icmp.h                   |   2 +-
 lib/tests/blackhole_dev_kunit.c               |   2 +-
 net/6lowpan/nhc_udp.c                         |  10 +-
 net/core/netpoll.c                            |   2 +-
 net/core/pktgen.c                             |   4 +-
 net/core/selftests.c                          |   4 +-
 net/core/skbuff.c                             |  10 +-
 net/core/tso.c                                |   3 +-
 net/ipv4/esp4.c                               |   2 +-
 net/ipv4/fou_core.c                           |   2 +-
 net/ipv4/ipconfig.c                           |   6 +-
 net/ipv4/netfilter/nf_nat_snmp_basic_main.c   |   4 +-
 net/ipv4/route.c                              |   2 +-
 net/ipv4/udp.c                                |   8 +-
 net/ipv4/udp_offload.c                        |  58 +++----
 net/ipv4/udp_tunnel_core.c                    |   2 +-
 net/ipv6/esp6.c                               |   5 +-
 net/ipv6/fou6.c                               |   2 +-
 net/ipv6/ip6_udp_tunnel.c                     |   2 +-
 net/ipv6/udp.c                                |   3 +-
 net/ipv6/udp_offload.c                        |   2 +-
 net/l2tp/l2tp_core.c                          |   2 +-
 net/netfilter/ipvs/ip_vs_xmit.c               |   2 +-
 net/netfilter/nf_conntrack_proto_udp.c        |  19 ++-
 net/netfilter/nf_log_syslog.c                 |   2 +-
 net/netfilter/nf_nat_helper.c                 |   2 +-
 net/psp/psp_main.c                            |   2 +-
 net/sched/act_csum.c                          |  12 +-
 net/xfrm/xfrm_nat_keepalive.c                 |   2 +-
 tools/testing/selftests/net/Makefile          |   1 +
 .../testing/selftests/net/big_tcp_tunnels.sh  | 145 ++++++++++++++++++
 53 files changed, 296 insertions(+), 112 deletions(-)
 create mode 100755 tools/testing/selftests/net/big_tcp_tunnels.sh

-- 
2.53.0


^ permalink raw reply

* [PATCH net-next v3 01/12] net/sched: act_csum: don't mangle UDP tunnel GSO packets
From: Alice Mikityanska @ 2026-04-10 15:09 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xin Long, Willem de Bruijn, David Ahern,
	Nikolay Aleksandrov
  Cc: Shuah Khan, Stanislav Fomichev, Andrew Lunn, Simon Horman,
	Florian Westphal, netdev, Alice Mikityanska
In-Reply-To: <20260410150943.993350-1-alice.kernel@fastmail.im>

From: Alice Mikityanska <alice@isovalent.com>

Similar to commit add641e7dee3 ("sched: act_csum: don't mangle TCP and
UDP GSO packets"), UDP tunnel GSO packets going through act_csum
shouldn't have their checksum calculated at this point, because it will
be done after segmentation. Setting the checksum in act_csum modifies
skb->ip_summed and prevents inner IP csum offload from kicking in,
resulting in a packet with a bad checksum.

Add UDP tunnel GSO packets to the exceptions, and also add UDP GSO
(SKB_GSO_UDP_L4), as the same logic as in the commit mentioned above
applies to UDP GSO too.

Signed-off-by: Alice Mikityanska <alice@isovalent.com>
---
 net/sched/act_csum.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index a9e4635d899e..078d3a27130b 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -259,7 +259,9 @@ static int tcf_csum_ipv4_udp(struct sk_buff *skb, unsigned int ihl,
 	const struct iphdr *iph;
 	u16 ul;
 
-	if (skb_is_gso(skb) && skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
+	if (skb_is_gso(skb) && skb_shinfo(skb)->gso_type &
+	    (SKB_GSO_UDP | SKB_GSO_UDP_L4 |
+	     SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM))
 		return 1;
 
 	/*
@@ -315,7 +317,9 @@ static int tcf_csum_ipv6_udp(struct sk_buff *skb, unsigned int ihl,
 	const struct ipv6hdr *ip6h;
 	u16 ul;
 
-	if (skb_is_gso(skb) && skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
+	if (skb_is_gso(skb) && skb_shinfo(skb)->gso_type &
+	    (SKB_GSO_UDP | SKB_GSO_UDP_L4 |
+	     SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM))
 		return 1;
 
 	/*
-- 
2.53.0


^ permalink raw reply related

* [PATCH net-next v3 02/12] udp: gso: Simplify handling length in GSO_PARTIAL
From: Alice Mikityanska @ 2026-04-10 15:09 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xin Long, Willem de Bruijn, David Ahern,
	Nikolay Aleksandrov
  Cc: Shuah Khan, Stanislav Fomichev, Andrew Lunn, Simon Horman,
	Florian Westphal, netdev, Alice Mikityanska, Gal Pressman
In-Reply-To: <20260410150943.993350-1-alice.kernel@fastmail.im>

From: Alice Mikityanska <alice@isovalent.com>

Taking further the idea of commit b10b446ce7ad ("udp: gso: Use single
MSS length in UDP header for GSO_PARTIAL"), simplify the implementation
and fix the checksum (apparently ignored by hardware anyway).

The mentioned commit started using msslen for uh->len, but still uses
newlen to adjust uh->check. If the formula for check is fixed, newlen is
assigned but never used before the loop, and newlen is overwritten after
the loop. This makes msslen not really necessary, as we can reuse
newlen, if we don't adjust mss before. The adjustment of mss can be
simply dropped, because mss is not used anywhere else below.

This brings us back to one variable, drops an unneeded arithmetic for
mss, and fixes the UDP checksum.

Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Cc: Gal Pressman <gal@nvidia.com>
---
 net/ipv4/udp_offload.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index a0813d425b71..2578aa7f9ff9 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -482,11 +482,11 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 	struct sock *sk = gso_skb->sk;
 	unsigned int sum_truesize = 0;
 	struct sk_buff *segs, *seg;
-	__be16 newlen, msslen;
 	struct udphdr *uh;
 	unsigned int mss;
 	bool copy_dtor;
 	__sum16 check;
+	__be16 newlen;
 	int ret = 0;
 
 	mss = skb_shinfo(gso_skb)->gso_size;
@@ -555,15 +555,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 		return segs;
 	}
 
-	msslen = htons(sizeof(*uh) + mss);
-
-	/* GSO partial and frag_list segmentation only requires splitting
-	 * the frame into an MSS multiple and possibly a remainder, both
-	 * cases return a GSO skb. So update the mss now.
-	 */
-	if (skb_is_gso(segs))
-		mss *= skb_shinfo(segs)->gso_segs;
-
 	seg = segs;
 	uh = udp_hdr(seg);
 
@@ -586,7 +577,7 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 		if (!seg->next)
 			break;
 
-		uh->len = msslen;
+		uh->len = newlen;
 		uh->check = check;
 
 		if (seg->ip_summed == CHECKSUM_PARTIAL)
-- 
2.53.0


^ permalink raw reply related

* [PATCH net-next v3 03/12] geneve: Fix off-by-one comparing with GRO_LEGACY_MAX_SIZE
From: Alice Mikityanska @ 2026-04-10 15:09 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xin Long, Willem de Bruijn, David Ahern,
	Nikolay Aleksandrov
  Cc: Shuah Khan, Stanislav Fomichev, Andrew Lunn, Simon Horman,
	Florian Westphal, netdev, Alice Mikityanska
In-Reply-To: <20260410150943.993350-1-alice.kernel@fastmail.im>

From: Alice Mikityanska <alice@isovalent.com>

GRO_LEGACY_MAX_SIZE = 65536; total_len being 65536 is too big to fit
into a u16. As can be seen in skb_gro_receive, packets bigger or equal
to gro_max_size (or GRO_LEGACY_MAX_SIZE) are dropped with -E2BIG. Apply
the same boundary to geneve_post_decap_hint to avoid writing 65536 to a
16-bit iph->tot_len field with an overflow.

Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path")
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
---
 drivers/net/geneve.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index c6563367d382..84e8d6c69172 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -603,7 +603,7 @@ static int geneve_post_decap_hint(const struct sock *sk, struct sk_buff *skb,
 	ipv6h = (void *)skb->data + gro_hint->nested_nh_offset;
 	iph = (struct iphdr *)ipv6h;
 	total_len = skb->len - gro_hint->nested_nh_offset;
-	if (total_len > GRO_LEGACY_MAX_SIZE)
+	if (total_len >= GRO_LEGACY_MAX_SIZE)
 		return -E2BIG;
 
 	/*
-- 
2.53.0


^ permalink raw reply related

* [PATCH net-next v3 04/12] net: Use helpers to get/set UDP len tree-wide
From: Alice Mikityanska @ 2026-04-10 15:09 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Xin Long, Willem de Bruijn, David Ahern,
	Nikolay Aleksandrov
  Cc: Shuah Khan, Stanislav Fomichev, Andrew Lunn, Simon Horman,
	Florian Westphal, netdev, Alice Mikityanska
In-Reply-To: <20260410150943.993350-1-alice.kernel@fastmail.im>

From: Alice Mikityanska <alice@isovalent.com>

Since BIG TCP for UDP tunnels will start using len=0 in the UDP header
as an indicator of a GSO packet bigger than 65535 bytes, this commit
introduces the following getter and setters to use tree-wide, in order
to explicitly mark places where len=0 may be expected, and handle them
properly:

1. udp_get_len_short() returns len in host byte order: to be used on the
RX side to deal with non-aggregated packets, or to access the raw value
of the len field.

2. udp_set_len() sets uh->len to its real value if it's not bigger than
65535, and to 0 otherwise: to be used in GSO context with aggregated
packets.

3. udp_set_len_short() is to be used when the length is known to fit 16
bits. It WARNs when the caller tries to assign a bigger value if
CONFIG_DEBUG_NET=y.

At the moment udp_set_len() is not used, a following commit will start
using it after enabling len>65535 for GSO.

Raw uh->len (in network byte order) is still accessed in a few places
for checksum calculation purposes, and in __udp6_lib_rcv and nsim_do_psp
to decode len=0 (__udp4_lib_rcv will be modified to parse len=0 in the
corresponding commit).

Signed-off-by: Alice Mikityanska <alice@isovalent.com>
---
 drivers/infiniband/core/lag.c                 |  2 +-
 drivers/infiniband/sw/rxe/rxe_net.c           |  4 +-
 drivers/net/amt.c                             |  6 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  2 +-
 drivers/net/ethernet/intel/iavf/iavf_txrx.c   |  2 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c     |  2 +-
 drivers/net/ethernet/intel/idpf/idpf_txrx.c   |  2 +-
 .../marvell/octeontx2/nic/otx2_txrx.c         |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  4 +-
 .../ethernet/mellanox/mlx5/core/en_selftest.c |  2 +-
 drivers/net/ethernet/sfc/falcon/selftest.c    |  4 +-
 drivers/net/ethernet/sfc/selftest.c           |  4 +-
 drivers/net/ethernet/sfc/siena/selftest.c     |  4 +-
 drivers/net/ethernet/sfc/tc_encap_actions.c   |  2 +-
 .../stmicro/stmmac/stmmac_selftests.c         |  4 +-
 drivers/net/geneve.c                          |  2 +-
 drivers/net/netdevsim/dev.c                   |  2 +-
 drivers/net/netdevsim/psample.c               |  2 +-
 drivers/net/netdevsim/psp.c                   |  8 ++--
 drivers/net/wireguard/receive.c               |  2 +-
 include/linux/udp.h                           | 16 ++++++++
 include/trace/events/icmp.h                   |  2 +-
 lib/tests/blackhole_dev_kunit.c               |  2 +-
 net/6lowpan/nhc_udp.c                         | 10 ++---
 net/core/netpoll.c                            |  2 +-
 net/core/pktgen.c                             |  4 +-
 net/core/selftests.c                          |  4 +-
 net/core/tso.c                                |  3 +-
 net/ipv4/esp4.c                               |  2 +-
 net/ipv4/fou_core.c                           |  2 +-
 net/ipv4/ipconfig.c                           |  6 +--
 net/ipv4/netfilter/nf_nat_snmp_basic_main.c   |  4 +-
 net/ipv4/route.c                              |  2 +-
 net/ipv4/udp.c                                |  3 +-
 net/ipv4/udp_offload.c                        | 37 +++++++++----------
 net/ipv4/udp_tunnel_core.c                    |  2 +-
 net/ipv6/esp6.c                               |  5 ++-
 net/ipv6/fou6.c                               |  2 +-
 net/ipv6/ip6_udp_tunnel.c                     |  2 +-
 net/ipv6/udp.c                                |  3 +-
 net/ipv6/udp_offload.c                        |  2 +-
 net/l2tp/l2tp_core.c                          |  2 +-
 net/netfilter/ipvs/ip_vs_xmit.c               |  2 +-
 net/netfilter/nf_conntrack_proto_udp.c        | 17 +++++++--
 net/netfilter/nf_log_syslog.c                 |  2 +-
 net/netfilter/nf_nat_helper.c                 |  2 +-
 net/psp/psp_main.c                            |  2 +-
 net/sched/act_csum.c                          |  4 +-
 net/xfrm/xfrm_nat_keepalive.c                 |  2 +-
 49 files changed, 121 insertions(+), 89 deletions(-)

diff --git a/drivers/infiniband/core/lag.c b/drivers/infiniband/core/lag.c
index 8fd80adfe833..00fe241737ff 100644
--- a/drivers/infiniband/core/lag.c
+++ b/drivers/infiniband/core/lag.c
@@ -36,7 +36,7 @@ static struct sk_buff *rdma_build_skb(struct net_device *netdev,
 	uh->source =
 		htons(rdma_flow_label_to_udp_sport(ah_attr->grh.flow_label));
 	uh->dest = htons(ROCE_V2_UDP_DPORT);
-	uh->len = htons(sizeof(struct udphdr));
+	udp_set_len_short(uh, sizeof(struct udphdr));
 
 	if (is_ipv4) {
 		skb_push(skb, sizeof(struct iphdr));
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index cbc646a30003..e9fbfa6af3ba 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -237,7 +237,7 @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	pkt->port_num = 1;
 	pkt->hdr = (u8 *)(udph + 1);
 	pkt->mask = RXE_GRH_MASK;
-	pkt->paylen = be16_to_cpu(udph->len) - sizeof(*udph);
+	pkt->paylen = udp_get_len_short(udph) - sizeof(*udph);
 
 	/* remove udp header */
 	skb_pull(skb, sizeof(struct udphdr));
@@ -300,7 +300,7 @@ static void prepare_udp_hdr(struct sk_buff *skb, __be16 src_port,
 
 	udph->dest = dst_port;
 	udph->source = src_port;
-	udph->len = htons(skb->len);
+	udp_set_len_short(udph, skb->len);
 	udph->check = 0;
 }
 
diff --git a/drivers/net/amt.c b/drivers/net/amt.c
index f2f3139e38a5..01511eca7d84 100644
--- a/drivers/net/amt.c
+++ b/drivers/net/amt.c
@@ -667,7 +667,7 @@ static void amt_send_discovery(struct amt_dev *amt)
 	udph		= udp_hdr(skb);
 	udph->source	= amt->gw_port;
 	udph->dest	= amt->relay_port;
-	udph->len	= htons(sizeof(*udph) + sizeof(*amtd));
+	udp_set_len_short(udph, sizeof(*udph) + sizeof(*amtd));
 	udph->check	= 0;
 	offset = skb_transport_offset(skb);
 	skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
@@ -758,7 +758,7 @@ static void amt_send_request(struct amt_dev *amt, bool v6)
 	udph		= udp_hdr(skb);
 	udph->source	= amt->gw_port;
 	udph->dest	= amt->relay_port;
-	udph->len	= htons(sizeof(*amtrh) + sizeof(*udph));
+	udp_set_len_short(udph, sizeof(*amtrh) + sizeof(*udph));
 	udph->check	= 0;
 	offset = skb_transport_offset(skb);
 	skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
@@ -2608,7 +2608,7 @@ static void amt_send_advertisement(struct amt_dev *amt, __be32 nonce,
 	udph		= udp_hdr(skb);
 	udph->source	= amt->relay_port;
 	udph->dest	= dport;
-	udph->len	= htons(sizeof(*amta) + sizeof(*udph));
+	udp_set_len_short(udph, sizeof(*amta) + sizeof(*udph));
 	udph->check	= 0;
 	offset = skb_transport_offset(skb);
 	skb->csum = skb_checksum(skb, offset, skb->len - offset, 0);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 894f2d06d39d..ef5e657816f0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -3129,7 +3129,7 @@ static int i40e_tso(struct i40e_tx_buffer *first, u8 *hdr_len,
 					 SKB_GSO_UDP_TUNNEL_CSUM)) {
 		if (!(skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL) &&
 		    (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM)) {
-			l4.udp->len = 0;
+			udp_set_len_short(l4.udp, 0);
 
 			/* determine offset of outer transport header */
 			l4_offset = l4.hdr - skb->data;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 363c42bf3dcf..c30abf17cf5d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1774,7 +1774,7 @@ static int iavf_tso(struct iavf_tx_buffer *first, u8 *hdr_len,
 					 SKB_GSO_UDP_TUNNEL_CSUM)) {
 		if (!(skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL) &&
 		    (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM)) {
-			l4.udp->len = 0;
+			udp_set_len_short(l4.udp, 0);
 
 			/* determine offset of outer transport header */
 			l4_offset = l4.hdr - skb->data;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index a2cd4cf37734..bb74e9f567ec 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1884,7 +1884,7 @@ int ice_tso(struct ice_tx_buf *first, struct ice_tx_offload_params *off)
 					 SKB_GSO_UDP_TUNNEL_CSUM)) {
 		if (!(skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL) &&
 		    (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM)) {
-			l4.udp->len = 0;
+			udp_set_len_short(l4.udp, 0);
 
 			/* determine offset of outer transport header */
 			l4_start = (u8)(l4.hdr - skb->data);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index f6b3b15364ff..276296e321ed 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -2871,7 +2871,7 @@ int idpf_tso(struct sk_buff *skb, struct idpf_tx_offload_params *off)
 				     (__force __wsum)htonl(paylen));
 		/* compute length of segmentation header */
 		off->tso_hdr_len = sizeof(struct udphdr) + l4_start;
-		l4.udp->len = htons(shinfo->gso_size + sizeof(struct udphdr));
+		udp_set_len_short(l4.udp, shinfo->gso_size + sizeof(struct udphdr));
 		break;
 	default:
 		return -EINVAL;
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
index 625bb5a05344..8d2d607bc92f 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
@@ -750,7 +750,7 @@ static void otx2_sqe_add_ext(struct otx2_nic *pfvf, struct otx2_snd_queue *sq,
 				ext->lso_format = pfvf->hw.lso_udpv6_idx;
 			}
 
-			udph->len = htons(sizeof(struct udphdr));
+			udp_set_len_short(udph, sizeof(struct udphdr));
 		}
 	} else if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) {
 		ext->tstmp = 1;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5b60aa47c75b..fdd5f35bac73 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1081,7 +1081,7 @@ static void mlx5e_shampo_update_ipv4_udp_hdr(struct mlx5e_rq *rq, struct iphdr *
 	struct udphdr *uh;
 
 	uh = (struct udphdr *)(skb->data + udp_off);
-	uh->len = htons(skb->len - udp_off);
+	udp_set_len_short(uh, skb->len - udp_off);
 
 	if (uh->check)
 		uh->check = ~udp_v4_check(skb->len - udp_off, ipv4->saddr,
@@ -1100,7 +1100,7 @@ static void mlx5e_shampo_update_ipv6_udp_hdr(struct mlx5e_rq *rq, struct ipv6hdr
 	struct udphdr *uh;
 
 	uh = (struct udphdr *)(skb->data + udp_off);
-	uh->len = htons(skb->len - udp_off);
+	udp_set_len_short(uh, skb->len - udp_off);
 
 	if (uh->check)
 		uh->check = ~udp_v6_check(skb->len - udp_off, &ipv6->saddr,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index accc26d1a872..1dcdb86690bb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -113,7 +113,7 @@ static struct sk_buff *mlx5e_test_get_udp_skb(struct mlx5e_priv *priv)
 	/* Fill UDP header */
 	udph->source = htons(9);
 	udph->dest = htons(9); /* Discard Protocol */
-	udph->len = htons(sizeof(struct mlx5ehdr) + sizeof(struct udphdr));
+	udp_set_len_short(udph, sizeof(struct mlx5ehdr) + sizeof(struct udphdr));
 	udph->check = 0;
 
 	/* Fill IP header */
diff --git a/drivers/net/ethernet/sfc/falcon/selftest.c b/drivers/net/ethernet/sfc/falcon/selftest.c
index db4dd7fb77f5..4d29e0baf2eb 100644
--- a/drivers/net/ethernet/sfc/falcon/selftest.c
+++ b/drivers/net/ethernet/sfc/falcon/selftest.c
@@ -401,8 +401,8 @@ static void ef4_iterate_state(struct ef4_nic *efx)
 
 	/* Initialise udp header */
 	payload->udp.source = 0;
-	payload->udp.len = htons(sizeof(*payload) -
-				 offsetof(struct ef4_loopback_payload, udp));
+	udp_set_len_short(&payload->udp, sizeof(*payload) -
+			  offsetof(struct ef4_loopback_payload, udp));
 	payload->udp.check = 0;	/* checksum ignored */
 
 	/* Fill out payload */
diff --git a/drivers/net/ethernet/sfc/selftest.c b/drivers/net/ethernet/sfc/selftest.c
index 8ec76329237a..dc716feb79cb 100644
--- a/drivers/net/ethernet/sfc/selftest.c
+++ b/drivers/net/ethernet/sfc/selftest.c
@@ -398,8 +398,8 @@ static void efx_iterate_state(struct efx_nic *efx)
 
 	/* Initialise udp header */
 	payload->udp.source = 0;
-	payload->udp.len = htons(sizeof(*payload) -
-				 offsetof(struct efx_loopback_payload, udp));
+	udp_set_len_short(&payload->udp, sizeof(*payload) -
+			  offsetof(struct efx_loopback_payload, udp));
 	payload->udp.check = 0;	/* checksum ignored */
 
 	/* Fill out payload */
diff --git a/drivers/net/ethernet/sfc/siena/selftest.c b/drivers/net/ethernet/sfc/siena/selftest.c
index 930643612df5..c74cf5131364 100644
--- a/drivers/net/ethernet/sfc/siena/selftest.c
+++ b/drivers/net/ethernet/sfc/siena/selftest.c
@@ -399,8 +399,8 @@ static void efx_iterate_state(struct efx_nic *efx)
 
 	/* Initialise udp header */
 	payload->udp.source = 0;
-	payload->udp.len = htons(sizeof(*payload) -
-				 offsetof(struct efx_loopback_payload, udp));
+	udp_set_len_short(&payload->udp, sizeof(*payload) -
+			  offsetof(struct efx_loopback_payload, udp));
 	payload->udp.check = 0;	/* checksum ignored */
 
 	/* Fill out payload */
diff --git a/drivers/net/ethernet/sfc/tc_encap_actions.c b/drivers/net/ethernet/sfc/tc_encap_actions.c
index db222abef53b..c2ad3a358d20 100644
--- a/drivers/net/ethernet/sfc/tc_encap_actions.c
+++ b/drivers/net/ethernet/sfc/tc_encap_actions.c
@@ -311,7 +311,7 @@ static void efx_gen_tun_header_udp(struct efx_tc_encap_action *encap, u8 len)
 	encap->encap_hdr_len += sizeof(*udp);
 
 	udp->dest = key->tp_dst;
-	udp->len = cpu_to_be16(sizeof(*udp) + len);
+	udp_set_len_short(udp, sizeof(*udp) + len);
 }
 
 static void efx_gen_tun_header_vxlan(struct efx_tc_encap_action *encap)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c
index a0c75886587c..29e824bd90ca 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_selftests.c
@@ -154,9 +154,9 @@ static struct sk_buff *stmmac_test_get_udp_skb(struct stmmac_priv *priv,
 	} else {
 		uhdr->source = htons(attr->sport);
 		uhdr->dest = htons(attr->dport);
-		uhdr->len = htons(sizeof(*shdr) + sizeof(*uhdr) + attr->size);
+		udp_set_len_short(uhdr, sizeof(*shdr) + sizeof(*uhdr) + attr->size);
 		if (attr->max_size)
-			uhdr->len = htons(attr->max_size -
+			udp_set_len_short(uhdr, attr->max_size -
 					  (sizeof(*ihdr) + sizeof(*ehdr)));
 		uhdr->check = 0;
 	}
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 84e8d6c69172..dc3a405e0e0c 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -630,7 +630,7 @@ static int geneve_post_decap_hint(const struct sock *sk, struct sk_buff *skb,
 
 	/* Adjust the nested UDP header len and checksum. */
 	uh = udp_hdr(skb);
-	uh->len = htons(skb->len - gro_hint->nested_tp_offset);
+	udp_set_len_short(uh, skb->len - gro_hint->nested_tp_offset);
 	if (uh->check) {
 		len = skb->len - gro_hint->nested_nh_offset;
 		skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 1e06e781c835..1e0b50fdbf74 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -845,7 +845,7 @@ static struct sk_buff *nsim_dev_trap_skb_build(void)
 	udph = skb_put_zero(skb, sizeof(struct udphdr) + data_len);
 	get_random_bytes(&udph->source, sizeof(u16));
 	get_random_bytes(&udph->dest, sizeof(u16));
-	udph->len = htons(sizeof(struct udphdr) + data_len);
+	udp_set_len_short(udph, sizeof(struct udphdr) + data_len);
 
 	return skb;
 }
diff --git a/drivers/net/netdevsim/psample.c b/drivers/net/netdevsim/psample.c
index 717d157c3ae2..1e71c3da4def 100644
--- a/drivers/net/netdevsim/psample.c
+++ b/drivers/net/netdevsim/psample.c
@@ -73,7 +73,7 @@ static struct sk_buff *nsim_dev_psample_skb_build(void)
 	udph = skb_put_zero(skb, sizeof(struct udphdr) + data_len);
 	get_random_bytes(&udph->source, sizeof(u16));
 	get_random_bytes(&udph->dest, sizeof(u16));
-	udph->len = htons(sizeof(struct udphdr) + data_len);
+	udp_set_len_short(udph, sizeof(struct udphdr) + data_len);
 
 	return skb;
 }
diff --git a/drivers/net/netdevsim/psp.c b/drivers/net/netdevsim/psp.c
index 0b4d717253b0..e81b69d6a577 100644
--- a/drivers/net/netdevsim/psp.c
+++ b/drivers/net/netdevsim/psp.c
@@ -84,6 +84,7 @@ nsim_do_psp(struct sk_buff *skb, struct netdevsim *ns,
 		struct iphdr *iph;
 		struct udphdr *uh;
 		__wsum csum;
+		u16 udplen;
 
 		/* Do not decapsulate. Receive the skb with the udp and psp
 		 * headers still there as if this is a normal udp packet.
@@ -91,19 +92,20 @@ nsim_do_psp(struct sk_buff *skb, struct netdevsim *ns,
 		 * provide a valid checksum here, so the skb isn't dropped.
 		 */
 		uh = udp_hdr(skb);
+		udplen = ntohs(uh->len) ?: skb->len;
 		csum = skb_checksum(skb, skb_transport_offset(skb),
-				    ntohs(uh->len), 0);
+				    udplen, 0);
 
 		switch (skb->protocol) {
 		case htons(ETH_P_IP):
 			iph = ip_hdr(skb);
-			uh->check = udp_v4_check(ntohs(uh->len), iph->saddr,
+			uh->check = udp_v4_check(udplen, iph->saddr,
 						 iph->daddr, csum);
 			break;
 #if IS_ENABLED(CONFIG_IPV6)
 		case htons(ETH_P_IPV6):
 			ip6h = ipv6_hdr(skb);
-			uh->check = udp_v6_check(ntohs(uh->len), &ip6h->saddr,
+			uh->check = udp_v6_check(udplen, &ip6h->saddr,
 						 &ip6h->daddr, csum);
 			break;
 #endif
diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c
index eb8851113654..275fe1bc994c 100644
--- a/drivers/net/wireguard/receive.c
+++ b/drivers/net/wireguard/receive.c
@@ -62,7 +62,7 @@ static int prepare_skb_header(struct sk_buff *skb, struct wg_device *wg)
 		 * to have UDP fields.
 		 */
 		return -EINVAL;
-	data_len = ntohs(udp->len);
+	data_len = udp_get_len_short(udp); /* GRO not expected here. */
 	if (unlikely(data_len < sizeof(struct udphdr) ||
 		     data_len > skb->len - data_offset))
 		/* UDP packet is reporting too small of a size or lying about
diff --git a/include/linux/udp.h b/include/linux/udp.h
index ce56ebcee5cb..fe3abbec2cb5 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -23,6 +23,22 @@ static inline struct udphdr *udp_hdr(const struct sk_buff *skb)
 	return (struct udphdr *)skb_transport_header(skb);
 }
 
+static inline unsigned int udp_get_len_short(const struct udphdr *uh)
+{
+	return ntohs(uh->len);
+}
+
+static inline void udp_set_len(struct udphdr *uh, unsigned int len)
+{
+	uh->len = len < GRO_LEGACY_MAX_SIZE ? htons(len) : 0;
+}
+
+static inline void udp_set_len_short(struct udphdr *uh, unsigned int len)
+{
+	DEBUG_NET_WARN_ON_ONCE(len >= GRO_LEGACY_MAX_SIZE);
+	uh->len = htons(len);
+}
+
 #define UDP_HTABLE_SIZE_MIN_PERNET	128
 #define UDP_HTABLE_SIZE_MIN		(IS_ENABLED(CONFIG_BASE_SMALL) ? 128 : 256)
 #define UDP_HTABLE_SIZE_MAX		65536
diff --git a/include/trace/events/icmp.h b/include/trace/events/icmp.h
index 31559796949a..09ae115099df 100644
--- a/include/trace/events/icmp.h
+++ b/include/trace/events/icmp.h
@@ -44,7 +44,7 @@ TRACE_EVENT(icmp_send,
 			} else {
 				__entry->sport = ntohs(uh->source);
 				__entry->dport = ntohs(uh->dest);
-				__entry->ulen = ntohs(uh->len);
+				__entry->ulen = udp_get_len_short(uh);
 			}
 
 			p32 = (__be32 *) __entry->saddr;
diff --git a/lib/tests/blackhole_dev_kunit.c b/lib/tests/blackhole_dev_kunit.c
index 06834ab35f43..fa3e0533038d 100644
--- a/lib/tests/blackhole_dev_kunit.c
+++ b/lib/tests/blackhole_dev_kunit.c
@@ -46,7 +46,7 @@ static void test_blackholedev(struct kunit *test)
 	uh = (struct udphdr *)skb_push(skb, sizeof(struct udphdr));
 	skb_set_transport_header(skb, 0);
 	uh->source = uh->dest = htons(UDP_PORT);
-	uh->len = htons(data_len);
+	udp_set_len_short(uh, data_len);
 	uh->check = 0;
 	/* (Network) IPv6 */
 	ip6h = (struct ipv6hdr *)skb_push(skb, sizeof(struct ipv6hdr));
diff --git a/net/6lowpan/nhc_udp.c b/net/6lowpan/nhc_udp.c
index 0a506c77283d..ed4227e6db74 100644
--- a/net/6lowpan/nhc_udp.c
+++ b/net/6lowpan/nhc_udp.c
@@ -88,16 +88,16 @@ static int udp_uncompress(struct sk_buff *skb, size_t needed)
 	switch (lowpan_dev(skb->dev)->lltype) {
 	case LOWPAN_LLTYPE_IEEE802154:
 		if (lowpan_802154_cb(skb)->d_size)
-			uh.len = htons(lowpan_802154_cb(skb)->d_size -
-				       sizeof(struct ipv6hdr));
+			udp_set_len_short(&uh, lowpan_802154_cb(skb)->d_size -
+					  sizeof(struct ipv6hdr));
 		else
-			uh.len = htons(skb->len + sizeof(struct udphdr));
+			udp_set_len_short(&uh, skb->len + sizeof(struct udphdr));
 		break;
 	default:
-		uh.len = htons(skb->len + sizeof(struct udphdr));
+		udp_set_len_short(&uh, skb->len + sizeof(struct udphdr));
 		break;
 	}
-	pr_debug("uncompressed UDP length: src = %d", ntohs(uh.len));
+	pr_debug("uncompressed UDP length: src = %d", udp_get_len_short(&uh));
 
 	/* replace the compressed UDP head by the uncompressed UDP
 	 * header
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index cd74beffd209..b6ea6975b55b 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -474,7 +474,7 @@ static void push_udp(struct netpoll *np, struct sk_buff *skb, int len)
 	udph = udp_hdr(skb);
 	udph->source = htons(np->local_port);
 	udph->dest = htons(np->remote_port);
-	udph->len = htons(udp_len);
+	udp_set_len_short(udph, udp_len);
 
 	netpoll_udp_checksum(np, skb, len);
 }
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 8e185b318288..5b4dd04d6124 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3005,7 +3005,7 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 
 	udph->source = htons(pkt_dev->cur_udp_src);
 	udph->dest = htons(pkt_dev->cur_udp_dst);
-	udph->len = htons(datalen + 8);	/* DATA + udphdr */
+	udp_set_len_short(udph, datalen + 8);	/* DATA + udphdr */
 	udph->check = 0;
 
 	iph->ihl = 5;
@@ -3138,7 +3138,7 @@ static struct sk_buff *fill_packet_ipv6(struct net_device *odev,
 	udplen = datalen + sizeof(struct udphdr);
 	udph->source = htons(pkt_dev->cur_udp_src);
 	udph->dest = htons(pkt_dev->cur_udp_dst);
-	udph->len = htons(udplen);
+	udp_set_len_short(udph, udplen);
 	udph->check = 0;
 
 	*(__be32 *) iph = htonl(0x60000000);	/* Version + flow */
diff --git a/net/core/selftests.c b/net/core/selftests.c
index 0a203d3fb9dc..36b949ae520b 100644
--- a/net/core/selftests.c
+++ b/net/core/selftests.c
@@ -72,9 +72,9 @@ struct sk_buff *net_test_get_skb(struct net_device *ndev, u8 id,
 	} else {
 		uhdr->source = htons(attr->sport);
 		uhdr->dest = htons(attr->dport);
-		uhdr->len = htons(sizeof(*shdr) + sizeof(*uhdr) + attr->size);
+		udp_set_len_short(uhdr, sizeof(*shdr) + sizeof(*uhdr) + attr->size);
 		if (attr->max_size)
-			uhdr->len = htons(attr->max_size -
+			udp_set_len_short(uhdr, attr->max_size -
 					  (sizeof(*ihdr) + sizeof(*ehdr)));
 		uhdr->check = 0;
 	}
diff --git a/net/core/tso.c b/net/core/tso.c
index 6df997b9076e..3cc5a03e7a12 100644
--- a/net/core/tso.c
+++ b/net/core/tso.c
@@ -38,7 +38,8 @@ void tso_build_hdr(const struct sk_buff *skb, char *hdr, struct tso_t *tso,
 	} else {
 		struct udphdr *uh = (struct udphdr *)hdr;
 
-		uh->len = htons(sizeof(*uh) + size);
+		/* size is after segmentation. */
+		udp_set_len_short(uh, sizeof(*uh) + size);
 	}
 }
 EXPORT_SYMBOL(tso_build_hdr);
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 6dfc0bcdef65..df04a407e778 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -320,7 +320,7 @@ static struct ip_esp_hdr *esp_output_udp_encap(struct sk_buff *skb,
 	uh = (struct udphdr *)esp->esph;
 	uh->source = sport;
 	uh->dest = dport;
-	uh->len = htons(len);
+	udp_set_len_short(uh, len);
 	uh->check = 0;
 
 	/* For IPv4 ESP with UDP encapsulation, if xo is not null, the skb is in the crypto offload
diff --git a/net/ipv4/fou_core.c b/net/ipv4/fou_core.c
index 5bae3cf7fe76..e66e10a2c33f 100644
--- a/net/ipv4/fou_core.c
+++ b/net/ipv4/fou_core.c
@@ -1043,7 +1043,7 @@ static void fou_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e,
 
 	uh->dest = e->dport;
 	uh->source = sport;
-	uh->len = htons(skb->len);
+	udp_set_len_short(uh, skb->len);
 	udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb,
 		     fl4->saddr, fl4->daddr, skb->len);
 
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index a35ffedacc7c..155db067eaec 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -847,7 +847,7 @@ static void __init ic_bootp_send_if(struct ic_device *d, unsigned long jiffies_d
 	/* Construct UDP header */
 	b->udph.source = htons(68);
 	b->udph.dest = htons(67);
-	b->udph.len = htons(sizeof(struct bootp_pkt) - sizeof(struct iphdr));
+	udp_set_len_short(&b->udph, sizeof(struct bootp_pkt) - sizeof(struct iphdr));
 	/* UDP checksum not calculated -- explicitly allowed in BOOTP RFC */
 
 	/* Construct DHCP/BOOTP header */
@@ -1025,10 +1025,10 @@ static int __init ic_bootp_recv(struct sk_buff *skb, struct net_device *dev, str
 	if (b->udph.source != htons(67) || b->udph.dest != htons(68))
 		goto drop;
 
-	if (ntohs(h->tot_len) < ntohs(b->udph.len) + sizeof(struct iphdr))
+	if (ntohs(h->tot_len) < udp_get_len_short(&b->udph) + sizeof(struct iphdr))
 		goto drop;
 
-	len = ntohs(b->udph.len) - sizeof(struct udphdr);
+	len = udp_get_len_short(&b->udph) - sizeof(struct udphdr);
 	ext_len = len - (sizeof(*b) -
 			 sizeof(struct iphdr) -
 			 sizeof(struct udphdr) -
diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic_main.c b/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
index 717b726504fe..afe0f4a328d0 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic_main.c
@@ -127,7 +127,7 @@ static int snmp_translate(struct nf_conn *ct, int dir, struct sk_buff *skb)
 {
 	struct iphdr *iph = ip_hdr(skb);
 	struct udphdr *udph = (struct udphdr *)((__be32 *)iph + iph->ihl);
-	u16 datalen = ntohs(udph->len) - sizeof(struct udphdr);
+	u16 datalen = udp_get_len_short(udph) - sizeof(struct udphdr);
 	char *data = (unsigned char *)udph + sizeof(struct udphdr);
 	struct snmp_ctx ctx;
 	int ret;
@@ -181,7 +181,7 @@ static int help(struct sk_buff *skb, unsigned int protoff,
 	 * enough room for a UDP header.  Just verify the UDP length field so we
 	 * can mess around with the payload.
 	 */
-	if (ntohs(udph->len) != skb->len - (iph->ihl << 2)) {
+	if (udp_get_len_short(udph) != skb->len - (iph->ihl << 2)) {
 		nf_ct_helper_log(skb, ct, "dropping malformed packet\n");
 		return NF_DROP;
 	}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index bc1296f0ea69..9fa130a847df 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3190,7 +3190,7 @@ static struct sk_buff *inet_rtm_getroute_build_skb(__be32 src, __be32 dst,
 		udph = skb_put_zero(skb, sizeof(struct udphdr));
 		udph->source = sport;
 		udph->dest = dport;
-		udph->len = htons(sizeof(struct udphdr));
+		udp_set_len_short(udph, sizeof(struct udphdr));
 		udph->check = 0;
 		break;
 	}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ab415de32443..43e1cf8d32e3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1107,7 +1107,8 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
 	uh = udp_hdr(skb);
 	uh->source = inet_sk(sk)->inet_sport;
 	uh->dest = fl4->fl4_dport;
-	uh->len = htons(len);
+	/* Datagram length checked in udp_sendmsg. */
+	udp_set_len_short(uh, len);
 	uh->check = 0;
 
 	if (cork->gso_size) {
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 2578aa7f9ff9..22acc80b12a4 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -279,11 +279,11 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 		 * segment instead of the entire frame.
 		 */
 		if (gso_partial && skb_is_gso(skb)) {
-			uh->len = htons(skb_shinfo(skb)->gso_size +
-					SKB_GSO_CB(skb)->data_offset +
-					skb->head - (unsigned char *)uh);
+			udp_set_len_short(uh, skb_shinfo(skb)->gso_size +
+					  SKB_GSO_CB(skb)->data_offset +
+					  skb->head - (unsigned char *)uh);
 		} else {
-			uh->len = htons(len);
+			udp_set_len_short(uh, len);
 		}
 
 		if (!need_csum)
@@ -468,7 +468,7 @@ static struct sk_buff *__udp_gso_segment_list(struct sk_buff *skb,
 	if (IS_ERR(skb))
 		return skb;
 
-	udp_hdr(skb)->len = htons(sizeof(struct udphdr) + mss);
+	udp_set_len_short(udp_hdr(skb), sizeof(struct udphdr) + mss);
 
 	if (is_ipv6)
 		return __udpv6_gso_segment_list_csum(skb);
@@ -486,8 +486,8 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 	unsigned int mss;
 	bool copy_dtor;
 	__sum16 check;
-	__be16 newlen;
 	int ret = 0;
+	u16 newlen;
 
 	mss = skb_shinfo(gso_skb)->gso_size;
 	if (gso_skb->len <= sizeof(*uh) + mss)
@@ -564,8 +564,8 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 			(skb_shinfo(gso_skb)->tx_flags & SKBTX_ANY_TSTAMP);
 
 	/* compute checksum adjustment based on old length versus new */
-	newlen = htons(sizeof(*uh) + mss);
-	check = csum16_add(csum16_sub(uh->check, uh->len), newlen);
+	newlen = sizeof(*uh) + mss;
+	check = csum16_add(csum16_sub(uh->check, uh->len), htons(newlen));
 
 	for (;;) {
 		if (copy_dtor) {
@@ -577,7 +577,7 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 		if (!seg->next)
 			break;
 
-		uh->len = newlen;
+		udp_set_len_short(uh, newlen);
 		uh->check = check;
 
 		if (seg->ip_summed == CHECKSUM_PARTIAL)
@@ -591,11 +591,10 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 	}
 
 	/* last packet can be partial gso_size, account for that in checksum */
-	newlen = htons(skb_tail_pointer(seg) - skb_transport_header(seg) +
-		       seg->data_len);
-	check = csum16_add(csum16_sub(uh->check, uh->len), newlen);
+	newlen = skb_tail_pointer(seg) - skb_transport_header(seg) + seg->data_len;
+	check = csum16_add(csum16_sub(uh->check, uh->len), htons(newlen));
 
-	uh->len = newlen;
+	udp_set_len_short(uh, newlen);
 	uh->check = check;
 
 	if (seg->ip_summed == CHECKSUM_PARTIAL)
@@ -706,7 +705,7 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 	}
 
 	/* Do not deal with padded or malicious packets, sorry ! */
-	ulen = ntohs(uh->len);
+	ulen = udp_get_len_short(uh);
 	if (ulen <= sizeof(*uh) || ulen != skb_gro_len(skb)) {
 		NAPI_GRO_CB(skb)->flush = 1;
 		return NULL;
@@ -739,7 +738,7 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 		 * On len mismatch merge the first packet shorter than gso_size,
 		 * otherwise complete the GRO packet.
 		 */
-		if (ulen > ntohs(uh2->len) || flush) {
+		if (ulen > udp_get_len_short(uh2) || flush) {
 			pp = p;
 		} else {
 			if (NAPI_GRO_CB(skb)->is_flist) {
@@ -762,7 +761,7 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 			}
 		}
 
-		if (ret || ulen != ntohs(uh2->len) ||
+		if (ret || ulen != udp_get_len_short(uh2) ||
 		    NAPI_GRO_CB(p)->count >= UDP_GRO_CNT_MAX)
 			pp = p;
 
@@ -912,12 +911,12 @@ static int udp_gro_complete_segment(struct sk_buff *skb)
 int udp_gro_complete(struct sk_buff *skb, int nhoff,
 		     udp_lookup_t lookup)
 {
-	__be16 newlen = htons(skb->len - nhoff);
+	unsigned int newlen = skb->len - nhoff;
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 	struct sock *sk;
 	int err;
 
-	uh->len = newlen;
+	udp_set_len_short(uh, newlen);
 
 	sk = INDIRECT_CALL_INET(lookup, udp6_lib_lookup_skb,
 				udp4_lib_lookup_skb, skb, uh->source, uh->dest);
@@ -954,7 +953,7 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 
 	/* do fraglist only if there is no outer UDP encap (or we already processed it) */
 	if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
-		uh->len = htons(skb->len - nhoff);
+		udp_set_len_short(uh, skb->len - nhoff);
 
 		skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
 		skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
diff --git a/net/ipv4/udp_tunnel_core.c b/net/ipv4/udp_tunnel_core.c
index b1f667c52cb2..18f789d9383e 100644
--- a/net/ipv4/udp_tunnel_core.c
+++ b/net/ipv4/udp_tunnel_core.c
@@ -184,7 +184,7 @@ void udp_tunnel_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb
 
 	uh->dest = dst_port;
 	uh->source = src_port;
-	uh->len = htons(skb->len);
+	udp_set_len_short(uh, skb->len);
 
 	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
 
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 9f75313734f8..1d71a95d48b8 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -227,7 +227,8 @@ static void esp_output_encap_csum(struct sk_buff *skb)
 	if (*skb_mac_header(skb) == IPPROTO_UDP) {
 		struct udphdr *uh = udp_hdr(skb);
 		struct ipv6hdr *ip6h = ipv6_hdr(skb);
-		int len = ntohs(uh->len);
+		/* esp6_output_udp_encap limits len to U16_MAX. */
+		int len = udp_get_len_short(uh);
 		unsigned int offset = skb_transport_offset(skb);
 		__wsum csum = skb_checksum(skb, offset, skb->len - offset, 0);
 
@@ -355,7 +356,7 @@ static struct ip_esp_hdr *esp6_output_udp_encap(struct sk_buff *skb,
 	uh = (struct udphdr *)esp->esph;
 	uh->source = sport;
 	uh->dest = dport;
-	uh->len = htons(len);
+	udp_set_len_short(uh, len);
 	uh->check = 0;
 
 	*skb_mac_header(skb) = IPPROTO_UDP;
diff --git a/net/ipv6/fou6.c b/net/ipv6/fou6.c
index 157765259e2f..588929409241 100644
--- a/net/ipv6/fou6.c
+++ b/net/ipv6/fou6.c
@@ -30,7 +30,7 @@ static void fou6_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e,
 
 	uh->dest = e->dport;
 	uh->source = sport;
-	uh->len = htons(skb->len);
+	udp_set_len_short(uh, skb->len);
 	udp6_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM6), skb,
 		      &fl6->saddr, &fl6->daddr, skb->len);
 
diff --git a/net/ipv6/ip6_udp_tunnel.c b/net/ipv6/ip6_udp_tunnel.c
index 405ef1cb8864..43e94a3efb26 100644
--- a/net/ipv6/ip6_udp_tunnel.c
+++ b/net/ipv6/ip6_udp_tunnel.c
@@ -93,7 +93,7 @@ void udp_tunnel6_xmit_skb(struct dst_entry *dst, struct sock *sk,
 	uh->dest = dst_port;
 	uh->source = src_port;
 
-	uh->len = htons(skb->len);
+	udp_set_len_short(uh, skb->len);
 
 	skb_dst_set(skb, dst);
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d7cf4c9508b2..04c4adeb6688 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1367,7 +1367,8 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
 	uh = udp_hdr(skb);
 	uh->source = fl6->fl6_sport;
 	uh->dest = fl6->fl6_dport;
-	uh->len = htons(len);
+	/* Datagram length checked in udpv6_sendmsg. */
+	udp_set_len_short(uh, len);
 	uh->check = 0;
 
 	if (cork->gso_size) {
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 778afc7453ce..c92cf5ee3e6a 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -171,7 +171,7 @@ int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 
 	/* do fraglist only if there is no outer UDP encap (or we already processed it) */
 	if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
-		uh->len = htons(skb->len - nhoff);
+		udp_set_len_short(uh, skb->len - nhoff);
 
 		skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
 		skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 157fc23ce4e1..0ed18164bfb7 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1295,7 +1295,7 @@ static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb, uns
 			ret = NET_XMIT_DROP;
 			goto out_unlock;
 		}
-		uh->len = htons(udp_len);
+		udp_set_len_short(uh, udp_len);
 
 		/* Calculate UDP checksum if configured to do so */
 #if IS_ENABLED(CONFIG_IPV6)
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 0fb5162992e5..b460998e348e 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -1089,7 +1089,7 @@ ipvs_gue_encap(struct net *net, struct sk_buff *skb,
 	dport = cp->dest->tun_port;
 	udph->dest = dport;
 	udph->source = sport;
-	udph->len = htons(skb->len);
+	udp_set_len_short(udph, skb->len);
 	udph->check = 0;
 
 	*next_protocol = IPPROTO_UDP;
diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
index 0030fbe8885c..e9bd1632304f 100644
--- a/net/netfilter/nf_conntrack_proto_udp.c
+++ b/net/netfilter/nf_conntrack_proto_udp.c
@@ -41,11 +41,22 @@ static void udp_error_log(const struct sk_buff *skb,
 	nf_l4proto_log_invalid(skb, state, IPPROTO_UDP, "%s", msg);
 }
 
+static bool udp_validate_len(struct sk_buff *skb,
+			     const struct udphdr *hdr,
+			     unsigned int dataoff)
+{
+	unsigned int udplen = udp_get_len_short(hdr);
+	unsigned int skblen = skb->len - dataoff;
+
+	if (udplen > skblen || udplen < sizeof(*hdr))
+		return false;
+	return true;
+}
+
 static bool udp_error(struct sk_buff *skb,
 		      unsigned int dataoff,
 		      const struct nf_hook_state *state)
 {
-	unsigned int udplen = skb->len - dataoff;
 	const struct udphdr *hdr;
 	struct udphdr _hdr;
 
@@ -57,7 +68,7 @@ static bool udp_error(struct sk_buff *skb,
 	}
 
 	/* Truncated/malformed packets */
-	if (ntohs(hdr->len) > udplen || ntohs(hdr->len) < sizeof(*hdr)) {
+	if (!udp_validate_len(skb, hdr, dataoff)) {
 		udp_error_log(skb, state, "truncated/malformed packet");
 		return true;
 	}
@@ -153,7 +164,7 @@ static bool udplite_error(struct sk_buff *skb,
 		return true;
 	}
 
-	cscov = ntohs(hdr->len);
+	cscov = udp_get_len_short(hdr);
 	if (cscov == 0) {
 		cscov = udplen;
 	} else if (cscov < sizeof(*hdr) || cscov > udplen) {
diff --git a/net/netfilter/nf_log_syslog.c b/net/netfilter/nf_log_syslog.c
index 0507d67cad27..da990e3b30f4 100644
--- a/net/netfilter/nf_log_syslog.c
+++ b/net/netfilter/nf_log_syslog.c
@@ -298,7 +298,7 @@ nf_log_dump_udp_header(struct nf_log_buf *m,
 
 	/* Max length: 20 "SPT=65535 DPT=65535 " */
 	nf_log_buf_add(m, "SPT=%u DPT=%u LEN=%u ",
-		       ntohs(uh->source), ntohs(uh->dest), ntohs(uh->len));
+		       ntohs(uh->source), ntohs(uh->dest), udp_get_len_short(uh));
 
 out:
 	return 0;
diff --git a/net/netfilter/nf_nat_helper.c b/net/netfilter/nf_nat_helper.c
index bf591e6af005..3853f41db499 100644
--- a/net/netfilter/nf_nat_helper.c
+++ b/net/netfilter/nf_nat_helper.c
@@ -161,7 +161,7 @@ nf_nat_mangle_udp_packet(struct sk_buff *skb,
 
 	/* update the length of the UDP packet */
 	datalen = skb->len - protoff;
-	udph->len = htons(datalen);
+	udp_set_len_short(udph, datalen);
 
 	/* fix udp checksum if udp checksum was previously calculated */
 	if (!udph->check && skb->ip_summed != CHECKSUM_PARTIAL)
diff --git a/net/psp/psp_main.c b/net/psp/psp_main.c
index 9508b6c38003..47491b0ce4c9 100644
--- a/net/psp/psp_main.c
+++ b/net/psp/psp_main.c
@@ -207,7 +207,7 @@ static void psp_write_headers(struct net *net, struct sk_buff *skb, __be32 spi,
 		uh->source = udp_flow_src_port(net, skb, 0, 0, false);
 	}
 	uh->check = 0;
-	uh->len = htons(udp_len);
+	udp_set_len_short(uh, udp_len);
 
 	psph->nexthdr = IPPROTO_TCP;
 	psph->hdrlen = PSP_HDRLEN_NOOPT;
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 078d3a27130b..5fff52a8ca90 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -276,7 +276,7 @@ static int tcf_csum_ipv4_udp(struct sk_buff *skb, unsigned int ihl,
 		return 0;
 
 	iph = ip_hdr(skb);
-	ul = ntohs(udph->len);
+	ul = udp_get_len_short(udph);
 
 	if (udplite || udph->check) {
 
@@ -334,7 +334,7 @@ static int tcf_csum_ipv6_udp(struct sk_buff *skb, unsigned int ihl,
 		return 0;
 
 	ip6h = ipv6_hdr(skb);
-	ul = ntohs(udph->len);
+	ul = udp_get_len_short(udph);
 
 	udph->check = 0;
 
diff --git a/net/xfrm/xfrm_nat_keepalive.c b/net/xfrm/xfrm_nat_keepalive.c
index 458931062a04..906458f3d8c5 100644
--- a/net/xfrm/xfrm_nat_keepalive.c
+++ b/net/xfrm/xfrm_nat_keepalive.c
@@ -133,7 +133,7 @@ static void nat_keepalive_send(struct nat_keepalive *ka)
 	uh = skb_push(skb, sizeof(*uh));
 	uh->source = ka->encap_sport;
 	uh->dest = ka->encap_dport;
-	uh->len = htons(skb->len);
+	udp_set_len_short(uh, skb->len);
 	uh->check = 0;
 
 	skb->mark = ka->smark;
-- 
2.53.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox