Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] netpoll: run NAPI poll in softirq context to avoid rq->lock self-deadlock
From: Jakub Kicinski @ 2026-06-17 20:21 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Peter Zijlstra, Petr Mladek, Sebastian Andrzej Siewior,
	John Ogness, Sergey Senozhatsky, Vlad Poenaru, Thomas Gleixner,
	netdev, David S . Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Clark Williams, Steven Rostedt, linux-rt-devel, linux-kernel,
	stable, Frederic Weisbecker, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, K Prateek Nayak
In-Reply-To: <ajKi4wtA8U1iZkMD@gmail.com>

On Wed, 17 Jun 2026 07:56:50 -0700 Breno Leitao wrote:
> As far as I can tell, there isn't a network driver today whose transmit
> path is completely lockless, so, even if we make netpoll lockless.
> 
> It's unlikely any NIC will ever achieve this, given that NIC TX
> fundamentally relies on a shared DMA ring and doorbell register, which
> inherently cannot be made lockless.

The lock which protects the queue is maintained by the stack,
and we trylock it. Maybe I lost the thread but if you're saying
that writes to netconsole are impossible from arbitrary context,
that is _not_ true, AFAIU. We can queue a packet and kick off 
the transfer on well-behaved drivers.

Main problem is the opportunistic freeing up of the queue space.
If we could avoid that in atomic context I think we'd be good.

^ permalink raw reply

* Re: [PATCH 6.6.y] rxrpc: Fix the ACK parser to extract the SACK table for parsing
From: Jakub Kicinski @ 2026-06-17 20:27 UTC (permalink / raw)
  To: Sasha Levin
  Cc: stable, David Howells, Michael Bommarito, Marc Dionne,
	Jeffrey Altman, Eric Dumazet, David S. Miller, Paolo Abeni,
	Simon Horman, linux-afs, netdev, stable
In-Reply-To: <20260617180410.271223-1-sashal@kernel.org>

On Wed, 17 Jun 2026 14:04:10 -0400 Sasha Levin wrote:
> Subject: [PATCH 6.6.y] rxrpc: Fix the ACK parser to extract the SACK table for parsing
> Date: Wed, 17 Jun 2026 14:04:10 -0400
> X-Mailer: git-send-email 2.53.0
> 
> From: David Howells <dhowells@redhat.com>
> 
> [ Upstream commit 333b6d5bb9f87827ac2639c737bf9613dbae7253 ]

nit: you missed the "skip patchwork" header on this?

^ permalink raw reply

* Re: [PATCH net] net: ena: clean up XDP TX queues when regular TX setup fails
From: Arthur Kiyanovski @ 2026-06-17 20:41 UTC (permalink / raw)
  To: Dawei Feng
  Cc: akiyano, darinzon, andrew+netdev, davem, edumazet, kuba, pabeni,
	ast, daniel, hawk, john.fastabend, sdf, sameehj, netdev,
	linux-kernel, bpf, jianhao.xu, stable
In-Reply-To: <20260616142424.4005130-1-dawei.feng@seu.edu.cn>

On Tue, 16 Jun 2026 22:24:24 +0800, Dawei Feng <dawei.feng@seu.edu.cn> wrote:
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 92d149d4f091..5d05020a6d05 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -2078,14 +2090,21 @@ static int create_queues_with_size_backoff(struct ena_adapter *adapter)
> [ ... skip 17 lines ... ]
> +			ena_destroy_xdp_tx_queues(adapter);
>  			goto err_create_tx_queues;
> +		}
>  
>  		rc = ena_setup_all_rx_resources(adapter);
>  		if (rc)

Thank you for submitting the fix.

I verified it on AWS.

The inline cleanup before goto is slightly non-idiomatic — kernel
style typically prefers label-based unwinding. Splitting
ena_destroy_all_tx_queues() into regular-only and XDP-only variants
would allow a clean label chain without special-case code at each call
site. But that's a larger refactor better suited for net-next; for a
targeted bug fix this is fine.

Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>

-- 
Arthur Kiyanovski <akiyano@amazon.com>

^ permalink raw reply

* Re: [PATCH net] net: rnpgbe: fix mailbox endianness handling
From: Jakub Kicinski @ 2026-06-17 20:45 UTC (permalink / raw)
  To: Yibo Dong
  Cc: Andrew Lunn, andrew+netdev, davem, edumazet, pabeni,
	vadim.fedorenko, netdev, linux-kernel, yaojun
In-Reply-To: <5249B5A1F6CD9ACB+20260617140530.GA278329@nic-Precision-5820-Tower>

On Wed, 17 Jun 2026 22:05:30 +0800 Yibo Dong wrote:
> On Wed, Jun 17, 2026 at 02:09:00PM +0200, Andrew Lunn wrote:
> > > My understanding is as follows:
> > > The firmware structures are defined with__le16 / __le32 for wire format,
> > > but the original code cast these struct pointers to u32 * before passing
> > > them to the mailbox read/write routines:
> > > - Send path: (u32 *)&req -> msg buffer -> writel()
> > > - Receive path: readl() -> msg buffer -> (u32 *)&reply
> > > Sparse only sees pure u32 = u32 assignments here, so no type mismatch is
> > > reported.  
> > 
> > Can the code be changed so that it does not need the cast? Casts are
> > bad, as you have just shown. This is something i try to push back on,
> > it makes you think about types and avoid issues like this.
> > 
> > 	Andrew
> >   
> Thinking... Yes. A few possibilities:
> 
> 1. Make all fields __le32, then extract via shifts:
>    struct mbx_fw_cmd_req {
>        __le32 word0;  // [15:0]=flags  [31:16]=opcode
>        __le32 word1;  // [15:0]=datalen [31:16]=ret_value
>        ...
>    };
>    But that's painful — le32_to_cpu(req.word0) >> 16 vs req.opcode.
> 
> 2. Use a union to keep named fields while also exposing __le32[] access:
>    union mbx_fw_cmd_req_u {
>        struct mbx_fw_cmd_req req;
>        __le32 dwords[sizeof(struct mbx_fw_cmd_req) / sizeof(__le32)];
>    };
>    union mbx_fw_cmd_reply_u {
>        struct mbx_fw_cmd_reply reply;
>        __le32 dwords[sizeof(struct mbx_fw_cmd_reply) / sizeof(__le32)];
>    };
> 
>    The transport interface becomes:
>    int mucse_write_mbx_pf(struct mucse_hw *hw, const __le32 *msg, u16 size);
>    int mucse_read_mbx_pf(struct mucse_hw *hw, __le32 *msg, u16 size);
> 
>    Callers would use:
>    union mbx_fw_cmd_req_u cmd = {};
>    cmd.req.opcode = cpu_to_le16(...);
>    cmd.req.flags  = cpu_to_le16(...);
>    mucse_write_mbx_pf(hw, cmd.dwords, sizeof(cmd.req));
> 
>    If the transport layer forgets le32_to_cpu(), sparse would catch it
>    because msg is __le32 * and mbx_data_rd32() returns u32.
> 
>    The downside is an extra union wrapper and an extra level in field
>    access (cmd.req.opcode vs req.opcode) — a minor inconvenience.
> 
> Do you have a preference between these, or another approach?
> 
> Thanks for the feedback.

3. Maybe use memcpy_toio() to transfer the data without any byteswaps?

^ permalink raw reply

* Re: [PATCH v19 net-next 00/11] nbl driver for Nebulamatrix NICs
From: Jakub Kicinski @ 2026-06-17 20:46 UTC (permalink / raw)
  To: illusion.wang
  Cc: dimon.zhao, alvin.wang, sam.chen, netdev, andrew+netdev, corbet,
	horms, linux-doc, pabeni, vadim.fedorenko, lukas.bulwahn,
	edumazet, enelsonmoore, skhan, hkallweit1, open list
In-Reply-To: <20260617044702.2439-1-illusion.wang@nebula-matrix.com>

On Wed, 17 Jun 2026 12:46:45 +0800 illusion.wang wrote:
> This patch series represents the first phase. We plan to integrate it in
> two phases: the first phase covers mailbox and chip configuration,
> while the second phase involves net dev configuration.
> Together, they will provide basic PF-based Ethernet port transmission and
> reception capabilities.

## Form letter - net-next-closed

We have already submitted our pull request with net-next material for v7.2,
and therefore net-next is closed for new drivers, features, code refactoring
and optimizations. We are currently accepting bug fixes only.

Please repost when net-next reopens after June 29th.

RFC patches sent for review only are obviously welcome at any time.

See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
-- 
pw-bot: defer
pv-bot: closed

^ permalink raw reply

* Re: [PATCH] net: stmmac: loongson1: Use dev_err_probe()
From: Jakub Kicinski @ 2026-06-17 20:54 UTC (permalink / raw)
  To: Jacob Keller
  Cc: keguang.zhang, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, linux-mips,
	netdev, linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <31630db0-85cb-421b-8ebe-bbae07521533@intel.com>

On Tue, 16 Jun 2026 16:42:18 -0700 Jacob Keller wrote:
> I'd probably also argue this may go against the desired goals of
> net-next with only wanting such cleanups when in the context of other
> larger work. Of course that decision ultimately belongs to the maintainers.

Yes, feeding const EINVAL into dev_err_probe() is pretty pointless
so if this helps it's just by "saving" 2 LoC. I'm not sure it's worth
it even in context of larger work, let along by itself.
-- 
pw-bot: reject

^ permalink raw reply

* Re: [PATCH v1 bpf-next 0/2] bpf: bpf_redirect_peer egress redirection
From: Jordan Rife @ 2026-06-17 21:10 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: Paul Chaignon, bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Stanislav Fomichev
In-Reply-To: <9da94ca2-a479-42e2-8941-b38c1a08566b@linux.dev>

> Agree.
>
>
> For the existing bpf_redirect_peer(ifindex, 0), there are two ways to
> read what 0 means:
>
> 1. If we consider the operated object to be the peer of ifindex, then 0
> means the peer does ingress.
> 2. If we consider the operated object to be ifindex itself, then 0 means
> ifindex does egress
>     (which results in its peer doing ingress).
>
> This patch's new mode operates on the peer — on the host side, we want
> to "write" to the dev inside the pod to
> make the packet look like it leaves the pod. That fits reading (1), where
> the flag describes the peer's direction: 0 is peer ingress, and this new
> mode is peer egress.
> So BPF_F_EGRESS would be the clearer name; reusing BPF_F_INGRESS for
> what is really a
> peer-egress action is what creates the ambiguity.

(2) is my original interpretation that makes BPF_F_INGRESS make sense;
you're operating on ifindex, so the flag matches the direction
relative to ifindex. Under that interpretation, 0 for "egress" (really
ingress on the peer side) and BPF_F_INGRESS for "ingress" (really
egress on the peer side) makes sense.

That said, I agree BPF_F_EGRESS is probably clearer, so I'll go with
that in the next version of the series.

^ permalink raw reply

* Re: [syzbot] [net?] WARNING in tls_err_abort
From: Jakub Kicinski @ 2026-06-17 21:14 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: syzbot, davem, edumazet, horms, john.fastabend, linux-kernel,
	netdev, pabeni, syzkaller-bugs
In-Reply-To: <ajJwP14SWmdwwFYg@krikkit>

On Wed, 17 Jun 2026 12:00:31 +0200 Sabrina Dubroca wrote:
> 2026-06-16, 23:46:28 +0200, Sabrina Dubroca wrote:
> > 2026-06-16, 14:23:59 -0700, Jakub Kicinski wrote:  
> > > In which case the question is whether we should try to remove 
> > > the sock_error() instead? (stating the obvious I guess)  
> > 
> > That would make sense, but we can't prevent sock_error() being called
> > from some helper.  
> 
> Actually, getsockopt(SO_ERROR) will also clear sk_err. If we want to
> prevent further state transitions, we'll have to use something else
> (probably a flag in tls_context set by tls_err_abort()).
> 
> So I'd go with 2 separate patches. The 2nd one will be a change in
> userspace-visible behavior, but hopefully not one they'd be upset
> about.

Seems reasonable, FWIW

^ permalink raw reply

* Re: [PATCH net-next v4 0/2] net: lan743x: add RMII support for PCI11x1x
From: Jakub Kicinski @ 2026-06-17 21:17 UTC (permalink / raw)
  To: Thangaraj Samynathan
  Cc: netdev, andrew+netdev, davem, edumazet, pabeni, horms,
	bryan.whitehead, UNGLinuxDriver, linux-kernel
In-Reply-To: <20260617053241.157932-1-thangaraj.s@microchip.com>

On Wed, 17 Jun 2026 11:02:39 +0530 Thangaraj Samynathan wrote:
> This series adds RMII interface support for the Microchip PCI11x1x
> Ethernet controller.
> 
> The PCI11x1x device supports RMII as an alternative MAC-PHY interface,
> selected via the STRAP_READ software strap register. Patch 1 reads the
> RMII strap bits from this register and sets the is_rmii_en flag. Patch 2
> uses this flag to configure the PHY interface mode, phylink supported
> interfaces, and enables RMII in hardware via the RMII_CTL register.

## Form letter - net-next-closed

We have already submitted our pull request with net-next material for v7.2,
and therefore net-next is closed for new drivers, features, code refactoring
and optimizations. We are currently accepting bug fixes only.

Please repost when net-next reopens after June 29th.

RFC patches sent for review only are obviously welcome at any time.

See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
-- 
pw-bot: defer
pv-bot: closed


^ permalink raw reply

* Re: [PATCH v27 4/5] sfc: obtain and map cxl range using devm_cxl_probe_mem
From: Alejandro Lucero Palau @ 2026-06-17 21:18 UTC (permalink / raw)
  To: Dan Williams (nvidia), alejandro.lucero-palau, linux-cxl, netdev,
	edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
In-Reply-To: <a423702c-05f6-4b4b-9ad4-fcaec2e07957@amd.com>


On 6/17/26 10:42, Alejandro Lucero Palau wrote:
>
> On 6/16/26 20:51, Dan Williams (nvidia) wrote:
>> Alejandro Lucero Palau wrote:
>>> On 6/10/26 14:56, Alejandro Lucero Palau wrote:
>>>> On 6/10/26 07:10, Alejandro Lucero Palau wrote:
>>>>> On 6/10/26 00:30, Dan Williams (nvidia) wrote:
>>>>>> alejandro.lucero-palau@ wrote:
>>>>>>> From: Alejandro Lucero <alucerop@amd.com>
>>>>>>>
>>>>>>> Use core API for safely obtain the CXL range linked to an HDM
>>>>>>> committed
>>>>>>> by the BIOS. Map such a range for being used as the ctpio buffer.
>>>>>>>
>>>>>>> A potential user space action through sysfs unbinding or core cxl
>>>>>>> modules remove will trigger sfc driver device detachment, with that
>>>>>>> case
>>>>>>> not racing with this mapping as this is done during driver probe 
>>>>>>> and
>>>>>>> therefore protected with device lock against those user space 
>>>>>>> actions.
>>>>>>>
>>>>>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>>>>>> ---
>>>>>>>    drivers/net/ethernet/sfc/efx.c     |  1 +
>>>>>>>    drivers/net/ethernet/sfc/efx_cxl.c | 24 ++++++++++++++++++++++++
>>>>>>>    drivers/net/ethernet/sfc/efx_cxl.h |  3 +++
>>>>>>>    3 files changed, 28 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/net/ethernet/sfc/efx.c
>>>>>>> b/drivers/net/ethernet/sfc/efx.c
>>>>>>> index 90ccbe310386..578054c21e79 100644
>>>>>>> --- a/drivers/net/ethernet/sfc/efx.c
>>>>>>> +++ b/drivers/net/ethernet/sfc/efx.c
>>>>>>> @@ -984,6 +984,7 @@ static void efx_pci_remove(struct pci_dev
>>>>>>> *pci_dev)
>>>>>>>        efx_fini_io(efx);
>>>>>>>          probe_data = container_of(efx, struct efx_probe_data, 
>>>>>>> efx);
>>>>>>> +    efx_cxl_exit(probe_data);
>>>>>>>          pci_dbg(efx->pci_dev, "shutdown successful\n");
>>>>>>>    diff --git a/drivers/net/ethernet/sfc/efx_cxl.c
>>>>>>> b/drivers/net/ethernet/sfc/efx_cxl.c
>>>>>>> index 4d55c08cf2a1..d5766a40e2cf 100644
>>>>>>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>>>>>>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>>>>>>> @@ -18,6 +18,7 @@ int efx_cxl_init(struct efx_probe_data 
>>>>>>> *probe_data)
>>>>>>>    {
>>>>>>>        struct efx_nic *efx = &probe_data->efx;
>>>>>>>        struct pci_dev *pci_dev = efx->pci_dev;
>>>>>>> +    struct range cxl_pio_range;
>>>>>>>        struct efx_cxl *cxl;
>>>>>>>        u16 dvsec;
>>>>>>>        int rc;
>>>>>>> @@ -75,9 +76,32 @@ int efx_cxl_init(struct efx_probe_data 
>>>>>>> *probe_data)
>>>>>>>            return -ENODEV;
>>>>>>>        }
>>>>>>>    +    cxl->cxlmd = devm_cxl_probe_mem(&cxl->cxlds, 
>>>>>>> &cxl_pio_range);
>>>>>>> +    if (IS_ERR(cxl->cxlmd)) {
>>>>>>> +        pci_err(pci_dev, "CXL accel memdev creation failed\n");
>>>>>>> +        return PTR_ERR(cxl->cxlmd);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    cxl->ctpio_cxl = ioremap_wc(cxl_pio_range.start,
>>>>>>> +                    range_len(&cxl_pio_range));
>>>>>>> +    if (!cxl->ctpio_cxl) {
>>>>>>> +        pci_err(pci_dev, "CXL ioremap region (%pra) failed\n",
>>>>>>> +            &cxl_pio_range);
>>>>>>> +        return -ENOMEM;
>>>>>> Dave caught the iounmap leak, but another concern is since you 
>>>>>> want to
>>>>>> continue operation if efx_cxl_init() fails then you probably also 
>>>>>> want
>>>>>> to release the successful attachment to the CXL domain if this 
>>>>>> happens.
>>>>>
>>>>> I will do that.
>>>>>
>>>> Looking at this issue, I think an error when creating the memdev or
>>>> during the region attach triggers the memdev removal, but ...
>>>>
>>>>
>>>>>> Minor since something else is likely to fail if ioremap is not
>>>>>> reliable.
>>>>
>>>> .. if we want to specifically do that with an unlikely (but possible)
>>>> ioremap error something else needs to be exported like
>>>> cxl_memdev_unregister(). Are you happy with that approach?
>>>>
>>> I have just tested with this:
>>>
>>> +void cxl_memdev_remove(void *_cxlmd)
>>> +{
>>> +       struct cxl_memdev *cxlmd = _cxlmd;
>>> +       struct device *dev = &cxlmd->dev;
>>> +
>>> +       devm_remove_action_nowarn(cxlmd->cxlds->dev, 
>>> cxl_memdev_unregister,
>>> +                                 cxlmd);
>>> +
>>> +       cdev_device_del(&cxlmd->cdev, dev);
>>> +       cxl_memdev_shutdown(dev);
>>> +       put_device(dev);
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_remove, "CXL");
>>>
>>>
>>> only called if the ioremap fails.
>>>
>>>
>>> Please, let me know if you like this approach before sending another
>>> version.
>> A devres group can automatically cleanup after devm_cxl_memdev_probe()
>> in the error path with no new exports needed from the CXL core.
>> Something like:
>>
>>          void *group = devres_open_group(cxl->cxlds.dev, NULL, 
>> GFP_KERNEL);
>>          int rc = 0;
>>
>>          if (!group)
>>                  return -ENOMEM;
>>                   cxl->cxlmd = devm_cxl_probe_mem(&cxl->cxlds, 
>> &cxl_pio_range);
>>          if (IS_ERR(cxl->cxlmd)) {
>>                  pci_err(pci_dev, "CXL accel memdev creation failed\n");
>>                  rc = PTR_ERR(cxl->cxlmd);
>>                  goto out;
>>          }
>>
>>          cxl->ctpio_cxl =
>>                  ioremap_wc(cxl_pio_range.start, 
>> range_len(&cxl_pio_range));
>>          if (!cxl->ctpio_cxl) {
>>                  pci_err(pci_dev, "CXL ioremap region (%pra) failed\n",
>>                          &cxl_pio_range);
>>                  rc = -ENOMEM;
>>          }
>>
>> out:
>>          if (rc)
>>                  devres_release_group(group);
>>          else
>>                  devres_remove_group(group);
>>          return rc;
>
>
> OK. I will use this in v28 instead of that export.
>

Adding this implies the full driver is detached from the device. The 
same can be obtained changing the check of efx_cxl_init() call and make 
the probe to fail if an error is returned.


I prefer to do this after discussing this internally. After all, if 
CXL.mem is there and the related initialization fails, this is not 
expected and it should be fixed.


Sending v28 shortly.



>
> Thanks
>

^ permalink raw reply

* Re: [PATCH] net: stmmac: loongson1: Use dev_err_probe()
From: Jacob Keller @ 2026-06-17 21:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: keguang.zhang, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, linux-mips,
	netdev, linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <20260617135407.6ff54e27@kernel.org>

On 6/17/2026 1:54 PM, Jakub Kicinski wrote:
> On Tue, 16 Jun 2026 16:42:18 -0700 Jacob Keller wrote:
>> I'd probably also argue this may go against the desired goals of
>> net-next with only wanting such cleanups when in the context of other
>> larger work. Of course that decision ultimately belongs to the maintainers.
> 
> Yes, feeding const EINVAL into dev_err_probe() is pretty pointless
> so if this helps it's just by "saving" 2 LoC. I'm not sure it's worth
> it even in context of larger work, let along by itself.

It does claim that it has benefit since you get the error code emitted
symbolically. But we have %pe for that. I wonder if dev_err_probe
predates %pe?

Per commit: 532888a59505 ("driver core: Better advertise dev_err_probe()"):

>     Describing the usage of dev_err_probe() as being (only?) "deemed
>     acceptable" has a bad connotation. In fact dev_err_probe() fulfills
>     three tasks:
> 
>      - handling of EPROBE_DEFER (even more than degrading to dev_dbg())
>      - symbolic output of the error code
>      - return err for compact error code paths

This was in 2023.. %pe was introduced in 2019, so I guess %pe is even older.

I personally find dev_err_probe acceptable and might find it nice when
writing new code, but I agree its not really meaningful gain to refactor
existing legacy code.

Anyways, all this to say in too many words: this patch doesn't seem to
have much value for netdev.

Thanks,
Jake

^ permalink raw reply

* Re: [PATCH v3] net: mvneta_bm: add suspend/resume support to prevent crash after resume
From: Jakub Kicinski @ 2026-06-17 21:35 UTC (permalink / raw)
  To: Yun Zhou
  Cc: marcin.s.wojtas, andrew+netdev, davem, edumazet, pabeni, netdev,
	linux-kernel
In-Reply-To: <20260616112540.4181231-1-yun.zhou@windriver.com>

On Tue, 16 Jun 2026 19:25:40 +0800 Yun Zhou wrote:
> Add a device_link (DL_FLAG_AUTOREMOVE_CONSUMER) in mvneta_probe to
> guarantee BM resumes before mvneta and suspends after mvneta.

You say "guarantee" but you pretty much ignore the failure to add it:

+				if (!device_link_add(&pdev->dev,
+						     &pp->bm_priv->pdev->dev,
+						     DL_FLAG_AUTOREMOVE_CONSUMER))
+					dev_warn(&pdev->dev,
+						 "failed to create device link to BM\n");
 			}

Why not handle this error properly ?
-- 
pw-bot: cr

^ permalink raw reply

* Re: [PATCH 6.6.y] rxrpc: Fix the ACK parser to extract the SACK table for parsing
From: Sasha Levin @ 2026-06-17 21:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: stable, David Howells, Michael Bommarito, Marc Dionne,
	Jeffrey Altman, Eric Dumazet, David S. Miller, Paolo Abeni,
	Simon Horman, linux-afs, netdev, stable
In-Reply-To: <20260617132704.0e1fe56b@kernel.org>

On Wed, Jun 17, 2026 at 01:27:04PM -0700, Jakub Kicinski wrote:
>On Wed, 17 Jun 2026 14:04:10 -0400 Sasha Levin wrote:
>> Subject: [PATCH 6.6.y] rxrpc: Fix the ACK parser to extract the SACK table for parsing
>> Date: Wed, 17 Jun 2026 14:04:10 -0400
>> X-Mailer: git-send-email 2.53.0
>>
>> From: David Howells <dhowells@redhat.com>
>>
>> [ Upstream commit 333b6d5bb9f87827ac2639c737bf9613dbae7253 ]
>
>nit: you missed the "skip patchwork" header on this?

Hey Jakub,

This one is a backport crafted in response to a failed backport of a stable
tagged commit.

I followed Greg's template to sending those backports to him, but I also think
that I do want folks to review the actual backport itself.

Do you think it makes sense to add a skip patchwork header on these too?

-- 
Thanks,
Sasha

^ permalink raw reply

* [PATCH net] net/sched: act_ct: fix nf_connlabels leak on two error paths
From: Michael Bommarito @ 2026-06-17 21:57 UTC (permalink / raw)
  To: Jamal Hadi Salim, Jiri Pirko
  Cc: Pablo Neira Ayuso, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, linux-kernel

tcf_ct_fill_params() calls nf_connlabels_get() (setting put_labels) when
TCA_CT_LABELS is present, but two later error sites use a bare return
instead of "goto err", skipping the err: nf_connlabels_put() cleanup.
They also precede the "p->put_labels = put_labels" assignment, so the
tcf_ct_params_free() fallback does not release the count either. Each
failed RTM_NEWACTION on these paths leaks one nf_connlabels reference:
net->ct.labels_used is incremented and never released. The action is
reachable with CAP_NET_ADMIN over the netns, i.e. from an unprivileged
user namespace on default-userns kernels.

Impact: an unprivileged user with CAP_NET_ADMIN over a network namespace
(e.g. via user namespaces) leaks one nf_connlabels reference per failed
RTM_NEWACTION on the two error paths; net->ct.labels_used is never
released.

The err: label is safe to reach from both sites: p->tmpl is still NULL
there (kzalloc'd, not yet assigned) and nf_ct_put(NULL) is a no-op, so
no inline release is needed.

Fixes: 70f06c115bcc ("sched: act_ct: switch to per-action label counting")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
Testing: refcount/counter leak (CWE-772); no sanitizer for this class, so
the oracle is the nf_connlabels accounting counter net->ct.labels_used.

Reproduction (UML, before/after, same trigger): CONFIG_NET_ACT_CT=y,
NF_CONNTRACK_LABELS=y, NF_CONNTRACK_ZONES=n (forces the zone-disabled
path). A raw RTM_NEWACTION trigger adds "action ct label 0x1/0x1 zone 1"
20 times; each returns -EOPNOTSUPP.
  stock:   net->ct.labels_used climbs 1,2,...,20 (get, then bare return,
           no put) -- 20 leaked counts, never recovered.
  patched: counter stays balanced (get then goto err -> put); baseline.
Control: the same loop without "label" (no nf_connlabels_get) leaves the
counter unchanged on both trees -- the trigger reached the labels path and
the synthesis is not itself the cause.

Conditions: reachable via RTM_NEWACTION with CAP_NET_ADMIN over the netns,
i.e. an unprivileged user in a fresh user+net namespace on default-userns
distros. The easy path needs CONFIG_NF_CONNTRACK_ZONES=n; the
nf_ct_tmpl_alloc ENOMEM path leaks on any config under memory pressure.

Mitigations: restrict unprivileged user namespaces; otherwise none short of
the fix. Harness (trigger.c, init) available on request.

 net/sched/act_ct.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 6158e13c98d35..f5866a364a74a 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -1295,7 +1295,8 @@ static int tcf_ct_fill_params(struct net *net,
 	if (tb[TCA_CT_ZONE]) {
 		if (!IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES)) {
 			NL_SET_ERR_MSG_MOD(extack, "Conntrack zones isn't enabled.");
-			return -EOPNOTSUPP;
+			err = -EOPNOTSUPP;
+			goto err;
 		}
 
 		tcf_ct_set_key_val(tb,
@@ -1308,7 +1309,8 @@ static int tcf_ct_fill_params(struct net *net,
 	tmpl = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL);
 	if (!tmpl) {
 		NL_SET_ERR_MSG_MOD(extack, "Failed to allocate conntrack template");
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto err;
 	}
 	p->tmpl = tmpl;
 	if (tb[TCA_CT_HELPER_NAME]) {
-- 
2.53.0


^ permalink raw reply related

* [PATCH] mptcp: only honor zero-length DATA_FIN when a mapping is present
From: Michael Bommarito @ 2026-06-17 21:57 UTC (permalink / raw)
  To: Matthieu Baerts, Mat Martineau
  Cc: Geliang Tang, Paolo Abeni, Eric Dumazet, Jakub Kicinski, mptcp,
	netdev, linux-kernel

mptcp_get_options() initializes only the status group of struct
mptcp_options_received; data_seq, subflow_seq and data_len are set by
mptcp_parse_option() only inside the DSS mapping block, which runs when
the DSS M (mapping present) bit is set.

A peer can send a DSS option with DATA_FIN set but the mapping bit clear.
The parser then sets mp_opt.data_fin while leaving data_len and data_seq
uninitialized, and for a zero-length segment mptcp_incoming_options()
reads them; KMSAN reports an uninit-value in mptcp_incoming_options().

Impact: a remote peer that has completed the MPTCP handshake makes
mptcp_incoming_options() read uninitialized data_len and data_seq (KMSAN
uninit-value) by sending a DSS option with DATA_FIN set and the mapping
bit clear.

A DATA_FIN is always sent with a mapping (mptcp_write_data_fin()), so
gating this path on the mapping bit drops only the malformed no-map case
and leaves valid DATA_FIN handling unchanged.

Fixes: 43b54c6ee382 ("mptcp: Use full MPTCP-level disconnect state machine")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
The stale data_seq then reaches mptcp_update_rcv_data_fin(); no further
consequence is demonstrated, so this is a robustness fix. The read site
for a zero-length segment is:

	if (mp_opt.data_fin && mp_opt.data_len == 1 &&
	    mptcp_update_rcv_data_fin(msk, mp_opt.data_seq, mp_opt.dsn64))

Reproduced under KMSAN: a no-map DSS DATA_FIN crafted on the wire and
injected over a TUN device into a real MPTCP listener produces twelve
"uninit-value in mptcp_incoming_options" reports on stock and none on the patched build (the unrelated mm-side
KMSAN boot noise present on both trees is not affected); a well-formed mapped DATA_FIN
(control) drives the same branch with no report. Harness on request.

 net/mptcp/options.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index dff3fd5d3b559..e40efa26a6694 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1227,7 +1227,7 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
 	 * present, needs to be updated here before the skb is freed.
 	 */
 	if (TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq) {
-		if (mp_opt.data_fin && mp_opt.data_len == 1 &&
+		if (mp_opt.use_map && mp_opt.data_fin && mp_opt.data_len == 1 &&
 		    mptcp_update_rcv_data_fin(msk, mp_opt.data_seq, mp_opt.dsn64))
 			mptcp_schedule_work((struct sock *)msk);
 
-- 
2.53.0


^ permalink raw reply related

* [PATCH] idpf: bound interrupt-vector register fill to the allocated array
From: Michael Bommarito @ 2026-06-17 21:57 UTC (permalink / raw)
  To: Tony Nguyen, Przemek Kitszel, Joshua Hay, Pavan Kumar Linga,
	Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: intel-wired-lan, netdev, linux-kernel

idpf_get_reg_intr_vecs() fills the caller-allocated reg_vals[] array from
the VIRTCHNL2_OP_ALLOC_VECTORS reply in adapter->req_vec_chunks, bounding
its inner loop only by the per-chunk num_vectors. The array is sized
separately: idpf_intr_reg_init() allocates
kzalloc_objs(struct idpf_vec_regs, total_vecs) from
caps.num_allocated_vectors and only checks the returned count after the
fill. The sum of per-chunk num_vectors is never reconciled against
total_vecs, so a reply with a small num_allocated_vectors but chunks
summing higher writes past the end of reg_vals[].

Impact: a control plane (a PF or hypervisor device model) that returns a
VIRTCHNL2_OP_ALLOC_VECTORS reply whose per-chunk num_vectors sum exceeds
num_allocated_vectors writes struct idpf_vec_regs entries past the end of
the reg_vals kmalloc allocation (KASAN slab-out-of-bounds write).

Bound the fill loop to the array capacity passed in by the callers,
mirroring the sibling idpf_vport_get_q_reg(). The existing
num_regs < num_vecs check then rejects an undersized reply without the
out-of-bounds write happening first.

Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport")
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
The reply originates from the control plane (a PF or hypervisor device
model), which is trusted in a standard deployment, so this is a
defense-in-depth / robustness fix: it bounds a malformed or internally
inconsistent ALLOC_VECTORS reply. It is a genuine trust-boundary crossing
only where the guest distrusts the control plane (a confidential VM or an
Intel IPU posture) or the control plane is simply buggy. It is not
remotely or unprivileged-reachable.

Reproduced with a KUnit harness that calls the unmodified
idpf_get_reg_intr_vecs() against a crafted req_vec_chunks reply
(num_allocated_vectors = 1, four chunks of sixteen vectors) under KASAN:
stock reports a slab-out-of-bounds write 0 bytes past a 12-byte kmalloc-16
object and the test fails; the patched build is KASAN-clean; a well-formed
64-vector reply still fills 64 entries on both. The KUnit wiring is
repro-only scaffolding, not part of this patch; harness on request.

 drivers/net/ethernet/intel/idpf/idpf_dev.c      | 2 +-
 drivers/net/ethernet/intel/idpf/idpf_vf_dev.c   | 2 +-
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 5 +++--
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.h | 2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_dev.c b/drivers/net/ethernet/intel/idpf/idpf_dev.c
index 1a0c71c95ef12..4079a787657f1 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_dev.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_dev.c
@@ -87,7 +87,7 @@ static int idpf_intr_reg_init(struct idpf_vport *vport,
 	if (!reg_vals)
 		return -ENOMEM;
 
-	num_regs = idpf_get_reg_intr_vecs(adapter, reg_vals);
+	num_regs = idpf_get_reg_intr_vecs(adapter, reg_vals, total_vecs);
 	if (num_regs < num_vecs) {
 		err = -EINVAL;
 		goto free_reg_vals;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c b/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
index a07d7e808ca9b..6726084f6cfa0 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
@@ -86,7 +86,7 @@ static int idpf_vf_intr_reg_init(struct idpf_vport *vport,
 	if (!reg_vals)
 		return -ENOMEM;
 
-	num_regs = idpf_get_reg_intr_vecs(adapter, reg_vals);
+	num_regs = idpf_get_reg_intr_vecs(adapter, reg_vals, total_vecs);
 	if (num_regs < num_vecs) {
 		err = -EINVAL;
 		goto free_reg_vals;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index be66f9b2e101c..ec7330603ff84 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -1318,11 +1318,12 @@ idpf_vport_init_queue_reg_chunks(struct idpf_vport_config *vport_config,
  * idpf_get_reg_intr_vecs - Get vector queue register offset
  * @adapter: adapter structure to get the vector chunks
  * @reg_vals: Register offsets to store in
+ * @num_vecs: number of entries the @reg_vals array can hold
  *
  * Return: number of registers that got populated
  */
 int idpf_get_reg_intr_vecs(struct idpf_adapter *adapter,
-			   struct idpf_vec_regs *reg_vals)
+			   struct idpf_vec_regs *reg_vals, int num_vecs)
 {
 	struct virtchnl2_vector_chunks *chunks;
 	struct idpf_vec_regs reg_val;
@@ -1346,7 +1347,7 @@ int idpf_get_reg_intr_vecs(struct idpf_adapter *adapter,
 		dynctl_reg_spacing = le32_to_cpu(chunk->dynctl_reg_spacing);
 		itrn_reg_spacing = le32_to_cpu(chunk->itrn_reg_spacing);
 
-		for (i = 0; i < num_vec; i++) {
+		for (i = 0; i < num_vec && num_regs < num_vecs; i++) {
 			reg_vals[num_regs].dyn_ctl_reg = reg_val.dyn_ctl_reg;
 			reg_vals[num_regs].itrn_reg = reg_val.itrn_reg;
 			reg_vals[num_regs].itrn_index_spacing =
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h
index 6876e3ed9d1be..9b1c9c86f6eac 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h
@@ -104,7 +104,7 @@ int idpf_vc_core_init(struct idpf_adapter *adapter);
 void idpf_vc_core_deinit(struct idpf_adapter *adapter);
 
 int idpf_get_reg_intr_vecs(struct idpf_adapter *adapter,
-			   struct idpf_vec_regs *reg_vals);
+			   struct idpf_vec_regs *reg_vals, int num_vecs);
 int idpf_queue_reg_init(struct idpf_vport *vport,
 			struct idpf_q_vec_rsrc *rsrc,
 			struct idpf_queue_id_reg_info *chunks);
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net] net: dst_metadata: fix false-positive memcpy overflow in tun_dst_unclone
From: Ilya Maximets @ 2026-06-17 22:01 UTC (permalink / raw)
  To: Gustavo A. R. Silva, netdev
  Cc: i.maximets, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Kees Cook, Gustavo A. R. Silva,
	Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
	linux-kernel, linux-hardening, llvm, Johan Thomsen
In-Reply-To: <9e941b82-23d4-44e6-a240-b7949ace76ab@embeddedor.com>

On 6/17/26 10:08 PM, Gustavo A. R. Silva wrote:
> Hi,
> 
> On 6/16/26 04:03, Ilya Maximets wrote:
>> kmalloc_flex() in metadata_dst_alloc() sets __counted_by for the
>> structure to the options_len, which is then initialized to zero.
>> Later, we're initializing the structure by copying the tunnel info
>> together with the options, and this triggers a warning for a potential
>> memcpy overflow, since the compiler estimates that the options can't
>> fit into the structure, even though the memory for them is actually
>> allocated.
>>
>>   memcpy: detected buffer overflow: 104 byte write of buffer size 96
>>   WARNING: CPU: X PID: Y at lib/string_helpers.c:1036 __fortify_report
>>    skb_tunnel_info_unclone+0x179/0x190
>>    geneve_xmit+0x7fe/0xe00
> 
> This warning has nothing to do with counted_by. See below for more
> comments.
> 
>>
>> The issue is triggered when built with clang and source fortification.
>>
>> Fix that by doing the copy in two stages: first - the main data with
>> the options_len, then the options.  This way the correct length should
>> be known at the time of the copy.
>>
>> It would be better if the options_len never changed after allocation,
>> but the allocation code is a little separate from the initialization
>> and it would be awkward and potentially dangerous to return a struct
>> with options_len set to a non-zero value from the metadata_dst_alloc().
>>
>> Another option would be to use ip_tunnel_info_opts_set(), but it is
>> doing too many unnecessary operations for the use case here.
>>
>> Fixes: 69050f8d6d07 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types")
>> Reported-by: Johan Thomsen <write@ownrisk.dk>
>> Closes: https://lore.kernel.org/netdev/CAKv6aAM8_EWgXScnKmKYm_4SwGDVBK++dzfP+Y6msUXbp99QUw@mail.gmail.com/
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>
>> Johan, if you can test this one in your setup as well, that would
>> be great.  Thanks.
>>
>>   include/net/dst_metadata.h | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
>> index 1fc2fb03ce3f..f45d1e3163f0 100644
>> --- a/include/net/dst_metadata.h
>> +++ b/include/net/dst_metadata.h
>> @@ -164,8 +164,11 @@ static inline struct metadata_dst *tun_dst_unclone(struct sk_buff *skb)
>>   	if (!new_md)
>>   		return ERR_PTR(-ENOMEM);
>>   
>> -	memcpy(&new_md->u.tun_info, &md_dst->u.tun_info,
>> -	       sizeof(struct ip_tunnel_info) + md_size);
> 
> What's going on here is that, internally, fortified memcpy() retrieves
> the destination size via __builtin_dynamic_object_size() in mode 1.
> 
> That is:
> 
> __builtin_dynamic_object_size(&new_md->u.tun_info, 1)
> 
> For the above case, Clang returns sizeof(new_md->u.tun_info) == 96.
> 
> So the warning is reporting that 104 bytes don't fit in an object of
> size 96 bytes, regardless of any counted_by annotation or allocation.

Hmm.  Does __builtin_dynamic_object_size(&new_md->u.tun_info, 1) return
104 when the options_len is 8?  If so, isn't that because it is counted
by that field?  Asking because the fortification doesn't complain if we
keep the full 104-byte copy as-is, but set the options_len beforehand,
as tested by Johan.

> 
> Of course, in this case, the write of 104 bytes into new_md->u.tun_info
> is intentional and controlled, but what if it weren't?
> 
> On the other hand, for this same case, GCC currently returns SIZE_MAX,
> which translates to -1 inside fortified memcpy(). Thus, bounds-checking
> is bypassed, which is why this warning doesn't show up with GCC.
> 
> However, this is a bug in GCC. We're already looking into that.
> 
> I think we've had just a handful of cases like this across the whole
> kernel tree. We can deal with them as you did here (by directly copying
> the composite structure first, and then using memcpy() to copy into the
> flexible-array member). If these cases ever become more common, we
> could create some kind of helper to do both things at once. :)
> 
>> +	/* Copy in two stages to keep the __counted_by happy. */
> 
> So based on my comments above, this code comment is not correct.

I feel like some comment is still needed, do you have some suggestions
for what would be a better wording?

> 
>> +	new_md->u.tun_info = md_dst->u.tun_info;
> 
> This is fine.
> 
>> +	memcpy(ip_tunnel_info_opts(&new_md->u.tun_info),
>> +	       ip_tunnel_info_opts(&md_dst->u.tun_info), md_size);
> 
> Is ip_tunnel_info_opts() really needed here?
> 
> Probably this works just fine:
> 
> memcpy(new_md->u.tun_info.options, md_dst->u.tun_info.options, md_size);

The logic here is: we have the access function, therefore we should use it.
It gives a bad example if we don't.

Best regards, Ilya Maximets.

^ permalink raw reply

* Re: [PATCH 6.6.y] rxrpc: Fix the ACK parser to extract the SACK table for parsing
From: Jakub Kicinski @ 2026-06-17 22:05 UTC (permalink / raw)
  To: Sasha Levin
  Cc: stable, David Howells, Michael Bommarito, Marc Dionne,
	Jeffrey Altman, Eric Dumazet, David S. Miller, Paolo Abeni,
	Simon Horman, linux-afs, netdev, stable
In-Reply-To: <ajMXGIoyTqpZCvw-@laps>

On Wed, 17 Jun 2026 17:52:24 -0400 Sasha Levin wrote:
> On Wed, Jun 17, 2026 at 01:27:04PM -0700, Jakub Kicinski wrote:
> >On Wed, 17 Jun 2026 14:04:10 -0400 Sasha Levin wrote:  
> >> Subject: [PATCH 6.6.y] rxrpc: Fix the ACK parser to extract the SACK table for parsing
> >> Date: Wed, 17 Jun 2026 14:04:10 -0400
> >> X-Mailer: git-send-email 2.53.0
> >>
> >> From: David Howells <dhowells@redhat.com>
> >>
> >> [ Upstream commit 333b6d5bb9f87827ac2639c737bf9613dbae7253 ]  
> >
> >nit: you missed the "skip patchwork" header on this?  
> 
> Hey Jakub,
> 
> This one is a backport crafted in response to a failed backport of a stable
> tagged commit.
> 
> I followed Greg's template to sending those backports to him, but I also think
> that I do want folks to review the actual backport itself.

I see!
 
> Do you think it makes sense to add a skip patchwork header on these too?

Hm, maybe some clever patchwork DB query would tell us what others do.
For networking the patchwork queue is purely for patches we have to
apply. So my preference would be to skip, but I didn't realize this
was intentional. 

It'd still be useful to add _some_ header that we could filter on. 
We have bots which complain if people repost patches too fast,
they will get confused.

^ permalink raw reply

* Re: [PATCH] net: stmmac: loongson1: Use dev_err_probe()
From: Jakub Kicinski @ 2026-06-17 22:07 UTC (permalink / raw)
  To: Jacob Keller
  Cc: keguang.zhang, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, linux-mips,
	netdev, linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <6b8db599-5bb2-47f9-ab53-a0b5141af2e5@intel.com>

On Wed, 17 Jun 2026 14:26:25 -0700 Jacob Keller wrote:
> It does claim that it has benefit since you get the error code emitted
> symbolically. But we have %pe for that. I wonder if dev_err_probe
> predates %pe?

I'd argue

  No of match data provided: -EINVAL

is more confusing than just:

  No of match data provided

the EINVAL is meaningless and hardcoded in this case?

^ permalink raw reply

* Re: [PATCH net-next] r8169: migrate Rx path to page_pool
From: Francois Romieu @ 2026-06-17 22:25 UTC (permalink / raw)
  To: Atharva Potdar
  Cc: hkallweit1, nic_swsd, andrew+netdev, davem, edumazet, kuba,
	pabeni, netdev
In-Reply-To: <CAF9AHva0TSFz5tedMEgJTkhThzDGqmW7MJshAtf3ULbLY4wd=w@mail.gmail.com>

Atharva Potdar <atharvapotdar07@gmail.com> :
[...]
> Francois:
> > You may consider fdd7b4c3302c93f6833e338903ea77245eb510b4 and some related
> > changes around that time.
> 
> I am sorry but I don't fully understand the context of this commit or
> the behaviour it addresses. Could you please help me regarding what I
> need to watch out for this change?

It should be clearer with c0cd884af045338476b8e69a61fceb3f34ff22f1 (and
may be 6f0333b8fde44b8c04a53b2461504f0e8f1cebe6 as well).

Old chipsets did not correctly implement receive buffer size limit.

I have no idea how mildly recent ones (< 15 years) behave.

Reinstatement of the bug on old chipset imho deserves some warning.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH v13 5/6] tls: add hardware offload key update support
From: Rishikesh Jethwani @ 2026-06-17 22:28 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: netdev, saeedm, tariqt, mbloch, borisp, john.fastabend, kuba,
	davem, pabeni, edumazet, leon
In-Reply-To: <ag8PQ2pxvKMHglWV@krikkit>

On Thu, May 21, 2026 at 6:57 AM Sabrina Dubroca <sd@queasysnail.net> wrote:
>
> 2026-05-12, 10:55:35 -0700, Rishikesh Jethwani wrote:
> > > Not blaming you for NIC behavior, but... the NIC passes up as
> > > "decrypted" records that have failed decryption (because it was using
> > > the wrong (old) key), or passes as "encrypted" the incorrectly
> > > decrypted data (that it has "decrypted" with the old key)?
> > >
> > > Or this is only the first record(s) after the KeyUpdate message, if
> > > they fall within the same packet, the whole packet was "decrypted"
> > > with the old key but only the KeyUpdate itself (and maybe some more
> > > records before it) decrypted correctly ; but subsequent packets get
> > > passed as !decrypted and don't need this reencrypt dance?
> > >
> > > (this is maybe more of a question for Tariq or the other @nvidia
> > > folks)
> > >
> > >
> > > I haven't reviewed the whole patch at this point, because of Paolo's
> > > suggestion and this confusion with the RX rekey.
> > >
> > > The traces show how both NICs behave during the key transition:
> >
> >   Broadcom (NIC preserves decrypted flags):
> >   - decrypted=1: NIC fully decrypted these with the old key; one reencrypt
> > pass (retry=0) re-encrypts those frags back to ciphertext, then SW decrypts
> > with the new key.
> >   - encrypted=0, decrypted=0: boundary-straddling record; same single
> > reencrypt pass.
> >   - encrypted=1: NIC never touched these; SW decrypts directly with the new
> > key.
> >
> >   Mellanox (NIC clears decrypted flags on auth failure):
> >   - encrypted=0, decrypted=0: NIC partially processed the record but
> > cleared all decrypted flags on auth failure. retry=0 reencrypts with the
> > wrong frag interpretation and gets EBADMSG; retry=1 toggles the flags and
> > succeeds.
> >   - encrypted=1: NIC reported these as untouched; SW decrypts directly.
> >
> >   The retry path exists specifically for the Mellanox case: cleared flags
> > hide which frags the NIC actually touched, so the first pass may pick the
> > wrong interpretation and need to flip it.
>
> Thanks for describing this.
>
> I don't think the core TLS implementation should implement workarounds
> for HW/drivers doing strange things. IMO this should instead be
> stuffed into a driver CB, which would implement the necessary steps to
> perform during a rekey.
>
> The core can provide a helper, but I would move the whole
> "reencrypt+maybe retry" logic to a CB.

The driver callback approach sounds good.

^ permalink raw reply

* Re: [PATCH v14 6/9] tls: device: add TX KeyUpdate support
From: Rishikesh Jethwani @ 2026-06-17 22:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, saeedm, tariqt, mbloch, borisp, john.fastabend, sd, davem,
	pabeni, edumazet, leon
In-Reply-To: <20260525211608.2508449-1-kuba@kernel.org>

On Mon, May 25, 2026 at 2:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> tls: device: add TX KeyUpdate support
>
> This adds TX-side TLS 1.3 KeyUpdate support for HW-offloaded sockets by
> installing a temporary SW context with the new key, deferring the HW
> reinstall until all old-key records have been ACKed, and falling back to
> permanent SW mode if the HW reinstall fails.
>
> Thanks for the patch.  A few questions inline.
>
> > diff --git a/include/net/tls.h b/include/net/tls.h
> > index 2512a3799b21..c1085873ee01 100644
> > --- a/include/net/tls.h
> > +++ b/include/net/tls.h
>
> [ ... ]
>
> > diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> > index 7a98d2f6cbd3..c435b3450872 100644
> > --- a/net/tls/tls_device.c
> > +++ b/net/tls/tls_device.c
>
> [ ... ]
>
> > @@ -222,6 +246,19 @@ static void tls_tcp_clean_acked(struct sock *sk, u32 acked_seq)
> >       }
> >
> >       ctx->unacked_record_sn += deleted_records;
> > +
> > +     /* Once all old-key HW records are ACKed, set REKEY_READY to
> > +      * let sendmsg know it can finish the rekey and switch back
> > +      * to HW offload.
> > +      */
> > +     if (test_bit(TLS_TX_REKEY_PENDING, &tls_ctx->flags) &&
> > +         !test_bit(TLS_TX_REKEY_FAILED, &tls_ctx->flags)) {
> > +             u32 boundary_seq = READ_ONCE(tls_ctx->rekey.boundary_seq);
> > +
> > +             if (!before(acked_seq, boundary_seq))
> > +                     set_bit(TLS_TX_REKEY_READY, &tls_ctx->flags);
> > +     }
> > +
> >       spin_unlock_irqrestore(&ctx->lock, flags);
> >  }
>
> [Medium]
> Is the ordering between the boundary_seq publish and the PENDING bit
> publish in tls_device_start_rekey() strong enough for this read site?
>
> In tls_device_start_rekey() the sequence is:
>
>         WRITE_ONCE(ctx->rekey.boundary_seq, tcp_sk(sk)->write_seq);
>         ...
>         set_bit(TLS_TX_REKEY_PENDING, &ctx->flags);
>
> set_bit() does not document any implicit memory barrier
> (Documentation/atomic_bitops.txt).  On weakly-ordered architectures, can
> this reader observe PENDING=1 with a stale boundary_seq, set REKEY_READY
> prematurely, and let tls_device_complete_rekey() run before the old-key
> records are actually ACKed?
>
> Would an smp_wmb() (or smp_mb__before_atomic()) between the write of
> boundary_seq and the set_bit, or moving both writes under
> offload_ctx->lock that this reader already holds, be appropriate?

I'll move both writes under offload_ctx->lock

> > @@ -253,6 +290,14 @@ void tls_device_free_resources_tx(struct sock *sk)
> >       struct tls_context *tls_ctx = tls_get_ctx(sk);
> >
> >       tls_free_partial_record(sk, tls_ctx);
> > +
> > +     if (unlikely(tls_ctx->rekey.sw_ctx))
> > +             tls_sw_release_resources_tx(sk);
> > +
> > +     if (test_bit(TLS_TX_REKEY_PENDING, &tls_ctx->flags)) {
> > +             TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSTXREKEYOK);
> > +             TLS_DEC_STATS(sock_net(sk), LINUX_MIB_TLSTXREKEYINPROGRESS);
> > +     }
> >  }
>
> [Medium]
> Should a rekey that was still PENDING at socket teardown count as
> TlsTxRekeyOk?  The rekey never completed, but the counter is bumped as
> if it had succeeded.  Should this be a separate "aborted" counter, or
> just omit the OK increment in this path?

I'll add a separate "aborted" counter.

> > @@ -624,6 +672,19 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> >                       goto out;
> >       }
> >
> > +     /* Old-key records all ACKed; switch back to HW. */
> > +     if (test_bit(TLS_TX_REKEY_READY, &tls_ctx->flags))
> > +             tls_device_complete_rekey(sk, tls_ctx, true);
>
> [Medium]
> Should the return value of tls_device_complete_rekey() be checked here?
>
> tls_device_complete_rekey() can return non-zero from its early
> tls_sw_drain_tx() path:
>
>         rc = tls_sw_drain_tx(sk, ctx);
>         if (rc)
>                 return rc;
>
> In that early-return case none of REKEY_PENDING, REKEY_READY, or
> REKEY_FAILED is updated.  The caller falls through to
> tls_sw_sendmsg_locked(), and on the next sendmsg the same READY bit is
> still set, so complete_rekey is retried, drain returns -EAGAIN again,
> and so on.  Is there a way out of this state?  Should persistent drain
> failure transition to FAILED, or at least bump a counter so it is
> observable?

Looking at the actual failure modes, I don't think persistent failure
or a FAILED transition is the right fix here, let me lay out why, and
what I'm proposing instead.

tls_sw_drain_tx() returns non-zero in three cases:
1. tls_tx_records() returns < 0
2. tls_is_partially_sent_record() still true after the push attempt
3. tls_is_pending_open_record() still true after the push attempt

The key thing is that tls_tx_records() (tls_sw.c:460-462) already
collapses every non-EAGAIN failure into tls_err_abort(sk, rc), which
sets sk->sk_err. From that point the socket is poisoned: the next
sendmsg returns the saved sk_err to userspace via the sk->sk_err check
at the top of tls_push_data() (and the SW equivalent), and the loop
exits naturally, userspace stops sending.

That means the failure modes that actually retry on the next sendmsg
are exclusively -EAGAIN-class transients:

- TCP send buffer full → clears as ACKs free space; tls_write_space()
flushes the partial.
- Open record not fully pushed → flushes on the next sendmsg once
buffer space is available.

Transitioning to REKEY_FAILED on -EAGAIN would permanently downgrade a
connection that's just waiting one RTT for TCP buffer space, which I
don't think is what we want, once FAILED, we never retry the HW
transition.

For observability without committing a new SNMP counter, I'll add a
tracepoint at the sendmsg call site so a stuck loop is visible via
tracefs

> > +
> > +     /* Use SW path if rekey is in progress (PENDING) or if HW rekey
> > +      * failed (FAILED).
> > +      */
> > +     if (test_bit(TLS_TX_REKEY_PENDING, &tls_ctx->flags) ||
> > +         test_bit(TLS_TX_REKEY_FAILED, &tls_ctx->flags)) {
> > +             rc = tls_sw_sendmsg_locked(sk, msg, size);
> > +             goto out;
> > +     }
> > +
> >       rc = tls_push_data(sk, &msg->msg_iter, size, msg->msg_flags,
> >                          record_type);
>
> [High]
> What happens to a HW open_record across this transition?
>
> If userspace previously sent with MSG_MORE, offload_ctx->open_record
> holds buffered plaintext and tls_ctx->pending_open_record_frags is true.
> A subsequent setsockopt KeyUpdate then routes future sendmsg calls into
> tls_sw_sendmsg_locked(), which builds into sw_ctx->open_rec (initially
> NULL) and never touches the HW open_record.  At socket close,
> tls_device_sk_destruct() does:
>
>         if (ctx->open_record)
>                 destroy_record(ctx->open_record);
>
> so the buffered plaintext is freed without ever being transmitted.
>
> Also, since pending_open_record_frags stays true, would
> tls_sw_drain_tx() see tls_is_pending_open_record(ctx) as true and return
> -EAGAIN, blocking tls_device_complete_rekey() until something on the SW
> side coincidentally clears it?
>
> The tcp_write_collapse_fence() in tls_device_start_rekey() only handles
> records already in the TCP write queue; it does not appear to flush this
> HW open_record.

Right, flushing the HW open_record is required.

> > @@ -1103,6 +1164,260 @@ static struct tls_offload_context_tx *alloc_offload_ctx_tx(struct tls_context *c
> [ ... ]
> > +static int tls_device_start_rekey(struct sock *sk,
> > +                               struct tls_context *ctx,
> > +                               struct tls_offload_context_tx *offload_ctx,
> > +                               struct tls_crypto_info *new_crypto_info)
> > +{
> > +     bool rekey_pending = test_bit(TLS_TX_REKEY_PENDING, &ctx->flags);
> > +     bool rekey_failed = test_bit(TLS_TX_REKEY_FAILED, &ctx->flags);
> [ ... ]
> > +     if (rekey_pending || rekey_failed) {
> > +             rc = crypto_aead_setkey(offload_ctx->rekey.sw.aead_send,
> > +                                     key, cipher_desc->key);
> > +             if (rc)
> > +                     return rc;
>
> [High]
> Can there be in-flight async crypto requests on this aead_send when
> crypto_aead_setkey() is called here?
>
> While a rekey is PENDING, sendmsg has been encrypting via
> offload_ctx->rekey.sw.aead_send through tls_sw_sendmsg_locked().  Async
> backends can return -EINPROGRESS and complete later.  Other call sites
> that mutate AEAD state (tls_sw_release_resources_tx, tls_sw_drain_tx,
> tls_set_sw_offload) call tls_encrypt_async_wait() first.  Should the
> same wait happen here before re-keying the same tfm, so in-flight
> encryptions cannot pick up K2 instead of the K1 they were submitted
> with?
>
> [ ... ]
> > +     } else {
> > +             rc = tls_device_init_rekey_sw(sk, ctx, offload_ctx,
> > +                                           new_crypto_info);
> [ ... ]
> > +             WRITE_ONCE(ctx->rekey.boundary_seq, tcp_sk(sk)->write_seq);
> > +
> > +             /* Prevent a partial record straddling the SW/HW boundary. */
> > +             tcp_write_collapse_fence(sk);
> > +
> > +             ctx->rekey.sw_ctx = &offload_ctx->rekey.sw;
> > +             ctx->rekey.cipher_ctx = &offload_ctx->rekey.tx;
> > +
> > +             set_bit(TLS_TX_REKEY_PENDING, &ctx->flags);
> [ ... ]
> > +     unsafe_memcpy(&offload_ctx->rekey.crypto_send.info, new_crypto_info,
> > +                   cipher_desc->crypto_info,
> > +                   /* checked in do_tls_setsockopt_conf */);
> > +     memzero_explicit(new_crypto_info, cipher_desc->crypto_info);
> > +
> > +     return 0;
> > +}

Yes, I'll add a wait for in-flight async encryptions submitted to this
tfm with the previous key before changing it.

> [Medium]
> new_crypto_info is zeroized after the copy, but is the staging area
> itself ever scrubbed?
>
> offload_ctx->rekey.crypto_send.info, offload_ctx->rekey.tx.iv, and
> offload_ctx->rekey.tx.rec_seq receive the same key bytes via
> unsafe_memcpy/memcpy.  tls_device_complete_rekey() does not scrub them
> on success, and tls_device_free_resources_tx() / tls_device_tx_del_task()
> kfree the offload_ctx without a memzero_explicit().  Should those sites
> zero the staging area to limit the window in which key bytes are
> recoverable from slab memory after rotation?

Yes, I'll scrub the rekey staging area.

> > +static int tls_device_complete_rekey(struct sock *sk, struct tls_context *ctx,
> > +                                  bool deferred)
> > +{
> [ ... ]
> > +     if (!test_bit(TLS_TX_DEV_CLOSED, &ctx->flags)) {
> > +             netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
> > +                                             TLS_OFFLOAD_CTX_DIR_TX);
> > +             set_bit(TLS_TX_DEV_CLOSED, &ctx->flags);
> > +     }
> > +
> > +     memcpy(crypto_info_rec_seq(&offload_ctx->rekey.crypto_send.info, cipher_desc),
> > +            offload_ctx->rekey.tx.rec_seq, cipher_desc->rec_seq);
> > +
> > +     rc = tls_device_dev_add_tx(sk, netdev, &offload_ctx->rekey.crypto_send.info,
> > +                                tcp_sk(sk)->write_seq);
> > +
> > +release_lock:
> > +     up_read(&device_offload_lock);
> > +
> > +     spin_lock_irqsave(&offload_ctx->lock, flags);
> > +     memcpy(&rcd_sn, offload_ctx->rekey.tx.rec_seq, sizeof(rcd_sn));
> > +     offload_ctx->unacked_record_sn = be64_to_cpu(rcd_sn) - 1;
> > +     spin_unlock_irqrestore(&offload_ctx->lock, flags);
> > +
> > +     memcpy(ctx->tx.iv, offload_ctx->rekey.tx.iv,
> > +            cipher_desc->salt + cipher_desc->iv);
> > +     memcpy(ctx->tx.rec_seq, offload_ctx->rekey.tx.rec_seq,
> > +            cipher_desc->rec_seq);
> > +     unsafe_memcpy(&ctx->crypto_send.info,
> > +                   &offload_ctx->rekey.crypto_send.info,
> > +                   cipher_desc->crypto_info,
> > +                   /* checked during rekey setup */);
> > +
> > +     if (rc)
> > +             goto rekey_fail;
> > +
> > +     clear_bit(TLS_TX_DEV_CLOSED, &ctx->flags);
> > +
> > +     key = crypto_info_key(&offload_ctx->rekey.crypto_send.info, cipher_desc);
> > +     rc = crypto_aead_setkey(offload_ctx->aead_send, key, cipher_desc->key);
> > +     if (rc)
> > +             goto rekey_fail;
>
> [High]
> Can these unconditional copies of the new key material into ctx->tx.iv,
> ctx->tx.rec_seq, and ctx->crypto_send.info before the rc check leave
> inconsistent state on failure?
>
> If tls_device_dev_add_tx() failed, control reaches rekey_fail with:
>
>   - HW context removed (TLS_TX_DEV_CLOSED set)
>   - ctx->crypto_send.info advertising the NEW key
>   - ctx->tx.iv and ctx->tx.rec_seq holding the NEW salt/IV/rec_seq
>   - offload_ctx->aead_send still holding the OLD key (the setkey for it
>     is reached only after the rc check)
>
> Userspace getsockopt(SOL_TLS, TLS_TX) would then read NEW key bytes that
> no kernel AEAD ever absorbed.  Any subsequent fallback that builds IVs
> from ctx->tx and encrypts via offload_ctx->aead_send would mix NEW salt
> with the OLD key.
>
> Could the memcpy block move below the rc check?
>

I'll move the memcpys below the rc check.

> [High]
> On the success path, is there a window where tls_device_down() sees
> stale TLS_TX_DEV_CLOSED?
>
> The success path here:
>
>         if (!test_bit(TLS_TX_DEV_CLOSED, &ctx->flags)) {
>                 netdev->tlsdev_ops->tls_dev_del(...);
>                 set_bit(TLS_TX_DEV_CLOSED, &ctx->flags);
>         }
>         ...
>         rc = tls_device_dev_add_tx(...);   /* installs NEW HW ctx */
>
> release_lock:
>         up_read(&device_offload_lock);
>         ...
>         if (rc) goto rekey_fail;
>         clear_bit(TLS_TX_DEV_CLOSED, &ctx->flags);
>
> Between dropping device_offload_lock and clear_bit(), can
> tls_device_down() take down_write(&device_offload_lock) and run:
>
>         if (ctx->tx_conf == TLS_HW &&
>             !test_bit(TLS_TX_DEV_CLOSED, &ctx->flags)) {
>                 netdev->tlsdev_ops->tls_dev_del(...);
>                 set_bit(TLS_TX_DEV_CLOSED, &ctx->flags);
>         }
>
> It sees TLS_TX_DEV_CLOSED still set, skips tls_dev_del() for the
> freshly-installed NEW HW context, and proceeds to clear ctx->netdev.
> Would the new NIC-side TLS state then be leaked until the netdev is
> unregistered?
>
> Should TLS_TX_DEV_CLOSED be cleared while still holding
> device_offload_lock?

I will fix this TLS_TX_DEV_CLOSED issue.

>
> > +
> > +     /* Start marker: the NIC passes through everything before
> > +      * write_seq unencrypted (already SW-encrypted during rekey),
> > +      * same as during initial offload setup.
> > +      */
> > +     tls_device_commit_start_marker(sk, offload_ctx, start_marker_record);
> [ ... ]
> > +rekey_fail:
> > +     kfree(start_marker_record);
> > +     set_bit(TLS_TX_REKEY_FAILED, &ctx->flags);
> > +     clear_bit(TLS_TX_REKEY_READY, &ctx->flags);
> > +     clear_bit(TLS_TX_REKEY_PENDING, &ctx->flags);
> > +     if (deferred)
> > +             TLS_DEC_STATS(sock_net(sk), LINUX_MIB_TLSTXREKEYINPROGRESS);
> > +     TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSTXREKEYFALLBACK);
> > +
> > +     return 0;
> > +}
>
> [Medium]
> After REKEY_FAILED, the connection is structurally encrypting in SW even
> though ctx->tx_conf stays TLS_HW.  TLSCURRTXDEVICE is never decremented
> and TLSCURRTXSW is never incremented in this path.  Should the
> current-mode gauges in /proc/net/tls_stat reflect the actual encryption
> mode for these connections?

I'll make suggested change

> > +static int tls_set_device_offload_rekey(struct sock *sk,
> > +                                     struct tls_context *ctx,
> > +                                     struct net_device *netdev,
> > +                                     struct tls_crypto_info *new_crypto_info)
> > +{
> > +     struct tls_offload_context_tx *offload_ctx = tls_offload_ctx_tx(ctx);
> > +     bool rekey_pending = test_bit(TLS_TX_REKEY_PENDING, &ctx->flags);
> > +     bool rekey_failed = test_bit(TLS_TX_REKEY_FAILED, &ctx->flags);
> > +     bool defer = true;
> > +     int rc;
> > +
> > +     if (!rekey_pending && !rekey_failed)
> > +             defer = tls_has_unacked_records(offload_ctx);
> > +
> > +     down_read(&device_offload_lock);
> > +
> > +     rc = tls_device_start_rekey(sk, ctx, offload_ctx, new_crypto_info);
> > +     if (rc) {
> > +             up_read(&device_offload_lock);
> > +             return rc;
> > +     }
> > +
> > +     up_read(&device_offload_lock);
> > +
> > +     if (defer) {
> > +             if (!rekey_pending)
> > +                     TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSTXREKEYINPROGRESS);
> > +             else
> > +                     TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSTXREKEYOK);
> > +             return 0;
> > +     }
> > +
> > +     return tls_device_complete_rekey(sk, ctx, false);
> > +}
>
> [Medium]
> On the rekey_pending path here, TLSTXREKEYOK is incremented before any
> HW reinstall actually happens.  Combined with the increment in
> tls_device_complete_rekey() on success and the increment in
> tls_device_free_resources_tx() when PENDING is set at close, can a
> single logical successful rekey produce more than one TLSTXREKEYOK
> increment?

The accounting is per-setsockopt, not per-HW-install, same convention
as the SW path (tls_main.c:743 also bumps TlsTxRekeyOk once per
successful setsockopt). When two
setsockopt calls overlap a single HW transition, each call gets one
final outcome. The early OK++ in the rekey_pending branch
credits the previous in-flight setsockopt (which is implicitly
"completed" from userspace's view, since its key is now being
replaced). The trailing complete_rekey
OK++ credits the current one. No setsockopt produces zero or two
final-outcome bumps, each produces exactly one.

> [Medium]
> tls_has_unacked_records() inspects offload_ctx->records_list only.  Is
> that sufficient to decide whether to defer?  pending_open_record_frags
> or offload_ctx->open_record can still hold pre-rekey state, in which
> case the !defer path falls into tls_device_complete_rekey(deferred=false),
> tls_sw_drain_tx() returns -EAGAIN, setsockopt returns the error to
> userspace via err_crypto_info, and none of the state transitions from
> tls_device_start_rekey() (PENDING bit, sw_ctx, cipher_ctx, boundary_seq,
> sk_validate_xmit_skb) are rolled back.  Should the failure path either
> roll those back or transition to a defined recovery state?

I'll check tls_is_pending_open_record also to decide whether to defer

> [ ... ]
>
> > diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c
> > index 1110f7ac6bcb..64ac4ef4012b 100644
> > --- a/net/tls/tls_device_fallback.c
> > +++ b/net/tls/tls_device_fallback.c
> > @@ -435,6 +435,30 @@ struct sk_buff *tls_validate_xmit_skb_sw(struct sock *sk,
> >       return tls_sw_fallback(sk, skb);
> >  }
> >
> > +struct sk_buff *tls_validate_xmit_skb_rekey(struct sock *sk,
> > +                                         struct net_device *dev,
> > +                                         struct sk_buff *skb)
> > +{
> > +     struct tls_context *tls_ctx = tls_get_ctx(sk);
> > +     u32 tcp_seq = ntohl(tcp_hdr(skb)->seq);
> > +     u32 boundary_seq;
> > +
> > +     if (test_bit(TLS_TX_REKEY_FAILED, &tls_ctx->flags))
> > +             return skb;
> > +
> > +     /* If this packet is at or after the rekey boundary, it's already
> > +      * SW-encrypted with the new key, pass through unchanged
> > +      */
> > +     boundary_seq = READ_ONCE(tls_ctx->rekey.boundary_seq);
> > +     if (!before(tcp_seq, boundary_seq))
> > +             return skb;
> > +
> > +     /* Packet before boundary means retransmit of old data,
> > +      * use SW fallback with the old key
> > +      */
> > +     return tls_sw_fallback(sk, skb);
> > +}
>
> [Critical]
> Can this leak plaintext on the wire after REKEY_FAILED?
>
> The first branch returns the skb unchanged whenever
> TLS_TX_REKEY_FAILED is set, regardless of where tcp_seq is relative to
> boundary_seq.
>
> After tls_device_complete_rekey() has called tls_dev_del() and then
> landed in rekey_fail, there is no NIC TLS context for the socket.
> SKBs from before the boundary are still sitting in the socket write
> queue with plaintext payload, because HW-offload semantics expected the
> NIC to encrypt them at TX time.  When TCP retransmits one of those old
> skbs:
>
>         tcp_seq < boundary_seq, but TLS_TX_REKEY_FAILED is set
>         -> early return skb unchanged
>         -> NIC has no TLS context for this flow
>         -> driver transmits the skb as-is
>
> Should the FAILED branch still route pre-boundary packets through
> tls_sw_fallback() with the old key, the same way the non-failed branch
> does, so that pre-boundary retransmits are encrypted?
>
> > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> > index fd04857fa0ab..2548ad2b2219 100644
> > --- a/net/tls/tls_main.c
> > +++ b/net/tls/tls_main.c
> > @@ -371,6 +371,8 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
> >
> >       if (ctx->tx_conf == TLS_SW)
> >               tls_sw_cancel_work_tx(ctx);
> > +     else if (ctx->tx_conf == TLS_HW && ctx->rekey.sw_ctx)
> > +             tls_sw_cancel_work_tx(ctx);
> >
> >       lock_sock(sk);

boundary_seq captures tcp_sk(sk)->write_seq at start_rekey;
clean_acked_data only sets REKEY_READY when acked_seq >= boundary_seq;
complete_rekey is only invoked when READY is set i.e.
complete_rekey only runs when all pre-boundary data is already ACKed.
So when complete_rekey fails and sets REKEY_FAILED, the pre-boundary
plaintext SKBs have already been ACKed and removed from
sk_write_queue. The early return in tls_validate_xmit_skb_rekey is
safe because no pre-boundary data remains in the write queue for
retransmission.

> [High]
> Is reading ctx->rekey.sw_ctx before lock_sock(sk) safe?
>
> ctx->rekey.sw_ctx is published by tls_device_start_rekey() under
> lock_sock.  Here close reads it before taking lock_sock, so a concurrent
> setsockopt rekey can publish sw_ctx after this read but before close
> acquires the socket lock.  Close then proceeds without canceling the
> delayed tx_work, while later cleanup
> (tls_device_free_resources_tx -> tls_sw_release_resources_tx) frees
> aead_send.
>
> If a tx_work_handler scheduled by the rekey path runs after that, can it
> touch the freed delayed_work or aead_send?
>
> Should the read be moved under lock_sock, or should
> tls_sw_release_resources_tx() internally cancel the delayed work?
>

Why setsockopt and close can't be concurrent here:

  tls_sk_proto_close is proto->close. It is reached only via
inet_release → __sock_release → run from __fput when the file's last
reference drops. Any syscall that could mutate ctx->rekey.sw_ctx
(setsockopt, sendmsg) holds a file reference for its entire duration. So while
one thread is inside do_tls_setsockopt_conf publishing sw_ctx, the
file refcount cannot reach 0, and tls_sk_proto_close cannot start. By
the time tls_sk_proto_close runs, no setsockopt/sendmsg is in flight
on this socket.

  The existing line proves the pattern is intended:

  The pre-existing check at tls_main.c:372 (if (ctx->tx_conf ==
TLS_SW) tls_sw_cancel_work_tx(ctx)) already reads ctx->tx_conf before
lock_sock. tx_conf is likewise only written under setsockopt. The new
ctx->rekey.sw_ctx read at line 374 relies on the same invariant, so
if you accept line 372 as safe, line 374 is too.

^ permalink raw reply

* Re: [PATCH bpf v2] bpf, sockmap: fix use-after-free when the stream parser resizes the skb
From: Bobby Eshleman @ 2026-06-17 22:36 UTC (permalink / raw)
  To: Sechang Lim
  Cc: John Fastabend, Jakub Sitnicki, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, bpf,
	linux-kernel
In-Reply-To: <20260612123553.2724240-1-rhkrqnwk98@gmail.com>

On Fri, Jun 12, 2026 at 12:35:51PM +0000, Sechang Lim wrote:
> sk_psock_strp_parse() runs the BPF_PROG_TYPE_SK_SKB stream-parser program
> to find the length of the next message. strparser assembles a message out
> of several received skbs by chaining them onto the head's frag_list and
> recording where to append the next one in strp->skb_nextp:
> 
> 	*strp->skb_nextp = skb;
> 	strp->skb_nextp = &skb->next;
> 
> and then calls the parser on the head:
> 
> 	len = (*strp->cb.parse_msg)(strp, head);
> 
> The parser is only meant to inspect the skb, but the program may call
> bpf_skb_change_tail() -- or the sibling bpf_skb_pull_data(),
> bpf_skb_change_head(), bpf_skb_adjust_room(), all allowed for SK_SKB.
> Once the head carries a frag_list these go
> 
> 	... -> skb_ensure_writable -> pskb_may_pull -> __pskb_pull_tail
> 
> and __pskb_pull_tail() frees the frag_list skbs that strparser still
> tracks through skb_nextp:
> 
> 	while ((list = skb_shinfo(skb)->frag_list) != insp) {
> 		skb_shinfo(skb)->frag_list = list->next;
> 		consume_skb(list);
> 	}
> 
> strp->skb_nextp now points into a freed sk_buff. The next segment of
> the same message arrives in __strp_recv(), which links it with
> *strp->skb_nextp = skb, an 8-byte write into the freed skb. The free
> and the write happen in different __strp_recv() calls, so the message
> has to span at least three segments before it triggers.
> 
>   BUG: KASAN: slab-use-after-free in __strp_recv+0x447/0xda0
>   Write of size 8 at addr ffff88810db86140 by task repro/349
> 
>   Call Trace:
>    <IRQ>
>    __strp_recv+0x447/0xda0
>    __tcp_read_sock+0x13d/0x590
>    tcp_bpf_strp_read_sock+0x195/0x320
>    strp_data_ready+0x267/0x340
>    sk_psock_strp_data_ready+0x1ce/0x350
>    tcp_data_queue+0x1364/0x2fd0
>    tcp_rcv_established+0xe07/0x1640
>    [...]
> 
>   Allocated by task 349:
>    skb_clone+0x17b/0x210
>    __strp_recv+0x2c3/0xda0
>    __tcp_read_sock+0x13d/0x590
>    [...]
> 
>   Freed by task 349:
>    kmem_cache_free+0x150/0x570
>    __pskb_pull_tail+0x57b/0xc20
>    skb_ensure_writable+0x236/0x260
>    __bpf_skb_change_tail+0x1d4/0x590
>    sk_skb_change_tail+0x2a/0x40
>    bpf_prog_1b285dcd6c41373e+0x27/0x30
>    bpf_prog_run_pin_on_cpu+0xf3/0x260
>    sk_psock_strp_parse+0x118/0x1e0
>    __strp_recv+0x4f6/0xda0
>    [...]
> 
> The same resize also leaves the head's length inconsistent with its
> frags, so a later __pskb_pull_tail() can instead hit the
> BUG_ON(skb_copy_bits(...)) in net/core/skbuff.c.
> 
> Run the parser on a private clone of the head when the message spans more
> than one skb and the program can modify the packet
> (prog->aux->changes_pkt_data), so a resizing helper can only touch the
> clone and strparser's head and skb_nextp stay valid. Single-skb messages
> have no frag_list and read-only parsers cannot resize, so both are still
> parsed in place. If the clone cannot be allocated, return 0 so the caller
> retries on the next read rather than failing the parser.
> 
> Fixes: 8a31db561566 ("bpf: add access to sock fields and pkt data from sk_skb programs")
> Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
> ---
> v2:
>  - clone only when prog->aux->changes_pkt_data (Bobby Eshleman)
>  - return 0 on clone failure instead of -ENOMEM (Bobby Eshleman)
>  - free the clone with consume_skb() instead of kfree_skb()
>  - drop the unrelated guard(rcu)() change (Bobby Eshleman)
> 
> v1:
>  - https://lore.kernel.org/all/20260609112316.3685738-1-rhkrqnwk98@gmail.com/
> 
>  net/core/skmsg.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index e1850caf1a71..97e5bc5f38c3 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1149,9 +1149,29 @@ static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb)
>  	rcu_read_lock();
>  	prog = READ_ONCE(psock->progs.stream_parser);
>  	if (likely(prog)) {
> -		skb->sk = psock->sk;
> -		ret = bpf_prog_run_pin_on_cpu(prog, skb);
> -		skb->sk = NULL;
> +		struct sk_buff *parse_skb = skb;
> +
> +		/*
> +		 * strparser chains the message skbs through skb->frag_list and
> +		 * keeps a pointer into that list in strp->skb_nextp.  The parser
> +		 * program may call bpf_skb_change_tail() and friends, which go
> +		 * through __pskb_pull_tail() and free the frag_list skbs that
> +		 * strparser still tracks.  Run the program on a clone when the head
> +		 * has a frag_list and the program can modify the packet, so it
> +		 * cannot drop frags strparser owns.
> +		 */
> +		if (skb_has_frag_list(skb) && prog->aux->changes_pkt_data) {
> +			parse_skb = skb_clone(skb, GFP_ATOMIC);
> +			if (!parse_skb) {
> +				rcu_read_unlock();
> +				return 0;
> +			}
> +		}
> +		parse_skb->sk = psock->sk;
> +		ret = bpf_prog_run_pin_on_cpu(prog, parse_skb);
> +		parse_skb->sk = NULL;
> +		if (parse_skb != skb)
> +			consume_skb(parse_skb);
>  	}
>  	rcu_read_unlock();
>  	return ret;
> -- 
> 2.43.0
> 

Hey Sechang,

I'm still on the fence about "return 0" vs ENOMEM. I hate to flip-flop
on you here, but now I'm not sure if it is worth the complication to
return 0 since we're really only buying a single timer interval in which
we need 1) suddenly more memory to alloc the clone, and 2) another data
ready event to cause the stream parsing to pick up again. If any one
doesn't happen, the end result is the same. Not sure its a good
trade-off for the complexity of basically tricking the caller with the
zero return. Maybe let's go back to ENOMEM?

BTW, based on the comm name "repro", it sounds like you have a decent
reproducer for this. I wonder if it is possible to add something to the
selftests to catch this?

Best,
Bobby

^ permalink raw reply

* Re: [PATCH] net: stmmac: loongson1: Use dev_err_probe()
From: Jacob Keller @ 2026-06-17 22:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: keguang.zhang, Andrew Lunn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, linux-mips,
	netdev, linux-stm32, linux-arm-kernel, linux-kernel
In-Reply-To: <20260617150744.05756f0b@kernel.org>

On 6/17/2026 3:07 PM, Jakub Kicinski wrote:
> On Wed, 17 Jun 2026 14:26:25 -0700 Jacob Keller wrote:
>> It does claim that it has benefit since you get the error code emitted
>> symbolically. But we have %pe for that. I wonder if dev_err_probe
>> predates %pe?
> 
> I'd argue
> 
>   No of match data provided: -EINVAL
> 
> is more confusing than just:
> 
>   No of match data provided
> 
> the EINVAL is meaningless and hardcoded in this case?

Yea, it is a bit more pointless with a hard-coded error.

^ permalink raw reply

* Re: [PATCH v14 7/9] tls: device: add RX KeyUpdate support
From: Rishikesh Jethwani @ 2026-06-17 22:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, saeedm, tariqt, mbloch, borisp, john.fastabend, sd, davem,
	pabeni, edumazet, leon
In-Reply-To: <20260525211610.2508479-1-kuba@kernel.org>

On Mon, May 25, 2026 at 2:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> ---
> tls: device: add RX KeyUpdate support
>
> This adds RX-side TLS 1.3 KeyUpdate support to the device offload path,
> keeping the old AEAD/IV/rec_seq around to handle in-flight records that
> the NIC already decrypted with the retired key, and deferring tls_dev_add
> of the new key until the old key has fully drained.
>
> > diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> > index c435b3450872..1c58cbd55ffb 100644
> > --- a/net/tls/tls_device.c
> > +++ b/net/tls/tls_device.c
>
> [ ... ]
>
> > @@ -1552,13 +1722,30 @@ int tls_set_device_offload(struct sock *sk,
> >       return rc;
> >  }
> >
> > -int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx)
> > +int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx,
> > +                           struct tls_crypto_info *new_crypto_info)
> >  {
>
> [ ... ]
>
> > +     if (new_crypto_info) {
> > +             struct tls_sw_context_rx *sw_ctx = tls_sw_ctx_rx(ctx);
> > +
> > +             if (!test_bit(TLS_RX_DEV_CLOSED, &ctx->flags)) {
> > +                     set_bit(TLS_RX_DEV_CLOSED, &ctx->flags);
> > +                     synchronize_net();
> > +                     netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
> > +                                                     TLS_OFFLOAD_CTX_DIR_RX);
> > +             }
> > +
> > +             if (context->rekey.old_aead_recv &&
> > +                 before(copied_seq, context->rekey.old_nic_boundary)) {
>
> [ ... ]
>
> > +                     context->dev_add_pending = 1;
> > +             } else {
> > +                     u32 rcv_nxt;
> > +
> > +                     if (context->rekey.old_aead_recv) {
> > +                             crypto_free_aead(context->rekey.old_aead_recv);
> > +                             context->rekey.old_aead_recv = NULL;
> > +                     }
> > +
> > +                     /* flush the backlog so rcv_nxt is accurate */
> > +                     __sk_flush_backlog(sk);
> > +                     rcv_nxt = tcp_sk(sk)->rcv_nxt;
> > +
> > +                     if (before(copied_seq, rcv_nxt)) {
> > +                             context->rekey.old_aead_recv = sw_ctx->aead_recv;
> > +                             sw_ctx->aead_recv = NULL;
> > +                             memcpy(context->rekey.old_iv, ctx->rx.iv,
> > +                                    sizeof(context->rekey.old_iv));
> > +                             memcpy(context->rekey.old_rec_seq, ctx->rx.rec_seq,
> > +                                    sizeof(context->rekey.old_rec_seq));
> > +                             context->rekey.old_nic_boundary = rcv_nxt;
> > +                             context->dev_add_pending = 1;
> > +                     }
> > +             }
> > +     }
> > +
> > +     rc = tls_sw_ctx_init(sk, 0, new_crypto_info);
> >       if (rc)
> >               goto release_ctx;
>
> [High]
> In the rekey branch, sw_ctx->aead_recv has just been moved into
> context->rekey.old_aead_recv (with sw_ctx->aead_recv set to NULL) and the
> NIC offload has already been torn down via tls_dev_del with
> TLS_RX_DEV_CLOSED set.
>
> If tls_sw_ctx_init() then fails (for example crypto_alloc_aead, setkey,
> setauthsize, or an ENOMEM in the strparser init), control jumps to
> release_ctx, and for the new_crypto_info case that label is essentially a
> no-op:
>
>         release_ctx:
>                 if (!new_crypto_info) {
>                         kfree(ctx->priv_ctx_rx);
>                         ctx->priv_ctx_rx = NULL;
>                 }
>
> So sw_ctx->aead_recv stays NULL, the rekey state stays populated,
> TLS_RX_DEV_CLOSED stays set, and tls_sw_ctx_finalize() is never called.
>
> The error returns to do_tls_setsockopt_conf() and, because
> update && rx_conf == TLS_HW, it goes straight to err_crypto_info without
> restoring anything, while rx_conf is still TLS_HW and the socket remains
> open.
>
> Can a subsequent recvmsg on this socket then NULL-deref in tls_decrypt_sg(),
> which uses ctx->aead_recv unconditionally to compute
> sizeof(*aead_req) + crypto_aead_reqsize(ctx->aead_recv)?
>
> Should this path roll back the rekey state (restore sw_ctx->aead_recv from
> rekey.old_aead_recv, clear rekey state, clear dev_add_pending) on
> tls_sw_ctx_init() failure?
>
> [High]
> The error path renaming from tls_sw_free_resources_rx() to
> tls_sw_release_resources_rx() in free_sw_resources, combined with a later
> kfree(ctx->priv_ctx_rx), looks like it drops the strparser anchor cleanup:
>
> > free_sw_resources:
> >       up_read(&device_offload_lock);
> > -     tls_sw_free_resources_rx(sk);
> > +     tls_sw_release_resources_rx(sk);
> >       down_read(&device_offload_lock);
> >  release_ctx:
> > -     ctx->priv_ctx_rx = NULL;
> > +     if (!new_crypto_info) {
> > +             kfree(ctx->priv_ctx_rx);
> > +             ctx->priv_ctx_rx = NULL;
> > +     }
>
> tls_sw_free_resources_rx() previously did three things: release_resources_rx
> (frees aead_recv, stops strparser), __tls_strp_done() ->
> tls_strp_anchor_free() which consume_skb()s the anchor allocated in
> tls_strp_init() during tls_sw_ctx_init(), and finally tls_sw_free_ctx_rx().
>
> The new code only does steps one and three — calling kfree(priv_ctx_rx)
> without first doing the anchor consume_skb().
>
> Since tls_strp_init() runs inside tls_sw_ctx_init() during the
> !new_crypto_info path, and free_sw_resources is reached only after
> tls_device_dev_add_rx() fails (i.e. after init succeeded), is the
> strp->anchor SKB leaked on every initial-setup failure here?
>
> > +     if (!context->dev_add_pending) {
> > +             rc = tls_device_dev_add_rx(sk, ctx, netdev, src_crypto_info,
> > +                                        copied_seq, !!new_crypto_info);
> > +             if (!new_crypto_info) {
> > +                     if (rc)
> > +                             goto free_sw_resources;
> > +                     tls_device_attach(ctx, sk, netdev);
> > +             }
> > +     } else if (!was_dev_add_pending) {
> > +             TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXREKEYINPROGRESS);
> > +     } else {
> > +             TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXREKEYOK);
> > +     }
>
> [Medium, Medium]
> For the rekey path, context->dev_add_pending is set to 1 before
> tls_sw_ctx_init() is called, but the matching
> TLS_INC_STATS(LINUX_MIB_TLSRXREKEYINPROGRESS) only happens later in the
> "else if (!was_dev_add_pending)" branch after init has succeeded.
>
> If tls_sw_ctx_init() fails, the goto release_ctx is a no-op for
> new_crypto_info, leaving dev_add_pending == 1 with no prior
> REKEYINPROGRESS increment.
>
> When the socket closes, tls_device_offload_cleanup_rx() then runs:
>
>         if (rx_ctx && rx_ctx->dev_add_pending) {
>                 rx_ctx->dev_add_pending = 0;
>                 TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXREKEYOK);
>                 TLS_DEC_STATS(sock_net(sk), LINUX_MIB_TLSRXREKEYINPROGRESS);
>         }
>
> Does this decrement TLSRXREKEYINPROGRESS without a prior matching
> increment, wrapping the per-CPU SNMP counter to a very large unsigned
> value, and bump TLSRXREKEYOK for a rekey that never installed a new key
> on the NIC?
>
> Separately, in the same cleanup tail, dev_add_pending == 1 means the
> deferred tls_dev_add never ran, so the new key was not installed on the
> NIC. Should this path really increment LINUX_MIB_TLSRXREKEYOK, given that
> tls_device_dev_add_rx() is the documented place that bumps REKEYOK after
> the NIC accepts the new key?
>
> [Medium]
> In the same hunk, when was_dev_add_pending is 1 (a previous rekey is still
> draining) and a new rekey arrives, the third arm runs:
>
>         } else {
>                 TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXREKEYOK);
>         }
>
> dev_add_pending is still 1 here and the new key has not been installed on
> the NIC yet. The eventual tls_device_deferred_dev_add_rx() will also call
> tls_device_dev_add_rx() with is_rekey=true, which on success increments
> LINUX_MIB_TLSRXREKEYOK again.
>
> Is the same successful rekey getting counted twice in this case, and a
> never-completed nested rekey getting counted as success?

Not double-counted, the totals balance. REKEYINPROGRESS is raised
exactly once per draining-chain (by the first rekey that defers) and
lowered exactly once (by the single deferred tls_device_dev_add_rx,
since nested rekeys hit the draining branch and don't re-arm
old_nic_boundary). Each nested rekey is counted immediately in the
else arm; the deferred add's REKEYOK is credited to the rekey that
opened the
INPROGRESS window. So N accepted rekeys → N REKEYOK, gauge back to 0.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox