Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net v4 0/5] nfc: fix multiple OOB reads in NCI and LLCP parsing paths
From: David Heidelberg @ 2026-06-24 16:11 UTC (permalink / raw)
  To: Lekë Hapçiu, oe-linux-nfc
  Cc: davem, edumazet, kuba, pabeni, krzk, horms, linux-kernel, netdev
In-Reply-To: <20260424180151.3808557-1-snowwlake@icloud.com>

On 24/04/2026 20:01, Lekë Hapçiu wrote:
> This series fixes five out-of-bounds / underflow bugs in the kernel NFC
> stack.  All are reachable from a remote NFC peer that the local stack
> has already associated with; in the LLCP cases the peer only needs to
> send a malformed frame.
> 
>    1/5  nci: u8 underflow in nci_store_general_bytes_nfc_dep() lets the
>         attacker-controlled atr_res_len skip the GT-offset subtraction
>         and cause an OOB read/write against general_bytes[].
>    2/5  llcp: parse_gb_tlv() / parse_connection_tlv() trust the TLV
>         length byte without checking remaining buffer, and the tlv16
>         accessors read past the end when length < 2.
>    3/5  llcp: nfc_llcp_recv_snl() has the same TLV-length trust bug, and
>         its SDRES handler uses an unbounded "%.16s" pr_debug() that
>         walks past service_name_len.
>    4/5  llcp: nfc_llcp_recv_dm() reads skb->data[3] without checking
>         skb->len, giving a 1-byte heap OOB read.
>    5/5  llcp: nfc_llcp_connect_sn() walks the TLV array with no length
>         validation; a crafted CONNECT frame drops it into OOB reads /
>         an unbounded service-name pointer.
> 
> The series applies on top of net/main.
> 
> Lekë Hapçiu (5):
>    nfc: nci: fix u8 underflow in nci_store_general_bytes_nfc_dep
>    nfc: llcp: fix TLV parsing in parse_gb_tlv and parse_connection_tlv
>    nfc: llcp: fix TLV parsing OOB in nfc_llcp_recv_snl
>    nfc: llcp: fix OOB read of DM reason byte in nfc_llcp_recv_dm
>    nfc: llcp: fix TLV parsing OOB in nfc_llcp_connect_sn
> 
>   net/nfc/llcp_commands.c | 24 ++++++++++++++++++++++--
>   net/nfc/llcp_core.c     | 35 ++++++++++++++++++++++++++++++++---
>   net/nfc/nci/ntf.c       |  6 ++++++
>   3 files changed, 60 insertions(+), 5 deletions(-)
> 

Hello Lekë,

could you please rebase this series against NFC for-linus branch [1]?

Likely some checks has been added meanwhile, but I would love to get the 
remaining ones in!

Don't forget also add NFC mailing list oe-linux-nfc@lists.linux.dev .

Thank you
David

^ permalink raw reply

* Re: [PATCH v3 3/7] net: wwan: t9xx: Add control DMA interface
From: Andrew Lunn @ 2026-06-24 16:15 UTC (permalink / raw)
  To: jackbb_wu
  Cc: Loic Poulain, Sergey Ryazanov, Johannes Berg, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Wen-Zhi Huang, Shi-Wei Yeh, Minano Tseng, Matthias Brugger,
	AngeloGioacchino Del Regno, Simon Horman, Jonathan Corbet,
	Shuah Khan, linux-kernel, netdev, linux-arm-kernel,
	linux-mediatek, linux-doc
In-Reply-To: <20260624-t9xx_driver_v1-v3-3-73ff03f60c48@compal.com>

> diff --git a/drivers/net/wwan/t9xx/pcie/mtk_cldma.c b/drivers/net/wwan/t9xx/pcie/mtk_cldma.c

> +static inline void mtk_cldma_clr_bd_dsc(struct cldma_drv_info *drv_info,
> +					struct bd_dsc *bd_dsc_pool, int nr_bds)

No inline functions in C files. Please let the compiler decide.

> +static int mtk_cldma_reload_rx_skb(struct mtk_md_dev *mdev, struct rxq *rxq,
> +				   struct rx_req *req)
> +{
> +	struct sk_buff *tail = NULL;
> +	struct bd_dsc *bd_dsc;
> +	int nr_bds;
> +	int i, ret;
> +
> +	nr_bds = rxq->nr_bds;
> +
> +	for (i = 0; i < nr_bds; i++) {
> +		bd_dsc = req->bd_dsc_pool + i;
> +		bd_dsc->skb = __dev_alloc_skb(req->frag_size, GFP_KERNEL);
> +		if (!bd_dsc->skb) {
> +			dev_warn((mdev)->dev, "Failed to alloc SKB\n");

You might want to rate limit this, and the other similar messages in
the data path, otherwise it could be a DOS.

> +u32 mtk_cldma_stop_queue(struct cldma_drv_info *drv_info, enum mtk_tx_rx dir, u32 qno)
> +{
> +	u32 val = (qno == ALLQ) ? qno : BIT(qno);
> +	struct cldma_hw_regs *hw_regs;
> +	unsigned int active;
> +	int cnt = 0;
> +	int base;
> +	u32 addr;
> +
> +	hw_regs = drv_info->hw_regs;
> +	base = drv_info->base_addr;
> +
> +	if (dir == DIR_TX)
> +		addr = base + hw_regs->reg_cldma_ul_stop_cmd;
> +	else
> +		addr = base + hw_regs->reg_cldma_so_stop_cmd;
> +
> +	mtk_pci_write32(drv_info->mdev, addr, val);
> +
> +	do {
> +		active = drv_info->drv_ops->cldma_queue_status(drv_info, dir, qno);
> +		if (active == LINK_ERROR_VAL || !active)
> +			break;
> +		usleep_range(WAIT_QUEUE_STOP, 2 * WAIT_QUEUE_STOP);
> +	} while (++cnt < 10);

Please use one of the helpers from iopoll.h. Any loops waiting for an
event to happen should use those macros, since the open code
implementation is often wrong.

> +static int mtk_ctrl_trb_srv_init(struct mtk_ctrl_trans *trans)
> +{
> +	struct srv_que *srv_que;
> +	struct trb_srv *srv;
> +	int i, j;
> +	int ret;
> +
> +	for (i = 0; i < trans->trb_srv_num; i++) {
> +		srv = devm_kzalloc(trans->mdev->dev, sizeof(*srv), GFP_KERNEL);
> +		if (!srv) {
> +			ret = -ENOMEM;
> +			goto err_free_srv;
> +		}
> +
> +		srv->trans = trans;
> +		srv->srv_id = i;
> +		trans->trb_srv[i] = srv;
> +
> +		init_waitqueue_head(&srv->trb_waitq);
> +		for (j = 0; j < NR_CLDMA; j++)
> +			INIT_LIST_HEAD(&srv->srv_q_list[j]);
> +	}
> +
> +	for (i = 0; i < NR_CLDMA; i++)
> +		for (j = 0; j < HW_QUE_NUM; j++) {
> +			if (trans->srv_cfg[i][j] < 0 ||
> +			    trans->srv_cfg[i][j] >= trans->trb_srv_num)
> +				trans->srv_cfg[i][j] = 0;
> +			srv_que = devm_kzalloc(trans->mdev->dev, sizeof(*srv_que), GFP_KERNEL);
> +			if (!srv_que) {
> +				ret = -ENOMEM;
> +				goto err_free_srv_que;
> +			}
> +			srv_que->hif_id = i;
> +			srv_que->qno = j;
> +			list_add_tail(&srv_que->list,
> +				      &trans->trb_srv[trans->srv_cfg[i][j]]->srv_q_list[i]);
> +		}
> +
> +	for (i = 0; i < trans->trb_srv_num; i++) {
> +		trans->trb_srv[i]->trb_thread = kthread_run(mtk_ctrl_trb_thread, trans->trb_srv[i],
> +							    "mtk_trb_srv%d_%s", i,
> +							    trans->mdev->dev_str);
> +		if (IS_ERR(trans->trb_srv[i]->trb_thread)) {
> +			ret = PTR_ERR(trans->trb_srv[i]->trb_thread);
> +			trans->trb_srv[i]->trb_thread = NULL;
> +			goto err_stop_kthread;
> +		}
> +	}
> +
> +	return 0;
> +err_stop_kthread:
> +	while (--i >= 0)
> +		kthread_stop(trans->trb_srv[i]->trb_thread);
> +err_free_srv_que:
> +	for (i = 0; i < trans->trb_srv_num; i++) {
> +		for (j = 0; j < NR_CLDMA; j++) {
> +			struct srv_que *next_srv_que;
> +
> +			list_for_each_entry_safe(srv_que, next_srv_que,
> +						 &trans->trb_srv[i]->srv_q_list[j], list) {
> +				list_del(&srv_que->list);
> +				devm_kfree(trans->mdev->dev, srv_que);

It is unusual to see devm_kfree(). Why is it needed?

> +static unsigned int ctrl_port_chl_mtu;

Is this a global variable? Why is it not part of priv?

> +module_param(ctrl_port_chl_mtu, uint, 0644);
> +MODULE_PARM_DESC(ctrl_port_chl_mtu, "This is used to config the ctrl port mtu!\n");

Ah. No modules parameters please. If this is an MTU, why not use the
normal networking interfaces to set the MTU?

	Andrew

^ permalink raw reply

* Re: [PATCH net v2] nfc: llcp: fix OOB read and u8 offset wrap in TLV parsers
From: David Heidelberg @ 2026-06-24 16:16 UTC (permalink / raw)
  To: Muhammad Bilal, oe-linux-nfc
  Cc: davem, edumazet, kuba, pabeni, horms, krzk, linux-kernel, stable,
	netdev
In-Reply-To: <20260622131802.239035-1-meatuni001@gmail.com>


On Mon, 22 Jun 2026 18:18:02 +0500, Muhammad Bilal wrote:
 > nfc: llcp: fix OOB read and u8 offset wrap in TLV parsers

Applied, thanks!

[1/1] nfc: llcp: fix OOB read and u8 offset wrap in TLV parsers
       commit: cb682b66aba4bc3f4dd4d7ae5612f49ad75b2087

Best regards,
-- 
David Heidelberg <david@ixit.cz>

^ permalink raw reply

* Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Ralf Lici @ 2026-06-24 16:18 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: netdev, Daniel Gröber, Antonio Quartulli, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	Beniamino Galvani
In-Reply-To: <87v7b9c9jj.fsf@toke.dk>

On Tue, 23 Jun 2026 21:59:44 +0200, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> Ralf Lici <ralf@mandelbit.com> writes:
> > On the BPF point specifically: I agree a BPF program should be able to
> > decide whether to translate. What I am less sure about is whether
> > redirecting to a netdevice is the best way to expose that. A TC action
> > (yet another model, I know :)) gives you the same thing in-pipeline and
> > more directly:
> >
> >     tc filter add dev wwan0 egress \
> >         bpf obj match.o action ipxlat4to6 domain clat0
> >
> > Let BPF make the policy decision, with the native action doing the
> > translation work that the current BPF CLAT implementations have trouble
> > with: fragmentation, checksum corner cases, and ICMP error inner
> > headers (as explained by Beniamino).
> >
> > So TC clsact looks like the natural in-kernel replacement for today's
> > TC-BPF CLAT programs: no extra netdev, you attach to the existing
> > uplink, direction is explicit, and on egress you sit on the real route
> > dst, so the synthetic-dst and double-routing problems above just don't
> > arise. The cost is more moving parts than a single bpf_redirect since
> > userspace has to manage clsact, filters, priorities and action
> > lifecycle/cleanup.
>
> Hmm, so no one really uses the bpf filter mechanism, since you can just
> do everything from an action anyway (and with TCX attachment, you can
> even avoid the overhead of the TC filter/action infrastructure
> entirely). However, point taken wrt how to integrate this with BPF. I
> guess the most flexible thing would be to expose the functionality
> directly (as a kfunc callable from a BPF program). Which also fits with
> your point below:
>

Ah, I see, the cls_bpf example was dated, and I like the kfunc angle
better than a new TC action.

I would probably keep that as the minimal per-packet interface: BPF can
decide whether a packet should be translated, and the kfunc can do the
actual translation work for packets whose translated form still fits the
output MTU. The full 4->6 fragmentation case still looks like
output-path/harness territory to me, since it is a 1->N fan-out
operation.

> > For a gateway translator, though, I still think a device-bound model is
> > less natural. There the translation point is more like a forwarding
> > decision across routes and nexthops, so a route/LWT attachment, or
> > possibly a netfilter attachment seems easier to reason about. Also, as
> > you already pointed out while discussing LWT, an admin setting up NAT64
> > is more likely to reach for an nft rule than for a clsact filter on a
> > specific device.
> >
> > Taking a step back, ipxlat is really a generic translation engine plus a
> > thin harness around it. So rather than pick one attachment, it might be
> > worth structuring the engine so different harnesses can drive it.
> > There's interesting precedent for this shape:
> >
> > - ILA, again, is the closest sibling: stateless IPv6 address translation
> >   with a shared core in ila_common.c, driven both by an LWT frontend in
> >   ila_lwt.c and by an inline netfilter hook with a netlink-configured
> >   mapping table in ila_xlat.c.
> >
> > - act_ct is the precedent for the TC side specifically: a TC action that
> >   reuses the netfilter conntrack engine rather than reimplementing it.
> >
> > And act_nat is the cautionary counter-example: a standalone TC
> > reimplementation of stateless NAT that shares no code with nf_nat, and
> > carries a "would be nice to share code" comment :)
> >
> > So I am wondering whether the right direction is to factor the
> > translation engine cleanly, land it with one harness first, and keep the
> > other attachment points as follow-up work once the core semantics are
> > settled.
> >
> > Does that direction seem reasonable to you?
>
> Yes, reusable functionality that can be called from multiple places
> sounds like a good fit; let's try to structure it that way!
>

Great, that's the direction I'll take then.

> As for which hook to start with, well, let's see if we hear back from
> the netfilter devs, but either netfilter or the routing subsystem (LWT
> style) would be OK for me I think.
>

Works for me. The engine factoring is common to all of them, so I'll
start there. Once it's in shape I can sketch a harness against it to
sanity-check the interface.

-- 
Ralf Lici
Mandelbit Srl

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net v7 2/4] i40e: skip unnecessary VF reset when setting trust
From: Simon Horman @ 2026-06-24 16:30 UTC (permalink / raw)
  To: jtornosm
  Cc: Simon Horman, netdev, intel-wired-lan, przemyslaw.kitszel,
	aleksandr.loktionov, jacob.e.keller, anthony.l.nguyen, davem,
	edumazet, kuba, pabeni
In-Reply-To: <20260623101800.991293-3-jtornosm@redhat.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
i40e: skip unnecessary VF reset when setting trust

This patch avoids the ~10 second VF reset delay on trust changes by
flipping the I40E_VIRTCHNL_VF_CAP_PRIVILEGE bit directly, and only
resorting to a full reset when revoking trust from a VF that has
advanced features (ADQ, cloud filters, UC/MC promisc) configured.

> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> index a26c3d47ec15..0cc434b26eb8 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> @@ -4987,19 +5004,20 @@ int i40e_ndo_set_vf_trust(struct net_device *netdev, int vf_id, bool setting)
>  	set_bit(__I40E_MACVLAN_SYNC_PENDING, pf->state);
>  	pf->vsi[vf->lan_vsi_idx]->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
>  
> -	i40e_vc_reset_vf(vf, true);
> +	/* Reset only if revoking trust and VF has advanced features configured */
> +	if (!setting &&
> +	    (vf->adq_enabled || vf->num_cloud_filters > 0 ||
> +	     test_bit(I40E_VF_STATE_UC_PROMISC, &vf->vf_states) ||
> +	     test_bit(I40E_VF_STATE_MC_PROMISC, &vf->vf_states))) {

[High]
Does this revoke-trust heuristic miss MAC/VLAN filters that were only
installable while the VF was trusted?

A trusted VF is allowed to add many more MAC addresses than
I40E_VC_MAX_MAC_ADDR_PER_VF (i40e_check_vf_permission() uses
I40E_VC_MAX_MACVLAN_PER_TRUSTED_VF) and more than I40E_VC_MAX_VLAN_PER_VF
VLANs.  These quotas are enforced only at add-time.

The previous always-reset path went through i40e_vc_reset_vf() ->
i40e_cleanup_reset_vf() -> i40e_free_vf_res() -> i40e_vsi_release(),
which destroyed the VSI and its mac_filter_hash, and reset
vf->num_vlan = 0.

The new fast path only flips the PRIVILEGE bit, leaving any
trusted-quota MAC/VLAN filters installed in hardware on an
now-untrusted VF.  Should the heuristic also check
vf->num_vlan > I40E_VC_MAX_VLAN_PER_VF or the size of the lan VSI's
MAC filter hash before taking the no-reset branch?

[Medium]
Is the sampling of vf->adq_enabled, vf->num_cloud_filters and the
UC/MC_PROMISC bits race-free against the admin queue worker?

i40e_ndo_set_vf_trust() runs under rtnl_lock and
__I40E_VIRTCHNL_OP_PENDING, but i40e_vc_process_vf_msg() (called from
i40e_clean_adminq_subtask() in the service workqueue) takes neither.
A virtchnl handler such as i40e_vc_config_promiscuous_mode_msg(),
i40e_vc_add_cloud_filter() or i40e_vc_add_qch_msg() can be running
concurrently and may have already passed its CAP_PRIVILEGE gate but
not yet set vf_states / num_cloud_filters / adq_enabled when this
ndo samples them.

In that window the ndo would observe a "clean" VF, take the no-reset
branch, and clear CAP_PRIVILEGE while the in-flight handler still
goes on to enable promisc / install a cloud filter / set
adq_enabled = true on the now-untrusted VF.

The previous always-reset behaviour masked this because the reset
moved the VF to I40E_VF_STATE_DISABLED, after which
i40e_vc_process_vf_msg() rejects further messages.  Is some additional
serialization (or a recheck after clearing the bit) needed here?

> +		i40e_vc_reset_vf(vf, true);
> +		i40e_del_all_cloud_filters(vf);
> +	} else {
> +		i40e_setup_vf_trust(vf, setting);
> +	}
> +
>  	dev_info(&pf->pdev->dev, "VF %u is now %strusted\n",
>  		 vf_id, setting ? "" : "un");
>  
> -	if (vf->adq_enabled) {
> -		if (!vf->trusted) {
> -			dev_info(&pf->pdev->dev,
> -				 "VF %u no longer Trusted, deleting all cloud filters\n",
> -				 vf_id);
> -			i40e_del_all_cloud_filters(vf);
> -		}
> -	}
> -

[Low]
The dedicated "VF %u no longer Trusted, deleting all cloud filters"
dev_info() is dropped in this hunk, and the commit message does not
mention the logging change.  Operators or log scrapers that key on
this specific string for audit of cloud filter teardown will no
longer see it; only the generic "VF %u is now untrusted" line
remains.  Was removing this message intentional, and if so could
that be called out in the changelog?

>  out:
>  	clear_bit(__I40E_VIRTCHNL_OP_PENDING, pf->state);
>  	return ret;

^ permalink raw reply

* Re: [BUG] KFENCE: use-after-free read in udp_tunnel_nic_device_sync_work
From: Sam Sun @ 2026-06-24 16:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
	linux-kernel, syzkaller
In-Reply-To: <CANn89iKgwNmTZ98MW92VVgC5zPqtQDBFBFspCphKpS-5i6+-7A@mail.gmail.com>

On Wed, Jun 24, 2026 at 11:01 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Jun 24, 2026 at 7:46 AM Sam Sun <samsun1006219@gmail.com> wrote:
> >
>
> > So we are still freeing struct udp_tunnel_nic while its embedded work_struct
> > is active. debugobjects catches this at kfree() before the active work gets a
> > chance to run later and dereference the freed utn.
> >
> > My read is that the conversion from bitfields to atomic bitops removes the
> > plain bitfield data race, but UDP_TUNNEL_NIC_WORK_PENDING is still only one
> > boolean state. It can represent "some work is pending", but it cannot
> > distinguish between:
> >   idle
> >   queued
> >   running
> >   running and queued again
> >
> > In particular, the workqueue core clears WORK_STRUCT_PENDING before invoking
> > the worker. At that point the same work item can be queued again by
> > udp_tunnel_nic_device_sync(). If an already running instance later executes:
> >
> >   clear_bit(UDP_TUNNEL_NIC_WORK_PENDING, &utn->flags);
> >
> > it can still clear the bit that was set for the requeued instance. Then
> > udp_tunnel_nic_unregister() may observe UDP_TUNNEL_NIC_WORK_PENDING clear and
> > free utn, even though debugobjects still sees utn->work as active.
> >
>
> -ETOOMANYBUGS
>
> Ok, we could try to convert pending bit to a refcount.
>
> diff --git a/net/ipv4/udp_tunnel_nic.c b/net/ipv4/udp_tunnel_nic.c
> index 9944ed923ddfd10f9adf6ad788c0740daeaf2adb..2e14686f35896cb0caba3f8f587ef8b369090fbf
> 100644
> --- a/net/ipv4/udp_tunnel_nic.c
> +++ b/net/ipv4/udp_tunnel_nic.c
> @@ -3,6 +3,7 @@
>
>  #include <linux/ethtool_netlink.h>
>  #include <linux/netdevice.h>
> +#include <linux/refcount.h>
>  #include <linux/slab.h>
>  #include <linux/types.h>
>  #include <linux/workqueue.h>
> @@ -30,9 +31,8 @@ struct udp_tunnel_nic_table_entry {
>   * @work:      async work for talking to hardware from process context
>   * @dev:       netdev pointer
>   * @lock:      protects all fields
> - * @need_sync: at least one port start changed
> - * @need_replay: space was freed, we need a replay of all ports
> - * @work_pending: @work is currently scheduled
> + * @flags:     sync, replay flags
> + * @refcnt:    reference count
>   * @n_tables:  number of tables under @entries
>   * @missed:    bitmap of tables which overflown
>   * @entries:   table of tables of ports currently offloaded
> @@ -44,9 +44,11 @@ struct udp_tunnel_nic {
>
>         struct mutex lock;
>
> -       u8 need_sync:1;
> -       u8 need_replay:1;
> -       u8 work_pending:1;
> +       unsigned long flags;
> +#define UDP_TUNNEL_NIC_NEED_SYNC       0
> +#define UDP_TUNNEL_NIC_NEED_REPLAY     1
> +
> +       refcount_t refcnt;
>
>         unsigned int n_tables;
>         unsigned long missed;
> @@ -116,7 +118,7 @@ udp_tunnel_nic_entry_queue(struct udp_tunnel_nic *utn,
>                            unsigned int flag)
>  {
>         entry->flags |= flag;
> -       utn->need_sync = 1;
> +       set_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
>  }
>
>  static void
> @@ -283,7 +285,7 @@ udp_tunnel_nic_device_sync_by_table(struct net_device *dev,
>  static void
>  __udp_tunnel_nic_device_sync(struct net_device *dev, struct
> udp_tunnel_nic *utn)
>  {
> -       if (!utn->need_sync)
> +       if (!test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags))
>                 return;
>
>         if (dev->udp_tunnel_nic_info->sync_table)
> @@ -291,21 +293,24 @@ __udp_tunnel_nic_device_sync(struct net_device
> *dev, struct udp_tunnel_nic *utn)
>         else
>                 udp_tunnel_nic_device_sync_by_port(dev, utn);
>
> -       utn->need_sync = 0;
> +       clear_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
>         /* Can't replay directly here, in case we come from the tunnel driver's
>          * notification - trying to replay may deadlock inside tunnel driver.
>          */
> -       utn->need_replay = udp_tunnel_nic_should_replay(dev, utn);
> +       if (udp_tunnel_nic_should_replay(dev, utn))
> +               set_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
> +       else
> +               clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
>  }
>
>  static void
>  udp_tunnel_nic_device_sync(struct net_device *dev, struct udp_tunnel_nic *utn)
>  {
> -       if (!utn->need_sync)
> +       if (!test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags))
>                 return;
>
> -       queue_work(udp_tunnel_nic_workqueue, &utn->work);
> -       utn->work_pending = 1;
> +       if (queue_work(udp_tunnel_nic_workqueue, &utn->work))
> +               refcount_inc(&utn->refcnt);
>  }
>
>  static bool
> @@ -348,7 +353,7 @@ udp_tunnel_nic_has_collision(struct net_device
> *dev, struct udp_tunnel_nic *utn,
>                         if (!udp_tunnel_nic_entry_is_free(entry) &&
>                             entry->port == ti->port &&
>                             entry->type != ti->type) {
> -                               __set_bit(i, &utn->missed);
> +                               set_bit(i, &utn->missed);
>                                 return true;
>                         }
>                 }
> @@ -483,7 +488,7 @@ udp_tunnel_nic_add_new(struct net_device *dev,
> struct udp_tunnel_nic *utn,
>                  * are no devices currently which have multiple tables accepting
>                  * the same tunnel type, and false positives are okay.
>                  */
> -               __set_bit(i, &utn->missed);
> +               set_bit(i, &utn->missed);
>         }
>
>         return false;
> @@ -552,7 +557,7 @@ static void __udp_tunnel_nic_reset_ntf(struct
> net_device *dev)
>
>         mutex_lock(&utn->lock);
>
> -       utn->need_sync = false;
> +       clear_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags);
>         for (i = 0; i < utn->n_tables; i++)
>                 for (j = 0; j < info->tables[i].n_entries; j++) {
>                         struct udp_tunnel_nic_table_entry *entry;
> @@ -696,8 +701,8 @@ udp_tunnel_nic_flush(struct net_device *dev,
> struct udp_tunnel_nic *utn)
>         for (i = 0; i < utn->n_tables; i++)
>                 memset(utn->entries[i], 0, array_size(info->tables[i].n_entries,
>                                                       sizeof(**utn->entries)));
> -       WARN_ON(utn->need_sync);
> -       utn->need_replay = 0;
> +       WARN_ON(test_bit(UDP_TUNNEL_NIC_NEED_SYNC, &utn->flags));
> +       clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
>  }
>
>  static void
> @@ -713,8 +718,8 @@ udp_tunnel_nic_replay(struct net_device *dev,
> struct udp_tunnel_nic *utn)
>         for (i = 0; i < utn->n_tables; i++)
>                 for (j = 0; j < info->tables[i].n_entries; j++)
>                         udp_tunnel_nic_entry_freeze_used(&utn->entries[i][j]);
> -       utn->missed = 0;
> -       utn->need_replay = 0;
> +       bitmap_zero(&utn->missed, UDP_TUNNEL_NIC_MAX_TABLES);
> +       clear_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags);
>
>         if (!info->shared) {
>                 udp_tunnel_get_rx_info(dev);
> @@ -728,6 +733,25 @@ udp_tunnel_nic_replay(struct net_device *dev,
> struct udp_tunnel_nic *utn)
>                         udp_tunnel_nic_entry_unfreeze(&utn->entries[i][j]);
>  }
>
> +static void udp_tunnel_nic_free(struct udp_tunnel_nic *utn)
> +{
> +       unsigned int i;
> +
> +       for (i = 0; i < utn->n_tables; i++)
> +               kfree(utn->entries[i]);
> +
> +       if (utn->dev)
> +               dev_put(utn->dev);
> +
> +       kfree(utn);
> +}
> +
> +static void udp_tunnel_nic_put(struct udp_tunnel_nic *utn)
> +{
> +       if (refcount_dec_and_test(&utn->refcnt))
> +               udp_tunnel_nic_free(utn);
> +}
> +
>  static void udp_tunnel_nic_device_sync_work(struct work_struct *work)
>  {
>         struct udp_tunnel_nic *utn =
> @@ -736,14 +760,15 @@ static void
> udp_tunnel_nic_device_sync_work(struct work_struct *work)
>         rtnl_lock();
>         mutex_lock(&utn->lock);
>
> -       utn->work_pending = 0;
>         __udp_tunnel_nic_device_sync(utn->dev, utn);
>
> -       if (utn->need_replay)
> +       if (test_bit(UDP_TUNNEL_NIC_NEED_REPLAY, &utn->flags))
>                 udp_tunnel_nic_replay(utn->dev, utn);
>
>         mutex_unlock(&utn->lock);
>         rtnl_unlock();
> +
> +       udp_tunnel_nic_put(utn);
>  }
>
>  static struct udp_tunnel_nic *
> @@ -759,6 +784,7 @@ udp_tunnel_nic_alloc(const struct udp_tunnel_nic_info *info,
>         utn->n_tables = n_tables;
>         INIT_WORK(&utn->work, udp_tunnel_nic_device_sync_work);
>         mutex_init(&utn->lock);
> +       refcount_set(&utn->refcnt, 1);
>
>         for (i = 0; i < n_tables; i++) {
>                 utn->entries[i] = kzalloc_objs(*utn->entries[i],
> @@ -776,15 +802,6 @@ udp_tunnel_nic_alloc(const struct
> udp_tunnel_nic_info *info,
>         return NULL;
>  }
>
> -static void udp_tunnel_nic_free(struct udp_tunnel_nic *utn)
> -{
> -       unsigned int i;
> -
> -       for (i = 0; i < utn->n_tables; i++)
> -               kfree(utn->entries[i]);
> -       kfree(utn);
> -}
> -
>  static int udp_tunnel_nic_register(struct net_device *dev)
>  {
>         const struct udp_tunnel_nic_info *info = dev->udp_tunnel_nic_info;
> @@ -863,6 +880,7 @@ static void
>  udp_tunnel_nic_unregister(struct net_device *dev, struct udp_tunnel_nic *utn)
>  {
>         const struct udp_tunnel_nic_info *info = dev->udp_tunnel_nic_info;
> +       bool last = true;
>
>         udp_tunnel_nic_lock(dev);
>
> @@ -889,6 +907,7 @@ udp_tunnel_nic_unregister(struct net_device *dev,
> struct udp_tunnel_nic *utn)
>                         udp_tunnel_drop_rx_info(dev);
>                         utn->dev = first->dev;
>                         udp_tunnel_nic_unlock(dev);
> +                       last = false;
>                         goto release_dev;
>                 }
>
> @@ -901,16 +920,11 @@ udp_tunnel_nic_unregister(struct net_device
> *dev, struct udp_tunnel_nic *utn)
>         udp_tunnel_nic_flush(dev, utn);
>         udp_tunnel_nic_unlock(dev);
>
> -       /* Wait for the work to be done using the state, netdev core will
> -        * retry unregister until we give up our reference on this device.
> -        */
> -       if (utn->work_pending)
> -               return;
> -
> -       udp_tunnel_nic_free(utn);
> +       udp_tunnel_nic_put(utn);
>  release_dev:
>         dev->udp_tunnel_nic = NULL;
> -       dev_put(dev);
> +       if (!last)
> +               dev_put(dev);
>  }
>
>  static int

I tested the refcount version, and I could no longer reproduce the bug with
the C reproducer. I think this patch fixes the specific lifetime problem
exposed by the reproducer.

Thanks,
Yue

^ permalink raw reply

* Re: [PATCH net] dt-bindings: net: renesas,ether: Drop example "ethernet-phy-ieee802.3-c22" fallback
From: Conor Dooley @ 2026-06-24 16:35 UTC (permalink / raw)
  To: Rob Herring (Arm)
  Cc: Niklas Söderlund, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Krzysztof Kozlowski, Conor Dooley,
	Geert Uytterhoeven, Magnus Damm, Sergei Shtylyov, netdev,
	linux-renesas-soc, devicetree, linux-kernel
In-Reply-To: <20260624150250.131966-2-robh@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 52 bytes --]

Acked-by: Conor Dooley <conor.dooley@microchip.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH 3/7] dt-bindings: net: rockchip-dwmac: Allow 9 clocks
From: Conor Dooley @ 2026-06-24 16:37 UTC (permalink / raw)
  To: Yanan He
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Wu, Maxime Coquelin, Alexandre Torgue,
	devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	netdev, linux-stm32
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-3-5aef608a3f64@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 64 bytes --]

Per Heiko's comments,
pw-bot: changes-requested

Cheers,
Conor.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim
From: Maciej Fijalkowski @ 2026-06-24 16:37 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, bpf, magnus.karlsson, stfomichev, kuba, pabeni, horms,
	kerneljasonxing, bjorn
In-Reply-To: <ajv23_r4M3FGrFNN@devvm7509.cco0.facebook.com>

On Wed, Jun 24, 2026 at 08:38:20AM -0700, Stanislav Fomichev wrote:
> On 06/23, Maciej Fijalkowski wrote:
> > Hi,
> > 
> > This series fixes several AF_XDP multi-buffer Tx paths where descriptors
> > consumed from the Tx ring are not consistently returned to userspace
> > through the completion ring when the packet is later dropped as invalid.
> > 
> > The affected cases are invalid or oversized multi-buffer Tx packets in
> > both the generic and zero-copy paths. In these cases, the kernel can
> > consume one or more Tx descriptors while building or validating a
> > multi-buffer packet, then drop the packet before it reaches the device.
> > Userspace still owns the UMEM buffers only after the corresponding
> > addresses are returned through the CQ. Missing completions therefore
> > make userspace lose track of those buffers.
> > 
> > The generic path fixes cover three related cases:
> > * partially built multi-buffer skbs dropped by xsk_drop_skb();
> >   continuation descriptors left in the Tx ring after xsk_build_skb()
> >   reports overflow;
> > * invalid descriptors encountered in the middle of a multi-buffer
> >   packet, including the offending invalid descriptor itself.
> > 
> > The zero-copy path is handled separately. The batched Tx parser now
> > distinguishes descriptors that can be passed to the driver from
> > descriptors that are consumed only because they belong to an invalid
> > multi-buffer packet. Reclaim-only descriptors are written to the CQ
> > address area and published in completion order, after any earlier
> > driver-visible Tx descriptors.
> > 
> > The ZC batching path can also retain drain state when userspace has not
> > yet provided the end of an invalid multi-buffer packet. To keep this
> > state local to the singular batched path, the series prevents a second
> > Tx socket from joining the same pool while such drain state exists.
> > During the singular-to-shared transition, Tx batching is gated,
> > pre-existing readers are waited out, and bind fails with -EAGAIN if the
> > existing socket still has pending drain state. This avoids adding
> > multi-buffer drain handling to the shared-UMEM fallback path.
> > 
> > The last two patches update xskxceiver so the tests account invalid
> > multi-buffer Tx packets as descriptors that must be reclaimed, while
> > still not expecting those invalid packets on the Rx side.
> > 
> > This is a follow-up to Jason's changes [0] which were addressing generic
> > xmit only and this set allows me to pass full xskxceiver test suite run
> > against ice driver.
> 
> There is a fair amount of feedback from sashiko already :-( So the meta
> question from me is: is it time to scrap our current approach where
> we parse descriptor by descriptor? (and maintain half-baked skb and
> half-consumed descriptor queues)
> 
> Should we:
> 
> 1. do desc[MAX_SKB_FRAGS] and xskq_cons_peek_desc until we exhaust
> PKT_CONT (if the last packet has PKT_CONT, return EOVERFLOW to userspace
> and do a full stop here)
> 2. now that we really know the number of valid descriptors -> reserve
> the cq space (if not -> EAGAIN)
> 3. pre-allocate everything here (if at any point we have ENOMEM -> cleanup
> locally, don't ever create semi-initialized skb)
> 4. construct the skb
> 5. xmit

Yeah generic xmit became utterly horrible, haven't gone through sashiko
reviews yet, but bare in mind this set also aligns zc side to what was
previously being addressed by Jason.

I believe planned logistics were to get these fixes onto net and then
Jason had an implementation of batching on generic xmit, directed towards
-next and that's where we could address current flow.

> 
> If at any point there is an issue, the cleanup is straightforward. That
> whole xk->skb goes away, no state between syscalls. Thoughts?

^ permalink raw reply

* Re: [PATCH 2/7] dt-bindings: arm: rockchip: Add Alientek DLRV1126
From: Conor Dooley @ 2026-06-24 16:37 UTC (permalink / raw)
  To: Yanan He
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Wu, Maxime Coquelin, Alexandre Torgue,
	devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	netdev, linux-stm32
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-2-5aef608a3f64@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 75 bytes --]

Acked-by: Conor Dooley <conor.dooley@microchip.com>
pw-bot: not-applicable

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH 1/7] dt-bindings: vendor-prefixes: add alientek
From: Conor Dooley @ 2026-06-24 16:38 UTC (permalink / raw)
  To: Yanan He
  Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Wu, Maxime Coquelin, Alexandre Torgue,
	devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	netdev, linux-stm32
In-Reply-To: <20260624-rv1126-alientek-dlrv1126-v1-1-5aef608a3f64@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 75 bytes --]

Acked-by: Conor Dooley <conor.dooley@microchip.com>
pw-bot: not-applicable

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* [PATCH net v3 00/11] rxrpc: Miscellaneous fixes
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel

Here are some miscellaneous AF_RXRPC fixes for more stuff found by Sashiko[1][2]:

 (1) Fix ACKALL handling by adding two more call states to simplify when
     ACKs are valid.

 (2) Fix connection leak from AF_RXRPC recvmsg userspace OOB handling.

 (3) Fix double unlock in AF_RXRPC recvmsg userspace OOB handling.

 (4) Fix AFS preallocate charge to flush the waitqueue after unlistening
     the socket so that any charging thread that does manage to get started
     will be waited for before socket destruction.

 (5) Fix AFS OOB notify handling to cancel in-progress OOB notification
     handling and then to flush the workqueue it's on.

 (6) Fix handling of apparent reply reception before initial transmission
     starts in client call.

 (7) Fix OOB challenge leak in cleanup on notification failure.

 (8) Fix infinite loop in recvmsg if OOB packet available, but no calls.

 (9) Fix notify vs recvmsg race where notify thinks the call is already
     queued.

(10) Fix MSG_PEEK call leak for calls with no content.

(11) Fix rxrpc_rotate_tx_window() to check that there's something in the Tx
     buffer before attempting to rotate it.

Note that the latest Sashiko review[3] has some more pre-existing issues,
but I'd like to get this set of fixes committed before tackling those.

David

The patches can be found here also:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes

[1] https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
[2] https://sashiko.dev/#/patchset/20260616155749.2125907-1-dhowells%40redhat.com
[3] https://sashiko.dev/#/patchset/20260618134802.2477777-1-dhowells%40redhat.com

Changes
=======
ver #3)
- Rewrote ACKALL fix to add two new call states to make it easier to track
  when ACKs are valid.
- Addressed the Sashiko review[3]:
  - Use READ_ONCE() to access net->live.
  - Added a patch to check the Tx buffer is occupied before rotating it.

ver #2)
- Addressed the Sashiko review[2] of ver #1:
  - Added patches to fix more bugs that it found.
  - Adjusted AFS preallocate charge cleanup to only cancel the preallocate
    work item after unlistening rather than flushing the entire waitqueue
    (which may be waiting on DNS lookup).
  - Added extra cancel for OOB rx handler in AFS.

David Howells (10):
  rxrpc: Fix leak of connection from OOB challenge
  rxrpc: Fix double unlock in rxrpc_recvmsg()
  afs: Fix further netns teardown to cancel the preallocation charger
  afs: Fix uncancelled rxrpc OOB message handler
  rxrpc: Fix the reception of a reply packet before data transmission
  rxrpc: Fix oob challenge leak in cleanup after notification failure
  rxrpc: Fix potential infinite loop in rxrpc_recvmsg()
  rxrpc: Fix socket notification race
  rxrpc: Fix leak of released call in recvmsg(MSG_PEEK)
  rxrpc: Fix rxrpc_rotate_tx_rotate() to check there's something to
    rotate

Wyatt Feng (1):
  rxrpc: Fix ACKALL packet handling

 fs/afs/cm_security.c    |  3 ++-
 fs/afs/rxrpc.c          | 10 +++++++++-
 net/rxrpc/ar-internal.h |  6 ++++--
 net/rxrpc/call_event.c  |  5 ++++-
 net/rxrpc/call_object.c |  2 ++
 net/rxrpc/conn_client.c |  2 +-
 net/rxrpc/conn_event.c  |  9 +++++++--
 net/rxrpc/input.c       | 39 ++++++++++++++++++++++++++++++++-------
 net/rxrpc/oob.c         | 12 ++++++++++--
 net/rxrpc/recvmsg.c     | 10 ++++------
 net/rxrpc/sendmsg.c     |  3 ++-
 11 files changed, 77 insertions(+), 24 deletions(-)


^ permalink raw reply

* [PATCH net v3 01/11] rxrpc: Fix ACKALL packet handling
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Wyatt Feng, Yuan Tan, Yifan Wu, Juefei Pu, Zhengchuan Liang,
	Xin Liu, Ren Wei, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

From: Wyatt Feng <bronzed_45_vested@icloud.com>

rxrpc_input_ackall() accepts ACKALL packets without checking whether the
call is in a state that can legitimately have outstanding transmit buffers.
A forged ACKALL can therefore reach a new service call in
RXRPC_CALL_SERVER_RECV_REQUEST before any reply packets have been queued.

In that state call->tx_top is zero and call->tx_queue is NULL, so
rxrpc_rotate_tx_window() dereferences a NULL txqueue and triggers a
null-pointer dereference.

Fix the handling of ACKALL packets by the following means:

 (1) Add two new call states: RXRPC_CALL_CLIENT_PRE_SEND which indicates
     that the client call is connected, but nothing has been transmitted as
     yet; and RXRPC_CALL_CLIENT_AWAIT_ACK, which indicates that everything
     has been transmitted at least once, but we're now waiting for the
     stuff remaining in the Tx buffer to be ACK'd (retransmissions may
     still happen).

     The RXRPC_CALL_CLIENT_PRE_SEND state is set when the call is assigned
     a channel and transitions to RXRPC_CALL_CLIENT_SEND_REQUEST when the
     first packet is transmitted.

     RXRPC_CALL_CLIENT_AWAIT_REPLY is then narrowed in scope to indicate
     that all Tx packets have been ACK'd and we're now waiting for the
     reply to be received.

 (2) As per Wyatt Feng's original patch[1], the ACKALL handler then checks
     that the call state is one in which there might be stuff in the Tx
     buffer to ACK, but now this includes AWAIT_ACK rather than
     AWAIT_REPLY.  ACKALL packets are ignored if received in the wrong
     state.

     Note that unlike Wyatt Feng's patch, it's no longer necessary to check
     to see if the Tx buffer exists as this the state set now covers this.

 (3) Make the ACKALL handler use call->tx_transmitted rather than
     call->tx_top as the former is explicitly the highest packet seq number
     transmitted, whereas the latter has a looser definition.

Thanks to Jeffrey Altman for a description of the history of the ACKALL
packet[1].

Fixes: b341a0263b1b ("rxrpc: Implement progressive transmission queue struct")
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Yifan Wu <yifanwucs@gmail.com>
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Ren Wei <n05ec@lzu.edu.cn>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20260616155749.2125907-2-dhowells@redhat.com/ [1]
Link: https://lore.kernel.org/r/c0fd4fec-1576-4070-b31e-a37d5506f5ed@auristor.com/ [2]
---
 net/rxrpc/ar-internal.h |  2 ++
 net/rxrpc/call_event.c  |  5 ++++-
 net/rxrpc/call_object.c |  2 ++
 net/rxrpc/conn_client.c |  2 +-
 net/rxrpc/input.c       | 23 +++++++++++++++++++----
 net/rxrpc/sendmsg.c     |  3 ++-
 6 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 98f2165159d7..b6ccd8a8199b 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -650,7 +650,9 @@ enum rxrpc_call_event {
 enum rxrpc_call_state {
 	RXRPC_CALL_UNINITIALISED,
 	RXRPC_CALL_CLIENT_AWAIT_CONN,	/* - client waiting for connection to become available */
+	RXRPC_CALL_CLIENT_PRE_SEND,	/* - client is connected, but hasn't sent anything yet */
 	RXRPC_CALL_CLIENT_SEND_REQUEST,	/* - client sending request phase */
+	RXRPC_CALL_CLIENT_AWAIT_ACK,	/* - client awaiting ACKs of request */
 	RXRPC_CALL_CLIENT_AWAIT_REPLY,	/* - client awaiting reply */
 	RXRPC_CALL_CLIENT_RECV_REPLY,	/* - client receiving reply phase */
 	RXRPC_CALL_SERVER_PREALLOC,	/* - service preallocation */
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index fec59d9338b9..21be9c86d7a7 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -178,7 +178,7 @@ static void rxrpc_close_tx_phase(struct rxrpc_call *call)
 
 	switch (__rxrpc_call_state(call)) {
 	case RXRPC_CALL_CLIENT_SEND_REQUEST:
-		rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_AWAIT_REPLY);
+		rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_AWAIT_ACK);
 		break;
 	case RXRPC_CALL_SERVER_SEND_REPLY:
 		rxrpc_set_call_state(call, RXRPC_CALL_SERVER_AWAIT_ACK);
@@ -244,6 +244,8 @@ static void rxrpc_transmit_fresh_data(struct rxrpc_call *call, unsigned int limi
 				break;
 		} while (req.n < limit && before(seq, send_top));
 
+		if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_PRE_SEND)
+			rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_SEND_REQUEST);
 		if (txb->flags & RXRPC_LAST_PACKET) {
 			rxrpc_close_tx_phase(call);
 			tq = NULL;
@@ -267,6 +269,7 @@ void rxrpc_transmit_some_data(struct rxrpc_call *call, unsigned int limit,
 		fallthrough;
 
 	case RXRPC_CALL_SERVER_SEND_REPLY:
+	case RXRPC_CALL_CLIENT_PRE_SEND:
 	case RXRPC_CALL_CLIENT_SEND_REQUEST:
 		if (!rxrpc_tx_window_space(call))
 			return;
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index fcb9d38bb521..817ed9acb91e 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -18,7 +18,9 @@
 const char *const rxrpc_call_states[NR__RXRPC_CALL_STATES] = {
 	[RXRPC_CALL_UNINITIALISED]		= "Uninit  ",
 	[RXRPC_CALL_CLIENT_AWAIT_CONN]		= "ClWtConn",
+	[RXRPC_CALL_CLIENT_PRE_SEND]		= "ClPreSnd",
 	[RXRPC_CALL_CLIENT_SEND_REQUEST]	= "ClSndReq",
+	[RXRPC_CALL_CLIENT_AWAIT_ACK]		= "ClAwtAck",
 	[RXRPC_CALL_CLIENT_AWAIT_REPLY]		= "ClAwtRpl",
 	[RXRPC_CALL_CLIENT_RECV_REPLY]		= "ClRcvRpl",
 	[RXRPC_CALL_SERVER_PREALLOC]		= "SvPrealc",
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 9b757798dedd..48519f0de185 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -449,7 +449,7 @@ static void rxrpc_activate_one_channel(struct rxrpc_connection *conn,
 	trace_rxrpc_connect_call(call);
 	call->tx_last_sent = ktime_get_real();
 	rxrpc_start_call_timer(call);
-	rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_SEND_REQUEST);
+	rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_PRE_SEND);
 	wake_up(&call->waitq);
 }
 
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index ce761466b02d..2eedab1b0919 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -181,7 +181,8 @@ void rxrpc_congestion_degrade(struct rxrpc_call *call)
 	if (call->cong_ca_state != RXRPC_CA_SLOW_START &&
 	    call->cong_ca_state != RXRPC_CA_CONGEST_AVOIDANCE)
 		return;
-	if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_REPLY)
+	if (__rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_ACK ||
+	    __rxrpc_call_state(call) == RXRPC_CALL_CLIENT_AWAIT_REPLY)
 		return;
 
 	rtt = ns_to_ktime(call->srtt_us * (NSEC_PER_USEC / 8));
@@ -356,6 +357,7 @@ static void rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun,
 
 	switch (__rxrpc_call_state(call)) {
 	case RXRPC_CALL_CLIENT_SEND_REQUEST:
+	case RXRPC_CALL_CLIENT_AWAIT_ACK:
 	case RXRPC_CALL_CLIENT_AWAIT_REPLY:
 		if (reply_begun) {
 			rxrpc_set_call_state(call, RXRPC_CALL_CLIENT_RECV_REPLY);
@@ -694,6 +696,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
 
 	switch (__rxrpc_call_state(call)) {
 	case RXRPC_CALL_CLIENT_SEND_REQUEST:
+	case RXRPC_CALL_CLIENT_AWAIT_ACK:
 	case RXRPC_CALL_CLIENT_AWAIT_REPLY:
 		/* Received data implicitly ACKs all of the request
 		 * packets we sent when we're acting as a client.
@@ -1154,10 +1157,12 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
 	if (hard_ack + 1 == 0)
 		return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_zero);
 
-	/* Ignore ACKs unless we are or have just been transmitting. */
+	/* Ignore ACKs unless we are transmitting or are waiting for
+	 * acknowledgement of the packets we've just been transmitting.
+	 */
 	switch (__rxrpc_call_state(call)) {
 	case RXRPC_CALL_CLIENT_SEND_REQUEST:
-	case RXRPC_CALL_CLIENT_AWAIT_REPLY:
+	case RXRPC_CALL_CLIENT_AWAIT_ACK:
 	case RXRPC_CALL_SERVER_SEND_REPLY:
 	case RXRPC_CALL_SERVER_AWAIT_ACK:
 		break;
@@ -1215,7 +1220,17 @@ static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb)
 {
 	struct rxrpc_ack_summary summary = { 0 };
 
-	if (rxrpc_rotate_tx_window(call, call->tx_top, &summary))
+	switch (__rxrpc_call_state(call)) {
+	case RXRPC_CALL_CLIENT_SEND_REQUEST:
+	case RXRPC_CALL_CLIENT_AWAIT_ACK:
+	case RXRPC_CALL_SERVER_SEND_REPLY:
+	case RXRPC_CALL_SERVER_AWAIT_ACK:
+		break;
+	default:
+		return;
+	}
+
+	if (rxrpc_rotate_tx_window(call, call->tx_transmitted, &summary))
 		rxrpc_end_tx_phase(call, false, rxrpc_eproto_unexpected_ackall);
 }
 
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index c35de4fd75e3..ed2c9a51005a 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -366,7 +366,8 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
 	if (state >= RXRPC_CALL_COMPLETE)
 		goto maybe_error;
 	ret = -EPROTO;
-	if (state != RXRPC_CALL_CLIENT_SEND_REQUEST &&
+	if (state != RXRPC_CALL_CLIENT_PRE_SEND &&
+	    state != RXRPC_CALL_CLIENT_SEND_REQUEST &&
 	    state != RXRPC_CALL_SERVER_ACK_REQUEST &&
 	    state != RXRPC_CALL_SERVER_SEND_REPLY) {
 		/* Request phase complete for this client call */


^ permalink raw reply related

* [PATCH net v3 02/11] rxrpc: Fix leak of connection from OOB challenge
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

Fix leak of connection object from OOB challenge queue when response is
provided by userspace.

Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
 net/rxrpc/oob.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/rxrpc/oob.c b/net/rxrpc/oob.c
index 05ca9c1faa57..3318c8bd82ad 100644
--- a/net/rxrpc/oob.c
+++ b/net/rxrpc/oob.c
@@ -210,6 +210,11 @@ static int rxrpc_respond_to_oob(struct rxrpc_sock *rx,
 		break;
 	}
 
+	switch (skb->mark) {
+	case RXRPC_OOB_CHALLENGE:
+		rxrpc_put_connection(sp->chall.conn, rxrpc_conn_put_oob);
+		break;
+	}
 	rxrpc_free_skb(skb, rxrpc_skb_put_oob);
 	return ret;
 }


^ permalink raw reply related

* [PATCH net v3 03/11] rxrpc: Fix double unlock in rxrpc_recvmsg()
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

Fix a double unlock in rxrpc_recvmsg() when dealing with OOB messages.

Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
 net/rxrpc/recvmsg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 82614cbdb60f..39a03684432d 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -471,7 +471,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 		release_sock(&rx->sk);
 		if (ret == -EAGAIN)
 			goto try_again;
-		goto error_no_call;
+		goto error_trace;
 	}
 
 	/* Find the next call and dequeue it if we're not just peeking.  If we


^ permalink raw reply related

* [PATCH net v3 04/11] afs: Fix further netns teardown to cancel the preallocation charger
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Li Daming, Ren Wei, Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

When an afs network namespace is torn down, it cancels and waits for the
work item that keeps the preallocated rxrpc call/conn/peer queue charged
before disabling incoming (i.e. listen 0), but there's a small window in
which it can be requeued by an incoming call wending through the I/O
thread.

Fix this by cancelling the charger work item again after reducing the
listen backlog to zero.

Fixes: 47694fbc9d24 ("afs: Fix netns teardown to cancel the preallocation charger")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
cc: Li Daming <d4n.for.sec@gmail.com>
cc: Ren Wei <n05ec@lzu.edu.cn>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
 fs/afs/rxrpc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index d5cfd24e815b..6714a189d58f 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -128,8 +128,13 @@ void afs_close_socket(struct afs_net *net)
 	_enter("");
 
 	cancel_work_sync(&net->charge_preallocation_work);
+	/* Future work items should now see ->live is false. */
+
 	kernel_listen(net->socket, 0);
+
+	/* Make sure work items are no longer running. */
 	flush_workqueue(afs_async_calls);
+	cancel_work_sync(&net->charge_preallocation_work);
 
 	if (net->spare_incoming_call) {
 		afs_put_call(net->spare_incoming_call);


^ permalink raw reply related

* [PATCH net v3 05/11] afs: Fix uncancelled rxrpc OOB message handler
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Li Daming, Ren Wei, Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

Fix AFS to cancel its OOB message processing (typically to respond to
security challenges).  Also move OOB message processing to afs_wq so that
it's also waited for and make the OOB handler just return if the net
namespace is no longer live.

Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260609140911.838677-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Li Daming <d4n.for.sec@gmail.com>
cc: Ren Wei <n05ec@lzu.edu.cn>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
 fs/afs/cm_security.c | 3 ++-
 fs/afs/rxrpc.c       | 5 ++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/afs/cm_security.c b/fs/afs/cm_security.c
index edcbd249d202..103168c70dd4 100644
--- a/fs/afs/cm_security.c
+++ b/fs/afs/cm_security.c
@@ -101,7 +101,8 @@ void afs_process_oob_queue(struct work_struct *work)
 	struct sk_buff *oob;
 	enum rxrpc_oob_type type;
 
-	while ((oob = rxrpc_kernel_dequeue_oob(net->socket, &type))) {
+	while (READ_ONCE(net->live) &&
+	       (oob = rxrpc_kernel_dequeue_oob(net->socket, &type))) {
 		switch (type) {
 		case RXRPC_OOB_CHALLENGE:
 			afs_respond_to_challenge(oob);
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 6714a189d58f..d82916657a3d 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -128,6 +128,7 @@ void afs_close_socket(struct afs_net *net)
 	_enter("");
 
 	cancel_work_sync(&net->charge_preallocation_work);
+	cancel_work_sync(&net->rx_oob_work);
 	/* Future work items should now see ->live is false. */
 
 	kernel_listen(net->socket, 0);
@@ -148,6 +149,7 @@ void afs_close_socket(struct afs_net *net)
 
 	kernel_sock_shutdown(net->socket, SHUT_RDWR);
 	flush_workqueue(afs_async_calls);
+	cancel_work_sync(&net->rx_oob_work);
 	net->socket->sk->sk_user_data = NULL;
 	sock_release(net->socket);
 	key_put(net->fs_cm_token_key);
@@ -989,5 +991,6 @@ static void afs_rx_notify_oob(struct sock *sk, struct sk_buff *oob)
 {
 	struct afs_net *net = sk->sk_user_data;
 
-	schedule_work(&net->rx_oob_work);
+	if (READ_ONCE(net->live))
+		queue_work(afs_wq, &net->rx_oob_work);
 }


^ permalink raw reply related

* [PATCH net v3 06/11] rxrpc: Fix the reception of a reply packet before data transmission
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

Fix rxrpc_receiving_reply() to handle the reception of an apparent reply
DATA packet before rxrpc has had a chance to send any request DATA packets
on a client call by checking to see if the call has been exposed yet by
sending the first packet.

Without this, rxrpc_rotate_tx_window() might oops.

Also fix rxrpc_rotate_tx_window() to handle the Tx queue being empty by
changing the do...while loop into a while loop, just in case a call is
abnormally terminated by an early reply before the last request packet is
transmitted.

Fixes: b341a0263b1b ("rxrpc: Implement progressive transmission queue struct")
Link: https://sashiko.dev/#/patchset/20260616155749.2125907-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
 net/rxrpc/input.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 2eedab1b0919..9bd0f1b92463 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -248,7 +248,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
 		tq = call->tx_queue;
 	}
 
-	do {
+	while (before_eq(seq, to)) {
 		unsigned int ix = seq - call->tx_qbase;
 
 		_debug("tq=%x seq=%x i=%d f=%x", tq->qbase, seq, ix, tq->bufs[ix]->flags);
@@ -318,8 +318,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
 				break;
 			}
 		}
-
-	} while (before_eq(seq, to));
+	}
 
 	if (trace)
 		trace_rxrpc_rack_update(call, summary);
@@ -394,6 +393,14 @@ static bool rxrpc_receiving_reply(struct rxrpc_call *call)
 		trace_rxrpc_timer_can(call, rxrpc_timer_trace_delayed_ack);
 	}
 
+	/* Deal with an apparent reply coming in before we've got the request
+	 * queued or transmitted.
+	 */
+	if (!test_bit(RXRPC_CALL_EXPOSED, &call->flags)) {
+		rxrpc_proto_abort(call, top, rxrpc_eproto_early_reply);
+		return false;
+	}
+
 	if (!test_bit(RXRPC_CALL_TX_LAST, &call->flags)) {
 		if (!rxrpc_rotate_tx_window(call, top, &summary)) {
 			rxrpc_proto_abort(call, top, rxrpc_eproto_early_reply);


^ permalink raw reply related

* [PATCH net v3 07/11] rxrpc: Fix oob challenge leak in cleanup after notification failure
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

Fix rxrpc_notify_socket_oob() to return an indication of failure in the
event that it failed to queue a packet and fix rxrpc_post_challenge() to
clean up the connection ref in such an event.

Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260616155749.2125907-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
 net/rxrpc/ar-internal.h | 4 ++--
 net/rxrpc/conn_event.c  | 9 +++++++--
 net/rxrpc/oob.c         | 7 +++++--
 3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index b6ccd8a8199b..d2b31d15851b 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -1357,9 +1357,9 @@ static inline struct rxrpc_net *rxrpc_net(struct net *net)
 }
 
 /*
- * out_of_band.c
+ * oob.c
  */
-void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb);
+bool rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb);
 void rxrpc_add_pending_oob(struct rxrpc_sock *rx, struct sk_buff *skb);
 int rxrpc_sendmsg_oob(struct rxrpc_sock *rx, struct msghdr *msg, size_t len);
 
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index c96ca615b787..611c790bc6d0 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -436,7 +436,7 @@ static bool rxrpc_post_challenge(struct rxrpc_connection *conn,
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	struct rxrpc_call *call = NULL;
 	struct rxrpc_sock *rx;
-	bool respond = false;
+	bool respond = false, queued = false;
 
 	sp->chall.conn =
 		rxrpc_get_connection(conn, rxrpc_conn_get_challenge_input);
@@ -472,8 +472,13 @@ static bool rxrpc_post_challenge(struct rxrpc_connection *conn,
 	}
 
 	if (call)
-		rxrpc_notify_socket_oob(call, skb);
+		queued = rxrpc_notify_socket_oob(call, skb);
 	rcu_read_unlock();
+	if (call && !queued) {
+		rxrpc_put_connection(conn, rxrpc_conn_put_challenge_input);
+		sp->chall.conn = NULL;
+		return false;
+	}
 
 	if (!call)
 		rxrpc_post_packet_to_conn(conn, skb);
diff --git a/net/rxrpc/oob.c b/net/rxrpc/oob.c
index 3318c8bd82ad..c80ee2487d09 100644
--- a/net/rxrpc/oob.c
+++ b/net/rxrpc/oob.c
@@ -32,11 +32,12 @@ struct rxrpc_oob_params {
  * Post an out-of-band message for attention by the socket or kernel service
  * associated with a reference call.
  */
-void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb)
+bool rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb)
 {
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	struct rxrpc_sock *rx;
 	struct sock *sk;
+	bool queued = false;
 
 	rcu_read_lock();
 
@@ -49,6 +50,7 @@ void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb)
 			skb->skb_mstamp_ns = rx->oob_id_counter++;
 			rxrpc_get_skb(skb, rxrpc_skb_get_post_oob);
 			skb_queue_tail(&rx->recvmsg_oobq, skb);
+			queued = true;
 
 			trace_rxrpc_notify_socket(call->debug_id, sp->hdr.serial);
 			if (rx->app_ops)
@@ -56,11 +58,12 @@ void rxrpc_notify_socket_oob(struct rxrpc_call *call, struct sk_buff *skb)
 		}
 
 		spin_unlock_irq(&rx->recvmsg_lock);
-		if (!rx->app_ops && !sock_flag(sk, SOCK_DEAD))
+		if (queued && !rx->app_ops && !sock_flag(sk, SOCK_DEAD))
 			sk->sk_data_ready(sk);
 	}
 
 	rcu_read_unlock();
+	return queued;
 }
 
 /*


^ permalink raw reply related

* [PATCH net v3 08/11] rxrpc: Fix potential infinite loop in rxrpc_recvmsg()
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

Fix the wait in rxrpc_recvmsg() also take check the oob queue.

Fixes: 5800b1cf3fd8 ("rxrpc: Allow CHALLENGEs to the passed to the app for a RESPONSE")
Link: https://sashiko.dev/#/patchset/20260616155749.2125907-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
 net/rxrpc/recvmsg.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 39a03684432d..f382a47c6eb0 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -438,7 +438,8 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 		return -EAGAIN;
 	}
 
-	if (list_empty(&rx->recvmsg_q)) {
+	if (list_empty(&rx->recvmsg_q) &&
+	    skb_queue_empty_lockless(&rx->recvmsg_oobq)) {
 		ret = -EWOULDBLOCK;
 		if (timeo == 0) {
 			call = NULL;


^ permalink raw reply related

* [PATCH net v3 10/11] rxrpc: Fix leak of released call in recvmsg(MSG_PEEK)
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

Fix rxrpc_recvmsg() to also drop the ref it holds on an already-released
call if MSG_PEEK is in force (the function holds a ref on the call
irrespective of whether MSG_PEEK is specified or not).

Fixes: 962fb1f651c2 ("rxrpc: Fix recv-recv race of completed call")
Link: https://sashiko.dev/#/patchset/20260616155749.2125907-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
 net/rxrpc/recvmsg.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 9962e135cb73..efcba4b2e74f 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -529,8 +529,7 @@ int rxrpc_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
 	if (test_bit(RXRPC_CALL_RELEASED, &call->flags)) {
 		rxrpc_see_call(call, rxrpc_call_see_already_released);
 		mutex_unlock(&call->user_mutex);
-		if (!(flags & MSG_PEEK))
-			rxrpc_put_call(call, rxrpc_call_put_recvmsg);
+		rxrpc_put_call(call, rxrpc_call_put_recvmsg);
 		goto try_again;
 	}
 


^ permalink raw reply related

* [PATCH net v3 09/11] rxrpc: Fix socket notification race
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

There's a race between rxrpc_recvmsg() and rxrpc_notify_socket(), whereby
the latter's attempt to avoid disabling interrupts and taking the socket's
recvmsg_lock if the call is already queued may happen simultaneously with
the former's discarding of a call that has nothing queued.

Fix this by removing the shortcut.  Note that this only affects userspace's
use of AF_RXRPC; the AFS filesystem driver doesn't use the socket queue.

Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Link: https://sashiko.dev/#/patchset/20260616155749.2125907-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
 net/rxrpc/recvmsg.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index f382a47c6eb0..9962e135cb73 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -27,8 +27,6 @@ void rxrpc_notify_socket(struct rxrpc_call *call)
 
 	_enter("%d", call->debug_id);
 
-	if (!list_empty(&call->recvmsg_link))
-		return;
 	if (test_bit(RXRPC_CALL_RELEASED, &call->flags)) {
 		rxrpc_see_call(call, rxrpc_call_see_notify_released);
 		return;


^ permalink raw reply related

* [PATCH net v3 11/11] rxrpc: Fix rxrpc_rotate_tx_rotate() to check there's something to rotate
From: David Howells @ 2026-06-24 16:38 UTC (permalink / raw)
  To: netdev
  Cc: David Howells, Marc Dionne, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, linux-afs, linux-kernel,
	Jeffrey Altman, stable
In-Reply-To: <20260624163819.3017002-1-dhowells@redhat.com>

Fix rxrpc_rotate_tx_rotate() to check that there's something in the
transmission buffer to be rotated before it attempts to rotate anything.

Fixes: b341a0263b1b ("rxrpc: Implement progressive transmission queue struct")
Link: https://sashiko.dev/#/patchset/20260618134802.2477777-1-dhowells%40redhat.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Eric Dumazet <edumazet@google.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org
cc: stable@kernel.org
---
 net/rxrpc/input.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 9bd0f1b92463..73cafe6bfa9f 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -237,6 +237,9 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
 		call->acks_lowest_nak = to;
 	}
 
+	if (after(seq, to))
+		return false;
+
 	/* We may have a left over fully-consumed buffer at the front that we
 	 * couldn't drop before (rotate_and_keep below).
 	 */


^ permalink raw reply related

* Re: [PATCH net v7 3/4] iavf: send MAC change request synchronously
From: Simon Horman @ 2026-06-24 16:31 UTC (permalink / raw)
  To: jtornosm
  Cc: Simon Horman, netdev, intel-wired-lan, przemyslaw.kitszel,
	aleksandr.loktionov, jacob.e.keller, anthony.l.nguyen, davem,
	edumazet, kuba, pabeni
In-Reply-To: <20260623101800.991293-4-jtornosm@redhat.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
iavf: send MAC change request synchronously

This commit fixes a deadlock in iavf_set_mac() by making the MAC
address change synchronous. It introduces iavf_poll_virtchnl_response()
to directly poll the admin queue instead of waiting on the watchdog task.

> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 630388e9d28cb..3fa288e3798ae 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
[ ... ]
> @@ -1049,25 +1103,23 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
>  		return -EADDRNOTAVAIL;
>  
>  	ret = iavf_replace_primary_mac(adapter, addr->sa_data);
> -
>  	if (ret)
>  		return ret;
>  
> -	ret = wait_event_interruptible_timeout(adapter->vc_waitqueue,
> -					       iavf_is_mac_set_handled(netdev, addr->sa_data),
> -					       msecs_to_jiffies(2500));
> -
> -	/* If ret < 0 then it means wait was interrupted.
> -	 * If ret == 0 then it means we got a timeout.
> -	 * else it means we got response for set MAC from PF,
> -	 * check if netdev MAC was updated to requested MAC,
> -	 * if yes then set MAC succeeded otherwise it failed return -EACCES
> -	 */
> -	if (ret < 0)
> +	ret = iavf_set_mac_sync(adapter, addr->sa_data);
> +	if (ret) {
> +		/* Rollback only if send failed (message never reached PF).
> +		 * Don't rollback on timeout (-EAGAIN) because the message was
> +		 * sent and PF will eventually respond. When the response arrives,
> +		 * iavf_virtchnl_completion() will handle rollback (on PF error)
> +		 * or acceptance (on PF success) automatically.
> +		 */
> +		if (ret != -EAGAIN) {
> +			iavf_mac_add_reject(adapter);
> +			ether_addr_copy(adapter->hw.mac.addr, netdev->dev_addr);
> +		}
>  		return ret;
> -
> -	if (!ret)
> -		return -EAGAIN;
>  
>  	if (!ether_addr_equal(netdev->dev_addr, addr->sa_data))
>  		return -EACCES;

[Severity: High]
Will this cause a persistent state mismatch if a timeout is followed by a
retry?

If iavf_set_mac_sync() times out, it returns -EAGAIN and bypasses this
rollback, leaving adapter->hw.mac.addr updated but netdev->dev_addr at the
old MAC.

If the user then retries before the PF asynchronously completes the original
request, iavf_add_ether_addrs() returns -EBUSY. Since -EBUSY != -EAGAIN,
the rollback runs, forcibly reverting adapter->hw.mac.addr to the old MAC.

When the PF's delayed success response finally arrives,
iavf_virtchnl_completion() sees the old MAC in both hw.mac.addr and
netdev->dev_addr, bypassing the netdev MAC update. The PF hardware is now
programmed with the new MAC, but the kernel retains the old MAC, dropping
traffic.

> diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
> index ec234cc8bd9db..e6b7e8f82c7c1 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
[ ... ]
> @@ -555,20 +556,23 @@ iavf_set_mac_addr_type(struct virtchnl_ether_addr *virtchnl_ether_addr,
>   * @adapter: adapter structure
>   *
>   * Request that the PF add one or more addresses to our filters.
> - **/
> -void iavf_add_ether_addrs(struct iavf_adapter *adapter)
> + *
> + * Return: 0 on success, negative on failure
> + */
> +int iavf_add_ether_addrs(struct iavf_adapter *adapter)
>  {
>  	struct virtchnl_ether_addr_list *veal;
>  	struct iavf_mac_filter *f;
>  	int i = 0, count = 0;
>  	bool more = false;
>  	size_t len;
> +	int ret;
>  
>  	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
>  		/* bail because we already have a command pending */
>  		dev_err(&adapter->pdev->dev, "Cannot add filters, command %d pending\n",
>  			adapter->current_op);
> -		return;
> +		return -EBUSY;
>  	}

[Severity: High]
Could routine background tasks cause MAC changes to frequently fail?

The watchdog task routinely sets adapter->current_op for background commands
like VIRTCHNL_OP_GET_STATS. If iavf_set_mac() is called during this window,
iavf_set_mac_sync() will hit this check and immediately fail the MAC change
with -EBUSY.

Because there is no retry logic in iavf_set_mac(), this error propagates
directly to userspace, making MAC address configuration randomly fail based
on watchdog timing.

[ ... ]
> @@ -712,8 +724,8 @@ static void iavf_mac_add_ok(struct iavf_adapter *adapter)
>   * @adapter: adapter structure
>   *
>   * Remove filters from list based on PF response.
> - **/
> -static void iavf_mac_add_reject(struct iavf_adapter *adapter)
> + */
> +void iavf_mac_add_reject(struct iavf_adapter *adapter)
>  {
>  	struct net_device *netdev = adapter->netdev;
>  	struct iavf_mac_filter *f, *ftmp;

[Severity: Medium]
Does exporting and reusing iavf_mac_add_reject() for local rollback cause
unintended background MAC changes?

When the user requests a MAC address that is already in the filter list
(e.g., a secondary MAC), iavf_add_filter() sets f->is_new_mac = false and
f->add = true, and flags IAVF_FLAG_AQ_ADD_MAC_FILTER.

If the synchronous operation fails locally (like the -EBUSY return above),
iavf_set_mac() calls this function. However, iavf_mac_add_reject() only
clears the pending state and removes the filter if f->is_new_mac is true:

    if (f->is_new_mac) {
        list_del(&f->list);
        kfree(f);
    }

Reused filters are ignored, leaving f->add = true and the
IAVF_FLAG_AQ_ADD_MAC_FILTER flag active. The background watchdog task will
eventually process this flag and blindly send the MAC configuration to the
PF, even though the VF already aborted the operation locally.

^ permalink raw reply

* Re: [PATCH net-next v5 1/4] dpll: add DPLL_PIN_TYPE_INT_NCO pin type
From: Ivan Vecera @ 2026-06-24 16:42 UTC (permalink / raw)
  To: Vadim Fedorenko, Kubalewski, Arkadiusz, Jiri Pirko,
	Jakub Kicinski
  Cc: netdev@vger.kernel.org, Jiri Pirko, David S. Miller,
	Donald Hunter, Eric Dumazet, Schmidt, Michal, Paolo Abeni,
	Vaananen, Pasi, Oros, Petr, Prathosh Satish, Simon Horman,
	linux-kernel@vger.kernel.org
In-Reply-To: <0f8fe4e0-72d8-48a6-96ad-d1650919d2df@linux.dev>

On 6/24/26 5:57 PM, Vadim Fedorenko wrote:
> On 19/06/2026 18:07, Ivan Vecera wrote:
>> On 6/17/26 1:59 PM, Kubalewski, Arkadiusz wrote:
>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>> Sent: Monday, June 15, 2026 2:00 PM
>>>>
>>>> On 6/11/26 2:09 PM, Jiri Pirko wrote:
>>>>> Wed, Jun 10, 2026 at 05:45:46PM +0200, ivecera@redhat.com wrote:
>>>>>> On 6/10/26 3:04 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>> Sent: Tuesday, June 9, 2026 4:59 PM
>>>>>>>>
>>>>>>>> On 6/9/26 4:00 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>>>>>>> Sent: Tuesday, June 9, 2026 10:51 AM
>>>>>>>>>>
>>>>>>>>>> Mon, Jun 08, 2026 at 07:03:46PM +0200,
>>>>>>>>>> arkadiusz.kubalewski@intel.com
>>>>>>>>>> wrote:
>>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>>> Sent: Monday, June 8, 2026 5:48 PM
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/8/26 4:43 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>>>>> Sent: Sunday, May 31, 2026 9:44 PM ...
>>>>>>>>>>>>>>            -
>>>>>>>>>>>>>>              name: gnss
>>>>>>>>>>>>>>              doc: GNSS recovered clock
>>>>>>>>>>>>>> +      -
>>>>>>>>>>>>>> +        name: int-nco
>>>>>>>>>>>>>> +        doc: |
>>>>>>>>>>>>>> +          Device internal numerically controlled oscillator.
>>>>>>>>>>>>>> +          When connected as a DPLL input, the DPLL enters 
>>>>>>>>>>>>>> NCO
>>>>>>>>>>>>>> mode
>>>>>>>>>>>>>> +          where the output frequency is adjusted by the host
>>>>>>>>>>>>>> via
>>>>>>>>>>>>>> +          the PTP clock interface.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Ivan!
>>>>>>>>>>>>>
>>>>>>>>>>>>> How would you control this in case of automatic mode dpll?
>>>>>>>>>>>>> Automatic mode DPLL shall be controlled on HW level, such pin
>>>>>>>>>>>>> brakes that rule and requires some driver magic to show it is
>>>>>>>>>>>>> higher priority then the rest of the pins?
>>>>>>>>>>>>
>>>>>>>>>>>> The NCO pin can be connected only in manual mode. In other 
>>>>>>>>>>>> words
>>>>>>>>>>>> a
>>>>>>>>>>>> DPLL in automatic mode cannot select NCO pin (switch to NCO 
>>>>>>>>>>>> mode)
>>>>>>>>>>>> by
>>>>>>>>>>>> its own.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Being picky on DPLL_MODE for enabling feature is not 
>>>>>>>>>>> something we
>>>>>>>>>>> can allow if it is not related to HW limitation, is it?
>>>>>>>>>>> Could you please elaborate why it is not possible for AUTOMATIC
>>>>>>>>>>> mode?
>>>>>>>>>>
>>>>>>>>>> In automatic mode, the pin selection logic is defined upon 
>>>>>>>>>> prio. I
>>>>>>>>>> can imagine that if NCO pin has the highest prio of the available
>>>>>>>>>> ones, it gets picked. I would be aligned 100% with automatic mode
>>>>>>>>>> behaviour.
>>>>>>>>>> Is there a real usecase for it?
>>>>>>>>>>
>>>>>>>>>> [..]
>>>>>>>>>
>>>>>>>>> This is not true. AUTOMATIC mode is HW solution, SW driver ONLY
>>>>>>>>> configures priorities on the inputs, not manages the active 
>>>>>>>>> inputs.
>>>>>>>>> This brakes that behavior, the SW driver would have to manually
>>>>>>>>> override the AUTMATIC mode to be fed from such NCO pin as it 
>>>>>>>>> doesn't
>>>>>>>>> exists on it's priority list, HW cannot pick or use it.
>>>>>>>>
>>>>>>>> Correct, AUTO mode is hardware feature and it should not be 
>>>>>>>> emulated
>>>>>>>> by a
>>>>>>>> driver. If the hardware does not support it then the switching
>>>>>>>> between
>>>>>>>> input references should be done by userspace (by monitoring ffo,
>>>>>>>> phase_offset, operstate).
>>>>>>>>
>>>>>>>
>>>>>>> Yes, exactly, so for AUTOMATIC mode HW it will not be possible to
>>>>>>> create
>>>>>>> such pin, which means that NCO pin would serve only a MANUAL mode
>>>>>>> implementation.
>>>>>>> Basically this is something we shall not allow to happen. DPLL API
>>>>>>> should be designed to cover the case where AUTO mode is able to
>>>>>>> implement
>>>>>>> all features consistently.
>>>>>>
>>>>>> If you don't like the proposal from Jiri (NCO switch driven by NCO 
>>>>>> pin
>>>>>> priority -> highest==enter_nco else leave_nco) then it could be
>>>>>> possible
>>>>>> to handle the switching by allowing the state 'connected' in AUTO 
>>>>>> mode
>>>>>> for the NCO pin type. Then the implementation will be the same for 
>>>>>> both
>>>>>> selection modes.
>>>>>>
>>>>>> Only difference would be that a user does not need to switch the 
>>>>>> device
>>>>> >from the AUTO to MANUAL mode.
>>>>>>
>>>>>>>>> The real use case is that any DPLL can switch the mode to this one
>>>>>>>>> instead of implementing MANUAL mode just to use the feature with a
>>>>>>>>> 'virtual' pin.
>>>>>>>>
>>>>>>>> I don't expect this... but it is up to a driver. I don't plan such
>>>>>>>> functionality in zl3073x as the NCO pin does not expose prio_get()
>>>>>>>> and
>>>>>>>> prio_set() callbacks - so it is clear that this pin cannot be 
>>>>>>>> part of
>>>>>>>> the
>>>>>>>> automatic selection.
>>>>>>>>
>>>>>>>> Ivan
>>>>>>>
>>>>>>> There is a difference between particular HW and API capabilities, 
>>>>>>> with
>>>>>>> the
>>>>>>> proposed API we would disallow the possibility of such 
>>>>>>> implementation
>>>>>>> for
>>>>>>> existing HW variants.
>>>>>>>
>>>>>>> DPLL NCO MODE would allow that but as pointed here by Ivan and by 
>>>>>>> Jiri
>>>>>>> in
>>>>>>> the other email it would also require the extra implementation for
>>>>>>> some
>>>>>>> configuration - device level phase/ffo handling.
>>>>>>>
>>>>>>> To summarize it all, I don't have such simple solution for it.
>>>>>>>
>>>>>>> First thing that comes to my mind is to combine both approaches.
>>>>>>> Make it possible for AUTMATIC mode to also set "CONNECTED" state
>>>>>>> on certain kind of "OVERRIDE" pins, where it could be determined by
>>>>>>> the type of PIN and embed that logic into the DPLL subsystem.
>>>>>>
>>>>>> The possible states for particual pins are now handled at a driver
>>>>>> level
>>>>>> so the driver decides if the requested state is correct or not. So it
>>>>>> could be easy to implement this.
>>>>>>
>>>>>> For auto mode allowed states:
>>>>>> - input references: selectable / disconnected
>>>>>> - nco pin: connected / disconnected
>>>>>>
>>>>>>> Basically, if driver registers such NCO pin it would be always
>>>>>>> selected
>>>>>>> manually, and in such case all the other pins are going to
>>>>>>> disconnected
>>>>>>> state while DPLL mode is also a "OVERRIDE" or something like it.
>>>>>>
>>>>>> I would leave this decision on the driver level... Imagine the
>>>>>> potential
>>>>>> HW that would allow to switch NCO mode if there is no valid input
>>>>>> reference.
>>>>>>
>>>>>> Example:
>>>>>>
>>>>>> REF0 (prio 0) -> +------+ -> OUT0
>>>>>> REF1 (prio 1) -> | DPLL | -> ...
>>>>>> NCO  (prio 2) -> +------+ -> OUTn
>>>>>>
>>>>>> Such HW would prefer REF0 or REF1 and lock to one of them if they are
>>>>>> qualified. But if they are NOT, then it switches to NCO mode.
>>>
>>> Now you said yourself "NCO mode" ... I agree that it would be a mode in
>>> that case. Where instead of running on regular/built in XO dpll would 
>>> run
>>> on NCO and user could select it, and this would be addition to regular
>>> behavior.
>>>
>>> I also agree that the pin approach might be better/easier to use, 
>>> assuming
>>> frequency offset for all the outputs given dpll drives, it makes more 
>>> sense
>>> to have it configurable on input side.
>>
>> +1
>>
>>>>>>
>>>>>> In this situation the relevant driver would allow to configure 
>>>>>> priority
>>>>>> and state 'selectable' for this NCO pin.
>>>>>>
>>>>>>> Perhaps the pin type could include OVERRIDE in it's name to make it
>>>>>>> less
>>>>>>> confusing and needs some extra documentation.
>>>>>>>
>>>>>>> Thoughts?
>>>>>> I think _INT_ is ok. In the case of TYPE_INT_OSCILLATOR it is also
>>>>>> obvious that it is not a standard input reference.
>>>>>>
>>>>>> Jiri, Vadim, Arek, thoughts?
>>>>>
>>>>> I agree with you, the driver should have the flexibility to implement
>>>>> this according to his/hw's needs/capabilities. If it implements prio
>>>>> selection in AUTO mode, let it have it. If it implements manual NCO 
>>>>> pin
>>>>> selection in AUTO mode using connected/disconnected override, let it
>>>>> have it.
>>>
>>> I don't know 'current' HW that is capable of using AUTO mode as a 
>>> part of
>>> HW-based priority source selection and use such NCO input..
>>> But as already explained above, this is special mode of regular XO, 
>>> which
>>> allows DPLL's output frequency offset configuration.
>>
>> Lets keep this available for potential future HW. I can imagine a
>> situation where a user will prefer an automatic switch to NCO mode
>> if there is no qualified input reference - automatic switch means
>> that HW will support this (not emulated by the driver).
>>
>>>>>
>>>>> Moreover, I actually like the "override" capability for pins in AUTO
>>>>> mode in general. It may be handy for other usecases as well.
>>>>>
>>>> Arek? Vadim?
>>>>
>>>> Thanks,
>>>> Ivan
>>>
>>> Agree, 'override' capability of a pin would be the way to go for this 
>>> and
>>> other similar further cases.
>>>
>>> I believe a single approach on this would be best, I mean if AUTO mode
>>> needs a capability, to switch from regular behavior to 'OVERRIDE', and
>>> 'OVERRIDE' is only pin capability that allows such behavior for AUTO
>>> mode, then similar approach should be used on MANUAL mode, to make
>>> userspace know that such pin is always available to set "CONNECTED"
>>> and make the userspace implementation consistent on enabling it no 
>>> matter
>>> if AUTO or MANUAL mode dpll.
>>
>> Proposal:
>> 1) new pin capability
>>     - name: state-connected-override
>>     - doc: pin state can be changed to connected in any DPLL mode
>>
>> 2) new NCO pin type to switch the DPLL to NCO mode when connected
>>
>> 3) automatic-only DPLL
>>     - should expose NCO pin with state-connected-override capability
>>
>> 4) manual-only DPLL
>>    - does not need to expose NCO pin with state-connected-override cap
>>
>> 5) dual-mode DPLL (supporting mode switching)
>>    - if it exposes NCO pin with the override cap then it has to support
>>      switching to NCO mode directly from AUTO mode
>>    - if does not expose NCO pin with the override cap then a user MUST
>>      switch the DPLL mode from AUTO to MANUAL to be able to make NCO
>>      pin connected to the DPLL
> 
> I still don't see good reasoning for the pin. Even this sentence says
> "DPLL mode" which keeps me thinking whether we have to move it to a
> special DPLL mode. All these items look like overcomplication of a
> simple function of the device itself. DPLL can be either in the closed
> loop when one of the pins provides a signal to align to, or in the open
> loop meaning that software can control adjustments to phase/frequency.
> But it's definitely a property of the device, and it's not a pin in any
> kind...

Yes, at first glance, introducing a new DPLL mode would be easier and
simpler. But it depends on how you look at it.

What is NCO mode and how does it differ from other modes?

In other modes, the DPLL loop is controlled by the input signal that the
DPLL is locked to. Changes in frequency (FFO) and phase (phase offset)
affect the output signal(s) that are produced from this DPLL.

In NCO mode, the frequency and, consequently, phase changes are
determined by software (e.g. ptp4l userspace via the PTP API)... these
changes from userspace can be understood as an input signal that affects
the DPLL outputs.

In addition, we could use the existing DPLL pin attributes to control
the DPLL device in NCO mode.

I mean specifically:
* fractional-frequency-offset-ppt as a variant of .adjfine(), but with
   a higher resolution (adjfine has a resolution of 15.25 ppt)
* phase-adjust as a higher-resolution variant of .adjphase() (adjphase
   uses nanoseconds, whereas phase-adjust uses picoseconds)

So:
'dpll pin set id <nco-pin> fractional-frequency-offset-ppt X'
  corresponds to PTP's adjfine(X-as-scaled-ppm)
'dpll pin set id <nco-pin> phase-adjust Y' corresponds to PTP's
  adjphase(Y/1000)

So in short, the NCO pin represents a software "signal" from userspace.

If we were to go the special DPLL mode route and want to implement this
functionality in the DPLL subsystem in the future, we would have to add
extra attributes (ffo, phase) to dpll_device.

WDYT?

Thanks,
Ivan


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox