From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yx1-f53.google.com (mail-yx1-f53.google.com [74.125.224.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57539477E3B for ; Thu, 22 Jan 2026 05:18:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.224.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769059124; cv=none; b=F+k1usVn0HlCnAVEjbNmZwtuN0SFksUx469SyrHMo9HzXJ9QzfPrXu3L09TXX6RXqR39qWRSML6wXpycToSA9rsg/XqiAuXKuzYoeLUmoUBddJJ336n8FkHcP2GwUizInGHJMNkSCG4KCXkM3IGpbuLuA75LF3vtQgotXRR+wC4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769059124; c=relaxed/simple; bh=fBc3lFvG00VaN5TGXf0atLgNuzGkZzULjII+rIAbWQw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=EqRt6koRJsOFdwsoqrzjyw/YqN3PNFV92wYv1Awi65ZQda/X+43pdgX4It1GFwUvr2soJKoR/1ppa3v0GW9gq8fgeBmYsXmXVXdTBo2ZUUo1E8br6yfDBwMrPaw+5qNMgsEh6cga7/fgicm/qoDNFJWfGG0qW7DCbHLb949+ICs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Vf19SM/v; arc=none smtp.client-ip=74.125.224.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Vf19SM/v" Received: by mail-yx1-f53.google.com with SMTP id 956f58d0204a3-649523de91fso404844d50.2 for ; Wed, 21 Jan 2026 21:18:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769059117; x=1769663917; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=AO8EA/DC8dWEgtSsLQDjFEdaL7T5CjryTDTrF4yRA3E=; b=Vf19SM/vFZAVGDgrYdX59JlWHLYgnDyQOw2V0fN+r+8wtyBxFFBw/J1M+fSkMTNhNH mS+9pmU3CZXmv53fMaA4ilf405U4ftqbdqzCCrW//tqtbZWGEj/rmpJs/2C9vYwY8MJ3 BsX2iXstnfZCL7uasC2lhMhzgxnLBhlS5pLaYYqmQQalooCKAShmUZFLe5aMrxPbZZwV Km7Sm7oIg78cE1nG33FhHWwfEiZs6d36onpD//FHRXjMO2dcBKVyFcfg+s3Gxhzo14tL ePHClISIGN7zhgwqTSQdNTsyo8vmdlSy+SpKxle7aE8sCLAogApCazXR98mkJng30MBe 8gHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769059117; x=1769663917; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AO8EA/DC8dWEgtSsLQDjFEdaL7T5CjryTDTrF4yRA3E=; b=lhSPsuDRt/wnDxnEGDUysCGGtkpbsSc9FctOtevAH0qskRQhirXhAJEN9ygR/HPDz3 8zsTQtKa2Ov3S8RwC3oI/Bdp2BUJZG4gLsDr9dSPkISucSAvFngIVh1vGX7ZQiIsih2Y 0msorcau7OYwoqGOvpo+Fw+3hL+UuO66RX+urTypMZgsih6KeT/UmaAflCGxUmdcmOYE inrsFwXPnuJDCxZLcHdEW4YjHNHP9/SnmLQqW3C4upAakfNbhg6osXKcEyZ0VqyxXWty sz5iHlLwAsNTalRnfCiP7DNeKtjLy9rEVUTTPqbCMHJQXAjwPtOP62P4xbGVbq7/wNN6 oQ3Q== X-Forwarded-Encrypted: i=1; AJvYcCUZajtxwP5l9i4crlx/QAApKTQSRui1yXwnT6VAjWkHAmd9x5RTu1zhQ/0bWDbov66A4qNKVKo=@vger.kernel.org X-Gm-Message-State: AOJu0Yx8rHqeFJ1Vrjz3/3GFa/4ZMrLOzxhWaWZhwgqTxudmNj7ocIuG 1X/H9Pk9XJBOz+Y8kkFtj5+KU/P6jviq5MZyz7sk+58TawIwkbHxRG2I X-Gm-Gg: AZuq6aIH+55lXBonaYfU6Yx0Bl2JexqaWFCQlqAUVoTQ6sXjbX9au9qDU1yiw3oWmYr kTWPG3CsN+6qAU59yqWt2jtiDt07UUWK3CBy58EZ02ym97/r4wZWVjpFPksHhgAcb1YlpswOjZw ADxpQON0It+TBd/3Nnaxvc5BQLsepTIvMMu858ucL9vL0PgexGcO9g2tCRPbR2P7e8Tl7HiiP+k 4qGQQK4f5ZndvcJ+Q83VpQC9RNQzKvGFR/hHSYJLdNx+pP/HbFc3+XEcoWqxzMJ5xbquSc8A5Vb qOtTQA1yZiqR4Nfbj6zbciOnghE4Z1Pb2Bz1bxKyRQhwse0m680VkszxzruUq8xWzuCCIGWWvgf qAyT149cwT9lCEyKe7HoC4MG9mIFWmbxFKmR3CKYdgeIQqI5y3xna+9NBnkrbKC8LYPIt8NGAir 48lYWDS0xrYk1tojVkppbF+62cWXfB7UZAt1g= X-Received: by 2002:a05:690e:120b:b0:646:e643:d14e with SMTP id 956f58d0204a3-649164f1d0fmr14892285d50.53.1769059116833; Wed, 21 Jan 2026 21:18:36 -0800 (PST) Received: from devvm11784.nha0.facebook.com ([2a03:2880:25ff:5d::]) by smtp.gmail.com with ESMTPSA id 956f58d0204a3-6493dcbcce6sm3259068d50.24.2026.01.21.21.18.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Jan 2026 21:18:36 -0800 (PST) Date: Wed, 21 Jan 2026 21:18:34 -0800 From: Bobby Eshleman To: Mina Almasry Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Simon Horman , Kuniyuki Iwashima , Willem de Bruijn , Neal Cardwell , David Ahern , Arnd Bergmann , Jonathan Corbet , Andrew Lunn , Shuah Khan , Donald Hunter , Stanislav Fomichev , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, asml.silence@gmail.com, matttbe@kernel.org, skhawaja@google.com, Bobby Eshleman Subject: Re: [PATCH net-next v10 3/5] net: devmem: implement autorelease token management Message-ID: References: <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-0-686d0af71978@meta.com> <20260115-scratch-bobbyeshleman-devmem-tcp-token-upstream-v10-3-686d0af71978@meta.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Wed, Jan 21, 2026 at 08:15:23PM -0800, Mina Almasry wrote: > On Thu, Jan 15, 2026 at 9:03 PM Bobby Eshleman wrote: > > > > From: Bobby Eshleman > > > > Add support for autorelease toggling of tokens using a static branch to > > control system-wide behavior. This allows applications to choose between > > two memory management modes: > > > > 1. Autorelease on: Leaked tokens are automatically released when the > > socket closes. > > > > 2. Autorelease off: Leaked tokens are released during dmabuf unbind. > > > > The autorelease mode is requested via the NETDEV_A_DMABUF_AUTORELEASE > > attribute of the NETDEV_CMD_BIND_RX message. Having separate modes per > > binding is disallowed and is rejected by netlink. The system will be > > "locked" into the mode that the first binding is set to. It can only be > > changed again once there are zero bindings on the system. > > > > Disabling autorelease offers ~13% improvement in CPU utilization. > > > > Static branching is used to limit the system to one mode or the other. > > > > The xa_erase(&net_devmem_dmabuf_bindings, ...) call is moved into > > __net_devmem_dmabuf_binding_free(...). The result is that it becomes > > possible to switch static branches atomically with regards to xarray > > state. In the time window between unbind and free the socket layer can > > still find the binding in the xarray, but it will fail to acquire > > binding->ref (if unbind decremented to zero). This change preserves > > correct behavior and allows us to avoid more complicated counting > > schemes for bindings. > > > > Signed-off-by: Bobby Eshleman > > --- > > Changes in v10: > > - add binding->users to track socket and rxq users of binding, defer > > release of urefs until binding->users hits zero to guard users from > > incrementing urefs *after* net_devmem_dmabuf_binding_put_urefs() > > is called. (Mina) > > - fix error failing to restore static key state when xarray alloc fails > > (Jakub) > > - add wrappers for setting/unsetting mode that captures the static key + > > rx binding count logic. > > - move xa_erase() into __net_devmem_dmabuf_binding_free() > > - remove net_devmem_rx_bindings_count, change xarray management to be > > to avoid the same race as net_devmem_rx_bindings_count did > > - check return of net_devmem_dmabuf_binding_get() in > > tcp_recvmsg_dmabuf() > > - move sk_devmem_info.binding fiddling into autorelease=off static path > > > > Changes in v9: > > - Add missing stub for net_devmem_dmabuf_binding_get() when NET_DEVMEM=n > > - Add wrapper around tcp_devmem_ar_key accesses so that it may be > > stubbed out when NET_DEVMEM=n > > - only dec rx binding count for rx bindings in free (v8 did not exclude > > TX bindings) > > > > Changes in v8: > > - Only reset static key when bindings go to zero, defaulting back to > > disabled (Stan). > > - Fix bad usage of xarray spinlock for sleepy static branch switching, > > use mutex instead. > > - Access pp_ref_count via niov->desc instead of niov directly. > > - Move reset of static key to __net_devmem_dmabuf_binding_free() so that > > the static key can not be changed while there are outstanding tokens > > (free is only called when reference count reaches zero). > > - Add net_devmem_dmabuf_rx_bindings_count because tokens may be active > > even after xa_erase(), so static key changes must wait until all > > RX bindings are finally freed (not just when xarray is empty). A > > counter is a simple way to track this. > > - socket takes reference on the binding, to avoid use-after-free on > > sk_devmem_info.binding in the case that user releases all tokens, > > unbinds, then issues SO_DEVMEM_DONTNEED again (with bad token). > > - removed some comments that were unnecessary > > > > Changes in v7: > > - implement autorelease with static branch (Stan) > > - use netlink instead of sockopt (Stan) > > - merge uAPI and implementation patches into one patch (seemed less > > confusing) > > > > Changes in v6: > > - remove sk_devmem_info.autorelease, using binding->autorelease instead > > - move binding->autorelease check to outside of > > net_devmem_dmabuf_binding_put_urefs() (Mina) > > - remove overly defensive net_is_devmem_iov() (Mina) > > - add comment about multiple urefs mapping to a single netmem ref (Mina) > > - remove overly defense netmem NULL and netmem_is_net_iov checks (Mina) > > - use niov without casting back and forth with netmem (Mina) > > - move the autorelease flag from per-binding to per-socket (Mina) > > - remove the batching logic in sock_devmem_dontneed_manual_release() > > (Mina) > > - move autorelease check inside tcp_xa_pool_commit() (Mina) > > - remove single-binding restriction for autorelease mode (Mina) > > - unbind always checks for leaked urefs > > > > Changes in v5: > > - remove unused variables > > - introduce autorelease flag, preparing for future patch toggle new > > behavior > > > > Changes in v3: > > - make urefs per-binding instead of per-socket, reducing memory > > footprint > > - fallback to cleaning up references in dmabuf unbind if socket leaked > > tokens > > - drop ethtool patch > > > > Changes in v2: > > - always use GFP_ZERO for binding->vec (Mina) > > - remove WARN for changed binding (Mina) > > - remove extraneous binding ref get (Mina) > > - remove WARNs on invalid user input (Mina) > > - pre-assign niovs in binding->vec for RX case (Mina) > > - use atomic_set(, 0) to initialize sk_user_frags.urefs > > - fix length of alloc for urefs > > --- > > Documentation/netlink/specs/netdev.yaml | 12 +++ > > include/net/netmem.h | 1 + > > include/net/sock.h | 7 +- > > include/uapi/linux/netdev.h | 1 + > > net/core/devmem.c | 136 +++++++++++++++++++++++++++----- > > net/core/devmem.h | 64 ++++++++++++++- > > net/core/netdev-genl-gen.c | 5 +- > > net/core/netdev-genl.c | 10 ++- > > net/core/sock.c | 57 +++++++++++-- > > net/ipv4/tcp.c | 87 ++++++++++++++++---- > > net/ipv4/tcp_ipv4.c | 15 +++- > > net/ipv4/tcp_minisocks.c | 3 +- > > tools/include/uapi/linux/netdev.h | 1 + > > 13 files changed, 345 insertions(+), 54 deletions(-) > > > > diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml > > index 596c306ce52b..a5301b150663 100644 > > --- a/Documentation/netlink/specs/netdev.yaml > > +++ b/Documentation/netlink/specs/netdev.yaml > > @@ -562,6 +562,17 @@ attribute-sets: > > type: u32 > > checks: > > min: 1 > > + - > > + name: autorelease > > + doc: | > > + Token autorelease mode. If true (1), leaked tokens are automatically > > + released when the socket closes. If false (0), leaked tokens are only > > + released when the dmabuf is torn down. Once a binding is created with > > + a specific mode, all subsequent bindings system-wide must use the > > + same mode. > > + > > + Optional. Defaults to false if not specified. > > Ooof. Defaults to false if not specified. I thought we wanted to default to the optimization on? > > My anxiety with this patch is that running an actual ML training job > involves many layers of middleware where we may not be able to enforce > the "must don'tneeding before closing the socket" requirement. I'm > curious if you have ML jobs or NCCL tests running on this and passing? I can run NCCL on it, but haven't this rev. > > > + type: u8 > > > > operations: > > list: > > @@ -769,6 +780,7 @@ operations: > > - ifindex > > - fd > > - queues > > + - autorelease > > reply: > > attributes: > > - id > > diff --git a/include/net/netmem.h b/include/net/netmem.h > > index 9e10f4ac50c3..80d2263ba4ed 100644 > > --- a/include/net/netmem.h > > +++ b/include/net/netmem.h > > @@ -112,6 +112,7 @@ struct net_iov { > > }; > > struct net_iov_area *owner; > > enum net_iov_type type; > > + atomic_t uref; > > }; > > > > struct net_iov_area { > > diff --git a/include/net/sock.h b/include/net/sock.h > > index aafe8bdb2c0f..9d3d5bde15e9 100644 > > --- a/include/net/sock.h > > +++ b/include/net/sock.h > > @@ -352,7 +352,7 @@ struct sk_filter; > > * @sk_scm_rights: flagged by SO_PASSRIGHTS to recv SCM_RIGHTS > > * @sk_scm_unused: unused flags for scm_recv() > > * @ns_tracker: tracker for netns reference > > - * @sk_user_frags: xarray of pages the user is holding a reference on. > > + * @sk_devmem_info: the devmem binding information for the socket > > * @sk_owner: reference to the real owner of the socket that calls > > * sock_lock_init_class_and_name(). > > */ > > @@ -584,7 +584,10 @@ struct sock { > > struct numa_drop_counters *sk_drop_counters; > > struct rcu_head sk_rcu; > > netns_tracker ns_tracker; > > - struct xarray sk_user_frags; > > + struct { > > + struct xarray frags; > > + struct net_devmem_dmabuf_binding *binding; > > + } sk_devmem_info; > > > > #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES) > > struct module *sk_owner; > > diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h > > index e0b579a1df4f..1e5c209cb998 100644 > > --- a/include/uapi/linux/netdev.h > > +++ b/include/uapi/linux/netdev.h > > @@ -207,6 +207,7 @@ enum { > > NETDEV_A_DMABUF_QUEUES, > > NETDEV_A_DMABUF_FD, > > NETDEV_A_DMABUF_ID, > > + NETDEV_A_DMABUF_AUTORELEASE, > > > > __NETDEV_A_DMABUF_MAX, > > NETDEV_A_DMABUF_MAX = (__NETDEV_A_DMABUF_MAX - 1) > > diff --git a/net/core/devmem.c b/net/core/devmem.c > > index 9dee697a28ee..1264d8ee40e3 100644 > > --- a/net/core/devmem.c > > +++ b/net/core/devmem.c > > @@ -11,6 +11,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -27,6 +28,9 @@ > > /* Device memory support */ > > > > static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1); > > +static DEFINE_MUTEX(devmem_ar_lock); > > +DEFINE_STATIC_KEY_FALSE(tcp_devmem_ar_key); > > +EXPORT_SYMBOL(tcp_devmem_ar_key); > > > > static const struct memory_provider_ops dmabuf_devmem_ops; > > > > @@ -63,12 +67,71 @@ static void net_devmem_dmabuf_binding_release(struct percpu_ref *ref) > > schedule_work(&binding->unbind_w); > > } > > > > +static bool net_devmem_has_rx_bindings(void) > > +{ > > + struct net_devmem_dmabuf_binding *binding; > > + unsigned long index; > > + > > + lockdep_assert_held(&devmem_ar_lock); > > + > > + xa_for_each(&net_devmem_dmabuf_bindings, index, binding) { > > + if (binding->direction == DMA_FROM_DEVICE) > > + return true; > > + } > > + return false; > > +} > > + > > +/* caller must hold devmem_ar_lock */ > > +static int > > +__net_devmem_dmabuf_binding_set_mode(enum dma_data_direction direction, > > + bool autorelease) > > +{ > > + lockdep_assert_held(&devmem_ar_lock); > > + > > + if (direction != DMA_FROM_DEVICE) > > + return 0; > > + > > + if (net_devmem_has_rx_bindings() && > > + static_key_enabled(&tcp_devmem_ar_key) != autorelease) > > + return -EBUSY; > > + > > + if (autorelease) > > + static_branch_enable(&tcp_devmem_ar_key); > > + > > + return 0; > > +} > > + > > +/* caller must hold devmem_ar_lock */ > > +static void > > +__net_devmem_dmabuf_binding_unset_mode(enum dma_data_direction direction) > > +{ > > + lockdep_assert_held(&devmem_ar_lock); > > + > > + if (direction != DMA_FROM_DEVICE) > > + return; > > + > > + if (net_devmem_has_rx_bindings()) > > + return; > > + > > + static_branch_disable(&tcp_devmem_ar_key); > > +} > > + > > void __net_devmem_dmabuf_binding_free(struct work_struct *wq) > > { > > struct net_devmem_dmabuf_binding *binding = container_of(wq, typeof(*binding), unbind_w); > > > > size_t size, avail; > > > > + mutex_lock(&devmem_ar_lock); > > + xa_erase(&net_devmem_dmabuf_bindings, binding->id); > > + __net_devmem_dmabuf_binding_unset_mode(binding->direction); > > + mutex_unlock(&devmem_ar_lock); > > + > > + /* Ensure no tx net_devmem_lookup_dmabuf() are in flight after the > > + * erase. > > + */ > > + synchronize_net(); > > + > > I'm sorry I could not wrap my head around moving this block of code to > the _free(), even though you mention it in the commit message. Why is > removing the the binding from net_devmem_dmabuf_bindings on unbind not > work for you? If the binding is not active on any rx queues then there > cannot be new data received on it. Unless I'm missing something it > should be fine to leave this where it is. Hmmm... actually now that recvmsg gets a binding->ref, the race I was trying to avoid can't happen (that is, dontneed using the wrong static branch after unbind but before free). Thanks for catching that. > > > > gen_pool_for_each_chunk(binding->chunk_pool, > > net_devmem_dmabuf_free_chunk_owner, NULL); > > > > @@ -126,19 +189,30 @@ void net_devmem_free_dmabuf(struct net_iov *niov) > > gen_pool_free(binding->chunk_pool, dma_addr, PAGE_SIZE); > > } > > > > +void > > +net_devmem_dmabuf_binding_put_urefs(struct net_devmem_dmabuf_binding *binding) > > +{ > > + int i; > > + > > + for (i = 0; i < binding->dmabuf->size / PAGE_SIZE; i++) { > > + struct net_iov *niov; > > + netmem_ref netmem; > > + > > + niov = binding->vec[i]; > > + netmem = net_iov_to_netmem(niov); > > + > > + /* Multiple urefs map to only a single netmem ref. */ > > + if (atomic_xchg(&niov->uref, 0) > 0) > > + WARN_ON_ONCE(!napi_pp_put_page(netmem)); > > + } > > +} > > + > > void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding) > > { > > struct netdev_rx_queue *rxq; > > unsigned long xa_idx; > > unsigned int rxq_idx; > > > > - xa_erase(&net_devmem_dmabuf_bindings, binding->id); > > - > > - /* Ensure no tx net_devmem_lookup_dmabuf() are in flight after the > > - * erase. > > - */ > > - synchronize_net(); > > - > > if (binding->list.next) > > list_del(&binding->list); > > > > @@ -151,6 +225,8 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding) > > rxq_idx = get_netdev_rx_queue_index(rxq); > > > > __net_mp_close_rxq(binding->dev, rxq_idx, &mp_params); > > + > > + net_devmem_dmabuf_binding_user_put(binding); > > } > > > > percpu_ref_kill(&binding->ref); > > @@ -178,6 +254,8 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, > > if (err) > > goto err_close_rxq; > > > > + atomic_inc(&binding->users); > > + > > nit: feels wrong that we have _binding_user_put() but open code the > get(). Either open code both or helper both, maybe? I had the same though, but the issue is that for the first queue atomic_inc_not_zero() (the helper) will fail. > > > return 0; > > > > err_close_rxq: > > @@ -189,8 +267,10 @@ struct net_devmem_dmabuf_binding * > > net_devmem_bind_dmabuf(struct net_device *dev, > > struct device *dma_dev, > > enum dma_data_direction direction, > > - unsigned int dmabuf_fd, struct netdev_nl_sock *priv, > > - struct netlink_ext_ack *extack) > > + unsigned int dmabuf_fd, > > + struct netdev_nl_sock *priv, > > + struct netlink_ext_ack *extack, > > + bool autorelease) > > { > > struct net_devmem_dmabuf_binding *binding; > > static u32 id_alloc_next; > > @@ -225,6 +305,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > if (err < 0) > > goto err_free_binding; > > > > + atomic_set(&binding->users, 0); > > + > > mutex_init(&binding->lock); > > > > binding->dmabuf = dmabuf; > > @@ -245,14 +327,12 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > goto err_detach; > > } > > > > - if (direction == DMA_TO_DEVICE) { > > - binding->vec = kvmalloc_array(dmabuf->size / PAGE_SIZE, > > - sizeof(struct net_iov *), > > - GFP_KERNEL); > > - if (!binding->vec) { > > - err = -ENOMEM; > > - goto err_unmap; > > - } > > + binding->vec = kvmalloc_array(dmabuf->size / PAGE_SIZE, > > + sizeof(struct net_iov *), > > + GFP_KERNEL | __GFP_ZERO); > > + if (!binding->vec) { > > + err = -ENOMEM; > > + goto err_unmap; > > } > > > > /* For simplicity we expect to make PAGE_SIZE allocations, but the > > @@ -306,25 +386,41 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > niov = &owner->area.niovs[i]; > > niov->type = NET_IOV_DMABUF; > > niov->owner = &owner->area; > > + atomic_set(&niov->uref, 0); > > page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), > > net_devmem_get_dma_addr(niov)); > > - if (direction == DMA_TO_DEVICE) > > - binding->vec[owner->area.base_virtual / PAGE_SIZE + i] = niov; > > + binding->vec[owner->area.base_virtual / PAGE_SIZE + i] = niov; > > } > > > > virtual += len; > > } > > > > + mutex_lock(&devmem_ar_lock); > > + > > + err = __net_devmem_dmabuf_binding_set_mode(direction, autorelease); > > + if (err < 0) { > > + NL_SET_ERR_MSG_FMT(extack, > > + "System already configured with autorelease=%d", > > + static_key_enabled(&tcp_devmem_ar_key)); > > + goto err_unlock_mutex; > > + } > > + > > Unless I've misread something, this looks very incorrect. TX bindings > will accidentally set the system to autorelease=false mode, no? You > need to make sure you only set the mode in rx-bindings only, right? Maybe the context you missed is that __net_devmem_dmabuf_binding_set_mode() no-ops for TX (DMA_FROM_DEVICE)? > > > err = xa_alloc_cyclic(&net_devmem_dmabuf_bindings, &binding->id, > > binding, xa_limit_32b, &id_alloc_next, > > GFP_KERNEL); > > if (err < 0) > > - goto err_free_chunks; > > + goto err_unset_mode; > > + > > + mutex_unlock(&devmem_ar_lock); > > > > list_add(&binding->list, &priv->bindings); > > > > return binding; > > > > +err_unset_mode: > > + __net_devmem_dmabuf_binding_unset_mode(direction); > > +err_unlock_mutex: > > + mutex_unlock(&devmem_ar_lock); > > err_free_chunks: > > gen_pool_for_each_chunk(binding->chunk_pool, > > net_devmem_dmabuf_free_chunk_owner, NULL); > > diff --git a/net/core/devmem.h b/net/core/devmem.h > > index 94874b323520..284f0ad5f381 100644 > > --- a/net/core/devmem.h > > +++ b/net/core/devmem.h > > @@ -12,9 +12,13 @@ > > > > #include > > #include > > +#include > > > > struct netlink_ext_ack; > > > > +/* static key for TCP devmem autorelease */ > > +extern struct static_key_false tcp_devmem_ar_key; > > + > > struct net_devmem_dmabuf_binding { > > struct dma_buf *dmabuf; > > struct dma_buf_attachment *attachment; > > @@ -43,6 +47,12 @@ struct net_devmem_dmabuf_binding { > > */ > > struct percpu_ref ref; > > > > + /* Counts sockets and rxqs that are using the binding. When this > > + * reaches zero, all urefs are drained and new sockets cannot join the > > + * binding. > > + */ > > + atomic_t users; > > + > > /* The list of bindings currently active. Used for netlink to notify us > > * of the user dropping the bind. > > */ > > @@ -61,7 +71,7 @@ struct net_devmem_dmabuf_binding { > > > > /* Array of net_iov pointers for this binding, sorted by virtual > > * address. This array is convenient to map the virtual addresses to > > - * net_iovs in the TX path. > > + * net_iovs. > > */ > > struct net_iov **vec; > > > > @@ -88,7 +98,7 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > struct device *dma_dev, > > enum dma_data_direction direction, > > unsigned int dmabuf_fd, struct netdev_nl_sock *priv, > > - struct netlink_ext_ack *extack); > > + struct netlink_ext_ack *extack, bool autorelease); > > struct net_devmem_dmabuf_binding *net_devmem_lookup_dmabuf(u32 id); > > void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding); > > int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx, > > @@ -134,6 +144,26 @@ net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding) > > percpu_ref_put(&binding->ref); > > } > > > > +void net_devmem_dmabuf_binding_put_urefs(struct net_devmem_dmabuf_binding *binding); > > + > > +static inline bool > > +net_devmem_dmabuf_binding_user_get(struct net_devmem_dmabuf_binding *binding) > > +{ > > + return atomic_inc_not_zero(&binding->users); > > +} > > + > > +static inline void > > +net_devmem_dmabuf_binding_user_put(struct net_devmem_dmabuf_binding *binding) > > +{ > > + if (atomic_dec_and_test(&binding->users)) > > + net_devmem_dmabuf_binding_put_urefs(binding); > > +} > > + > > +static inline bool net_devmem_autorelease_enabled(void) > > +{ > > + return static_branch_unlikely(&tcp_devmem_ar_key); > > +} > > + > > void net_devmem_get_net_iov(struct net_iov *niov); > > void net_devmem_put_net_iov(struct net_iov *niov); > > > > @@ -151,11 +181,38 @@ net_devmem_get_niov_at(struct net_devmem_dmabuf_binding *binding, size_t addr, > > #else > > struct net_devmem_dmabuf_binding; > > > > +static inline bool > > +net_devmem_dmabuf_binding_get(struct net_devmem_dmabuf_binding *binding) > > +{ > > + return false; > > +} > > + > > static inline void > > net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding) > > { > > } > > > > +static inline void > > +net_devmem_dmabuf_binding_put_urefs(struct net_devmem_dmabuf_binding *binding) > > +{ > > +} > > + > > +static inline bool > > +net_devmem_dmabuf_binding_user_get(struct net_devmem_dmabuf_binding *binding) > > +{ > > + return false; > > +} > > + > > +static inline void > > +net_devmem_dmabuf_binding_user_put(struct net_devmem_dmabuf_binding *binding) > > +{ > > +} > > + > > +static inline bool net_devmem_autorelease_enabled(void) > > +{ > > + return false; > > +} > > + > > static inline void net_devmem_get_net_iov(struct net_iov *niov) > > { > > } > > @@ -170,7 +227,8 @@ net_devmem_bind_dmabuf(struct net_device *dev, > > enum dma_data_direction direction, > > unsigned int dmabuf_fd, > > struct netdev_nl_sock *priv, > > - struct netlink_ext_ack *extack) > > + struct netlink_ext_ack *extack, > > + bool autorelease) > > { > > return ERR_PTR(-EOPNOTSUPP); > > } > > diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c > > index ba673e81716f..01b7765e11ec 100644 > > --- a/net/core/netdev-genl-gen.c > > +++ b/net/core/netdev-genl-gen.c > > @@ -86,10 +86,11 @@ static const struct nla_policy netdev_qstats_get_nl_policy[NETDEV_A_QSTATS_SCOPE > > }; > > > > /* NETDEV_CMD_BIND_RX - do */ > > -static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD + 1] = { > > +static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_AUTORELEASE + 1] = { > > [NETDEV_A_DMABUF_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1), > > [NETDEV_A_DMABUF_FD] = { .type = NLA_U32, }, > > [NETDEV_A_DMABUF_QUEUES] = NLA_POLICY_NESTED(netdev_queue_id_nl_policy), > > + [NETDEV_A_DMABUF_AUTORELEASE] = { .type = NLA_U8, }, > > This patch in the series is proving complicated to review. Any changes > you can refactor out of it for ease of review would be very welcome. > Things like the variable renames, netlink api changes, and any > refactors can be pulled into their own patches would ease review of > the core functionality. > > > }; > > > > /* NETDEV_CMD_NAPI_SET - do */ > > @@ -188,7 +189,7 @@ static const struct genl_split_ops netdev_nl_ops[] = { > > .cmd = NETDEV_CMD_BIND_RX, > > .doit = netdev_nl_bind_rx_doit, > > .policy = netdev_bind_rx_nl_policy, > > - .maxattr = NETDEV_A_DMABUF_FD, > > + .maxattr = NETDEV_A_DMABUF_AUTORELEASE, > > .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, > > }, > > { > > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c > > index 470fabbeacd9..c742bb34865e 100644 > > --- a/net/core/netdev-genl.c > > +++ b/net/core/netdev-genl.c > > @@ -939,6 +939,7 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info) > > struct netdev_nl_sock *priv; > > struct net_device *netdev; > > unsigned long *rxq_bitmap; > > + bool autorelease = false; > > struct device *dma_dev; > > struct sk_buff *rsp; > > int err = 0; > > @@ -952,6 +953,10 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info) > > ifindex = nla_get_u32(info->attrs[NETDEV_A_DEV_IFINDEX]); > > dmabuf_fd = nla_get_u32(info->attrs[NETDEV_A_DMABUF_FD]); > > > > + if (info->attrs[NETDEV_A_DMABUF_AUTORELEASE]) > > + autorelease = > > + !!nla_get_u8(info->attrs[NETDEV_A_DMABUF_AUTORELEASE]); > > + > > priv = genl_sk_priv_get(&netdev_nl_family, NETLINK_CB(skb).sk); > > if (IS_ERR(priv)) > > return PTR_ERR(priv); > > @@ -1002,7 +1007,8 @@ int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info) > > } > > > > binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_FROM_DEVICE, > > - dmabuf_fd, priv, info->extack); > > + dmabuf_fd, priv, info->extack, > > + autorelease); > > if (IS_ERR(binding)) { > > err = PTR_ERR(binding); > > goto err_rxq_bitmap; > > @@ -1097,7 +1103,7 @@ int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info) > > > > dma_dev = netdev_queue_get_dma_dev(netdev, 0); > > binding = net_devmem_bind_dmabuf(netdev, dma_dev, DMA_TO_DEVICE, > > - dmabuf_fd, priv, info->extack); > > + dmabuf_fd, priv, info->extack, false); > > if (IS_ERR(binding)) { > > err = PTR_ERR(binding); > > goto err_unlock_netdev; > > diff --git a/net/core/sock.c b/net/core/sock.c > > index f6526f43aa6e..6355c2ccfb8a 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -87,6 +87,7 @@ > > > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -151,6 +152,7 @@ > > #include > > > > #include "dev.h" > > +#include "devmem.h" > > > > static DEFINE_MUTEX(proto_list_mutex); > > static LIST_HEAD(proto_list); > > @@ -1081,6 +1083,44 @@ static int sock_reserve_memory(struct sock *sk, int bytes) > > #define MAX_DONTNEED_TOKENS 128 > > #define MAX_DONTNEED_FRAGS 1024 > > > > +static noinline_for_stack int > > +sock_devmem_dontneed_manual_release(struct sock *sk, > > + struct dmabuf_token *tokens, > > + unsigned int num_tokens) > > +{ > > + struct net_iov *niov; > > + unsigned int i, j; > > + netmem_ref netmem; > > + unsigned int token; > > + int num_frags = 0; > > + int ret = 0; > > + > > + if (!sk->sk_devmem_info.binding) > > + return -EINVAL; > > + > > + for (i = 0; i < num_tokens; i++) { > > + for (j = 0; j < tokens[i].token_count; j++) { > > + size_t size = sk->sk_devmem_info.binding->dmabuf->size; > > + > > + token = tokens[i].token_start + j; > > + if (token >= size / PAGE_SIZE) > > + break; > > + > > + if (++num_frags > MAX_DONTNEED_FRAGS) > > + return ret; > > + > > + niov = sk->sk_devmem_info.binding->vec[token]; > > + if (atomic_dec_and_test(&niov->uref)) { > > + netmem = net_iov_to_netmem(niov); > > + WARN_ON_ONCE(!napi_pp_put_page(netmem)); > > + } > > + ret++; > > + } > > + } > > + > > + return ret; > > +} > > + > > static noinline_for_stack int > > sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tokens, > > unsigned int num_tokens) > > @@ -1089,32 +1129,33 @@ sock_devmem_dontneed_autorelease(struct sock *sk, struct dmabuf_token *tokens, > > int ret = 0, num_frags = 0; > > netmem_ref netmems[16]; > > > > - xa_lock_bh(&sk->sk_user_frags); > > + xa_lock_bh(&sk->sk_devmem_info.frags); > > for (i = 0; i < num_tokens; i++) { > > for (j = 0; j < tokens[i].token_count; j++) { > > if (++num_frags > MAX_DONTNEED_FRAGS) > > goto frag_limit_reached; > > > > netmem_ref netmem = (__force netmem_ref)__xa_erase( > > - &sk->sk_user_frags, tokens[i].token_start + j); > > + &sk->sk_devmem_info.frags, > > + tokens[i].token_start + j); > > > > if (!netmem || WARN_ON_ONCE(!netmem_is_net_iov(netmem))) > > continue; > > > > netmems[netmem_num++] = netmem; > > if (netmem_num == ARRAY_SIZE(netmems)) { > > - xa_unlock_bh(&sk->sk_user_frags); > > + xa_unlock_bh(&sk->sk_devmem_info.frags); > > for (k = 0; k < netmem_num; k++) > > WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); > > netmem_num = 0; > > - xa_lock_bh(&sk->sk_user_frags); > > + xa_lock_bh(&sk->sk_devmem_info.frags); > > } > > ret++; > > } > > } > > > > frag_limit_reached: > > - xa_unlock_bh(&sk->sk_user_frags); > > + xa_unlock_bh(&sk->sk_devmem_info.frags); > > for (k = 0; k < netmem_num; k++) > > WARN_ON_ONCE(!napi_pp_put_page(netmems[k])); > > > > @@ -1145,7 +1186,11 @@ sock_devmem_dontneed(struct sock *sk, sockptr_t optval, unsigned int optlen) > > return -EFAULT; > > } > > > > - ret = sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); > > + if (net_devmem_autorelease_enabled()) > > + ret = sock_devmem_dontneed_autorelease(sk, tokens, num_tokens); > > + else > > + ret = sock_devmem_dontneed_manual_release(sk, tokens, > > + num_tokens); > > > > kvfree(tokens); > > return ret; > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index d5319ebe2452..73a577bd8765 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -260,6 +260,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -492,7 +493,8 @@ void tcp_init_sock(struct sock *sk) > > > > set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); > > sk_sockets_allocated_inc(sk); > > - xa_init_flags(&sk->sk_user_frags, XA_FLAGS_ALLOC1); > > + xa_init_flags(&sk->sk_devmem_info.frags, XA_FLAGS_ALLOC1); > > + sk->sk_devmem_info.binding = NULL; > > } > > EXPORT_IPV6_MOD(tcp_init_sock); > > > > @@ -2424,11 +2426,12 @@ static void tcp_xa_pool_commit_locked(struct sock *sk, struct tcp_xa_pool *p) > > > > /* Commit part that has been copied to user space. */ > > for (i = 0; i < p->idx; i++) > > - __xa_cmpxchg(&sk->sk_user_frags, p->tokens[i], XA_ZERO_ENTRY, > > - (__force void *)p->netmems[i], GFP_KERNEL); > > + __xa_cmpxchg(&sk->sk_devmem_info.frags, p->tokens[i], > > + XA_ZERO_ENTRY, (__force void *)p->netmems[i], > > + GFP_KERNEL); > > /* Rollback what has been pre-allocated and is no longer needed. */ > > for (; i < p->max; i++) > > - __xa_erase(&sk->sk_user_frags, p->tokens[i]); > > + __xa_erase(&sk->sk_devmem_info.frags, p->tokens[i]); > > > > p->max = 0; > > p->idx = 0; > > @@ -2436,14 +2439,17 @@ static void tcp_xa_pool_commit_locked(struct sock *sk, struct tcp_xa_pool *p) > > > > static void tcp_xa_pool_commit(struct sock *sk, struct tcp_xa_pool *p) > > { > > + if (!net_devmem_autorelease_enabled()) > > + return; > > + > > if (!p->max) > > return; > > > > - xa_lock_bh(&sk->sk_user_frags); > > + xa_lock_bh(&sk->sk_devmem_info.frags); > > > > tcp_xa_pool_commit_locked(sk, p); > > > > - xa_unlock_bh(&sk->sk_user_frags); > > + xa_unlock_bh(&sk->sk_devmem_info.frags); > > } > > > > static int tcp_xa_pool_refill(struct sock *sk, struct tcp_xa_pool *p, > > @@ -2454,24 +2460,41 @@ static int tcp_xa_pool_refill(struct sock *sk, struct tcp_xa_pool *p, > > if (p->idx < p->max) > > return 0; > > > > - xa_lock_bh(&sk->sk_user_frags); > > + xa_lock_bh(&sk->sk_devmem_info.frags); > > > > tcp_xa_pool_commit_locked(sk, p); > > > > for (k = 0; k < max_frags; k++) { > > - err = __xa_alloc(&sk->sk_user_frags, &p->tokens[k], > > + err = __xa_alloc(&sk->sk_devmem_info.frags, &p->tokens[k], > > XA_ZERO_ENTRY, xa_limit_31b, GFP_KERNEL); > > if (err) > > break; > > } > > > > - xa_unlock_bh(&sk->sk_user_frags); > > + xa_unlock_bh(&sk->sk_devmem_info.frags); > > > > p->max = k; > > p->idx = 0; > > return k ? 0 : err; > > } > > > > +static void tcp_xa_pool_inc_pp_ref_count(struct tcp_xa_pool *tcp_xa_pool, > > + skb_frag_t *frag) > > +{ > > + struct net_iov *niov; > > + > > + niov = skb_frag_net_iov(frag); > > + > > + if (net_devmem_autorelease_enabled()) { > > + atomic_long_inc(&niov->desc.pp_ref_count); > > + tcp_xa_pool->netmems[tcp_xa_pool->idx++] = > > + skb_frag_netmem(frag); > > + } else { > > + if (atomic_inc_return(&niov->uref) == 1) > > + atomic_long_inc(&niov->desc.pp_ref_count); > > + } > > +} > > + > > /* On error, returns the -errno. On success, returns number of bytes sent to the > > * user. May not consume all of @remaining_len. > > */ > > @@ -2533,6 +2556,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb, > > * sequence of cmsg > > */ > > for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { > > + struct net_devmem_dmabuf_binding *binding = NULL; > > skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; > > struct net_iov *niov; > > u64 frag_offset; > > @@ -2568,13 +2592,45 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb, > > start; > > dmabuf_cmsg.frag_offset = frag_offset; > > dmabuf_cmsg.frag_size = copy; > > - err = tcp_xa_pool_refill(sk, &tcp_xa_pool, > > - skb_shinfo(skb)->nr_frags - i); > > - if (err) > > - goto out; > > + > > + binding = net_devmem_iov_binding(niov); > > + > > + if (net_devmem_autorelease_enabled()) { > > + err = tcp_xa_pool_refill(sk, > > + &tcp_xa_pool, > > + skb_shinfo(skb)->nr_frags - i); > > + if (err) > > + goto out; > > + > > + dmabuf_cmsg.frag_token = > > + tcp_xa_pool.tokens[tcp_xa_pool.idx]; > > + } else { > > + if (!sk->sk_devmem_info.binding) { > > + if (!net_devmem_dmabuf_binding_user_get(binding)) { > > + err = -ENODEV; > > + goto out; > > + } > > + > > + if (!net_devmem_dmabuf_binding_get(binding)) { > > + net_devmem_dmabuf_binding_user_put(binding); > > + err = -ENODEV; > > + goto out; > > + } > > + > > + sk->sk_devmem_info.binding = binding; > > + } > > + > > + if (sk->sk_devmem_info.binding != binding) { > > + err = -EFAULT; > > + goto out; > > + } > > + > > + dmabuf_cmsg.frag_token = > > + net_iov_virtual_addr(niov) >> PAGE_SHIFT; > > + } > > + > > > > /* Will perform the exchange later */ > > - dmabuf_cmsg.frag_token = tcp_xa_pool.tokens[tcp_xa_pool.idx]; > > dmabuf_cmsg.dmabuf_id = net_devmem_iov_binding_id(niov); > > > > offset += copy; > > @@ -2587,8 +2643,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb, > > if (err) > > goto out; > > > > - atomic_long_inc(&niov->desc.pp_ref_count); > > - tcp_xa_pool.netmems[tcp_xa_pool.idx++] = skb_frag_netmem(frag); > > + tcp_xa_pool_inc_pp_ref_count(&tcp_xa_pool, frag); > > > > sent += copy; > > > > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > > index f8a9596e8f4d..420e8c8ebf6d 100644 > > --- a/net/ipv4/tcp_ipv4.c > > +++ b/net/ipv4/tcp_ipv4.c > > @@ -89,6 +89,9 @@ > > > > #include > > > > +#include > > +#include "../core/devmem.h" > > + > > #include > > > > #ifdef CONFIG_TCP_MD5SIG > > @@ -2492,7 +2495,7 @@ static void tcp_release_user_frags(struct sock *sk) > > unsigned long index; > > void *netmem; > > > > - xa_for_each(&sk->sk_user_frags, index, netmem) > > + xa_for_each(&sk->sk_devmem_info.frags, index, netmem) > > WARN_ON_ONCE(!napi_pp_put_page((__force netmem_ref)netmem)); > > #endif > > } > > @@ -2503,7 +2506,15 @@ void tcp_v4_destroy_sock(struct sock *sk) > > > > tcp_release_user_frags(sk); > > > > - xa_destroy(&sk->sk_user_frags); > > + if (!net_devmem_autorelease_enabled() && sk->sk_devmem_info.binding) { > > + net_devmem_dmabuf_binding_user_put(sk->sk_devmem_info.binding); > > + net_devmem_dmabuf_binding_put(sk->sk_devmem_info.binding); > > + sk->sk_devmem_info.binding = NULL; > > + WARN_ONCE(!xa_empty(&sk->sk_devmem_info.frags), > > + "non-empty xarray discovered in autorelease off mode"); > > + } > > + > > + xa_destroy(&sk->sk_devmem_info.frags); > > > > trace_tcp_destroy_sock(sk); > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c > > index bd5462154f97..2aec977f5c12 100644 > > --- a/net/ipv4/tcp_minisocks.c > > +++ b/net/ipv4/tcp_minisocks.c > > @@ -662,7 +662,8 @@ struct sock *tcp_create_openreq_child(const struct sock *sk, > > > > __TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS); > > > > - xa_init_flags(&newsk->sk_user_frags, XA_FLAGS_ALLOC1); > > + xa_init_flags(&newsk->sk_devmem_info.frags, XA_FLAGS_ALLOC1); > > + newsk->sk_devmem_info.binding = NULL; > > > > return newsk; > > } > > diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h > > index e0b579a1df4f..1e5c209cb998 100644 > > --- a/tools/include/uapi/linux/netdev.h > > +++ b/tools/include/uapi/linux/netdev.h > > @@ -207,6 +207,7 @@ enum { > > NETDEV_A_DMABUF_QUEUES, > > NETDEV_A_DMABUF_FD, > > NETDEV_A_DMABUF_ID, > > + NETDEV_A_DMABUF_AUTORELEASE, > > > > __NETDEV_A_DMABUF_MAX, > > NETDEV_A_DMABUF_MAX = (__NETDEV_A_DMABUF_MAX - 1) > > > > -- > > 2.47.3 > > > > > -- > Thanks, > Mina