From: Stanislav Fomichev <stfomichev@gmail.com>
To: Bobby Eshleman <bobbyeshleman@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Mina Almasry <almasrymina@google.com>,
Stanislav Fomichev <sdf@fomichev.me>,
asml.silence@gmail.com, Bobby Eshleman <bobbyeshleman@meta.com>
Subject: Re: [PATCH net-next] net: devmem: convert binding refcount to percpu_ref
Date: Mon, 1 Dec 2025 09:12:23 -0800 [thread overview]
Message-ID: <aS3Md9EuAGIl8Bd0@mini-arch> (raw)
In-Reply-To: <20251126-upstream-percpu-ref-v1-1-cea20a92b1dd@meta.com>
On 11/26, Bobby Eshleman wrote:
> From: Bobby Eshleman <bobbyeshleman@meta.com>
>
> Convert net_devmem_dmabuf_binding refcount from refcount_t to percpu_ref
> to optimize common-case reference counting on the hot path.
>
> The typical devmem workflow involves binding a dmabuf to a queue
> (acquiring the initial reference on binding->ref), followed by
> high-volume traffic where every skb fragment acquires a reference.
> Eventually traffic stops and the unbind operation releases the initial
> reference. Additionally, the high traffic hot path is often multi-core.
> This access pattern is ideal for percpu_ref as the first and last
> reference during bind/unbind and normally book-ends activity in the hot
> path.
>
> __net_devmem_dmabuf_binding_free becomes the percpu_ref callback invoked
> when the last reference is dropped.
>
> kperf test:
> - 4MB message sizes
> - 60s of workload each run
> - 5 runs
> - 4 flows
>
> Throughput:
> Before: 45.31 GB/s (+/- 3.17 GB/s)
> After: 48.67 GB/s (+/- 0.01 GB/s)
>
> Picking throughput-matched kperf runs (both before and after matched at
> ~48 GB/s) for apples-to-apples comparison:
>
> Summary (averaged across 4 workers):
>
> TX worker CPU idle %:
> Before: 34.44%
> After: 87.13%
>
> RX worker CPU idle %:
> Before: 5.38%
> After: 9.73%
>
> kperf before:
>
> client: == Source
> client: Tx 98.100 Gbps (735764807680 bytes in 60001149 usec)
> client: Tx102.798 Gbps (770996961280 bytes in 60001149 usec)
> client: Tx101.534 Gbps (761517834240 bytes in 60001149 usec)
> client: Tx 82.794 Gbps (620966707200 bytes in 60001149 usec)
> client: net CPU 56: usr: 0.01% sys: 0.12% idle:17.06% iow: 0.00% irq: 9.89% sirq:72.91%
> client: app CPU 60: usr: 0.08% sys:63.30% idle:36.24% iow: 0.00% irq: 0.30% sirq: 0.06%
> client: net CPU 57: usr: 0.03% sys: 0.08% idle:75.68% iow: 0.00% irq: 2.96% sirq:21.23%
> client: app CPU 61: usr: 0.06% sys:67.67% idle:31.94% iow: 0.00% irq: 0.28% sirq: 0.03%
> client: net CPU 58: usr: 0.01% sys: 0.06% idle:76.87% iow: 0.00% irq: 2.84% sirq:20.19%
> client: app CPU 62: usr: 0.06% sys:69.78% idle:29.79% iow: 0.00% irq: 0.30% sirq: 0.05%
> client: net CPU 59: usr: 0.06% sys: 0.16% idle:74.97% iow: 0.00% irq: 3.76% sirq:21.03%
> client: app CPU 63: usr: 0.06% sys:59.82% idle:39.80% iow: 0.00% irq: 0.25% sirq: 0.05%
> client: == Target
> client: Rx 98.092 Gbps (735764807680 bytes in 60006084 usec)
> client: Rx102.785 Gbps (770962161664 bytes in 60006084 usec)
> client: Rx101.523 Gbps (761499566080 bytes in 60006084 usec)
> client: Rx 82.783 Gbps (620933136384 bytes in 60006084 usec)
> client: net CPU 2: usr: 0.00% sys: 0.01% idle:24.51% iow: 0.00% irq: 1.67% sirq:73.79%
> client: app CPU 6: usr: 1.51% sys:96.43% idle: 1.13% iow: 0.00% irq: 0.36% sirq: 0.55%
> client: net CPU 1: usr: 0.00% sys: 0.01% idle:25.18% iow: 0.00% irq: 1.99% sirq:72.80%
> client: app CPU 5: usr: 2.21% sys:94.54% idle: 2.54% iow: 0.00% irq: 0.38% sirq: 0.30%
> client: net CPU 3: usr: 0.00% sys: 0.01% idle:26.34% iow: 0.00% irq: 2.12% sirq:71.51%
> client: app CPU 7: usr: 2.22% sys:94.28% idle: 2.52% iow: 0.00% irq: 0.59% sirq: 0.37%
> client: net CPU 0: usr: 0.00% sys: 0.03% idle: 0.00% iow: 0.00% irq:10.44% sirq:89.51%
> client: app CPU 4: usr: 2.39% sys:81.46% idle:15.33% iow: 0.00% irq: 0.50% sirq: 0.30%
>
> kperf after:
>
> client: == Source
> client: Tx 99.257 Gbps (744447016960 bytes in 60001303 usec)
> client: Tx101.013 Gbps (757617131520 bytes in 60001303 usec)
> client: Tx 88.179 Gbps (661357854720 bytes in 60001303 usec)
> client: Tx101.002 Gbps (757533245440 bytes in 60001303 usec)
> client: net CPU 56: usr: 0.00% sys: 0.01% idle: 6.22% iow: 0.00% irq: 8.68% sirq:85.06%
> client: app CPU 60: usr: 0.08% sys:12.56% idle:87.21% iow: 0.00% irq: 0.08% sirq: 0.05%
> client: net CPU 57: usr: 0.00% sys: 0.05% idle:69.53% iow: 0.00% irq: 2.02% sirq:28.38%
> client: app CPU 61: usr: 0.11% sys:13.40% idle:86.36% iow: 0.00% irq: 0.08% sirq: 0.03%
> client: net CPU 58: usr: 0.00% sys: 0.03% idle:70.04% iow: 0.00% irq: 3.38% sirq:26.53%
> client: app CPU 62: usr: 0.10% sys:11.46% idle:88.31% iow: 0.00% irq: 0.08% sirq: 0.03%
> client: net CPU 59: usr: 0.01% sys: 0.06% idle:71.18% iow: 0.00% irq: 1.97% sirq:26.75%
> client: app CPU 63: usr: 0.10% sys:13.10% idle:86.64% iow: 0.00% irq: 0.10% sirq: 0.05%
> client: == Target
> client: Rx 99.250 Gbps (744415182848 bytes in 60003297 usec)
> client: Rx101.006 Gbps (757589737472 bytes in 60003297 usec)
> client: Rx 88.171 Gbps (661319475200 bytes in 60003297 usec)
> client: Rx100.996 Gbps (757514792960 bytes in 60003297 usec)
> client: net CPU 2: usr: 0.00% sys: 0.01% idle:28.02% iow: 0.00% irq: 1.95% sirq:70.00%
> client: app CPU 6: usr: 2.03% sys:87.20% idle:10.04% iow: 0.00% irq: 0.37% sirq: 0.33%
> client: net CPU 3: usr: 0.00% sys: 0.00% idle:27.63% iow: 0.00% irq: 1.90% sirq:70.45%
> client: app CPU 7: usr: 1.78% sys:89.70% idle: 7.79% iow: 0.00% irq: 0.37% sirq: 0.34%
> client: net CPU 0: usr: 0.00% sys: 0.01% idle: 0.00% iow: 0.00% irq: 9.96% sirq:90.01%
> client: app CPU 4: usr: 2.33% sys:83.51% idle:13.24% iow: 0.00% irq: 0.64% sirq: 0.26%
> client: net CPU 1: usr: 0.00% sys: 0.01% idle:27.60% iow: 0.00% irq: 1.94% sirq:70.43%
> client: app CPU 5: usr: 1.88% sys:89.61% idle: 7.86% iow: 0.00% irq: 0.35% sirq: 0.27%
>
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> ---
> net/core/devmem.c | 38 +++++++++++++++++++++++++++++++++-----
> net/core/devmem.h | 18 ++++++++++--------
> 2 files changed, 43 insertions(+), 13 deletions(-)
>
> diff --git a/net/core/devmem.c b/net/core/devmem.c
> index 1d04754bc756..83989cf4a987 100644
> --- a/net/core/devmem.c
> +++ b/net/core/devmem.c
> @@ -54,10 +54,26 @@ static dma_addr_t net_devmem_get_dma_addr(const struct net_iov *niov)
> ((dma_addr_t)net_iov_idx(niov) << PAGE_SHIFT);
> }
>
> -void __net_devmem_dmabuf_binding_free(struct work_struct *wq)
> +/*
> + * percpu_ref release callback invoked when the last reference to the binding
> + * is dropped. Schedules the actual cleanup in a workqueue because
> + * ref->release() cb is not allowed to sleep as it may be called in RCU
> + * callback context.
> + */
Can we drop this and the rest of the comments? I feel like they mostly
explain how percpu_ref works, nothing devmem specific.
refcnt-wise, feels like the only place that deserves a comment is
net_devmem_get_net_iov (why it's safe to ignore
net_devmem_dmabuf_binding_get return value, but you are not touching that..)
Otherwise LGTM!
next prev parent reply other threads:[~2025-12-01 17:12 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-27 7:54 [PATCH net-next] net: devmem: convert binding refcount to percpu_ref Bobby Eshleman
2025-12-01 17:12 ` Stanislav Fomichev [this message]
2025-12-02 10:52 ` Paolo Abeni
2025-12-02 17:56 ` Bobby Eshleman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aS3Md9EuAGIl8Bd0@mini-arch \
--to=stfomichev@gmail.com \
--cc=almasrymina@google.com \
--cc=asml.silence@gmail.com \
--cc=bobbyeshleman@gmail.com \
--cc=bobbyeshleman@meta.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.