From: Paolo Abeni <pabeni@redhat.com>
To: John Ousterhout <ouster@cs.stanford.edu>, netdev@vger.kernel.org
Cc: edumazet@google.com, horms@kernel.org, kuba@kernel.org
Subject: Re: [PATCH net-next v15 09/15] net: homa: create homa_rpc.h and homa_rpc.c
Date: Tue, 26 Aug 2025 13:31:43 +0200 [thread overview]
Message-ID: <7d7516a6-07b7-4882-9da2-2c192ef43039@redhat.com> (raw)
In-Reply-To: <20250818205551.2082-10-ouster@cs.stanford.edu>
On 8/18/25 10:55 PM, John Ousterhout wrote:
> +/**
> + * homa_rpc_reap() - Invoked to release resources associated with dead
> + * RPCs for a given socket.
> + * @hsk: Homa socket that may contain dead RPCs. Must not be locked by the
> + * caller; this function will lock and release.
> + * @reap_all: False means do a small chunk of work; there may still be
> + * unreaped RPCs on return. True means reap all dead RPCs for
> + * hsk. Will busy-wait if reaping has been disabled for some RPCs.
> + *
> + * Return: A return value of 0 means that we ran out of work to do; calling
> + * again will do no work (there could be unreaped RPCs, but if so,
> + * they cannot currently be reaped). A value greater than zero means
> + * there is still more reaping work to be done.
> + */
> +int homa_rpc_reap(struct homa_sock *hsk, bool reap_all)
> +{
> + /* RPC Reaping Strategy:
> + *
> + * (Note: there are references to this comment elsewhere in the
> + * Homa code)
> + *
> + * Most of the cost of reaping comes from freeing sk_buffs; this can be
> + * quite expensive for RPCs with long messages.
> + *
> + * The natural time to reap is when homa_rpc_end is invoked to
> + * terminate an RPC, but this doesn't work for two reasons. First,
> + * there may be outstanding references to the RPC; it cannot be reaped
> + * until all of those references have been released. Second, reaping
> + * is potentially expensive and RPC termination could occur in
> + * homa_softirq when there are short messages waiting to be processed.
> + * Taking time to reap a long RPC could result in significant delays
> + * for subsequent short RPCs.
> + *
> + * Thus Homa doesn't reap immediately in homa_rpc_end. Instead, dead
> + * RPCs are queued up and reaping occurs in this function, which is
> + * invoked later when it is less likely to impact latency. The
> + * challenge is to do this so that (a) we don't allow large numbers of
> + * dead RPCs to accumulate and (b) we minimize the impact of reaping
> + * on latency.
> + *
> + * The primary place where homa_rpc_reap is invoked is when threads
> + * are waiting for incoming messages. The thread has nothing else to
> + * do (it may even be polling for input), so reaping can be performed
> + * with no latency impact on the application. However, if a machine
> + * is overloaded then it may never wait, so this mechanism isn't always
> + * sufficient.
> + *
> + * Homa now reaps in two other places, if reaping while waiting for
> + * messages isn't adequate:
> + * 1. If too may dead skbs accumulate, then homa_timer will call
> + * homa_rpc_reap.
> + * 2. If this timer thread cannot keep up with all the reaping to be
> + * done then as a last resort homa_dispatch_pkts will reap in small
> + * increments (a few sk_buffs or RPCs) for every incoming batch
> + * of packets . This is undesirable because it will impact Homa's
> + * performance.
> + *
> + * During the introduction of homa_pools for managing input
> + * buffers, freeing of packets for incoming messages was moved to
> + * homa_copy_to_user under the assumption that this code wouldn't be
> + * on the critical path. However, there is evidence that with
> + * fast networks (e.g. 100 Gbps) copying to user space is the
> + * bottleneck for incoming messages, and packet freeing takes about
> + * 20-25% of the total time in homa_copy_to_user. So, it may eventually
> + * be desirable to remove packet freeing out of homa_copy_to_user.
See skb_attempt_defer_free()
> + */
> +#define BATCH_MAX 20
> + struct homa_rpc *rpcs[BATCH_MAX];
> + struct sk_buff *skbs[BATCH_MAX];
A lot of bytes on the stack, and a quite large batch. You should probaly
decrease it.
Also it still feel suspect the need for just another tx free strategy on
top of the several existing caches.
> + int num_skbs, num_rpcs;
> + struct homa_rpc *rpc;
> + struct homa_rpc *tmp;
> + int i, batch_size;
> + int skbs_to_reap;
> + int result = 0;
> + int rx_frees;
> +
> + /* Each iteration through the following loop will reap
> + * BATCH_MAX skbs.
> + */
> + skbs_to_reap = hsk->homa->reap_limit;
> + while (skbs_to_reap > 0 && !list_empty(&hsk->dead_rpcs)) {
> + batch_size = BATCH_MAX;
> + if (!reap_all) {
> + if (batch_size > skbs_to_reap)
> + batch_size = skbs_to_reap;
> + skbs_to_reap -= batch_size;
> + }
> + num_skbs = 0;
> + num_rpcs = 0;
> + rx_frees = 0;
> +
> + homa_sock_lock(hsk);
> + if (atomic_read(&hsk->protect_count)) {
> + homa_sock_unlock(hsk);
> + if (reap_all)
> + continue;
> + return 0;
> + }
> +
> + /* Collect buffers and freeable RPCs. */
> + list_for_each_entry_safe(rpc, tmp, &hsk->dead_rpcs,
> + dead_links) {
> + int refs;
> +
> + /* Make sure that all outstanding uses of the RPC have
> + * completed. We can only be sure if the reference
> + * count is zero when we're holding the lock. Note:
> + * it isn't safe to block while locking the RPC here,
> + * since we hold the socket lock.
> + */
> + if (homa_rpc_try_lock(rpc)) {
> + refs = atomic_read(&rpc->refs);
> + homa_rpc_unlock(rpc);
> + } else {
> + refs = 1;
> + }
> + if (refs != 0)
> + continue;
> + rpc->magic = 0;
> +
> + /* For Tx sk_buffs, collect them here but defer
> + * freeing until after releasing the socket lock.
> + */
> + if (rpc->msgout.length >= 0) {
> + while (rpc->msgout.packets) {
> + skbs[num_skbs] = rpc->msgout.packets;
> + rpc->msgout.packets = homa_get_skb_info(
> + rpc->msgout.packets)->next_skb;
> + num_skbs++;
> + rpc->msgout.num_skbs--;
> + if (num_skbs >= batch_size)
> + goto release;
> + }
> + }
> +
> + /* In the normal case rx sk_buffs will already have been
> + * freed before we got here. Thus it's OK to free
> + * immediately in rare situations where there are
> + * buffers left.
> + */
> + if (rpc->msgin.length >= 0 &&
> + !skb_queue_empty_lockless(&rpc->msgin.packets)) {
> + rx_frees += skb_queue_len(&rpc->msgin.packets);
> + __skb_queue_purge(&rpc->msgin.packets);
> + }
> +
> + /* If we get here, it means all packets have been
> + * removed from the RPC.
> + */
> + rpcs[num_rpcs] = rpc;
> + num_rpcs++;
> + list_del(&rpc->dead_links);
> + WARN_ON(refcount_sub_and_test(rpc->msgout.skb_memory,
> + &hsk->sock.sk_wmem_alloc));
> + if (num_rpcs >= batch_size)
> + goto release;
> + }
> +
> + /* Free all of the collected resources; release the socket
> + * lock while doing this.
> + */
> +release:
> + hsk->dead_skbs -= num_skbs + rx_frees;
> + result = !list_empty(&hsk->dead_rpcs) &&
> + (num_skbs + num_rpcs) != 0;
> + homa_sock_unlock(hsk);
> + homa_skb_free_many_tx(hsk->homa, skbs, num_skbs);
> + for (i = 0; i < num_rpcs; i++) {
> + rpc = rpcs[i];
> +
> + if (unlikely(rpc->msgin.num_bpages))
> + homa_pool_release_buffers(rpc->hsk->buffer_pool,
> + rpc->msgin.num_bpages,
> + rpc->msgin.bpage_offsets);
> + if (rpc->msgin.length >= 0) {
> + while (1) {
> + struct homa_gap *gap;
> +
> + gap = list_first_entry_or_null(
> + &rpc->msgin.gaps,
> + struct homa_gap,
> + links);
> + if (!gap)
> + break;
> + list_del(&gap->links);
> + kfree(gap);
> + }
> + }
> + if (rpc->peer) {
> + homa_peer_release(rpc->peer);
> + rpc->peer = NULL;
> + }
> + rpc->state = 0;
> + kfree(rpc);
> + }
> + homa_sock_wakeup_wmem(hsk);
Here num_rpcs can be zero, and you can have spurius wake-ups
> +/**
> + * homa_rpc_hold() - Increment the reference count on an RPC, which will
> + * prevent it from being freed until homa_rpc_put() is called. References
> + * are taken in two situations:
> + * 1. An RPC is going to be manipulated by a collection of functions. In
> + * this case the top-most function that identifies the RPC takes the
> + * reference; any function that receives an RPC as an argument can
> + * assume that a reference has been taken on the RPC by some higher
> + * function on the call stack.
> + * 2. A pointer to an RPC is stored in an object for use later, such as
> + * an interest. A reference must be held as long as the pointer remains
> + * accessible in the object.
> + * @rpc: RPC on which to take a reference.
> + */
> +static inline void homa_rpc_hold(struct homa_rpc *rpc)
> +{
> + atomic_inc(&rpc->refs);
`refs` should be a reference_t, since is uses as such.
/P
next prev parent reply other threads:[~2025-08-26 11:31 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-18 20:55 [PATCH net-next v15 00/15] Begin upstreaming Homa transport protocol John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 01/15] net: homa: define user-visible API for Homa John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 02/15] net: homa: create homa_wire.h John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 03/15] net: homa: create shared Homa header files John Ousterhout
2025-08-26 9:05 ` Paolo Abeni
2025-08-26 23:10 ` John Ousterhout
2025-08-27 7:21 ` Paolo Abeni
2025-08-29 3:03 ` John Ousterhout
2025-08-29 7:53 ` Paolo Abeni
2025-08-29 17:08 ` John Ousterhout
2025-09-01 7:59 ` Paolo Abeni
2025-08-27 12:16 ` Eric Dumazet
2025-08-18 20:55 ` [PATCH net-next v15 04/15] net: homa: create homa_pool.h and homa_pool.c John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 05/15] net: homa: create homa_peer.h and homa_peer.c John Ousterhout
2025-08-26 9:32 ` Paolo Abeni
2025-08-27 23:27 ` John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 06/15] net: homa: create homa_sock.h and homa_sock.c John Ousterhout
2025-08-26 10:10 ` Paolo Abeni
2025-08-31 23:29 ` John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 07/15] net: homa: create homa_interest.h and homa_interest.c John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 08/15] net: homa: create homa_pacer.h and homa_pacer.c John Ousterhout
2025-08-26 10:53 ` Paolo Abeni
2025-09-01 16:35 ` John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 09/15] net: homa: create homa_rpc.h and homa_rpc.c John Ousterhout
2025-08-26 11:31 ` Paolo Abeni [this message]
2025-09-01 20:10 ` John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 10/15] net: homa: create homa_outgoing.c John Ousterhout
2025-08-26 11:50 ` Paolo Abeni
2025-09-01 20:21 ` John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 11/15] net: homa: create homa_utils.c John Ousterhout
2025-08-26 11:52 ` Paolo Abeni
2025-09-01 20:30 ` John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 12/15] net: homa: create homa_incoming.c John Ousterhout
2025-08-26 12:05 ` Paolo Abeni
2025-09-01 22:12 ` John Ousterhout
2025-09-02 7:19 ` Eric Dumazet
2025-08-18 20:55 ` [PATCH net-next v15 13/15] net: homa: create homa_timer.c John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 14/15] net: homa: create homa_plumbing.c John Ousterhout
2025-08-26 16:17 ` Paolo Abeni
2025-09-01 22:53 ` John Ousterhout
2025-09-01 23:03 ` Andrew Lunn
2025-09-02 4:54 ` John Ousterhout
2025-09-02 8:12 ` Paolo Abeni
2025-09-02 23:15 ` John Ousterhout
2025-08-18 20:55 ` [PATCH net-next v15 15/15] net: homa: create Makefile and Kconfig John Ousterhout
2025-08-23 5:36 ` kernel test robot
2025-08-22 15:51 ` [PATCH net-next v15 00/15] Begin upstreaming Homa transport protocol John Ousterhout
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7d7516a6-07b7-4882-9da2-2c192ef43039@redhat.com \
--to=pabeni@redhat.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=ouster@cs.stanford.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).