Re: [net-next,v17,05/14] net: homa: create homa_peer.h and homa_peer.c

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Paolo Abeni <pabeni@redhat.com>
To: ouster@cs.stanford.edu
Cc: horms@kernel.org, netdev@vger.kernel.org, kuba@kernel.org,
	pabeni@redhat.com, edumazet@google.com
Subject: Re: [net-next,v17,05/14] net: homa: create homa_peer.h and homa_peer.c
Date: Wed, 18 Mar 2026 08:21:19 +0100	[thread overview]
Message-ID: <20260318072119.268920-1-pabeni@redhat.com> (raw)
In-Reply-To: <20260316223228.2611-6-ouster@cs.stanford.edu>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net: homa: create homa_peer.h and homa_peer.c

Homa needs to keep a small amount of information for each peer that
it has communicated with. These files define that state and provide
functions for storing and accessing it.

> diff --git a/net/homa/homa_peer.c b/net/homa/homa_peer.c
> new file mode 100644
> index 000000000000..1e5301cdef41
> --- /dev/null
> +++ b/net/homa/homa_peer.c

[ ... ]

> +/**
> + * homa_peer_free_net() - Garbage collect all of the peer information
> + * associated with a particular network namespace.
> + * @hnet:    Network namespace whose peers should be freed. There must not
> + *           be any active sockets or RPCs for this namespace.
> + */
> +void homa_peer_free_net(struct homa_net *hnet)
> +{
> +	struct homa_peertab *peertab = hnet->homa->peertab;
> +	struct rhashtable_iter iter;
> +	struct homa_peer *peer;
> +
> +	spin_lock_bh(&peertab->lock);
> +	peertab->gc_stop_count++;
> +	spin_unlock_bh(&peertab->lock);
> +
> +	rhashtable_walk_enter(&peertab->ht, &iter);
> +	rhashtable_walk_start(&iter);
> +	while (1) {
> +		peer = rhashtable_walk_next(&iter);
> +		if (!peer)
> +			break;
> +		if (IS_ERR(peer))
> +			continue;
> +		if (peer->ht_key.hnet != hnet)
> +			continue;
> +		if (rhashtable_remove_fast(&peertab->ht, &peer->ht_linkage,
> +					   ht_params) == 0) {
> +			homa_peer_release(peer);
> +			hnet->num_peers--;
> +			peertab->num_peers--;
> +		}
> +	}
> +	rhashtable_walk_stop(&iter);
> +	rhashtable_walk_exit(&iter);

Does this code correctly protect peertab->num_peers updates? Looking at
homa_peer_gc():

net/homa/homa_peer.c:homa_peer_gc() {
    spin_lock_bh(&peertab->lock);
    ...
    if (rhashtable_remove_fast(&peertab->ht, &peer->ht_linkage,
                               ht_params) == 0) {
        homa_peer_release(peer);
        peertab->num_peers--;
        peer->ht_key.hnet->num_peers--;
    }
    ...
    spin_unlock_bh(&peertab->lock);
}

and homa_peer_get():

net/homa/homa_peer.c:homa_peer_get() {
    ...
    spin_lock_bh(&peertab->lock);
    other = rhashtable_lookup_get_insert_fast(&peertab->ht,
                                              &peer->ht_linkage, ht_params);
    ...
    peertab->num_peers++;
    key.hnet->num_peers++;
    ...
    spin_unlock_bh(&peertab->lock);
}

Both those functions hold peertab->lock while updating peertab->num_peers.

But homa_peer_free_net() drops peertab->lock before the rhashtable walk and
doesn't reacquire it before decrementing peertab->num_peers. Can a concurrent
homa_peer_get() call (for any namespace) produce lost updates to
peertab->num_peers? The gc_stop_count mechanism prevents concurrent GC but
doesn't prevent concurrent homa_peer_get().

Corrupted num_peers affects GC threshold decisions in the peertab->num_peers <
peertab->gc_threshold check in homa_peer_gc().

> +/**
> + * homa_peer_prefer_evict() - Given two peers, determine which one is
> + * a better candidate for eviction.
> + * @peertab:    Overall information used to manage peers.
> + * @peer1:      First peer.
> + * @peer2:      Second peer.
> + * Return:      True if @peer1 is a better candidate for eviction than @peer2.
> + */
> +int homa_peer_prefer_evict(struct homa_peertab *peertab,
> +			   struct homa_peer *peer1,
> +			   struct homa_peer *peer2)
> +{
> +	/* Prefer a peer whose homa-net is over its limit; if both are either
> +	 * over or under, then prefer the peer with the shortest idle time.
> +	 */
                                                      ^^^^^^^^

This isn't a bug, but the comment says "shortest idle time" while the code
returns peer1->access_jiffies < peer2->access_jiffies, which prefers the peer
accessed earlier (longest idle time). Since eviction should prefer stale peers,
the code is correct but the comment should say "longest idle time".

[ ... ]

> +/**
> + * homa_peer_free() - Release any resources in a peer and free the homa_peer
> + * struct. Invoked by the RCU mechanism via homa_peer_release.
> + * @head:   Pointer to the rcu_head field of the peer to free.
> + */
> +void homa_peer_free(struct rcu_head *head)
> +{
> +	struct homa_peer *peer;
> +
> +	peer = container_of(head, struct homa_peer, rcu_head);
> +	dst_release(rcu_dereference(peer->dst));
> +	kfree(peer);
> +}

Is the RCU API correct here? homa_peer_free() is an RCU callback invoked via
call_rcu from homa_peer_release():

net/homa/homa_peer.h:homa_peer_release() {
    if (refcount_dec_and_test(&peer->refs))
        call_rcu(&peer->rcu_head, homa_peer_free);
}

RCU callbacks run after the grace period, outside any RCU read-side critical
section. Using rcu_dereference() without holding rcu_read_lock() triggers
lockdep 'suspicious rcu_dereference_check() usage' warnings at runtime.

The peer is exclusively owned at this point (refcount was zero, grace period
has passed). The same codebase uses the correct pattern in
homa_peer_reset_dst():

net/homa/homa_peer.c:homa_peer_reset_dst() {
    ...
    dst_release(rcu_replace_pointer(peer->dst, dst, true));
    ...
}

Should this use rcu_dereference_protected(peer->dst, 1) instead of
rcu_dereference(peer->dst)?

> +/**
> + * homa_peer_get() - Returns the peer associated with a given host; creates
> + * a new homa_peer if one doesn't already exist.
> + * @hsk:        Socket where the peer will be used.
> + * @addr:       Address of the desired host: IPv4 addresses are represented
> + *              as IPv4-mapped IPv6 addresses.
> + *
> + * Return:      The peer associated with @addr, or a negative errno if an
> + *              error occurred. On a successful return the reference count
> + *              will be incremented for the returned peer. The caller must
> + *              eventually call homa_peer_release to release the reference.
> + */
> +struct homa_peer *homa_peer_get(struct homa_sock *hsk,
> +				const struct in6_addr *addr)
> +{
> +	struct homa_peertab *peertab = hsk->homa->peertab;
> +	struct homa_peer *peer, *other;
> +	struct homa_peer_key key;
> +
> +	key.addr = *addr;
> +	key.hnet = hsk->hnet;
> +	rcu_read_lock();
> +	peer = rhashtable_lookup(&peertab->ht, &key, ht_params);
> +	if (peer) {
> +		homa_peer_hold(peer);
> +		peer->access_jiffies = jiffies;
> +		rcu_read_unlock();
> +		return peer;
> +	}

Can this sequence race with homa_peer_gc()? The lookup uses
rhashtable_lookup() under rcu_read_lock() but without peertab->lock. Then it
calls homa_peer_hold() which uses refcount_inc():

net/homa/homa_peer.h:homa_peer_hold() {
    refcount_inc(&peer->refs);
}

Concurrently, homa_peer_gc() can:

net/homa/homa_peer.c:homa_peer_gc() {
    spin_lock_bh(&peertab->lock);
    ...
    if (rhashtable_remove_fast(&peertab->ht, &peer->ht_linkage,
                               ht_params) == 0) {
        homa_peer_release(peer);
        ...
    }
    ...
    spin_unlock_bh(&peertab->lock);
}

net/homa/homa_peer.h:homa_peer_release() {
    if (refcount_dec_and_test(&peer->refs))
        call_rcu(&peer->rcu_head, homa_peer_free);
}

If homa_peer_gc() removes the peer from the rhashtable and calls
homa_peer_release() (dropping refcount to 0 and scheduling call_rcu) between
the lookup and the hold, then refcount_inc() on a zero refcount triggers
REFCOUNT_WARN and refuses the increment. This leaves the caller with an
unref'd peer that will be freed after the RCU grace period.

The standard kernel pattern for RCU + rhashtable lookups is
refcount_inc_not_zero() with retry or fallback if it returns false. The RCU
read lock keeps the memory valid but doesn't prevent the refcount from
reaching zero.

[ ... ]

> +/**
> + * homa_peer_add_ack() - Add a given RPC to the list of unacked
> + * RPCs for its server. Once this method has been invoked, it's safe
> + * to delete the RPC, since it will eventually be acked to the server.
> + * @rpc:    Client RPC that has now completed.
> + */
> +void homa_peer_add_ack(struct homa_rpc *rpc)
> +{
> +	struct homa_peer *peer = rpc->peer;
> +	struct homa_ack_hdr ack;
> +
> +	homa_peer_lock(peer);
> +	if (peer->num_acks < HOMA_MAX_ACKS_PER_PKT) {
> +		peer->acks[peer->num_acks].client_id = cpu_to_be64(rpc->id);
> +		peer->acks[peer->num_acks].server_port = htons(rpc->dport);
> +		peer->num_acks++;
> +		homa_peer_unlock(peer);
> +		return;
> +	}
> +
> +	/* The peer has filled up; send an ACK message to empty it. The
> +	 * RPC in the message header will also be considered ACKed.
> +	 */
> +	memcpy(ack.acks, peer->acks, sizeof(peer->acks));
> +	ack.num_acks = htons(peer->num_acks);
> +	peer->num_acks = 0;
> +	homa_peer_unlock(peer);
> +	homa_xmit_control(ACK, &ack, sizeof(ack), rpc);
> +}

Does this code leak kernel stack memory? The 'struct homa_ack_hdr ack' is
declared on the stack without zeroing it. Only ack.acks (via memcpy) and
ack.num_acks are initialized before calling homa_xmit_control().

Looking at how homa_xmit_control() uses the ack:

The function sets ack.common.type, sport, dport, and sender_id, but the
following homa_common_hdr fields remain uninitialized: sequence, ack[3], doff,
reserved1, window, checksum, and reserved2.

These uninitialized stack bytes are then memcpy'd into an skb by
__homa_xmit_control() and transmitted on the network, leaking kernel stack
memory.

Additionally, the homa_common_hdr comment for doff warns: "other values may
cause some NICs (such as Intel E810-C) to drop outgoing packets when TCP
hijacking is enabled", so garbage in doff can cause packet drops.

The homa_xmit_control() doc comment claims "the common header will be filled
in by this function" but only 4 of 11 fields are actually set.

next prev parent reply	other threads:[~2026-03-18  7:21 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16 22:32 [PATCH net-next v17 00/14] Begin upstreaming Homa transport protocol John Ousterhout
2026-03-16 22:32 ` [PATCH net-next v17 01/14] net: homa: define user-visible API for Homa John Ousterhout
2026-03-17 10:10   ` kernel test robot
2026-03-17 18:40   ` kernel test robot
2026-03-16 22:32 ` [PATCH net-next v17 02/14] net: homa: create homa_wire.h John Ousterhout
2026-03-16 22:32 ` [PATCH net-next v17 03/14] net: homa: create shared Homa header files John Ousterhout
2026-03-18  7:20   ` [net-next,v17,03/14] " Paolo Abeni
2026-03-19 20:37     ` John Ousterhout
2026-03-16 22:32 ` [PATCH net-next v17 04/14] net: homa: create homa_pool.h and homa_pool.c John Ousterhout
2026-03-16 22:32 ` [PATCH net-next v17 05/14] net: homa: create homa_peer.h and homa_peer.c John Ousterhout
2026-03-18  7:21   ` Paolo Abeni [this message]
2026-03-20 17:13     ` [net-next,v17,05/14] " John Ousterhout
2026-03-20 17:20       ` Paolo Abeni
2026-03-16 22:32 ` [PATCH net-next v17 06/14] net: homa: create homa_sock.h and homa_sock.c John Ousterhout
2026-03-16 22:32 ` [PATCH net-next v17 07/14] net: homa: create homa_interest.h and homa_interest.c John Ousterhout
2026-03-18  7:21   ` [net-next,v17,07/14] " Paolo Abeni
2026-03-16 22:32 ` [PATCH net-next v17 08/14] net: homa: create homa_rpc.h and homa_rpc.c John Ousterhout
2026-03-18  7:21   ` [net-next,v17,08/14] " Paolo Abeni
2026-03-23 22:43     ` John Ousterhout
2026-03-24  8:55       ` Paolo Abeni
2026-03-16 22:32 ` [PATCH net-next v17 09/14] net: homa: create homa_outgoing.c John Ousterhout
2026-03-18  7:21   ` [net-next,v17,09/14] " Paolo Abeni
2026-03-20 18:21     ` John Ousterhout
2026-03-16 22:32 ` [PATCH net-next v17 10/14] net: homa: create homa_utils.c John Ousterhout
2026-03-16 22:32 ` [PATCH net-next v17 11/14] net: homa: create homa_incoming.c John Ousterhout
2026-03-18  7:21   ` [net-next,v17,11/14] " Paolo Abeni
2026-03-20 20:51     ` John Ousterhout
2026-03-16 22:32 ` [PATCH net-next v17 12/14] net: homa: create homa_timer.c John Ousterhout
2026-03-16 22:32 ` [PATCH net-next v17 13/14] net: homa: create homa_plumbing.c John Ousterhout
2026-03-18  7:21   ` [net-next,v17,13/14] " Paolo Abeni
2026-03-20 21:49     ` John Ousterhout
2026-03-16 22:32 ` [PATCH net-next v17 14/14] net: homa: create Makefile and Kconfig John Ousterhout
2026-03-17 18:51   ` kernel test robot
2026-03-17 19:26   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260318072119.268920-1-pabeni@redhat.com \
    --to=pabeni@redhat.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=ouster@cs.stanford.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox