From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 269E133D51A for ; Wed, 18 Mar 2026 07:21:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773818491; cv=none; b=HJ+uAqP1MIdyHkp1rC2uVZnooGYXll86LOi/8WPaMyqBmq3OzVGR8s27fsPBQEkH8svwsZk+XNq1HudLqQt+ZVdH4kKKiDH67aVtaaBaJsZJI17yLLB1mWn3bHWm9SP5R2ViIpmiC8gI70hxAtjc74We8W5ejC3PVbVESYleXoc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773818491; c=relaxed/simple; bh=3dc0JU/ZvkDWjYP28nqm/owf0iFWLlrha7m57TAytOw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ARa0E4uF1pXFn6i/xpcRRXLurOwqt01K0+/Qoz/uSxK3lg352XihdciJyZ+ioIQYWNP2hO2LK+YZJXCMvuzUDx1Nq94uiOQiAo/TnQS+y8dhOVip1X+Akp/CXYdU8g7O5mRuwLGMD149CgAsFslKW7Ca8eeHy1RjyX/cCsEO8KA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZIRR5EhS; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZIRR5EhS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1773818489; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=w2hbaWHuKsfYm1M0MvpSmmTQLNK0jpDa1Cl1UFVeCQo=; b=ZIRR5EhSOOtqurKj+5RF+eUxNUtYTHMiYdvIv3DUozKU9jnWVp8jiJCq9y6/TEAae3biOv iTFj/g7yE4fWwOzftQ7yMcCdfekqNC6P7/drOOP+mia6iFzjW78eibPYRml6l0KCmFrxbj TvnwHZM0BIWDv15VE9Mx9+FwYeNhMUg= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-168-r1ERTHf7MV6PuyCJAhuB-A-1; Wed, 18 Mar 2026 03:21:25 -0400 X-MC-Unique: r1ERTHf7MV6PuyCJAhuB-A-1 X-Mimecast-MFC-AGG-ID: r1ERTHf7MV6PuyCJAhuB-A_1773818484 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 45CED1944EB3; Wed, 18 Mar 2026 07:21:24 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.45.224.192]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 351C130001A1; Wed, 18 Mar 2026 07:21:21 +0000 (UTC) From: Paolo Abeni To: ouster@cs.stanford.edu Cc: horms@kernel.org, netdev@vger.kernel.org, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com Subject: Re: [net-next,v17,05/14] net: homa: create homa_peer.h and homa_peer.c Date: Wed, 18 Mar 2026 08:21:19 +0100 Message-ID: <20260318072119.268920-1-pabeni@redhat.com> In-Reply-To: <20260316223228.2611-6-ouster@cs.stanford.edu> References: <20260316223228.2611-6-ouster@cs.stanford.edu> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 This is an AI-generated review of your patch. The human sending this email has considered the AI review valid, or at least plausible. For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html --- net: homa: create homa_peer.h and homa_peer.c Homa needs to keep a small amount of information for each peer that it has communicated with. These files define that state and provide functions for storing and accessing it. > diff --git a/net/homa/homa_peer.c b/net/homa/homa_peer.c > new file mode 100644 > index 000000000000..1e5301cdef41 > --- /dev/null > +++ b/net/homa/homa_peer.c [ ... ] > +/** > + * homa_peer_free_net() - Garbage collect all of the peer information > + * associated with a particular network namespace. > + * @hnet: Network namespace whose peers should be freed. There must not > + * be any active sockets or RPCs for this namespace. > + */ > +void homa_peer_free_net(struct homa_net *hnet) > +{ > + struct homa_peertab *peertab = hnet->homa->peertab; > + struct rhashtable_iter iter; > + struct homa_peer *peer; > + > + spin_lock_bh(&peertab->lock); > + peertab->gc_stop_count++; > + spin_unlock_bh(&peertab->lock); > + > + rhashtable_walk_enter(&peertab->ht, &iter); > + rhashtable_walk_start(&iter); > + while (1) { > + peer = rhashtable_walk_next(&iter); > + if (!peer) > + break; > + if (IS_ERR(peer)) > + continue; > + if (peer->ht_key.hnet != hnet) > + continue; > + if (rhashtable_remove_fast(&peertab->ht, &peer->ht_linkage, > + ht_params) == 0) { > + homa_peer_release(peer); > + hnet->num_peers--; > + peertab->num_peers--; > + } > + } > + rhashtable_walk_stop(&iter); > + rhashtable_walk_exit(&iter); Does this code correctly protect peertab->num_peers updates? Looking at homa_peer_gc(): net/homa/homa_peer.c:homa_peer_gc() { spin_lock_bh(&peertab->lock); ... if (rhashtable_remove_fast(&peertab->ht, &peer->ht_linkage, ht_params) == 0) { homa_peer_release(peer); peertab->num_peers--; peer->ht_key.hnet->num_peers--; } ... spin_unlock_bh(&peertab->lock); } and homa_peer_get(): net/homa/homa_peer.c:homa_peer_get() { ... spin_lock_bh(&peertab->lock); other = rhashtable_lookup_get_insert_fast(&peertab->ht, &peer->ht_linkage, ht_params); ... peertab->num_peers++; key.hnet->num_peers++; ... spin_unlock_bh(&peertab->lock); } Both those functions hold peertab->lock while updating peertab->num_peers. But homa_peer_free_net() drops peertab->lock before the rhashtable walk and doesn't reacquire it before decrementing peertab->num_peers. Can a concurrent homa_peer_get() call (for any namespace) produce lost updates to peertab->num_peers? The gc_stop_count mechanism prevents concurrent GC but doesn't prevent concurrent homa_peer_get(). Corrupted num_peers affects GC threshold decisions in the peertab->num_peers < peertab->gc_threshold check in homa_peer_gc(). > +/** > + * homa_peer_prefer_evict() - Given two peers, determine which one is > + * a better candidate for eviction. > + * @peertab: Overall information used to manage peers. > + * @peer1: First peer. > + * @peer2: Second peer. > + * Return: True if @peer1 is a better candidate for eviction than @peer2. > + */ > +int homa_peer_prefer_evict(struct homa_peertab *peertab, > + struct homa_peer *peer1, > + struct homa_peer *peer2) > +{ > + /* Prefer a peer whose homa-net is over its limit; if both are either > + * over or under, then prefer the peer with the shortest idle time. > + */ ^^^^^^^^ This isn't a bug, but the comment says "shortest idle time" while the code returns peer1->access_jiffies < peer2->access_jiffies, which prefers the peer accessed earlier (longest idle time). Since eviction should prefer stale peers, the code is correct but the comment should say "longest idle time". [ ... ] > +/** > + * homa_peer_free() - Release any resources in a peer and free the homa_peer > + * struct. Invoked by the RCU mechanism via homa_peer_release. > + * @head: Pointer to the rcu_head field of the peer to free. > + */ > +void homa_peer_free(struct rcu_head *head) > +{ > + struct homa_peer *peer; > + > + peer = container_of(head, struct homa_peer, rcu_head); > + dst_release(rcu_dereference(peer->dst)); > + kfree(peer); > +} Is the RCU API correct here? homa_peer_free() is an RCU callback invoked via call_rcu from homa_peer_release(): net/homa/homa_peer.h:homa_peer_release() { if (refcount_dec_and_test(&peer->refs)) call_rcu(&peer->rcu_head, homa_peer_free); } RCU callbacks run after the grace period, outside any RCU read-side critical section. Using rcu_dereference() without holding rcu_read_lock() triggers lockdep 'suspicious rcu_dereference_check() usage' warnings at runtime. The peer is exclusively owned at this point (refcount was zero, grace period has passed). The same codebase uses the correct pattern in homa_peer_reset_dst(): net/homa/homa_peer.c:homa_peer_reset_dst() { ... dst_release(rcu_replace_pointer(peer->dst, dst, true)); ... } Should this use rcu_dereference_protected(peer->dst, 1) instead of rcu_dereference(peer->dst)? > +/** > + * homa_peer_get() - Returns the peer associated with a given host; creates > + * a new homa_peer if one doesn't already exist. > + * @hsk: Socket where the peer will be used. > + * @addr: Address of the desired host: IPv4 addresses are represented > + * as IPv4-mapped IPv6 addresses. > + * > + * Return: The peer associated with @addr, or a negative errno if an > + * error occurred. On a successful return the reference count > + * will be incremented for the returned peer. The caller must > + * eventually call homa_peer_release to release the reference. > + */ > +struct homa_peer *homa_peer_get(struct homa_sock *hsk, > + const struct in6_addr *addr) > +{ > + struct homa_peertab *peertab = hsk->homa->peertab; > + struct homa_peer *peer, *other; > + struct homa_peer_key key; > + > + key.addr = *addr; > + key.hnet = hsk->hnet; > + rcu_read_lock(); > + peer = rhashtable_lookup(&peertab->ht, &key, ht_params); > + if (peer) { > + homa_peer_hold(peer); > + peer->access_jiffies = jiffies; > + rcu_read_unlock(); > + return peer; > + } Can this sequence race with homa_peer_gc()? The lookup uses rhashtable_lookup() under rcu_read_lock() but without peertab->lock. Then it calls homa_peer_hold() which uses refcount_inc(): net/homa/homa_peer.h:homa_peer_hold() { refcount_inc(&peer->refs); } Concurrently, homa_peer_gc() can: net/homa/homa_peer.c:homa_peer_gc() { spin_lock_bh(&peertab->lock); ... if (rhashtable_remove_fast(&peertab->ht, &peer->ht_linkage, ht_params) == 0) { homa_peer_release(peer); ... } ... spin_unlock_bh(&peertab->lock); } net/homa/homa_peer.h:homa_peer_release() { if (refcount_dec_and_test(&peer->refs)) call_rcu(&peer->rcu_head, homa_peer_free); } If homa_peer_gc() removes the peer from the rhashtable and calls homa_peer_release() (dropping refcount to 0 and scheduling call_rcu) between the lookup and the hold, then refcount_inc() on a zero refcount triggers REFCOUNT_WARN and refuses the increment. This leaves the caller with an unref'd peer that will be freed after the RCU grace period. The standard kernel pattern for RCU + rhashtable lookups is refcount_inc_not_zero() with retry or fallback if it returns false. The RCU read lock keeps the memory valid but doesn't prevent the refcount from reaching zero. [ ... ] > +/** > + * homa_peer_add_ack() - Add a given RPC to the list of unacked > + * RPCs for its server. Once this method has been invoked, it's safe > + * to delete the RPC, since it will eventually be acked to the server. > + * @rpc: Client RPC that has now completed. > + */ > +void homa_peer_add_ack(struct homa_rpc *rpc) > +{ > + struct homa_peer *peer = rpc->peer; > + struct homa_ack_hdr ack; > + > + homa_peer_lock(peer); > + if (peer->num_acks < HOMA_MAX_ACKS_PER_PKT) { > + peer->acks[peer->num_acks].client_id = cpu_to_be64(rpc->id); > + peer->acks[peer->num_acks].server_port = htons(rpc->dport); > + peer->num_acks++; > + homa_peer_unlock(peer); > + return; > + } > + > + /* The peer has filled up; send an ACK message to empty it. The > + * RPC in the message header will also be considered ACKed. > + */ > + memcpy(ack.acks, peer->acks, sizeof(peer->acks)); > + ack.num_acks = htons(peer->num_acks); > + peer->num_acks = 0; > + homa_peer_unlock(peer); > + homa_xmit_control(ACK, &ack, sizeof(ack), rpc); > +} Does this code leak kernel stack memory? The 'struct homa_ack_hdr ack' is declared on the stack without zeroing it. Only ack.acks (via memcpy) and ack.num_acks are initialized before calling homa_xmit_control(). Looking at how homa_xmit_control() uses the ack: The function sets ack.common.type, sport, dport, and sender_id, but the following homa_common_hdr fields remain uninitialized: sequence, ack[3], doff, reserved1, window, checksum, and reserved2. These uninitialized stack bytes are then memcpy'd into an skb by __homa_xmit_control() and transmitted on the network, leaking kernel stack memory. Additionally, the homa_common_hdr comment for doff warns: "other values may cause some NICs (such as Intel E810-C) to drop outgoing packets when TCP hijacking is enabled", so garbage in doff can cause packet drops. The homa_xmit_control() doc comment claims "the common header will be filled in by this function" but only 4 of 11 fields are actually set.