Re: [PATCH v3 net] af_unix: Give up GC if MSG_PEEK intervened.

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Lee Jones <lee@kernel.org>
To: Kuniyuki Iwashima <kuniyu@google.com>, stable@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Kuniyuki Iwashima <kuni1840@gmail.com>,
	Linus Torvalds <torvalds@linuxfoundation.org>,
	netdev@vger.kernel.org, Igor Ushakov <sysroot314@gmail.com>
Subject: Re: [PATCH v3 net] af_unix: Give up GC if MSG_PEEK intervened.
Date: Tue, 7 Apr 2026 16:58:27 +0100	[thread overview]
Message-ID: <20260407155827.GA1993342@google.com> (raw)
In-Reply-To: <20260311054043.1231316-1-kuniyu@google.com>

INTENTIONAL TOP POST

I note that this was not sent to Stable, but it should be included please.

> Igor Ushakov reported that GC purged the receive queue of
> an alive socket due to a race with MSG_PEEK with a nice repro.
> 
> This is the exact same issue previously fixed by commit
> cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").
> 
> After GC was replaced with the current algorithm, the cited
> commit removed the locking dance in unix_peek_fds() and
> reintroduced the same issue.
> 
> The problem is that MSG_PEEK bumps a file refcount without
> interacting with GC.
> 
> Consider an SCC containing sk-A and sk-B, where sk-A is
> close()d but can be recv()ed via sk-B.
> 
> The bad thing happens if sk-A is recv()ed with MSG_PEEK from
> sk-B and sk-B is close()d while GC is checking unix_vertex_dead()
> for sk-A and sk-B.
> 
>   GC thread                    User thread
>   ---------                    -----------
>   unix_vertex_dead(sk-A)
>   -> true   <------.
>                     \
>                      `------   recv(sk-B, MSG_PEEK)
>               invalidate !!    -> sk-A's file refcount : 1 -> 2
> 
>                                close(sk-B)
>                                -> sk-B's file refcount : 2 -> 1
>   unix_vertex_dead(sk-B)
>   -> true
> 
> Initially, sk-A's file refcount is 1 by the inflight fd in sk-B
> recvq.  GC thinks sk-A is dead because the file refcount is the
> same as the number of its inflight fds.
> 
> However, sk-A's file refcount is bumped silently by MSG_PEEK,
> which invalidates the previous evaluation.
> 
> At this moment, sk-B's file refcount is 2; one by the open fd,
> and one by the inflight fd in sk-A.  The subsequent close()
> releases one refcount by the former.
> 
> Finally, GC incorrectly concludes that both sk-A and sk-B are dead.
> 
> One option is to restore the locking dance in unix_peek_fds(),
> but we can resolve this more elegantly thanks to the new algorithm.
> 
> The point is that the issue does not occur without the subsequent
> close() and we actually do not need to synchronise MSG_PEEK with
> the dead SCC detection.
> 
> When the issue occurs, close() and GC touch the same file refcount.
> If GC sees the refcount being decremented by close(), it can just
> give up garbage-collecting the SCC.
> 
> Therefore, we only need to signal the race during MSG_PEEK with
> a proper memory barrier to make it visible to the GC.
> 
> Let's use seqcount_t to notify GC when MSG_PEEK occurs and let
> it defer the SCC to the next run.
> 
> This way no locking is needed on the MSG_PEEK side, and we can
> avoid imposing a penalty on every MSG_PEEK unnecessarily.
> 
> Note that we can retry within unix_scc_dead() if MSG_PEEK is
> detected, but we do not do so to avoid hung task splat from
> abusive MSG_PEEK calls.
> 
> Fixes: 118f457da9ed ("af_unix: Remove lock dance in unix_peek_fds().")
> Reported-by: Igor Ushakov <sysroot314@gmail.com>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> ---
> v3: Check !fpl and add spinlock in unix_peek_fpl()
> v2: https://lore.kernel.org/all/20260309185823.3502204-1-kuniyu@google.com/
>   * Use seqcount_t for proper memory barrier
> v1: https://lore.kernel.org/netdev/20260308030406.1825938-1-kuniyu@google.com/
> ---
>  net/unix/af_unix.c |  2 ++
>  net/unix/af_unix.h |  1 +
>  net/unix/garbage.c | 79 ++++++++++++++++++++++++++++++----------------
>  3 files changed, 54 insertions(+), 28 deletions(-)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 7eaa5b187fef..b23c33df8b46 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1958,6 +1958,8 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
>  static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb)
>  {
>  	scm->fp = scm_fp_dup(UNIXCB(skb).fp);
> +
> +	unix_peek_fpl(scm->fp);
>  }
>  
>  static void unix_destruct_scm(struct sk_buff *skb)
> diff --git a/net/unix/af_unix.h b/net/unix/af_unix.h
> index c4f1b2da363d..8119dbeef3a3 100644
> --- a/net/unix/af_unix.h
> +++ b/net/unix/af_unix.h
> @@ -29,6 +29,7 @@ void unix_del_edges(struct scm_fp_list *fpl);
>  void unix_update_edges(struct unix_sock *receiver);
>  int unix_prepare_fpl(struct scm_fp_list *fpl);
>  void unix_destroy_fpl(struct scm_fp_list *fpl);
> +void unix_peek_fpl(struct scm_fp_list *fpl);
>  void unix_schedule_gc(struct user_struct *user);
>  
>  /* SOCK_DIAG */
> diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> index 816e8fa2b062..a7967a345827 100644
> --- a/net/unix/garbage.c
> +++ b/net/unix/garbage.c
> @@ -318,6 +318,25 @@ void unix_destroy_fpl(struct scm_fp_list *fpl)
>  	unix_free_vertices(fpl);
>  }
>  
> +static bool gc_in_progress;
> +static seqcount_t unix_peek_seq = SEQCNT_ZERO(unix_peek_seq);
> +
> +void unix_peek_fpl(struct scm_fp_list *fpl)
> +{
> +	static DEFINE_SPINLOCK(unix_peek_lock);
> +
> +	if (!fpl || !fpl->count_unix)
> +		return;
> +
> +	if (!READ_ONCE(gc_in_progress))
> +		return;
> +
> +	/* Invalidate the final refcnt check in unix_vertex_dead(). */
> +	spin_lock(&unix_peek_lock);
> +	raw_write_seqcount_barrier(&unix_peek_seq);
> +	spin_unlock(&unix_peek_lock);
> +}
> +
>  static bool unix_vertex_dead(struct unix_vertex *vertex)
>  {
>  	struct unix_edge *edge;
> @@ -351,6 +370,36 @@ static bool unix_vertex_dead(struct unix_vertex *vertex)
>  	return true;
>  }
>  
> +static LIST_HEAD(unix_visited_vertices);
> +static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
> +
> +static bool unix_scc_dead(struct list_head *scc, bool fast)
> +{
> +	struct unix_vertex *vertex;
> +	bool scc_dead = true;
> +	unsigned int seq;
> +
> +	seq = read_seqcount_begin(&unix_peek_seq);
> +
> +	list_for_each_entry_reverse(vertex, scc, scc_entry) {
> +		/* Don't restart DFS from this vertex. */
> +		list_move_tail(&vertex->entry, &unix_visited_vertices);
> +
> +		/* Mark vertex as off-stack for __unix_walk_scc(). */
> +		if (!fast)
> +			vertex->index = unix_vertex_grouped_index;
> +
> +		if (scc_dead)
> +			scc_dead = unix_vertex_dead(vertex);
> +	}
> +
> +	/* If MSG_PEEK intervened, defer this SCC to the next round. */
> +	if (read_seqcount_retry(&unix_peek_seq, seq))
> +		return false;
> +
> +	return scc_dead;
> +}
> +
>  static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *hitlist)
>  {
>  	struct unix_vertex *vertex;
> @@ -404,9 +453,6 @@ static bool unix_scc_cyclic(struct list_head *scc)
>  	return false;
>  }
>  
> -static LIST_HEAD(unix_visited_vertices);
> -static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
> -
>  static unsigned long __unix_walk_scc(struct unix_vertex *vertex,
>  				     unsigned long *last_index,
>  				     struct sk_buff_head *hitlist)
> @@ -474,9 +520,7 @@ static unsigned long __unix_walk_scc(struct unix_vertex *vertex,
>  	}
>  
>  	if (vertex->index == vertex->scc_index) {
> -		struct unix_vertex *v;
>  		struct list_head scc;
> -		bool scc_dead = true;
>  
>  		/* SCC finalised.
>  		 *
> @@ -485,18 +529,7 @@ static unsigned long __unix_walk_scc(struct unix_vertex *vertex,
>  		 */
>  		__list_cut_position(&scc, &vertex_stack, &vertex->scc_entry);
>  
> -		list_for_each_entry_reverse(v, &scc, scc_entry) {
> -			/* Don't restart DFS from this vertex in unix_walk_scc(). */
> -			list_move_tail(&v->entry, &unix_visited_vertices);
> -
> -			/* Mark vertex as off-stack. */
> -			v->index = unix_vertex_grouped_index;
> -
> -			if (scc_dead)
> -				scc_dead = unix_vertex_dead(v);
> -		}
> -
> -		if (scc_dead) {
> +		if (unix_scc_dead(&scc, false)) {
>  			unix_collect_skb(&scc, hitlist);
>  		} else {
>  			if (unix_vertex_max_scc_index < vertex->scc_index)
> @@ -550,19 +583,11 @@ static void unix_walk_scc_fast(struct sk_buff_head *hitlist)
>  	while (!list_empty(&unix_unvisited_vertices)) {
>  		struct unix_vertex *vertex;
>  		struct list_head scc;
> -		bool scc_dead = true;
>  
>  		vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry);
>  		list_add(&scc, &vertex->scc_entry);
>  
> -		list_for_each_entry_reverse(vertex, &scc, scc_entry) {
> -			list_move_tail(&vertex->entry, &unix_visited_vertices);
> -
> -			if (scc_dead)
> -				scc_dead = unix_vertex_dead(vertex);
> -		}
> -
> -		if (scc_dead) {
> +		if (unix_scc_dead(&scc, true)) {
>  			cyclic_sccs--;
>  			unix_collect_skb(&scc, hitlist);
>  		}
> @@ -577,8 +602,6 @@ static void unix_walk_scc_fast(struct sk_buff_head *hitlist)
>  		   cyclic_sccs ? UNIX_GRAPH_CYCLIC : UNIX_GRAPH_NOT_CYCLIC);
>  }
>  
> -static bool gc_in_progress;
> -
>  static void unix_gc(struct work_struct *work)
>  {
>  	struct sk_buff_head hitlist;
> -- 
> 2.53.0.473.g4a7958ca14-goog
> 

-- 
Lee Jones [李琼斯]

next prev parent reply	other threads:[~2026-04-07 15:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-11  5:40 [PATCH v3 net] af_unix: Give up GC if MSG_PEEK intervened Kuniyuki Iwashima
2026-03-12 20:40 ` patchwork-bot+netdevbpf
2026-04-07 15:58 ` Lee Jones [this message]
2026-04-07 16:03   ` Kuniyuki Iwashima
2026-04-07 17:01   ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260407155827.GA1993342@google.com \
    --to=lee@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=sysroot314@gmail.com \
    --cc=torvalds@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox