All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lee Jones <lee@kernel.org>
To: lee@kernel.org, "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Kuniyuki Iwashima <kuniyu@amazon.com>,
	Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org>,
	Michal Luczaj <mhal@rbox.co>, Rao Shoaib <Rao.Shoaib@oracle.com>,
	Pavel Begunkov <asml.silence@gmail.com>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Subject: [PATCH v6.6 21/26] af_unix: Remove lock dance in unix_peek_fds().
Date: Wed, 21 May 2025 14:45:29 +0000	[thread overview]
Message-ID: <20250521144803.2050504-22-lee@kernel.org> (raw)
In-Reply-To: <20250521144803.2050504-1-lee@kernel.org>

From: Kuniyuki Iwashima <kuniyu@amazon.com>

[ Upstream commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e ]

In the previous GC implementation, the shape of the inflight socket
graph was not expected to change while GC was in progress.

MSG_PEEK was tricky because it could install inflight fd silently
and transform the graph.

Let's say we peeked a fd, which was a listening socket, and accept()ed
some embryo sockets from it.  The garbage collection algorithm would
have been confused because the set of sockets visited in scan_inflight()
would change within the same GC invocation.

That's why we placed spin_lock(&unix_gc_lock) and spin_unlock() in
unix_peek_fds() with a fat comment.

In the new GC implementation, we no longer garbage-collect the socket
if it exists in another queue, that is, if it has a bridge to another
SCC.  Also, accept() will require the lock if it has edges.

Thus, we need not do the complicated lock dance.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240401173125.92184-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e)
Signed-off-by: Lee Jones <lee@kernel.org>
---
 include/net/af_unix.h |  1 -
 net/unix/af_unix.c    | 42 ------------------------------------------
 net/unix/garbage.c    |  2 +-
 3 files changed, 1 insertion(+), 44 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index d88ca51a9081d..47042de4a2a9c 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -17,7 +17,6 @@ static inline struct unix_sock *unix_get_socket(struct file *filp)
 }
 #endif
 
-extern spinlock_t unix_gc_lock;
 extern unsigned int unix_tot_inflight;
 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver);
 void unix_del_edges(struct scm_fp_list *fpl);
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index eee0bccd7877b..df70d8a7ee837 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1796,48 +1796,6 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb)
 {
 	scm->fp = scm_fp_dup(UNIXCB(skb).fp);
-
-	/*
-	 * Garbage collection of unix sockets starts by selecting a set of
-	 * candidate sockets which have reference only from being in flight
-	 * (total_refs == inflight_refs).  This condition is checked once during
-	 * the candidate collection phase, and candidates are marked as such, so
-	 * that non-candidates can later be ignored.  While inflight_refs is
-	 * protected by unix_gc_lock, total_refs (file count) is not, hence this
-	 * is an instantaneous decision.
-	 *
-	 * Once a candidate, however, the socket must not be reinstalled into a
-	 * file descriptor while the garbage collection is in progress.
-	 *
-	 * If the above conditions are met, then the directed graph of
-	 * candidates (*) does not change while unix_gc_lock is held.
-	 *
-	 * Any operations that changes the file count through file descriptors
-	 * (dup, close, sendmsg) does not change the graph since candidates are
-	 * not installed in fds.
-	 *
-	 * Dequeing a candidate via recvmsg would install it into an fd, but
-	 * that takes unix_gc_lock to decrement the inflight count, so it's
-	 * serialized with garbage collection.
-	 *
-	 * MSG_PEEK is special in that it does not change the inflight count,
-	 * yet does install the socket into an fd.  The following lock/unlock
-	 * pair is to ensure serialization with garbage collection.  It must be
-	 * done between incrementing the file count and installing the file into
-	 * an fd.
-	 *
-	 * If garbage collection starts after the barrier provided by the
-	 * lock/unlock, then it will see the elevated refcount and not mark this
-	 * as a candidate.  If a garbage collection is already in progress
-	 * before the file count was incremented, then the lock/unlock pair will
-	 * ensure that garbage collection is finished before progressing to
-	 * installing the fd.
-	 *
-	 * (*) A -> B where B is on the queue of A or B is on the queue of C
-	 * which is on the queue of listening socket A.
-	 */
-	spin_lock(&unix_gc_lock);
-	spin_unlock(&unix_gc_lock);
 }
 
 static void unix_destruct_scm(struct sk_buff *skb)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 89ea71d9297ba..12a4ec27e0d4d 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -183,7 +183,7 @@ static void unix_free_vertices(struct scm_fp_list *fpl)
 	}
 }
 
-DEFINE_SPINLOCK(unix_gc_lock);
+static DEFINE_SPINLOCK(unix_gc_lock);
 unsigned int unix_tot_inflight;
 
 void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
-- 
2.49.0.1112.g889b7c5bd8-goog


  parent reply	other threads:[~2025-05-21 14:52 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-21 14:45 [PATCH v6.6 00/26] af_unix: Align with upstream to avoid a potential UAF Lee Jones
2025-05-21 14:45 ` [PATCH v6.6 01/26] af_unix: Return struct unix_sock from unix_get_socket() Lee Jones
2025-05-22  2:03   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 02/26] af_unix: Run GC on only one CPU Lee Jones
2025-05-22  2:04   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 03/26] af_unix: Try to run GC async Lee Jones
2025-05-22  2:04   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 04/26] af_unix: Replace BUG_ON() with WARN_ON_ONCE() Lee Jones
2025-05-22  2:08   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 05/26] af_unix: Remove io_uring code for GC Lee Jones
2025-05-22  2:07   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 06/26] af_unix: Remove CONFIG_UNIX_SCM Lee Jones
2025-05-22  2:03   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 07/26] af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd Lee Jones
2025-05-22  2:08   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 08/26] af_unix: Allocate struct unix_edge " Lee Jones
2025-05-22  2:06   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 09/26] af_unix: Link struct unix_edge when queuing skb Lee Jones
2025-05-22  2:05   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 10/26] af_unix: Bulk update unix_tot_inflight/unix_inflight " Lee Jones
2025-05-22  2:03   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 11/26] af_unix: Iterate all vertices by DFS Lee Jones
2025-05-22  2:04   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 12/26] af_unix: Detect Strongly Connected Components Lee Jones
2025-05-22  2:06   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 13/26] af_unix: Save listener for embryo socket Lee Jones
2025-05-22  2:08   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 14/26] af_unix: Fix up unix_edge.successor " Lee Jones
2025-05-22  2:04   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 15/26] af_unix: Save O(n) setup of Tarjan's algo Lee Jones
2025-05-22  2:04   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 16/26] af_unix: Skip GC if no cycle exists Lee Jones
2025-05-22  2:07   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 17/26] af_unix: Avoid Tarjan's algorithm if unnecessary Lee Jones
2025-05-22  2:04   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 18/26] af_unix: Assign a unique index to SCC Lee Jones
2025-05-22  2:07   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 19/26] af_unix: Detect dead SCC Lee Jones
2025-05-22  2:05   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 20/26] af_unix: Replace garbage collection algorithm Lee Jones
2025-05-22  2:04   ` Sasha Levin
2025-05-21 14:45 ` Lee Jones [this message]
2025-05-22  2:07   ` [PATCH v6.6 21/26] af_unix: Remove lock dance in unix_peek_fds() Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 22/26] af_unix: Try not to hold unix_gc_lock during accept() Lee Jones
2025-05-22  2:07   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 23/26] af_unix: Don't access successor in unix_del_edges() during GC Lee Jones
2025-05-22  2:08   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 24/26] af_unix: Add dead flag to struct scm_fp_list Lee Jones
2025-05-22  2:05   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 25/26] af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS Lee Jones
2025-05-22  2:04   ` Sasha Levin
2025-05-21 14:45 ` [PATCH v6.6 26/26] af_unix: Fix uninit-value in __unix_walk_scc() Lee Jones
2025-05-22  2:07   ` Sasha Levin
2025-05-29 12:26 ` [PATCH v6.6 00/26] af_unix: Align with upstream to avoid a potential UAF Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250521144803.2050504-22-lee@kernel.org \
    --to=lee@kernel.org \
    --cc=Rao.Shoaib@oracle.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhal@rbox.co \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.