* [PATCH v2 net] net/ipv6: release expired exception dst cached in socket
@ 2024-11-28 8:59 Jiri Wiesner
2024-11-29 9:18 ` Eric Dumazet
2024-12-03 3:30 ` patchwork-bot+netdevbpf
0 siblings, 2 replies; 3+ messages in thread
From: Jiri Wiesner @ 2024-11-28 8:59 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Xin Long, yousaf.kaukab, andreas.taschner
Dst objects get leaked in ip6_negative_advice() when this function is
executed for an expired IPv6 route located in the exception table. There
are several conditions that must be fulfilled for the leak to occur:
* an ICMPv6 packet indicating a change of the MTU for the path is received,
resulting in an exception dst being created
* a TCP connection that uses the exception dst for routing packets must
start timing out so that TCP begins retransmissions
* after the exception dst expires, the FIB6 garbage collector must not run
before TCP executes ip6_negative_advice() for the expired exception dst
When TCP executes ip6_negative_advice() for an exception dst that has
expired and if no other socket holds a reference to the exception dst, the
refcount of the exception dst is 2, which corresponds to the increment
made by dst_init() and the increment made by the TCP socket for which the
connection is timing out. The refcount made by the socket is never
released. The refcount of the dst is decremented in sk_dst_reset() but
that decrement is counteracted by a dst_hold() intentionally placed just
before the sk_dst_reset() in ip6_negative_advice(). After
ip6_negative_advice() has finished, there is no other object tied to the
dst. The socket lost its reference stored in sk_dst_cache and the dst is
no longer in the exception table. The exception dst becomes a leaked
object.
As a result of this dst leak, an unbalanced refcount is reported for the
loopback device of a net namespace being destroyed under kernels that do
not contain e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev"):
unregister_netdevice: waiting for lo to become free. Usage count = 2
Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The
patch that introduced the dst_hold() in ip6_negative_advice() was
92f1655aa2b22 ("net: fix __dst_negative_advice() race"). But 92f1655aa2b22
merely refactored the code with regards to the dst refcount so the issue
was present even before 92f1655aa2b22. The bug was introduced in
54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually
expired.") where the expired cached route is deleted and the sk_dst_cache
member of the socket is set to NULL by calling dst_negative_advice() but
the refcount belonging to the socket is left unbalanced.
The IPv4 version - ipv4_negative_advice() - is not affected by this bug.
When the TCP connection times out ipv4_negative_advice() merely resets the
sk_dst_cache of the socket while decrementing the refcount of the
exception dst.
Fixes: 92f1655aa2b22 ("net: fix __dst_negative_advice() race")
Fixes: 54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually expired.")
Link: https://lore.kernel.org/netdev/20241113105611.GA6723@incl/T/#u
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
---
v2 changes:
* a comment describing the lifetime of the dst was put in place
* the steps to reproduce the issue were dropped because listing them
does not make the issue more apparent
* the commented trace was replaced with a description that should
be more succinct and easier to read
* the word count of the changelog was reduced
* a link was added
Paolo, Eric, thanks for the comments.
net/ipv6/route.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b4251915585f..31b4f97d7728 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2780,10 +2780,10 @@ static void ip6_negative_advice(struct sock *sk,
if (rt->rt6i_flags & RTF_CACHE) {
rcu_read_lock();
if (rt6_check_expired(rt)) {
- /* counteract the dst_release() in sk_dst_reset() */
- dst_hold(dst);
+ /* rt/dst can not be destroyed yet,
+ * because of rcu_read_lock()
+ */
sk_dst_reset(sk);
-
rt6_remove_exception_rt(rt);
}
rcu_read_unlock();
--
2.35.3
--
Jiri Wiesner
SUSE Labs
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v2 net] net/ipv6: release expired exception dst cached in socket
2024-11-28 8:59 [PATCH v2 net] net/ipv6: release expired exception dst cached in socket Jiri Wiesner
@ 2024-11-29 9:18 ` Eric Dumazet
2024-12-03 3:30 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: Eric Dumazet @ 2024-11-29 9:18 UTC (permalink / raw)
To: Jiri Wiesner
Cc: netdev, David S. Miller, David Ahern, Jakub Kicinski, Paolo Abeni,
Xin Long, yousaf.kaukab, andreas.taschner
On Thu, Nov 28, 2024 at 9:59 AM Jiri Wiesner <jwiesner@suse.de> wrote:
>
> Dst objects get leaked in ip6_negative_advice() when this function is
> executed for an expired IPv6 route located in the exception table. There
> are several conditions that must be fulfilled for the leak to occur:
> * an ICMPv6 packet indicating a change of the MTU for the path is received,
> resulting in an exception dst being created
> * a TCP connection that uses the exception dst for routing packets must
> start timing out so that TCP begins retransmissions
> * after the exception dst expires, the FIB6 garbage collector must not run
> before TCP executes ip6_negative_advice() for the expired exception dst
>
> When TCP executes ip6_negative_advice() for an exception dst that has
> expired and if no other socket holds a reference to the exception dst, the
> refcount of the exception dst is 2, which corresponds to the increment
> made by dst_init() and the increment made by the TCP socket for which the
> connection is timing out. The refcount made by the socket is never
> released. The refcount of the dst is decremented in sk_dst_reset() but
> that decrement is counteracted by a dst_hold() intentionally placed just
> before the sk_dst_reset() in ip6_negative_advice(). After
> ip6_negative_advice() has finished, there is no other object tied to the
> dst. The socket lost its reference stored in sk_dst_cache and the dst is
> no longer in the exception table. The exception dst becomes a leaked
> object.
>
> As a result of this dst leak, an unbalanced refcount is reported for the
> loopback device of a net namespace being destroyed under kernels that do
> not contain e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev"):
> unregister_netdevice: waiting for lo to become free. Usage count = 2
>
> Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The
> patch that introduced the dst_hold() in ip6_negative_advice() was
> 92f1655aa2b22 ("net: fix __dst_negative_advice() race"). But 92f1655aa2b22
> merely refactored the code with regards to the dst refcount so the issue
> was present even before 92f1655aa2b22. The bug was introduced in
> 54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually
> expired.") where the expired cached route is deleted and the sk_dst_cache
> member of the socket is set to NULL by calling dst_negative_advice() but
> the refcount belonging to the socket is left unbalanced.
>
> The IPv4 version - ipv4_negative_advice() - is not affected by this bug.
> When the TCP connection times out ipv4_negative_advice() merely resets the
> sk_dst_cache of the socket while decrementing the refcount of the
> exception dst.
>
> Fixes: 92f1655aa2b22 ("net: fix __dst_negative_advice() race")
> Fixes: 54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually expired.")
> Link: https://lore.kernel.org/netdev/20241113105611.GA6723@incl/T/#u
> Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Thanks a lot Jiri !
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH v2 net] net/ipv6: release expired exception dst cached in socket
2024-11-28 8:59 [PATCH v2 net] net/ipv6: release expired exception dst cached in socket Jiri Wiesner
2024-11-29 9:18 ` Eric Dumazet
@ 2024-12-03 3:30 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-12-03 3:30 UTC (permalink / raw)
To: Jiri Wiesner
Cc: netdev, davem, dsahern, edumazet, kuba, pabeni, lucien.xin,
yousaf.kaukab, andreas.taschner
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 28 Nov 2024 09:59:50 +0100 you wrote:
> Dst objects get leaked in ip6_negative_advice() when this function is
> executed for an expired IPv6 route located in the exception table. There
> are several conditions that must be fulfilled for the leak to occur:
> * an ICMPv6 packet indicating a change of the MTU for the path is received,
> resulting in an exception dst being created
> * a TCP connection that uses the exception dst for routing packets must
> start timing out so that TCP begins retransmissions
> * after the exception dst expires, the FIB6 garbage collector must not run
> before TCP executes ip6_negative_advice() for the expired exception dst
>
> [...]
Here is the summary with links:
- [v2,net] net/ipv6: release expired exception dst cached in socket
https://git.kernel.org/netdev/net/c/3301ab7d5aeb
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-12-03 3:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-28 8:59 [PATCH v2 net] net/ipv6: release expired exception dst cached in socket Jiri Wiesner
2024-11-29 9:18 ` Eric Dumazet
2024-12-03 3:30 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).