From: Jiri Wiesner <jwiesner@suse.de>
To: Xin Long <lucien.xin@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>,
netdev@vger.kernel.org, David Ahern <dsahern@kernel.org>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
"David S. Miller" <davem@davemloft.net>
Subject: Re: [RFC PATCH] ipv6: route: release reference of dsts cached in sockets
Date: Wed, 13 Nov 2024 11:59:12 +0100 [thread overview]
Message-ID: <20241113105912.GB19776@incl> (raw)
In-Reply-To: <CADvbK_dz9ewsEmGa63DgMOwRwFyE-evALq61CUYi54-K_WTvog@mail.gmail.com>
On Tue, Nov 12, 2024 at 04:51:27PM -0500, Xin Long wrote:
> On Tue, Nov 12, 2024 at 3:57 AM Jiri Wiesner <jwiesner@suse.de> wrote:
> >
> > On Sun, Oct 06, 2024 at 02:25:25PM -0400, Xin Long wrote:
> > > We recently also encountered this
> > >
> > > 'unregister_netdevice: waiting for lo to become free. Usage count = X'
> > >
> > > problem on our customer env after backporting
> > >
> > > Commit 92f1655aa2b22 ("net: fix __dst_negative_advice() race"). [1]
> > >
> > > The commit looks correct to me, so I guess it may uncover some existing
> > > issues.
> > >
> > > As it took a very long time to get reproduced on our customer env, which
> > > made it impossible to debug. Also the issue existed even after
> > > disabling IPv6.
> > >
> > > It seems much easier to reproduce it on your customer env. So I'm wondering
> > >
> > > - Was the testing on your customer env related to IPv6 ?
> > > - Does the issue still exist after reverting the commit [1] ?
> >
> > The customer tried reproducing the issue with 92f1655aa2b22 ("net: fix __dst_negative_advice() race") reverted and the issue appeared again. My next step was capturing more points within the stacktraces when dst refcounts are changed (I do not have full stack traces - the trace entries I store in the hash contain only the instruction pointer and the parent function). This is the trace for the leaked dst object:
> > > dst ff1c4157413bf900
> > > alloc: 1 destroy: obj diff: 1
> > > hold ops: 5 put ops: 4 refcnt diff: 1
> > > Function Parent Op Net Device Dst Number of Calls
> > > inet6_csk_route_socket+0x1c2/0x2d0: inet6_csk_update_pmtu+0x58/0x90: dst ff1c4151bcac1080 eth0 ff1c4157413bf900 1
> > > __ip6_rt_update_pmtu+0x183/0x3c0: inet6_csk_update_pmtu+0x4b/0x90: dst ff1c4151bcac1080 eth0 ff1c4157413bf900 1
> > > ip6_dst_lookup_flow+0x4f/0x1d0: inet6_csk_route_socket+0x198/0x2d0: dst ff1c4151bcac1080 eth0 ff1c4157413bf900 1
> > > ip6_dst_ifdown+0x5/0x250: dst_dev_put+0x2d/0xd0: dst ff1c4151bcac1080 eth0 ff1c4157413bf900 1
> > > tcp_v6_connect+0x37e/0x790: __inet_stream_connect+0x2c8/0x3a0: dst ff1c4151bcac1080 eth0 ff1c4157413bf900 2
> > > ip6_dst_lookup_flow+0x4f/0x1d0: tcp_v6_connect+0x320/0x790: dst ff1c4151bcac1080 eth0 ff1c4157413bf900 2
> > > __dev_queue_xmit+0x1af/0xd20: ip6_finish_output2+0x1f1/0x6e0: dst ff1c4151bcac1080 eth0 ff1c4157413bf900 675
> > > inet6_csk_xmit+0xa1/0x150: __tcp_transmit_skb+0x5f8/0xd40: dst ff1c4151bcac1080 eth0 ff1c4157413bf900 675
> > > Function Parent Op Net Device Dst Refcount Diff
> > > ip6_route_output_flags+0x76/0x230: ip6_dst_lookup_tail+0x215/0x250: dst_hold ff1c4151bcac1080 eth0 ff1c4157413bf900 1
> > > ip6_negative_advice+0x66/0x2d0: tcp_retransmit_timer+0x5ed/0xac0: dst_hold ff1c4151bcac1080 eth0 ff1c4157413bf900 1
> > > dst_alloc+0x5e/0x180: ip6_dst_alloc+0x27/0x60: dst_hold ff1c4151bcac1080 eth0 ff1c4157413bf900 1
> > > ip6_route_output_flags+0x76/0x230: ip6_dst_lookup_tail+0x10f/0x250: dst_hold ff1c4151bcac1080 eth0 ff1c4157413bf900 2
> > > dst_release+0x32/0x140: ip6_negative_advice+0x137/0x2d0: dst_put ff1c4151bcac1080 eth0 ff1c4157413bf900 1
> > > dst_release+0x32/0x140: inet_sock_destruct+0x146/0x1c0: dst_put ff1c4151bcac1080 eth0 ff1c4157413bf900 2
> > > dst_release+0x32/0x140: rt6_remove_exception.part.53+0x7f/0xe0: dst_put ffffffff90ee2a80 blackhole_dev ff1c4157413bf900 1
> > > Function Parent Op Net Dst Socket Number of Calls
> > > sk_setup_caps+0x5/0x200: tcp_v6_connect+0x428/0x790: sk_dst ff1c4151bcac1080 ff1c4157413bf900 ff1c4158c07aaf80 1
> > > sk_setup_caps+0x5/0x200: tcp_v6_connect+0x428/0x790: sk_dst ff1c4151bcac1080 ff1c4157413bf900 ff1c418e49bac280 1
> > > sk_setup_caps+0x5/0x200: inet6_csk_route_socket+0x23c/0x2d0: sk_dst ff1c4151bcac1080 ff1c4157413bf900 ff1c418ea04f7200 1
> >
> > The refcount of the dst was changed only when a dst_hold or dst_put operation was logged. The hold in dst_alloc() is balanced by the dst_release() in rt6_remove_exception(). The three holds in ip6_route_output_flags() belonged to the references held by sockets which were balanced by the dst_release() in inet_sock_destruct(). But that leaves one hold operation outstanding and unbalanced. There is a put operation for expired cached dst objects in ip6_negative_advice() but that one is intentionally balanced by a hold in the same function.
> >
> This makes sense to me.
>
> The code prior to 92f1655aa2b22 ("net: fix __dst_negative_advice()
> race") had no longer been able to release the cached dst for the
> reference held by socket in ip6_negative_advice() since
> rt6_remove_exception_rt() is called there.
>
> Hi, David Ahern,
>
> Can you confirm this?
I have submitted "[PATCH net] net/ipv6: release expired exception dst cached in socket" which should have clearer explanations not burdened by the debugging process. I initially debugged the issue on a customer's server without having any idea how to reproduce it. I then reconstructed the steps to reproduce the issue from the tracing data, which allowed me to check for the issue in the latest upstream kernel.
--
Jiri Wiesner
SUSE Labs
next prev parent reply other threads:[~2024-11-13 10:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-30 18:09 [RFC PATCH] ipv6: route: release reference of dsts cached in sockets Jiri Wiesner
2024-09-30 18:27 ` Eric Dumazet
2024-10-01 15:26 ` Jiri Wiesner
2024-10-02 20:27 ` Xin Long
2024-10-03 17:01 ` Jiri Wiesner
2024-10-06 18:25 ` Xin Long
2024-10-09 13:39 ` Jiri Wiesner
2024-11-12 8:56 ` Jiri Wiesner
2024-11-12 21:51 ` Xin Long
2024-11-13 10:59 ` Jiri Wiesner [this message]
2024-11-13 17:27 ` David Ahern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241113105912.GB19776@incl \
--to=jwiesner@suse.de \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=lucien.xin@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox