From: Krister Johansen <kjlx@templeofstupid.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: Matthieu Baerts <matttbe@kernel.org>,
Mat Martineau <martineau@kernel.org>,
Geliang Tang <geliang@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Simon Horman <horms@kernel.org>,
netdev@vger.kernel.org, mptcp@lists.linux.dev
Subject: Re: [PATCH mptcp] mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr
Date: Mon, 24 Feb 2025 08:31:08 -0800 [thread overview]
Message-ID: <20250224163108.GA1897@templeofstupid.com> (raw)
In-Reply-To: <9ef28d50-dad0-4dc6-8a6d-b3f82521fba1@redhat.com>
Hi Paolo,
Thanks for the feedback.
On Mon, Feb 24, 2025 at 11:09:17AM +0100, Paolo Abeni wrote:
> On 2/21/25 11:21 PM, Krister Johansen wrote:
> > If multiple connection requests attempt to create an implicit mptcp
> > endpoint in parallel, more than one caller may end up in
> > mptcp_pm_nl_append_new_local_addr because none found the address in
> > local_addr_list during their call to mptcp_pm_nl_get_local_id. In this
> > case, the concurrent new_local_addr calls may delete the address entry
> > created by the previous caller. These deletes use synchronize_rcu, but
> > this is not permitted in some of the contexts where this function may be
> > called. During packet recv, the caller may be in a rcu read critical
> > section and have preemption disabled.
> >
> > An example stack:
> >
> > BUG: scheduling while atomic: swapper/2/0/0x00000302
> >
> > Call Trace:
> > <IRQ>
> > dump_stack_lvl+0x76/0xa0
> > dump_stack+0x10/0x20
> > __schedule_bug+0x64/0x80
> > schedule_debug.constprop.0+0xdb/0x130
> > __schedule+0x69/0x6a0
> > schedule+0x33/0x110
> > schedule_timeout+0x157/0x170
> > wait_for_completion+0x88/0x150
> > __wait_rcu_gp+0x150/0x160
> > synchronize_rcu+0x12d/0x140
> > mptcp_pm_nl_append_new_local_addr+0x1bd/0x280
> > mptcp_pm_nl_get_local_id+0x121/0x160
> > mptcp_pm_get_local_id+0x9d/0xe0
> > subflow_check_req+0x1a8/0x460
> > subflow_v4_route_req+0xb5/0x110
> > tcp_conn_request+0x3a4/0xd00
> > subflow_v4_conn_request+0x42/0xa0
> > tcp_rcv_state_process+0x1e3/0x7e0
> > tcp_v4_do_rcv+0xd3/0x2a0
> > tcp_v4_rcv+0xbb8/0xbf0
> > ip_protocol_deliver_rcu+0x3c/0x210
> > ip_local_deliver_finish+0x77/0xa0
> > ip_local_deliver+0x6e/0x120
> > ip_sublist_rcv_finish+0x6f/0x80
> > ip_sublist_rcv+0x178/0x230
> > ip_list_rcv+0x102/0x140
> > __netif_receive_skb_list_core+0x22d/0x250
> > netif_receive_skb_list_internal+0x1a3/0x2d0
> > napi_complete_done+0x74/0x1c0
> > igb_poll+0x6c/0xe0 [igb]
> > __napi_poll+0x30/0x200
> > net_rx_action+0x181/0x2e0
> > handle_softirqs+0xd8/0x340
> > __irq_exit_rcu+0xd9/0x100
> > irq_exit_rcu+0xe/0x20
> > common_interrupt+0xa4/0xb0
> > </IRQ>
> >
> > This problem seems particularly prevalent if the user advertises an
> > endpoint that has a different external vs internal address. In the case
> > where the external address is advertised and multiple connections
> > already exist, multiple subflow SYNs arrive in parallel which tends to
> > trigger the race during creation of the first local_addr_list entries
> > which have the internal address instead.
> >
> > Fix this problem by switching mptcp_pm_nl_append_new_local_addr to use
> > call_rcu . As part of plumbing this up, make
> > __mptcp_pm_release_addr_entry take a rcu_head which is used by all
> > callers regardless of cleanup method.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: d045b9eb95a9 ("mptcp: introduce implicit endpoints")
> > Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
>
> The proposed patch looks functionally correct to me, but I think it
> would be better to avoid adding new fields to mptcp_pm_addr_entry, if
> not strictly needed.
>
> What about the following? (completely untested!). When inplicit
> endpoints creations race one with each other, we don't need to replace
> the existing one, we could simply use it.
>
> That would additionally prevent an implicit endpoint created from a
> subflow from overriding the flags set by a racing user-space endpoint add.
>
> If that works/fits you feel free to take/use it.
I like this suggestion. In addition to the benefits you outlined, it
also prevents a series of back-to-back replacements from getting turned
into a chunk of call_rcu() calls. Leaving this as a synchronize_rcu is
probably better too, if we can. I was unsure whether it was acceptable
to skip the replacement in this case. Thanks for clearing that up.
I'll test this and follow up with a v2.
> ---
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index 572d160edca3..dcb27b479824 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -977,7 +977,7 @@ static void __mptcp_pm_release_addr_entry(struct
> mptcp_pm_addr_entry *entry)
>
> static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
> struct mptcp_pm_addr_entry *entry,
> - bool needs_id)
> + bool needs_id, bool replace)
> {
> struct mptcp_pm_addr_entry *cur, *del_entry = NULL;
> unsigned int addr_max;
> @@ -1017,6 +1017,12 @@ static int
> mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
> if (entry->addr.id)
> goto out;
>
> + if (!replace) {
> + kfree(entry);
> + ret = cur->addr.id;
> + goto out;
> + }
> +
> pernet->addrs--;
> entry->addr.id = cur->addr.id;
> list_del_rcu(&cur->list);
> @@ -1165,7 +1171,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock
> *msk, struct mptcp_addr_info *skc
> entry->ifindex = 0;
> entry->flags = MPTCP_PM_ADDR_FLAG_IMPLICIT;
> entry->lsk = NULL;
> - ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true);
> + ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true, false);
> if (ret < 0)
> kfree(entry);
>
> @@ -1433,7 +1439,8 @@ int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb,
> struct genl_info *info)
> }
> }
> ret = mptcp_pm_nl_append_new_local_addr(pernet, entry,
> - !mptcp_pm_has_addr_attr_id(attr, info));
> + !mptcp_pm_has_addr_attr_id(attr, info),
> + true);
> if (ret < 0) {
> GENL_SET_ERR_MSG_FMT(info, "too many addresses or duplicate one: %d",
> ret);
> goto out_free;
Thanks,
-K
next prev parent reply other threads:[~2025-02-24 16:31 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-21 22:21 [PATCH mptcp] mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr Krister Johansen
2025-02-21 23:33 ` MPTCP CI
2025-02-24 10:09 ` Paolo Abeni
2025-02-24 16:31 ` Krister Johansen [this message]
2025-02-24 23:20 ` [PATCH v2 " Krister Johansen
2025-02-25 0:34 ` MPTCP CI
2025-02-25 17:52 ` Matthieu Baerts
2025-02-25 19:29 ` Krister Johansen
2025-02-25 21:41 ` Matthieu Baerts
2025-02-25 21:53 ` Krister Johansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250224163108.GA1897@templeofstupid.com \
--to=kjlx@templeofstupid.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=geliang@kernel.org \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=martineau@kernel.org \
--cc=matttbe@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.