netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Krister Johansen <kjlx@templeofstupid.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: Matthieu Baerts <matttbe@kernel.org>,
	Mat Martineau <martineau@kernel.org>,
	Geliang Tang <geliang@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Simon Horman <horms@kernel.org>,
	netdev@vger.kernel.org, mptcp@lists.linux.dev
Subject: Re: [PATCH mptcp] mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr
Date: Mon, 24 Feb 2025 08:31:08 -0800	[thread overview]
Message-ID: <20250224163108.GA1897@templeofstupid.com> (raw)
In-Reply-To: <9ef28d50-dad0-4dc6-8a6d-b3f82521fba1@redhat.com>

Hi Paolo,

Thanks for the feedback.

On Mon, Feb 24, 2025 at 11:09:17AM +0100, Paolo Abeni wrote:
> On 2/21/25 11:21 PM, Krister Johansen wrote:
> > If multiple connection requests attempt to create an implicit mptcp
> > endpoint in parallel, more than one caller may end up in
> > mptcp_pm_nl_append_new_local_addr because none found the address in
> > local_addr_list during their call to mptcp_pm_nl_get_local_id.  In this
> > case, the concurrent new_local_addr calls may delete the address entry
> > created by the previous caller.  These deletes use synchronize_rcu, but
> > this is not permitted in some of the contexts where this function may be
> > called.  During packet recv, the caller may be in a rcu read critical
> > section and have preemption disabled.
> > 
> > An example stack:
> > 
> >    BUG: scheduling while atomic: swapper/2/0/0x00000302
> > 
> >    Call Trace:
> >    <IRQ>
> >    dump_stack_lvl+0x76/0xa0
> >    dump_stack+0x10/0x20
> >    __schedule_bug+0x64/0x80
> >    schedule_debug.constprop.0+0xdb/0x130
> >    __schedule+0x69/0x6a0
> >    schedule+0x33/0x110
> >    schedule_timeout+0x157/0x170
> >    wait_for_completion+0x88/0x150
> >    __wait_rcu_gp+0x150/0x160
> >    synchronize_rcu+0x12d/0x140
> >    mptcp_pm_nl_append_new_local_addr+0x1bd/0x280
> >    mptcp_pm_nl_get_local_id+0x121/0x160
> >    mptcp_pm_get_local_id+0x9d/0xe0
> >    subflow_check_req+0x1a8/0x460
> >    subflow_v4_route_req+0xb5/0x110
> >    tcp_conn_request+0x3a4/0xd00
> >    subflow_v4_conn_request+0x42/0xa0
> >    tcp_rcv_state_process+0x1e3/0x7e0
> >    tcp_v4_do_rcv+0xd3/0x2a0
> >    tcp_v4_rcv+0xbb8/0xbf0
> >    ip_protocol_deliver_rcu+0x3c/0x210
> >    ip_local_deliver_finish+0x77/0xa0
> >    ip_local_deliver+0x6e/0x120
> >    ip_sublist_rcv_finish+0x6f/0x80
> >    ip_sublist_rcv+0x178/0x230
> >    ip_list_rcv+0x102/0x140
> >    __netif_receive_skb_list_core+0x22d/0x250
> >    netif_receive_skb_list_internal+0x1a3/0x2d0
> >    napi_complete_done+0x74/0x1c0
> >    igb_poll+0x6c/0xe0 [igb]
> >    __napi_poll+0x30/0x200
> >    net_rx_action+0x181/0x2e0
> >    handle_softirqs+0xd8/0x340
> >    __irq_exit_rcu+0xd9/0x100
> >    irq_exit_rcu+0xe/0x20
> >    common_interrupt+0xa4/0xb0
> >    </IRQ>
> > 
> > This problem seems particularly prevalent if the user advertises an
> > endpoint that has a different external vs internal address.  In the case
> > where the external address is advertised and multiple connections
> > already exist, multiple subflow SYNs arrive in parallel which tends to
> > trigger the race during creation of the first local_addr_list entries
> > which have the internal address instead.
> > 
> > Fix this problem by switching mptcp_pm_nl_append_new_local_addr to use
> > call_rcu .  As part of plumbing this up, make
> > __mptcp_pm_release_addr_entry take a rcu_head which is used by all
> > callers regardless of cleanup method.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: d045b9eb95a9 ("mptcp: introduce implicit endpoints")
> > Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
> 
> The proposed patch looks functionally correct to me, but I think it
> would be better to avoid adding new fields to mptcp_pm_addr_entry, if
> not strictly needed.
> 
> What about the following? (completely untested!). When inplicit
> endpoints creations race one with each other, we don't need to replace
> the existing one, we could simply use it.
> 
> That would additionally prevent an implicit endpoint created from a
> subflow from overriding the flags set by a racing user-space endpoint add.
> 
> If that works/fits you feel free to take/use it.

I like this suggestion.  In addition to the benefits you outlined, it
also prevents a series of back-to-back replacements from getting turned
into a chunk of call_rcu() calls.  Leaving this as a synchronize_rcu is
probably better too, if we can.  I was unsure whether it was acceptable
to skip the replacement in this case.  Thanks for clearing that up.

I'll test this and follow up with a v2.

> ---
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index 572d160edca3..dcb27b479824 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -977,7 +977,7 @@ static void __mptcp_pm_release_addr_entry(struct
> mptcp_pm_addr_entry *entry)
> 
>  static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
>  					     struct mptcp_pm_addr_entry *entry,
> -					     bool needs_id)
> +					     bool needs_id, bool replace)
>  {
>  	struct mptcp_pm_addr_entry *cur, *del_entry = NULL;
>  	unsigned int addr_max;
> @@ -1017,6 +1017,12 @@ static int
> mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
>  			if (entry->addr.id)
>  				goto out;
> 
> +			if (!replace) {
> +				kfree(entry);
> +				ret = cur->addr.id;
> +				goto out;
> +			}
> +
>  			pernet->addrs--;
>  			entry->addr.id = cur->addr.id;
>  			list_del_rcu(&cur->list);
> @@ -1165,7 +1171,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock
> *msk, struct mptcp_addr_info *skc
>  	entry->ifindex = 0;
>  	entry->flags = MPTCP_PM_ADDR_FLAG_IMPLICIT;
>  	entry->lsk = NULL;
> -	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true);
> +	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true, false);
>  	if (ret < 0)
>  		kfree(entry);
> 
> @@ -1433,7 +1439,8 @@ int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb,
> struct genl_info *info)
>  		}
>  	}
>  	ret = mptcp_pm_nl_append_new_local_addr(pernet, entry,
> -						!mptcp_pm_has_addr_attr_id(attr, info));
> +						!mptcp_pm_has_addr_attr_id(attr, info),
> +						true);
>  	if (ret < 0) {
>  		GENL_SET_ERR_MSG_FMT(info, "too many addresses or duplicate one: %d",
> ret);
>  		goto out_free;

Thanks,

-K

  reply	other threads:[~2025-02-24 16:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21 22:21 [PATCH mptcp] mptcp: fix 'scheduling while atomic' in mptcp_pm_nl_append_new_local_addr Krister Johansen
2025-02-24 10:09 ` Paolo Abeni
2025-02-24 16:31   ` Krister Johansen [this message]
2025-02-24 23:20   ` [PATCH v2 " Krister Johansen
2025-02-25 17:52     ` Matthieu Baerts
2025-02-25 19:29       ` Krister Johansen
2025-02-25 21:41         ` Matthieu Baerts
2025-02-25 21:53           ` Krister Johansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250224163108.GA1897@templeofstupid.com \
    --to=kjlx@templeofstupid.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=geliang@kernel.org \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=martineau@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).