All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlad Buslov <vlad@buslov.dev>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>,
	netdev@vger.kernel.org, Cong Wang <cong.wang@bytedance.com>,
	syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com,
	syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com,
	syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com,
	Jiri Pirko <jiri@resnulli.us>,
	Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
	Davide Caratti <dcaratti@redhat.com>,
	Briana Oursler <briana.oursler@gmail.com>
Subject: Re: [Patch net-next] net_sched: fix RTNL deadlock again caused by request_module()
Date: Mon, 18 Jan 2021 16:31:14 +0200	[thread overview]
Message-ID: <87im7u9v0t.fsf@buslov.dev> (raw)
In-Reply-To: <acba35f6-2e29-903f-6eb8-a50dde25a147@mojatatu.com>

On Sun 17 Jan 2021 at 17:15, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 2021-01-16 7:56 p.m., Cong Wang wrote:
>> From: Cong Wang <cong.wang@bytedance.com>
>> tcf_action_init_1() loads tc action modules automatically with
>> request_module() after parsing the tc action names, and it drops RTNL
>> lock and re-holds it before and after request_module(). This causes a
>> lot of troubles, as discovered by syzbot, because we can be in the
>> middle of batch initializations when we create an array of tc actions.
>> One of the problem is deadlock:
>> CPU 0					CPU 1
>> rtnl_lock();
>> for (...) {
>>    tcf_action_init_1();
>>      -> rtnl_unlock();
>>      -> request_module();
>> 				rtnl_lock();
>> 				for (...) {
>> 				  tcf_action_init_1();
>> 				    -> tcf_idr_check_alloc();
>> 				   // Insert one action into idr,
>> 				   // but it is not committed until
>> 				   // tcf_idr_insert_many(), then drop
>> 				   // the RTNL lock in the _next_
>> 				   // iteration
>> 				   -> rtnl_unlock();
>>      -> rtnl_lock();
>>      -> a_o->init();
>>        -> tcf_idr_check_alloc();
>>        // Now waiting for the same index
>>        // to be committed
>> 				    -> request_module();
>> 				    -> rtnl_lock()
>> 				    // Now waiting for RTNL lock
>> 				}
>> 				rtnl_unlock();
>> }
>> rtnl_unlock();
>> This is not easy to solve, we can move the request_module() before
>> this loop and pre-load all the modules we need for this netlink
>> message and then do the rest initializations. So the loop breaks down
>> to two now:
>>          for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
>>                  struct tc_action_ops *a_o;
>>                  a_o = tc_action_load_ops(name, tb[i]...);
>>                  ops[i - 1] = a_o;
>>          }
>>          for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
>>                  act = tcf_action_init_1(ops[i - 1]...);
>>          }
>> Although this looks serious, it only has been reported by syzbot, so it
>> seems hard to trigger this by humans. And given the size of this patch,
>> I'd suggest to make it to net-next and not to backport to stable.
>> This patch has been tested by syzbot and tested with tdc.py by me.
>> 
>
> LGTM.
> Initially i was worried about performance impact but i found nothing
> observable. We need to add a tdc test for batch (I can share how i did
> batch testing at next meet).
>
> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
>
> cheers,
> jamal

Hi,

Thanks for adding me to the thread!
I ran our performance tests with the patch applied and didn't observe
any regression.

Regards,
Vlad


  reply	other threads:[~2021-01-18 14:32 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-17  0:56 [Patch net-next] net_sched: fix RTNL deadlock again caused by request_module() Cong Wang
2021-01-17 15:15 ` Jamal Hadi Salim
2021-01-18 14:31   ` Vlad Buslov [this message]
2021-01-19  4:30 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87im7u9v0t.fsf@buslov.dev \
    --to=vlad@buslov.dev \
    --cc=briana.oursler@gmail.com \
    --cc=cong.wang@bytedance.com \
    --cc=dcaratti@redhat.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=marcelo.leitner@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=syzbot+82752bc5331601cf4899@syzkaller.appspotmail.com \
    --cc=syzbot+b3b63b6bff456bd95294@syzkaller.appspotmail.com \
    --cc=syzbot+ba67b12b1ca729912834@syzkaller.appspotmail.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.