public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Sargun Dhillon <sargun@sargun.me>,
	netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Gabriel Hartmann <ghartmann@netflix.com>,
	Rob Gulewich <rgulewich@netflix.com>,
	Bruce Curtis <brucec@netflix.com>,
	Hannes Frederic Sowa <hannes@stressinduktion.org>
Subject: Re: Deadlock in cleanup_net and addrconf_verify_work locks up workqueue
Date: Tue, 11 Feb 2020 21:36:28 -0800	[thread overview]
Message-ID: <75e34850-54f5-6d08-e4f9-dd6e1e9ee09d@gmail.com> (raw)
In-Reply-To: <8924a0a5-9179-f6a9-91d8-1163b425ec35@gmail.com>



On 2/11/20 9:08 PM, Eric Dumazet wrote:
> 
> 
> On 2/11/20 11:23 AM, Sargun Dhillon wrote:
>> We've found a workqueue stall / deadlock. Our workload is a container-oriented
>> workload in which we utilize IPv6. Our container (namespace) churn is quite
>> frequent, and containers can be terminated before their networking is
>> even setup.
>>
>> We're running 4.19.73 in production, and in investigation of the underlying
>> causes, I don't think that future versions of 4.19 fix it.
>>
>> We've narrowed it down to a lockup between ipv6_addrconf, and cleanup_net.
> 
> Sure, PID 1369493 addrconf_verify_work() is waiting for RTNL.
> 
> But PID 8  ?
> 
> __flush_work() is being called.
> 
> But from where ? Stacks seem not complete.
> 
> 
>>
>> crash> bt 8
>> PID: 8      TASK: ffff9a1072b50000  CPU: 24  COMMAND: "kworker/u192:0"
>>  #0 [ffffbfe2c00fbb70] __schedule at ffffffffa7f02bf7
>>  #1 [ffffbfe2c00fbc10] schedule at ffffffffa7f031e8
>>  #2 [ffffbfe2c00fbc18] schedule_timeout at ffffffffa7f0700e
>>  #3 [ffffbfe2c00fbc90] wait_for_completion at ffffffffa7f03b50
>>  #4 [ffffbfe2c00fbce0] __flush_work at ffffffffa76a2532
>>  #5 [ffffbfe2c00fbd58] rollback_registered_many at ffffffffa7dbcdf4
>>  #6 [ffffbfe2c00fbdc0] unregister_netdevice_many at ffffffffa7dbd31e
>>  #7 [ffffbfe2c00fbdd0] default_device_exit_batch at ffffffffa7dbd512
>>  #8 [ffffbfe2c00fbe40] cleanup_net at ffffffffa7dab970
>>  #9 [ffffbfe2c00fbe98] process_one_work at ffffffffa76a17c4
>> #10 [ffffbfe2c00fbed8] worker_thread at ffffffffa76a19dd
>> #11 [ffffbfe2c00fbf10] kthread at ffffffffa76a7fd3
>> #12 [ffffbfe2c00fbf50] ret_from_fork at ffffffffa80001ff
>>
>> crash> bt 1369493
>> PID: 1369493  TASK: ffff9a03684d9600  CPU: 58  COMMAND: "kworker/58:1"
>>  #0 [ffffbfe30d68fd48] __schedule at ffffffffa7f02bf7
>>  #1 [ffffbfe30d68fde8] schedule at ffffffffa7f031e8
>>  #2 [ffffbfe30d68fdf0] schedule_preempt_disabled at ffffffffa7f0349a
>>  #3 [ffffbfe30d68fdf8] __mutex_lock at ffffffffa7f04aed
>>  #4 [ffffbfe30d68fe90] addrconf_verify_work at ffffffffa7e8d1aa
>>  #5 [ffffbfe30d68fe98] process_one_work at ffffffffa76a17c4
>>  #6 [ffffbfe30d68fed8] worker_thread at ffffffffa76a19dd
>>  #7 [ffffbfe30d68ff10] kthread at ffffffffa76a7fd3
>>  #8 [ffffbfe30d68ff50] ret_from_fork at ffffffffa80001ff
>>
>>
>>
>>  struct -x mutex.owner.counter rtnl_mutex
>>   owner.counter = 0xffff9a1072b50001
>>
>> 0xffff9a1072b50001 & (~0x07) = 0xffff9a1072b50000
>>
>> This points back to PID 8 / CPU 24. It is working on cleanup_net, and a part
>> of cleanup net involves calling ops_exit_list, and as part of that it calls
>> default_device_exit_batch. default_device_exit_batch takes the rtnl lock before
>> calling into unregister_netdevice_many, and rollback_registered_many.
>> rollback_registered_many calls flush_all_backlogs. This will never complete
>> because it is holding the rtnl lock, and PID 1369493 / CPU 58 is waiting
>> for rtnl_lock.
> 
> But PID 1369493 is waiting on a mutex, thus properly yielding the cpu.
> (schedule() is clearly shown)
> 
> This should not prevent other threads
> from making progress so that flush_all_backlogs() completes eventually.
> 
> flush_all_backlogs() does not care of how many threads are currently blocked
> because they can not grab rtnl while flush_all_backlogs() is running.
> 
>>
>> If relevant, the workqueue stalls themselves look something like:
>> BUG: workqueue lockup - pool cpus=70 node=0 flags=0x0 nice=0 stuck for 3720s!
>> BUG: workqueue lockup - pool cpus=70 node=0 flags=0x0 nice=-20 stuck for 3719s!
>> Showing busy workqueues and worker pools:
>> workqueue events: flags=0x0
>>   pwq 32: cpus=16 node=0 flags=0x0 nice=0 active=2/256
>>     in-flight: 1274779:slab_caches_to_rcu_destroy_workfn slab_caches_to_rcu_destroy_workfn
>> workqueue events_highpri: flags=0x10
>>   pwq 141: cpus=70 node=0 flags=0x0 nice=-20 active=1/256
>>     pending: flush_backlog BAR(8)
>> workqueue events_power_efficient: flags=0x82
>>   pwq 193: cpus=0-23,48-71 node=0 flags=0x4 nice=0 active=1/256
>>     in-flight: 1396446:check_lifetime
>> workqueue mm_percpu_wq: flags=0x8
>>   pwq 140: cpus=70 node=0 flags=0x0 nice=0 active=1/256
>>     pending: vmstat_update
>> workqueue netns: flags=0xe000a
>>   pwq 192: cpus=0-95 flags=0x4 nice=0 active=1/1
>>     in-flight: 8:cleanup_net
>>     delayed: cleanup_net
>> workqueue writeback: flags=0x4e
>>   pwq 193: cpus=0-23,48-71 node=0 flags=0x4 nice=0 active=1/256
>>     in-flight: 1334335:wb_workfn
>> workqueue kblockd: flags=0x18
>>   pwq 141: cpus=70 node=0 flags=0x0 nice=-20 active=1/256
>>     pending: blk_mq_run_work_fn
>> workqueue ipv6_addrconf: flags=0x40008
>>   pwq 116: cpus=58 node=0 flags=0x0 nice=0 active=1/1
>>     in-flight: 1369493:addrconf_verify_work
>> workqueue ena: flags=0xe000a
>>   pwq 192: cpus=0-95 flags=0x4 nice=0 active=1/1
>>     in-flight: 7505:ena_fw_reset_device [ena]
>>

Can you test the following :

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index cb493e15959c4d1bb68cf30f4099a8daa785bb84..bcc7ce03f13881415f64c7329559c7ed1e6321f3 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4410,8 +4410,6 @@ static void addrconf_verify_rtnl(void)
        now = jiffies;
        next = round_jiffies_up(now + ADDR_CHECK_FREQUENCY);
 
-       cancel_delayed_work(&addr_chk_work);
-
        for (i = 0; i < IN6_ADDR_HSIZE; i++) {
 restart:
                hlist_for_each_entry_rcu_bh(ifp, &inet6_addr_lst[i], addr_lst) {

  reply	other threads:[~2020-02-12  5:36 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-11 19:23 Deadlock in cleanup_net and addrconf_verify_work locks up workqueue Sargun Dhillon
2020-02-12  5:08 ` Eric Dumazet
2020-02-12  5:36   ` Eric Dumazet [this message]
2023-01-04 13:06     ` Dongyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75e34850-54f5-6d08-e4f9-dd6e1e9ee09d@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=brucec@netflix.com \
    --cc=ghartmann@netflix.com \
    --cc=hannes@stressinduktion.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rgulewich@netflix.com \
    --cc=sargun@sargun.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox