From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jakub Kicinski Subject: Re: [PATCH net 14/15] nfp: remove ndo_poll_controller Date: Fri, 21 Sep 2018 16:41:18 -0700 Message-ID: <20180921164118.3dfb13db@cakuba.netronome.com> References: <20180921222752.101307-1-edumazet@google.com> <20180921222752.101307-15-edumazet@google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "David S . Miller" , netdev , Michael Chan , Ariel Elior , Eric Dumazet , Tariq Toukan , Saeed Mahameed , Jeff Kirsher , Song Liu , Jay Vosburgh , Veaceslav Falico , Andy Gospodarek To: Eric Dumazet Return-path: Received: from mail-qt1-f196.google.com ([209.85.160.196]:43798 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391571AbeIVFce (ORCPT ); Sat, 22 Sep 2018 01:32:34 -0400 Received: by mail-qt1-f196.google.com with SMTP id l16-v6so2947757qtq.10 for ; Fri, 21 Sep 2018 16:41:24 -0700 (PDT) In-Reply-To: <20180921222752.101307-15-edumazet@google.com> Sender: netdev-owner@vger.kernel.org List-ID: On Fri, 21 Sep 2018 15:27:51 -0700, Eric Dumazet wrote: > As diagnosed by Song Liu, ndo_poll_controller() can > be very dangerous on loaded hosts, since the cpu > calling ndo_poll_controller() might steal all NAPI > contexts (for all RX/TX queues of the NIC). This capture > can last for unlimited amount of time, since one > cpu is generally not able to drain all the queues under load. > > nfp uses NAPI for TX completions, so we better let core > networking stack call the napi->poll() to avoid the capture. > > Signed-off-by: Eric Dumazet Acked-by: Jakub Kicinski Tested-by: Jakub Kicinski My testing only detects the pre-existing lockdep splat :) The test runs multiple netperf sessions and it has threads spewing the "noise_xyz" lines into /dev/kmsg, then another thread sends series of the "this_is_line_X_" lines which are counted on the other end to make sure we don't drop the messages. And every now and then the test will do something random like reload the driver or bring the link down or install an XDP program. [ 1511.417141] BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 [ 1511.417173] noise_KVEQAAHZ [ 1511.417178] stuck for 35s! [ 1511.425191] noise_KVEQAAHZ [ 1511.428134] Showing busy workqueues and worker pools: [ 1511.434080] noise_KVEQAAHZ [ 1511.434744] workqueue events: flags=0x0 [ 1511.440908] noise_KVEQAAHZ [ 1511.443919] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=2/256 [ 1511.458404] in-flight: 197:free_obj_work [ 1511.463337] pending: free_obj_work [ 1511.467670] [ 1511.467671] ====================================================== [ 1511.467672] WARNING: possible circular locking dependency detected [ 1511.467673] 4.19.0-rc4-debug-00636-g4718a743bc48 #23 Not tainted [ 1511.467674] ------------------------------------------------------ [ 1511.467674] ksoftirqd/0/10 is trying to acquire lock: [ 1511.467675] 0000000081e5d4d3 (console_owner){-.-.}, at: vprintk_emit+0x2de/0x480 [ 1511.467678] [ 1511.467679] but task is already holding lock: [ 1511.467680] 00000000ebab6c31 (&(&pool->lock)->rlock){-.-.}, at: show_workqueue_state.cold.34+0x36d/0x13d7 [ 1511.467683] [ 1511.467684] which lock already depends on the new lock. [ 1511.467684] [ 1511.467685] [ 1511.467686] the existing dependency chain (in reverse order) is: [ 1511.467686] [ 1511.467687] -> #2 (&(&pool->lock)->rlock){-.-.}: [ 1511.467690] _raw_spin_lock+0x2f/0x40 [ 1511.467691] __queue_work+0x2f4/0x11d0 [ 1511.467691] __queue_delayed_work+0x22d/0x380 [ 1511.467692] queue_delayed_work_on+0x8d/0xa0 [ 1511.467693] netpoll_send_skb_on_dev+0x307/0x7e0 [ 1511.467694] netpoll_send_udp+0x1249/0x1910 [ 1511.467695] write_msg+0x285/0x2f0 [netconsole] [ 1511.467695] console_unlock+0x6a0/0xc90 [ 1511.467696] vprintk_emit+0x1dc/0x480 [ 1511.467697] printk_emit+0x88/0xae [ 1511.467697] devkmsg_write+0x2e5/0x360 [ 1511.467698] __vfs_write+0x4e9/0x890 [ 1511.467699] vfs_write+0x156/0x4a0 [ 1511.467699] ksys_write+0xd6/0x1d0 [ 1511.467700] do_syscall_64+0x146/0x5c0 [ 1511.467701] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1511.467701] [ 1511.467702] -> #1 (target_list_lock){-.-.}: [ 1511.467705] _raw_spin_lock_irqsave+0x46/0x60 [ 1511.467706] write_msg+0xe1/0x2f0 [netconsole] [ 1511.467706] console_unlock+0x6a0/0xc90 [ 1511.467707] vprintk_emit+0x1dc/0x480 [ 1511.467708] printk+0x9f/0xc5 [ 1511.467708] netpoll_print_options+0x77/0x2f0 [ 1511.467709] enabled_store+0x174/0x320 [netconsole] [ 1511.467710] configfs_write_file+0x250/0x400 [configfs] [ 1511.467711] __vfs_write+0xde/0x890 [ 1511.467711] vfs_write+0x156/0x4a0 [ 1511.467712] ksys_write+0xd6/0x1d0 [ 1511.467713] do_syscall_64+0x146/0x5c0 [ 1511.467713] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1511.467714] [ 1511.467714] -> #0 (console_owner){-.-.}: [ 1511.467717] lock_acquire+0x183/0x400 [ 1511.467718] vprintk_emit+0x339/0x480 [ 1511.467719] printk+0x9f/0xc5 [ 1511.467720] show_workqueue_state.cold.34+0x40d/0x13d7 [ 1511.467721] wq_watchdog_timer_fn+0x598/0x850 [ 1511.467721] call_timer_fn+0x21b/0x720 [ 1511.467722] expire_timers+0x34a/0x550 [ 1511.467723] run_timer_softirq+0x45e/0x6a0 [ 1511.467724] __do_softirq+0x2ca/0xb1d [ 1511.467725] run_ksoftirqd+0x3c/0x60 [ 1511.467725] smpboot_thread_fn+0x58a/0x9c0 [ 1511.467726] kthread+0x315/0x3d0 [ 1511.467727] ret_from_fork+0x24/0x30 [ 1511.467728] [ 1511.467729] other info that might help us debug this: [ 1511.467729] [ 1511.467730] Chain exists of: [ 1511.467730] console_owner --> target_list_lock --> &(&pool->lock)->rlock [ 1511.467734] [ 1511.467735] Possible unsafe locking scenario: [ 1511.467735] [ 1511.467736] CPU0 CPU1 [ 1511.467737] ---- ---- [ 1511.467737] lock(&(&pool->lock)->rlock); [ 1511.467739] lock(target_list_lock); [ 1511.467741] lock(&(&pool->lock)->rlock); [ 1511.467743] lock(console_owner); [ 1511.467744] [ 1511.467745] *** DEADLOCK *** [ 1511.467746] [ 1511.467746] 3 locks held by ksoftirqd/0/10: [ 1511.467747] #0: 0000000007b706eb ((&wq_watchdog_timer)){+.-.}, at: call_timer_fn+0x1c3/0x720 [ 1511.467750] #1: 00000000470e0b00 (rcu_read_lock_sched){....}, at: show_workqueue_state+0x5/0x1a0 [ 1511.467753] #2: 00000000ebab6c31 (&(&pool->lock)->rlock){-.-.}, at: show_workqueue_state.cold.34+0x36d/0x13d7 [ 1511.467757] [ 1511.467757] stack backtrace: [ 1511.467758] CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 4.19.0-rc4-debug-00636-g4718a743bc48 #23 [ 1511.467759] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016 [ 1511.467760] Call Trace: [ 1511.467761] dump_stack+0xea/0x1b0 [ 1511.467761] ? dump_stack_print_info.cold.0+0x1b/0x1b [ 1511.467762] ? print_stack_trace+0xf3/0x140 [ 1511.467763] print_circular_bug.isra.13.cold.36+0x2e2/0x417 [ 1511.467764] ? print_circular_bug_header+0x80/0x80 [ 1511.467764] check_prev_add.constprop.21+0xccd/0x1600 [ 1511.467765] ? check_usage+0x7b0/0x7b0 [ 1511.467766] ? __print_lock_name+0x110/0x110 [ 1511.467767] ? graph_lock+0x65/0x190 [ 1511.467767] ? __print_lock_name+0x110/0x110 [ 1511.467768] __lock_acquire+0x216f/0x35b0 [ 1511.467769] ? mark_held_locks+0x140/0x140 [ 1511.467770] ? check_usage+0x7b0/0x7b0 [ 1511.467770] ? print_irqtrace_events+0x270/0x270 [ 1511.467771] ? lock_downgrade+0x6b0/0x6b0 [ 1511.467772] ? do_raw_spin_unlock+0xb1/0x350 [ 1511.467772] ? __print_lock_name+0x110/0x110 [ 1511.467773] ? __lock_acquire+0x6a5/0x35b0 [ 1511.467774] ? mark_held_locks+0x140/0x140 [ 1511.467774] ? down_trylock+0x4b/0x70 [ 1511.467775] ? down_trylock+0x13/0x70 [ 1511.467776] lock_acquire+0x183/0x400 [ 1511.467776] ? vprintk_emit+0x2de/0x480 [ 1511.467777] ? lock_release+0x870/0x870 [ 1511.467778] ? do_raw_spin_trylock+0x1b0/0x1b0 [ 1511.467779] ? vprintk_emit+0x215/0x480 [ 1511.467779] ? trace_hardirqs_off+0x95/0x230 [ 1511.467780] vprintk_emit+0x339/0x480 [ 1511.467781] ? vprintk_emit+0x2de/0x480 [ 1511.467781] ? wake_up_klogd+0x1d0/0x1d0 [ 1511.467782] ? do_raw_spin_unlock+0xb1/0x350 [ 1511.467783] ? lock_acquire+0x183/0x400 [ 1511.467784] ? show_workqueue_state.cold.34+0x36d/0x13d7 [ 1511.467784] ? _raw_spin_unlock_irqrestore+0x59/0x70 [ 1511.467785] printk+0x9f/0xc5 [ 1511.467786] ? kmsg_dump_rewind_nolock+0xd9/0xd9 [ 1511.467786] show_workqueue_state.cold.34+0x40d/0x13d7 [ 1511.467787] ? print_worker_info+0x4a0/0x4a0 [ 1511.467788] ? llist_reverse_order+0x60/0x60 [ 1511.467789] ? radix_tree_iter_resume+0x190/0x190 [ 1511.467789] ? devkmsg_read+0x840/0x840 [ 1511.467790] ? radix_tree_next_chunk+0x1b2/0xae0 [ 1511.467791] ? __radix_tree_next_slot+0x810/0x810 [ 1511.467792] ? vprintk_emit+0x15f/0x480 [ 1511.467792] ? wake_up_klogd+0x1d0/0x1d0 [ 1511.467793] ? print_irqtrace_events+0x270/0x270 [ 1511.467794] ? mark_held_locks+0x140/0x140 [ 1511.467795] ? lock_downgrade+0x6b0/0x6b0 [ 1511.467796] ? _raw_spin_unlock_irqrestore+0x59/0x70 [ 1511.467796] ? wq_watchdog_timer_fn+0x4ce/0x850 [ 1511.467797] ? lock_downgrade+0x6b0/0x6b0 [ 1511.467798] ? idr_get_next_ul+0x290/0x290 [ 1511.467799] ? rcu_read_unlock_special.part.22+0x1660/0x1660 [ 1511.467800] ? rcu_dynticks_curr_cpu_in_eqs+0xd9/0x1d0 [ 1511.467801] wq_watchdog_timer_fn+0x598/0x850 [ 1511.467801] ? show_workqueue_state+0x1a0/0x1a0 [ 1511.467802] ? quarantine_put+0xd9/0x300 [ 1511.467803] ? lockdep_hardirqs_on+0x278/0x580 [ 1511.467804] ? trace_hardirqs_on+0x74/0x230 [ 1511.467804] ? lock_acquire+0x183/0x400 [ 1511.467805] ? call_timer_fn+0x1c3/0x720 [ 1511.467806] ? rcu_lockdep_current_cpu_online+0x212/0x310 [ 1511.467807] ? rcu_process_callbacks+0x660/0x660 [ 1511.467807] call_timer_fn+0x21b/0x720 [ 1511.467808] ? show_workqueue_state+0x1a0/0x1a0 [ 1511.467809] ? init_timer_key+0x380/0x380 [ 1511.467810] ? _raw_spin_unlock_irqrestore+0x3c/0x70 [ 1511.467810] ? expire_timers+0x33f/0x550 [ 1511.467811] ? _raw_spin_unlock_irqrestore+0x46/0x70 [ 1511.467812] ? lock_downgrade+0x6b0/0x6b0 [ 1511.467812] ? _raw_spin_unlock_irq+0x29/0x50 [ 1511.467813] ? show_workqueue_state+0x1a0/0x1a0 [ 1511.467814] ? lockdep_hardirqs_on+0x278/0x580 [ 1511.467815] ? _raw_spin_unlock_irq+0x29/0x50 [ 1511.467815] ? trace_hardirqs_on+0x74/0x230 [ 1511.467816] ? show_workqueue_state+0x1a0/0x1a0 [ 1511.467817] expire_timers+0x34a/0x550 [ 1511.467817] ? call_timer_fn+0x720/0x720 [ 1511.467818] ? lock_acquire+0x183/0x400 [ 1511.467819] ? run_timer_softirq+0x39d/0x6a0 [ 1511.467819] ? lock_release+0x870/0x870 [ 1511.467820] ? trace_hardirqs_off+0x95/0x230 [ 1511.467821] ? __next_timer_interrupt+0xf0/0xf0 [ 1511.467822] ? do_raw_spin_trylock+0x1b0/0x1b0 [ 1511.467822] run_timer_softirq+0x45e/0x6a0 [ 1511.467823] ? expire_timers+0x550/0x550 [ 1511.467824] ? tcp_tasklet_func+0x3d6/0x650 [ 1511.467824] ? tcp_tsq_handler+0x100/0x100 [ 1511.467825] ? rcu_process_callbacks+0x660/0x660 [ 1511.467826] ? tasklet_action_common.isra.2+0xcb/0x350 [ 1511.467827] ? lockdep_hardirqs_on+0x278/0x580 [ 1511.467827] ? tasklet_action_common.isra.2+0xcb/0x350 [ 1511.467828] ? trace_hardirqs_on+0x74/0x230 [ 1511.467829] ? __bpf_trace_preemptirq_template+0x10/0x10 [ 1511.467830] ? __rcu_process_callbacks+0x1750/0x1750 [ 1511.467830] ? __do_softirq+0x286/0xb1d [ 1511.467831] ? rcu_lockdep_current_cpu_online+0x212/0x310 [ 1511.467832] ? rcu_process_callbacks+0x660/0x660 [ 1511.467832] __do_softirq+0x2ca/0xb1d [ 1511.467833] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 1511.467834] ? __irqentry_text_end+0x1fa0ca/0x1fa0ca [ 1511.467835] ? trace_hardirqs_on+0x230/0x230 [ 1511.467835] ? __schedule+0x6/0x2130 [ 1511.467836] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 1511.467837] ? run_ksoftirqd+0x11/0x60 [ 1511.467837] ? trace_hardirqs_off+0x95/0x230 [ 1511.467838] ? trace_hardirqs_on_caller+0x240/0x240 [ 1511.467839] ? finish_task_switch+0xa20/0xa20 [ 1511.467840] ? ___preempt_schedule+0x16/0x18 [ 1511.467840] run_ksoftirqd+0x3c/0x60 [ 1511.467841] smpboot_thread_fn+0x58a/0x9c0 [ 1511.467842] ? __smpboot_create_thread+0x410/0x410 [ 1511.467843] ? _raw_spin_unlock_irqrestore+0x46/0x70 [ 1511.467843] ? __kthread_parkme+0xb6/0x180 [ 1511.467844] ? __smpboot_create_thread+0x410/0x410 [ 1511.467845] kthread+0x315/0x3d0 [ 1511.467845] ? kthread_park+0x120/0x120 [ 1511.467846] ret_from_fork+0x24/0x30 [ 1511.468035] this_is_line_0_ [ 1511.469707] workqueue mm_percpu_wq: flags=0x8 [ 1511.475150] noise_KVEQAAHZ [ 1511.475474] noise_KVEQAAHZ [ 1511.475740] noise_KVEQAAHZ [ 1511.476125] noise_KVEQAAHZ [ 1511.476409] noise_KVEQAAHZ [ 1511.476700] noise_KVEQAAHZ [ 1511.476996] noise_KVEQAAHZ [ 1511.477149] this_is_line_1_ [ 1511.477558] this_is_line_2_ [ 1511.477814] this_is_line_3_ [ 1511.478132] this_is_line_4_ [ 1511.478463] this_is_line_5_ [ 1511.478762] this_is_line_6_ [ 1511.479077] this_is_line_7_ [ 1511.479359] this_is_line_8_ [ 1511.479647] this_is_line_9_ [ 1511.479943] this_is_line_10_ [ 1511.480192] this_is_line_11_ [ 1511.480495] this_is_line_12_ [ 1511.480801] this_is_line_13_ [ 1511.483873] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [ 1511.491063] noise_KVEQAAHZ [ 1511.496337] this_is_line_14_ [ 1511.496649] this_is_line_15_ [ 1511.496905] this_is_line_16_ [ 1511.497257] this_is_line_17_ [ 1511.497521] this_is_line_18_ [ 1511.497797] pending: vmstat_update [ 1511.497827] this_is_line_19_ [ 1511.497881] pool 8: cpus=4 node=0 flags=0x0 nice=0 hung=35s workers=3 idle: 12260 9068