stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Ding Tianhong <dingtianhong@huawei.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Dhaval Giani <dhaval.giani@oracle.com>
Subject: [PATCH 4.8 14/35] rcu: Fix soft lockup for rcu_nocb_kthread
Date: Wed,  7 Dec 2016 08:08:30 +0100	[thread overview]
Message-ID: <20161207070723.278613814@linuxfoundation.org> (raw)
In-Reply-To: <20161207070722.410336250@linuxfoundation.org>

4.8-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ding Tianhong <dingtianhong@huawei.com>

commit bedc1969150d480c462cdac320fa944b694a7162 upstream.

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
[  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
[  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
[  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
[  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
[  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
[  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
[  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
[  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
[  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.106005] Stack:
[  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
[  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
[  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
[  368.106005] Call Trace:
[  368.106005]  <IRQ>
[  368.106005]
[  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
[  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
[  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
[  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
[  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
[  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
[  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
[  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
[  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
[  368.106005]  <EOI>
[  368.106005]
[  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
[  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
[  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
[  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140

==================================cut here==============================

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/rcu/tree_plugin.h |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2173,6 +2173,7 @@ static int rcu_nocb_kthread(void *arg)
 				cl++;
 			c++;
 			local_bh_enable();
+			cond_resched_rcu_qs();
 			list = next;
 		}
 		trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);



  parent reply	other threads:[~2016-12-07  7:09 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20161207070908epcas5p2dfd9fee4d41d2589b8737e48f513be67@epcas5p2.samsung.com>
2016-12-07  7:08 ` [PATCH 4.8 00/35] 4.8.13-stable review Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 01/35] libata-scsi: Fixup ata_gen_passthru_sense() Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 02/35] scsi: hpsa: use bus 3 for legacy HBA devices Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 03/35] scsi: libfc: fix seconds_since_last_reset miscalculation Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 04/35] ARC: mm: PAE40: Fix crash at munmap Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 05/35] ARC: Dont use "+l" inline asm constraint Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 07/35] thp: fix corner case of munlock() of PTE-mapped THPs Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 08/35] zram: fix unbalanced idr management at hot removal Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 09/35] kasan: update kasan_global for gcc 7 Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 10/35] mm: fix false-positive WARN_ON() in truncate/invalidate for hugetlb Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 11/35] ovl: fix d_real() for stacked fs Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 12/35] Input: change KEY_DATA from 0x275 to 0x277 Greg Kroah-Hartman
2016-12-07  7:08   ` Greg Kroah-Hartman [this message]
2016-12-07  7:08   ` [PATCH 4.8 16/35] PCI: Export pcie_find_root_port Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 18/35] mwifiex: printk() overflow with 32-byte SSIDs Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 19/35] KVM: arm/arm64: vgic: Dont notify EOI for non-SPIs Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 20/35] drm/i915: Dont touch NULL sg on i915_gem_object_get_pages_gtt() error Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 21/35] drm/i915: drop the struct_mutex when wedged or trying to reset Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 23/35] drm/radeon: fix power state when port pm is unavailable (v2) Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 24/35] drm/amdgpu: fix check for port PM availability Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 25/35] drm/radeon: " Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 26/35] arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 28/35] pwm: Fix device reference leak Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 29/35] drm/mediatek: fix null pointer dereference Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 30/35] perf/x86: Restore TASK_SIZE check on frame pointer Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 32/35] batman-adv: Detect missing primaryif during tp_send as error Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 33/35] arm64: cpufeature: Schedule enable() calls instead of calling them via IPI Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 34/35] arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call Greg Kroah-Hartman
2016-12-07  7:08   ` [PATCH 4.8 35/35] arm64: suspend: Reconfigure PSTATE after resume from idle Greg Kroah-Hartman
2016-12-07 16:08   ` [PATCH 4.8 00/35] 4.8.13-stable review Guenter Roeck
2016-12-08 16:25     ` Greg Kroah-Hartman
2016-12-07 18:17   ` Shuah Khan
2016-12-08 16:25     ` Greg Kroah-Hartman
     [not found]   ` <5848780d.c64bc20a.dbbb2.736e@mx.google.com>
     [not found]     ` <m2bmwncp0c.fsf@baylibre.com>
2016-12-08 16:26       ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161207070723.278613814@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dhaval.giani@oracle.com \
    --cc=dingtianhong@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).