From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754826Ab2AYJom (ORCPT ); Wed, 25 Jan 2012 04:44:42 -0500 Received: from mail-gx0-f174.google.com ([209.85.161.174]:36937 "EHLO mail-gx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752362Ab2AYJoj (ORCPT ); Wed, 25 Jan 2012 04:44:39 -0500 Message-ID: <4F1FCF02.9060209@numascale-asia.com> Date: Wed, 25 Jan 2012 17:44:34 +0800 From: Daniel J Blueman Organization: Numascale Asia User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:9.0) Gecko/20111229 Thunderbird/9.0 MIME-Version: 1.0 To: Paul E McKenney , Dipankar Sarma CC: Steffen Persvold , linux-kernel@vger.kernel.org, x86@kernel.org Subject: RCU qsmask !=0 warnings on large-SMP... Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org During validation of an interconnect which extends the AMD Opteron HyperTransport to many servers (Numascale NumaConnect), we observe RCU warnings [1] when larger systems are under specific workloads, and subsequent hanging. This is reproducable via RCU-intensive activities, eg the rcutorture module (let me know if interested in another reproducer 'bigmap' that hits the spot) on eg stock 3.1.8 and 3.2.1. Reproducing on a subset of cores for clarity and with further debug [2], we observe struct rcu_node's qsmask bitfield not being cleared on typically three tree nodes when expected to be [3] and we see the RCU grace periods stop incrementing. Has this been encountered previously? Many thanks, Daniel --- [1] WARNING: at kernel/rcutree_plugin.h:990 rcu_preempt_check_blocked_tasks+0x27/0x30() Hardware name: ProLiant DL165 G6 Modules linked in: Pid: 21880, comm: bigmap Not tainted 3.1.8-numaconnect10+ #5 Call Trace: [] ? rcu_preempt_check_blocked_tasks+0x27/0x30 [] warn_slowpath_common+0x8b/0xc0 [] ? perf_event_task_tick+0x71/0xa0 [] warn_slowpath_null+0x15/0x20 [] rcu_preempt_check_blocked_tasks+0x27/0x30 [] rcu_start_gp+0x163/0x200 [] __rcu_process_callbacks+0x8b/0xd0 [] rcu_process_callbacks+0x20/0x50 [] __do_softirq+0x9d/0x140 [] call_softirq+0x1c/0x30 [] do_softirq+0x4a/0x80 [] irq_exit+0xa5/0xb0 [] smp_apic_timer_interrupt+0x45/0x60 [] apic_timer_interrupt+0x6b/0x70 [] ? call_rcu_sched+0x10/0x20 [] ? fsnotify+0x70/0x2d0 [] ? iput_final+0xe3/0x150 [] ? kmem_cache_free+0x15/0xa0 [] ? vfsmount_lock_global_lock_online+0x4a/0xc0 [] __fput+0x17d/0x1f0 [] fput+0x17/0x20 [] remove_vma+0x36/0x70 [] exit_mmap+0xee/0x120 [] mmput+0x44/0xf0 [] exit_mm+0x109/0x140 [] do_exit+0x1b6/0x3d0 [] ? __dequeue_signal+0x5b/0xa0 [] do_group_exit+0x3c/0xa0 [] get_signal_to_deliver+0x1a0/0x320 [] do_signal+0x4c/0x120 [] ? kill_something_info+0x3d/0x130 [] ? sys_kill+0x7b/0x90 [] do_notify_resume+0x6d/0x80 [] int_signal+0x12/0x17 ---[ end trace 826a49848cffd17f ]--- --- [2] diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 6b76d81..7875427 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -828,6 +828,39 @@ rcu_start_gp_per_cpu(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat __note_new_gpnum(rsp, rnp, rdp); } +static void +_debug_print(struct rcu_data *rdp) +{ + printk(KERN_ERR "%3d %p c=%lu g=%lu pq=%d pgp=%lu qp=%d of=%lu ri=%lu ql=%ld qs=%c%c%c%c b=%ld ci=%lu co=%lu ca=%lu\n", + rdp->cpu, rdp, + rdp->completed, rdp->gpnum, + rdp->passed_quiesce, rdp->passed_quiesce_gpnum, + rdp->qs_pending, + rdp->offline_fqs, rdp->resched_ipi, + rdp->qlen, + ".N"[rdp->nxttail[RCU_NEXT_READY_TAIL] != + rdp->nxttail[RCU_NEXT_TAIL]], + ".R"[rdp->nxttail[RCU_WAIT_TAIL] != + rdp->nxttail[RCU_NEXT_READY_TAIL]], + ".W"[rdp->nxttail[RCU_DONE_TAIL] != + rdp->nxttail[RCU_WAIT_TAIL]], + ".D"[&rdp->nxtlist != rdp->nxttail[RCU_DONE_TAIL]], + rdp->blimit, rdp->n_cbs_invoked, rdp->n_cbs_orphaned, rdp->n_cbs_adopted); +} + +static void +rcu_debug_print(struct rcu_state *rsp, struct rcu_node *rnp) +{ + int cpu; + + if (rnp->qsmask) { + printk(KERN_ERR "CPU %d, treason uncloaked, rsp @ %p (%s), rsp->gpnum is %lu, rnp @ %p, qsmask is 0x%lx\n", + smp_processor_id(), rsp, rsp->name, rsp->gpnum, rnp, rnp->qsmask); + for_each_online_cpu(cpu) + _debug_print(per_cpu_ptr(rsp->rda, cpu)); + } +} + /* * Start a new RCU grace period if warranted, re-initializing the hierarchy * in preparation for detecting the next grace period. The caller must hold @@ -873,6 +906,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags) /* Special-case the common single-level case. */ if (NUM_RCU_NODES == 1) { + rcu_debug_print(rsp, rnp); rcu_preempt_check_blocked_tasks(rnp); rnp->qsmask = rnp->qsmaskinit; rnp->gpnum = rsp->gpnum; @@ -912,6 +946,7 @@ rcu_start_gp(struct rcu_state *rsp, unsigned long flags) */ rcu_for_each_node_breadth_first(rsp, rnp) { raw_spin_lock(&rnp->lock); /* irqs already disabled. */ + rcu_debug_print(rsp, rnp); rcu_preempt_check_blocked_tasks(rnp); rnp->qsmask = rnp->qsmaskinit; rnp->gpnum = rsp->gpnum; --- [3] CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched), rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d000, qsmask is 0x1 0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=279 ql=1 qs=..W. b=10 ci=145949 co=0 ca=0 12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=10 ql=2 qs=.R.. b=10 ci=561 co=0 ca=0 24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=14 ql=0 qs=.... b=10 ci=406 co=0 ca=0 36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=22 ql=0 qs=.... b=10 ci=215 co=0 ca=0 48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=14 ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0 60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16 ql=0 qs=.... b=10 ci=50 co=0 ca=0 72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26 ql=0 qs=.... b=10 ci=22 co=0 ca=0 ------------[ cut here ]------------ WARNING: at kernel/rcutree_plugin.h:1011 rcu_preempt_check_blocked_tasks+0x27/0x30() Hardware name: H8QI6 Modules linked in: rcutorture Pid: 4562, comm: rcu_torture_rea Not tainted 3.2.1-numaconnect10+ #59 Call Trace: [] ? rcu_preempt_check_blocked_tasks+0x27/0x30 [] warn_slowpath_common+0x8b/0xc0 [] warn_slowpath_null+0x15/0x20 [] rcu_preempt_check_blocked_tasks+0x27/0x30 [] rcu_start_gp+0x102/0x1b0 [] __rcu_process_callbacks+0x8b/0xd0 [] rcu_process_callbacks+0x20/0x40 [] __do_softirq+0x9d/0x140 [] ? rcu_torture_shuffle+0x80/0x80 [rcutorture] [] call_softirq+0x1c/0x30 [] do_softirq+0x4a/0x80 [] irq_exit+0x43/0x60 [] smp_apic_timer_interrupt+0x45/0x60 [] apic_timer_interrupt+0x6b/0x70 [] ? __schedule+0x34f/0x710 [] ? update_curr+0x85/0xd0 [] ? sched_clock_local+0x15/0x80 [] ? sched_clock_cpu+0x65/0x90 [] ? rcu_torture_shuffle+0x80/0x80 [rcutorture] [] schedule+0x3a/0x60 [] rcu_torture_reader+0x130/0x230 [rcutorture] [] ? rcu_torture_writer+0x160/0x160 [rcutorture] [] ? rcu_torture_shuffle+0x80/0x80 [rcutorture] [] kthread+0x96/0xa0 [] kernel_thread_helper+0x4/0x10 [] ? kthread_stop+0x70/0x70 [] ? gs_change+0xb/0xb ---[ end trace 19f7bd7f233f1506 ]--- CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched), rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d100, qsmask is 0x8 0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=279 ql=1 qs=..W. b=10 ci=145949 co=0 ca=0 12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=10 ql=3 qs=NR.. b=10 ci=561 co=0 ca=0 24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=14 ql=0 qs=.... b=10 ci=406 co=0 ca=0 36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=22 ql=0 qs=.... b=10 ci=215 co=0 ca=0 48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=19 ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0 60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16 ql=0 qs=.... b=10 ci=50 co=0 ca=0 72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26 ql=0 qs=.... b=10 ci=22 co=0 ca=0 CPU 48, treason uncloaked, rsp @ ffffffff81a1d000 (rcu_sched), rsp->gpnum is 4058, rsp->completed is 4057, rnp @ ffffffff81a1d800, qsmask is 0x1 0 ffff8803f840d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=0 ri=280 ql=1 qs=..W. b=10 ci=145949 co=0 ca=0 12 ffff880bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=7 ri=11 ql=3 qs=NR.. b=10 ci=561 co=0 ca=0 24 ffff8813d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=15 ql=0 qs=.... b=10 ci=406 co=0 ca=0 36 ffff881bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=5 ri=23 ql=0 qs=.... b=10 ci=215 co=0 ca=0 48 ffff8823d040d4e0 c=4057 g=4057 pq=1 pgp=4057 qp=0 of=6 ri=21 ql=26 qs=.RWD b=10 ci=7345 co=0 ca=0 60 ffff882bd040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=16 ql=0 qs=.... b=10 ci=50 co=0 ca=0 72 ffff8833d040d4e0 c=4057 g=4058 pq=1 pgp=4058 qp=0 of=6 ri=26 ql=0 qs=.... b=10 ci=22 co=0 ca=0 -- Daniel J Blueman Principal Software Engineer, Numascale Asia