Dhaval Giani wrote: > On Fri, Sep 11, 2009 at 02:28:13PM -0700, Andrew Morton wrote: > >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). >> >> On Thu, 10 Sep 2009 09:32:30 GMT >> bugzilla-daemon@bugzilla.kernel.org wrote: >> >> >>> http://bugzilla.kernel.org/show_bug.cgi?id=14150 >>> >>> Summary: BUG: soft lockup - CPU#3 stuck for 61s!, while running >>> cpu controller latency testcase on two containers >>> parallaly >>> Product: Process Management >>> Version: 2.5 >>> Kernel Version: 2.6.31-rc7 >>> Platform: All >>> OS/Version: Linux >>> Tree: Mainline >>> Status: NEW >>> Severity: high >>> Priority: P1 >>> Component: Scheduler >>> AssignedTo: mingo@elte.hu >>> ReportedBy: risrajak@linux.vnet.ibm.com >>> CC: serue@us.ibm.com, iranna.ankad@in.ibm.com, >>> risrajak@in.ibm.com >>> Regression: No >>> >>> >>> Created an attachment (id=23055) >>> --> (http://bugzilla.kernel.org/attachment.cgi?id=23055) >>> Config-file-used >>> >>> Hitting this soft lock issue while running this scenario on 2.6.31-rc7 kernel >>> on SystemX 32 bit on multiple machines. >>> >>> Scenario: >>> - While running cpu controller latency testcase from LTP same time on two >>> containers. >>> >>> Steps: >>> 1. Compile ltp-full-20090731.tgz on host. >>> 2. Create two container (Used lxc tool >>> (http://sourceforge.net/projects/lxc/lxc-0.6.3.tar.gz) for creating container ) >>> e.g: >>> lxc-create -n foo1 >>> lxc-create -n foo2 >>> On first shell: >>> lxc-execute -n foo1 -f /usr/etc/lxc/lxc-macvlan.conf /bin/bash >>> on Second shell: >>> lxc-execute -n foo2 -f /usr/etc/lxc/lxc-macvlan.conf /bin/bash >>> >>> 3. Either you run cpu_latency testcase alone or run "./runltp -f controllers" >>> at same time on both the containers. >>> 4. After testcase execution completes, you can see this message in dmesg. >>> >>> Expected Result: >>> - Should not reproduce soft lock up issue. >>> - This reproduces 3 times out of 5 tries. >>> >>> hrtimer: interrupt too slow, forcing clock min delta to 5843235 ns >>> hrtimer: interrupt too slow, forcing clock min delta to 5842476 ns >>> Clocksource tsc unstable (delta = 18749057581 ns) >>> BUG: soft lockup - CPU#3 stuck for 61s! [cpuctl_latency_:17174] >>> Modules linked in: bridge stp llc bnep sco l2cap bluetooth sunrpc ipv6 >>> p4_clockmod dm_multipath uinput qla2xxx ata_generic pata_acpi usb_storage e1000 >>> scsi_transport_fc joydev scsi_tgt i2c_piix4 pata_serverworks pcspkr serio_raw >>> mptspi mptscsih mptbase scsi_transport_spi radeon ttm drm i2c_algo_bit i2c_core >>> [last unloaded: scsi_wait_scan] >>> >>> Pid: 17174, comm: cpuctl_latency_ Tainted: G W (2.6.31-rc7 #1) IBM >>> eServer BladeCenter HS40 -[883961X]- >>> EIP: 0060:[] EFLAGS: 00000283 CPU: 3 >>> EIP is at find_next_bit+0x9/0x79 >>> EAX: c2c437a0 EBX: f3d433c0 ECX: 00000000 EDX: 00000020 >>> ESI: c2c436bc EDI: 00000000 EBP: f063be6c ESP: f063be64 >>> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 >>> CR0: 80050033 CR2: 008765a4 CR3: 314d7000 CR4: 000006d0 >>> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >>> DR6: ffff0ff0 DR7: 00000400 >>> Call Trace: >>> [] cpumask_next+0x17/0x19 >>> [] tg_shares_up+0x53/0x149 >>> [] ? tg_nop+0x0/0xc >>> [] ? tg_nop+0x0/0xc >>> [] walk_tg_tree+0x63/0x77 >>> [] ? tg_shares_up+0x0/0x149 >>> [] update_shares+0x5d/0x65 >>> [] rebalance_domains+0x114/0x460 >>> [] ? restore_all_notrace+0x0/0x18 >>> [] run_rebalance_domains+0x36/0xa3 >>> [] __do_softirq+0xbc/0x173 >>> [] do_softirq+0x3b/0x5f >>> [] irq_exit+0x3a/0x68 >>> [] smp_apic_timer_interrupt+0x6d/0x7b >>> [] apic_timer_interrupt+0x2f/0x34 >>> BUG: soft lockup - CPU#2 stuck for 61s! [watchdog/2:11] >>> Modules linked in: bridge stp llc bnep sco l2cap bluetooth sunrpc ipv6 >>> p4_clockmod dm_multipath uinput qla2xxx ata_generic pata_acpi usb_storage e1000 >>> scsi_transport_fc joydev scsi_tgt i2c_piix4 pata_serverworks pcspkr serio_raw >>> mptspi mptscsih mptbase scsi_transport_spi radeon ttm drm i2c_algo_bit i2c_core >>> [last unloaded: scsi_wait_scan] >>> >>> Pid: 11, comm: watchdog/2 Tainted: G W (2.6.31-rc7 #1) IBM eServer >>> BladeCenter HS40 -[883961X]- >>> EIP: 0060:[] EFLAGS: 00000246 CPU: 2 >>> EIP is at tg_shares_up+0xd9/0x149 >>> EAX: 00000000 EBX: f09b3c00 ECX: f0baac00 EDX: 00000100 >>> ESI: 00000002 EDI: 00000400 EBP: f6cb7de0 ESP: f6cb7db8 >>> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 >>> CR0: 8005003b CR2: 08070680 CR3: 009c8000 CR4: 000006d0 >>> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 >>> DR6: ffff0ff0 DR7: 00000400 >>> Call Trace: >>> [] ? tg_nop+0x0/0xc >>> [] ? tg_nop+0x0/0xc >>> [] walk_tg_tree+0x63/0x77 >>> [] ? tg_shares_up+0x0/0x149 >>> [] update_shares+0x5d/0x65 >>> [] rebalance_domains+0x114/0x460 >>> [] run_rebalance_domains+0x36/0xa3 >>> [] __do_softirq+0xbc/0x173 >>> [] do_softirq+0x3b/0x5f >>> [] irq_exit+0x3a/0x68 >>> [] smp_apic_timer_interrupt+0x6d/0x7b >>> [] apic_timer_interrupt+0x2f/0x34 >>> [] ? finish_task_switch+0x5d/0xc4 >>> [] schedule+0x74c/0x7b2 >>> [] ? trace_hardirqs_on_thunk+0xc/0x10 >>> [] ? restore_all_notrace+0x0/0x18 >>> [] ? watchdog+0x0/0x79 >>> [] ? watchdog+0x0/0x79 >>> [] watchdog+0x4a/0x79 >>> [] kthread+0x70/0x75 >>> [] ? kthread+0x0/0x75 >>> [] kernel_thread_helper+0x7/0x10 >>> [root@hs40 ltp-full-20090731]# uname -a >>> Linux hs40.in.ibm.com 2.6.31-rc7 #1 SMP Thu Sep 3 10:14:41 IST 2009 i686 i686 >>> i386 GNU/Linux >>> [root@hs40 ltp-full-20090731]# >>> >>> > > We have been unable to reproduce it on current -tip. Rishi, are you able > to reproduce it on -tip? > > thanks, > I am not able to create container with lxc on -tip kernel with config file attached. As soon as i am executing "lxc-execute ..." it hangs and only way to recover is to hard reboot system. I am not sure about tip but i am able to create the problem pretty easily on 2.6.31-rc7 with that config file. Only the changes i have done in the config file from (2.6.31-rc7) is : - Disabled KVM as it was giving me error on -tip kernel. - Applied following patch : http://www.gossamer-threads.com/lists/linux/kernel/1129527 Please let me know if you are able to recreate it on -tip with following config.