From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vasily Averin Subject: forcedeth hangs on loading Date: Fri, 30 Nov 2007 10:52:29 +0300 Message-ID: <474FC13D.6060002@sw.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Cc: Jeff Garzik , netdev@vger.kernel.org, devel@openvz.org To: Ayaz Abdulla Return-path: Received: from mailhub.sw.ru ([195.214.233.200]:32443 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754037AbXK3HxQ (ORCPT ); Fri, 30 Nov 2007 02:53:16 -0500 Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Hi Ayaz, While testing new OpenVZ kernel based on RHEL5.1 kernel 2.6.18-53.el5 we have observed that node loads forcedeth driver too long: Nov 28 14:04:19 ts49 kjournald starting. Commit interval 5 seconds Nov 28 14:04:19 ts49 EXT3-fs: mounted filesystem with ordered data mode. ... Nov 28 19:34:47 ts49 modprobe D ffff8101ff772120 0 4101 4098 (NOTLB) Nov 28 19:34:47 ts49 ffff8101fe7e1c28 0000000000000086 0000000000000246 00000010000200d0 Nov 28 19:34:47 ts49 ffff8101ff772120 ffff8103ff93c930 000012229b7b3bd7 00002bcc55b0dda7 Nov 28 19:34:47 ts49 ffff8101ff772328 ffff8101ff93a000 ffffffff802de500 ffff8103ff93c930 Nov 28 19:34:47 ts49 Call Trace: Nov 28 19:34:47 ts49 [] schedule_timeout+0x8a/0xad Nov 28 19:34:47 ts49 [] process_timeout+0x0/0x5 Nov 28 19:34:47 ts49 [] msleep+0x21/0x2c Nov 28 19:34:47 ts49 [] :forcedeth:nv_probe+0x873/0xd4d Nov 28 19:34:47 ts49 [] pci_device_probe+0x100/0x180 Nov 28 19:34:47 ts49 [] driver_probe_device+0x52/0xaa Nov 28 19:34:47 ts49 [] __driver_attach+0x65/0xb6 Nov 28 19:34:47 ts49 [] __driver_attach+0x0/0xb6 Nov 28 19:34:47 ts49 [] bus_for_each_dev+0x43/0x6e Nov 28 19:34:47 ts49 [] bus_add_driver+0x7e/0x130 Nov 28 19:34:47 ts49 [] __pci_register_driver+0x57/0x7e Nov 28 19:34:47 ts49 [] sys_init_module+0x16aa/0x185f Nov 28 19:34:47 ts49 [] system_call+0x7e/0x83 ... Nov 28 21:02:02 ts49 eth0: forcedeth.c: subsystem: 010f1:2912 bound to 0000:00:08.0 Nov 28 21:02:02 ts49 ACPI: PCI Interrupt Link [LMA2] enabled at IRQ 18 Nov 28 21:02:02 ts49 GSI 22 sharing vector 0x3A and IRQ 22 Nov 28 21:02:02 ts49 ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LMA2] -> GSI 18 (level, high) -> IRQ 58 Nov 28 21:02:02 ts49 PCI: Setting latency timer of device 0000:00:09.0 to 64 Nov 28 21:02:02 ts49 Nov 28 21:02:02 ts49 forcedeth: using HIGHDMA Nov 28 21:02:02 ts49 eth1: forcedeth.c: subsystem: 010f1:2912 bound to 0000:00:09.0 nv_probe() was called under semaphore and blocks lots of other modprobe's from loading other drivers. linux-2.6-net-forcedeth-update-to-driver-version-0-60.patch patch from RHEL5.1 added the following piece of code to nv_probe(): + for (i = 0; i < 5000; i++) { + msleep(1); + if (nv_mgmt_acquire_sema(dev)) { + } + } obviously, this loops 5000 times and calls nv_mgmt_acquire_sema() inside, which in the worst case does msleep(500) 10 times. So this loop can last 5000*10*0.5sec = 25000sec = 6.94 hours This is exactly what we face in the bug: boot hanged at 14:04:19, continued at 21:02:02, i.e. it took ~25063 seconds. As a workaround we have changed number of loops from 5000 to 5. https://bugzilla.redhat.com/show_bug.cgi?id=405521 [root@ts49 ~]# lspci -n 00:00.0 0500: 10de:0369 (rev a2) 00:01.0 0601: 10de:0364 (rev a3) 00:01.1 0c05: 10de:0368 (rev a3) 00:02.0 0c03: 10de:036c (rev a1) 00:02.1 0c03: 10de:036d (rev a2) 00:04.0 0101: 10de:036e (rev a1) 00:05.0 0101: 10de:037f (rev a3) 00:05.1 0101: 10de:037f (rev a3) 00:05.2 0101: 10de:037f (rev a3) 00:06.0 0604: 10de:0370 (rev a2) 00:08.0 0680: 10de:0373 (rev a3) 00:09.0 0680: 10de:0373 (rev a3) 00:0a.0 0604: 10de:0376 (rev a3) 00:0d.0 0604: 10de:0378 (rev a3) 00:0f.0 0604: 10de:0377 (rev a3) 00:18.0 0600: 1022:1100 00:18.1 0600: 1022:1101 00:18.2 0600: 1022:1102 00:18.3 0600: 1022:1103 00:19.0 0600: 1022:1100 00:19.1 0600: 1022:1101 00:19.2 0600: 1022:1102 00:19.3 0600: 1022:1103 01:04.0 0300: 1002:515e (rev 02) 00:08.0 0680: 10de:0373 (rev a3) Subsystem: 10f1:2912 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR-