From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adko Branil Subject: Re: HDD problem, software bug, bios bug, or hardware ? Date: Fri, 7 Sep 2012 04:32:56 -0700 (PDT) Message-ID: <1347017576.44763.YahooMailNeo@web124703.mail.ne1.yahoo.com> References: <1345901771.8871.YahooMailNeo@web124706.mail.ne1.yahoo.com> <20120826130152.GA20021@liondog.tnic> <1346086872.65665.YahooMailNeo@web124704.mail.ne1.yahoo.com> <20120827215952.GA18719@liondog.tnic> <1346259574.81504.YahooMailNeo@web124706.mail.ne1.yahoo.com> <20120830095808.GB24680@liondog.tnic> <1346325007.23643.YahooMailNeo@web124706.mail.ne1.yahoo.com> <20547.48101.900727.735398@pilspetsen.it.uu.se> Reply-To: Adko Branil Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from nm5-vm2.bullet.mail.ne1.yahoo.com ([98.138.90.153]:43956 "HELO nm5-vm2.bullet.mail.ne1.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1753222Ab2IGLc5 convert rfc822-to-8bit (ORCPT ); Fri, 7 Sep 2012 07:32:57 -0400 In-Reply-To: <20547.48101.900727.735398@pilspetsen.it.uu.se> Sender: linux-ide-owner@vger.kernel.org List-Id: linux-ide@vger.kernel.org To: Mikael Pettersson Cc: Borislav Petkov , Jeff Garzik , linux-ide , lkml "linux-ide@vger.kernel.org" , Dan Merillat , "Rafael J. Wysocki" After updating bios no more crashes happened, i tested it many times on= heavy HDD IO loads, with many kernels (including CONFIG_PREEMPT kernel= s). But now if enable "Cool'n' Quiet" option in bios,=A0 CONFIG_PREEMPT= _VOLUNTARY kernel with passed "nosmp" at boot time, crashes during boot= process with kernel panic, while=A0 CONFIG_PREEMPT kernlel without "no= smp" works fine=A0 - but it is another story i think, should not be rel= ated with the crashes when it was old bios, and i think it is probably = "nosmp" the reason. (i have never changed cpu frequency of this cpu at = all) When "Cool'n' Quiet" is disabled, the system works perfectly adequ= ately with all kind of kernels i tried. Except that this warning messag= e in dmesg still appears (if it is problem at all). I put here this mes= sage for "nosmp" case as well, kernel is 3.5.2: [=A0=A0=A0 1.912494] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D [=A0=A0=A0 1.912494] [ INFO: inconsistent lock state ] [=A0=A0=A0 1.912494] 3.5.2 #4 Not tainted [=A0=A0=A0 1.912494] --------------------------------- [=A0=A0=A0 1.912494] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usag= e. [=A0=A0=A0 1.912494] swapper/0/1 [HC1[1]:SC1[1]:HE0:SE0] takes: [=A0=A0=A0 1.912494]=A0 (&(&host->lock)->rlock){?.+...}, at: [] ata_bmdma_interrupt+0x27/0x1d0 [=A0=A0=A0 1.912494] {HARDIRQ-ON-W} state was registered at: [=A0=A0=A0 1.912494]=A0=A0 [] __lock_acquire+0x61b/0x= 1af0 [=A0=A0=A0 1.912494]=A0=A0 [] lock_acquire+0x8a/0x110 [=A0=A0=A0 1.912494]=A0=A0 [] _raw_spin_lock+0x31/0x4= 0 [=A0=A0=A0 1.912494]=A0=A0 [] pdc_sata_hardreset+0x85= /0x100 [=A0=A0=A0 1.912494]=A0=A0 [] ata_do_reset+0x3a/0x90 [=A0=A0=A0 1.912494]=A0=A0 [] ata_eh_reset+0x372/0xe0= 0 [=A0=A0=A0 1.912494]=A0=A0 [] ata_eh_recover+0x2a5/0x= 13d0 [=A0=A0=A0 1.912494]=A0=A0 [] ata_do_eh+0x4d/0xb0 [=A0=A0=A0 1.912494]=A0=A0 [] ata_sff_error_handler+0= xca/0x120 [=A0=A0=A0 1.912494]=A0=A0 [] pdc_error_handler+0x24/= 0x30 [=A0=A0=A0 1.912494]=A0=A0 [] ata_scsi_port_error_han= dler+0x47c/0x800 [=A0=A0=A0 1.912494]=A0=A0 [] ata_scsi_error+0x9e/0xd= 0 [=A0=A0=A0 1.912494]=A0=A0 [] scsi_error_handler+0xf8= /0x500 [=A0=A0=A0 1.912494]=A0=A0 [] kthread+0xae/0xc0 [=A0=A0=A0 1.912494]=A0=A0 [] kernel_thread_helper+0x= 4/0x10 [=A0=A0=A0 1.912494] irq event stamp: 661637 [=A0=A0=A0 1.912494] hardirqs last=A0 enabled at (661636): [] __do_softirq+0x71/0x1f0 [=A0=A0=A0 1.912494] hardirqs last disabled at (661637): [] common_interrupt+0x67/0x6c [=A0=A0=A0 1.912494] softirqs last=A0 enabled at (661610): [] __do_softirq+0x134/0x1f0 [=A0=A0=A0 1.912494] softirqs last disabled at (661635): [] call_softirq+0x1c/0x30 [=A0=A0=A0 1.912494]=20 [=A0=A0=A0 1.912494] other info that might help us debug this: [=A0=A0=A0 1.912494]=A0 Possible unsafe locking scenario: [=A0=A0=A0 1.912494]=20 [=A0=A0=A0 1.912494]=A0=A0=A0=A0=A0=A0=A0 CPU0 [=A0=A0=A0 1.912494]=A0=A0=A0=A0=A0=A0=A0 ---- [=A0=A0=A0 1.912494]=A0=A0 lock(&(&host->lock)->rlock); [=A0=A0=A0 1.912494]=A0=A0 [=A0=A0=A0 1.912494]=A0=A0=A0=A0 lock(&(&host->lock)->rlock); [=A0=A0=A0 1.912494]=20 [=A0=A0=A0 1.912494]=A0 *** DEADLOCK *** [=A0=A0=A0 1.912494]=20 [=A0=A0=A0 1.912494] 5 locks held by swapper/0/1: [=A0=A0=A0 1.912494]=A0 #0:=A0 (&__lockdep_no_validate__){......}, at: = [] __driver_attach+0x5b/0xb0 [=A0=A0=A0 1.912494]=A0 #1:=A0 (&__lockdep_no_validate__){......}, at: = [] __driver_attach+0x69/0xb0 [=A0=A0=A0 1.912494]=A0 #2:=A0 (usb_bus_list_lock){+.+.+.}, at: [] usb_add_hcd+0x295/0x6a0 [=A0=A0=A0 1.912494]=A0 #3:=A0 (&__lockdep_no_validate__){......}, at: = [] device_attach+0x2a/0xc0 [=A0=A0=A0 1.912494]=A0 #4:=A0 (&__lockdep_no_validate__){......}, at: = [] device_attach+0x2a/0xc0 [=A0=A0=A0 1.912494]=20 [=A0=A0=A0 1.912494] stack backtrace: [=A0=A0=A0 1.912494] Pid: 1, comm: swapper/0 Not tainted 3.5.2 #4 [=A0=A0=A0 1.912494] Call Trace: [=A0=A0=A0 1.912494]=A0 =A0 [] print_usage_bug+0= x1f7/0x208 [=A0=A0=A0 1.912494]=A0 [] ? save_stack_trace+0x2f/0x= 50 [=A0=A0=A0 1.912494]=A0 [] ? print_shortest_lock_depe= ndencies+0x1d0/0x1d0 [=A0=A0=A0 1.912494]=A0 [] mark_lock+0x262/0x2a0 [=A0=A0=A0 1.912494]=A0 [] ? __lock_acquire+0x2d5/0x1= af0 [=A0=A0=A0 1.912494]=A0 [] __lock_acquire+0x813/0x1af= 0 [=A0=A0=A0 1.912494]=A0 [] ? __lock_acquire+0x2d5/0x1= af0 [=A0=A0=A0 1.912494]=A0 [] lock_acquire+0x8a/0x110 [=A0=A0=A0 1.912494]=A0 [] ? ata_bmdma_interrupt+0x27= /0x1d0 [=A0=A0=A0 1.912494]=A0 [] ? cpuacct_charge+0xa8/0xf0 [=A0=A0=A0 1.912494]=A0 [] _raw_spin_lock_irqsave+0x4= 1/0x60 [=A0=A0=A0 1.912494]=A0 [] ? ata_bmdma_interrupt+0x27= /0x1d0 [=A0=A0=A0 1.912494]=A0 [] ata_bmdma_interrupt+0x27/0= x1d0 [=A0=A0=A0 1.912494]=A0 [] ? mask_and_ack_8259A+0x32/= 0x110 [=A0=A0=A0 1.912494]=A0 [] handle_irq_event_percpu+0x= 5d/0x1f0 [=A0=A0=A0 1.912494]=A0 [] handle_irq_event+0x48/0x70 [=A0=A0=A0 1.912494]=A0 [] ? mask_and_ack_8259A+0x6e/= 0x110 [=A0=A0=A0 1.912494]=A0 [] ? handle_level_irq+0x1e/0x= c0 [=A0=A0=A0 1.912494]=A0 [] handle_level_irq+0x71/0xc0 [=A0=A0=A0 1.912494]=A0 [] handle_irq+0x22/0x40 [=A0=A0=A0 1.912494]=A0 [] do_IRQ+0x5a/0xe0 [=A0=A0=A0 1.912494]=A0 [] common_interrupt+0x6c/0x6c [=A0=A0=A0 1.912494]=A0 [] ? handle_irq_event+0x53/0x= 70 [=A0=A0=A0 1.912494]=A0 [] ? __do_softirq+0x79/0x1f0 [=A0=A0=A0 1.912494]=A0 [] call_softirq+0x1c/0x30 [=A0=A0=A0 1.912494]=A0 [] do_softirq+0x85/0xc0 [=A0=A0=A0 1.912494]=A0 [] irq_exit+0xb5/0xc0 [=A0=A0=A0 1.912494]=A0 [] do_IRQ+0x63/0xe0 [=A0=A0=A0 1.912494]=A0 [] common_interrupt+0x6c/0x6c [=A0=A0=A0 1.912494]=A0 =A0 [] ? vprintk_emit+0x= 16b/0x4c0 [=A0=A0=A0 1.912494]=A0 [] printk_emit+0x31/0x33 [=A0=A0=A0 1.912494]=A0 [] ? kfree+0xd4/0x160 [=A0=A0=A0 1.912494]=A0 [] __dev_printk+0x127/0x240 [=A0=A0=A0 1.912494]=A0 [] ? kfree+0x95/0x160 [=A0=A0=A0 1.912494]=A0 [] ? usb_control_msg+0xef/0x1= 30 [=A0=A0=A0 1.912494]=A0 [] ? trace_hardirqs_on_caller= +0x105/0x190 [=A0=A0=A0 1.912494]=A0 [] ? trace_hardirqs_on+0xd/0x= 10 [=A0=A0=A0 1.912494]=A0 [] _dev_info+0x53/0x60 [=A0=A0=A0 1.912494]=A0 [] hub_probe+0x3ea/0x850 [=A0=A0=A0 1.912494]=A0 [] ? mutex_unlock+0xe/0x10 [=A0=A0=A0 1.912494]=A0 [] usb_probe_interface+0x184/= 0x230 [=A0=A0=A0 1.912494]=A0 [] driver_probe_device+0x7e/0= x210 [=A0=A0=A0 1.912494]=A0 [] ? __driver_attach+0xb0/0xb= 0 [=A0=A0=A0 1.912494]=A0 [] __device_attach+0x4b/0x60 [=A0=A0=A0 1.912494]=A0 [] bus_for_each_drv+0x4e/0xa0 [=A0=A0=A0 1.912494]=A0 [] device_attach+0xa7/0xc0 [=A0=A0=A0 1.912494]=A0 [] bus_probe_device+0xb0/0xe0 [=A0=A0=A0 1.912494]=A0 [] device_add+0x5cd/0x6a0 [=A0=A0=A0 1.912494]=A0 [] usb_set_configuration+0x4b= e/0x710 [=A0=A0=A0 1.912494]=A0 [] generic_probe+0x43/0xa0 [=A0=A0=A0 1.912494]=A0 [] usb_probe_device+0x2f/0x60 [=A0=A0=A0 1.912494]=A0 [] driver_probe_device+0x7e/0= x210 [=A0=A0=A0 1.912494]=A0 [] ? __driver_attach+0xb0/0xb= 0 [=A0=A0=A0 1.912494]=A0 [] __device_attach+0x4b/0x60 [=A0=A0=A0 1.912494]=A0 [] bus_for_each_drv+0x4e/0xa0 [=A0=A0=A0 1.912494]=A0 [] device_attach+0xa7/0xc0 [=A0=A0=A0 1.912494]=A0 [] bus_probe_device+0xb0/0xe0 [=A0=A0=A0 1.912494]=A0 [] device_add+0x5cd/0x6a0 [=A0=A0=A0 1.912494]=A0 [] usb_new_device+0x1dc/0x2a0 [=A0=A0=A0 1.912494]=A0 [] usb_add_hcd+0x36b/0x6a0 [=A0=A0=A0 1.912494]=A0 [] usb_hcd_pci_probe+0x249/0x= 3c0 [=A0=A0=A0 1.912494]=A0 [] pci_device_probe+0xaf/0x13= 0 [=A0=A0=A0 1.912494]=A0 [] driver_probe_device+0x7e/0= x210 [=A0=A0=A0 1.912494]=A0 [] __driver_attach+0xab/0xb0 [=A0=A0=A0 1.912494]=A0 [] ? driver_probe_device+0x21= 0/0x210 [=A0=A0=A0 1.912494]=A0 [] bus_for_each_dev+0x55/0x90 [=A0=A0=A0 1.912494]=A0 [] ? ohci_hcd_mod_init+0x54/0= x54 [=A0=A0=A0 1.912494]=A0 [] driver_attach+0x1e/0x20 [=A0=A0=A0 1.912494]=A0 [] bus_add_driver+0x1a8/0x270 [=A0=A0=A0 1.912494]=A0 [] ? ohci_hcd_mod_init+0x54/0= x54 [=A0=A0=A0 1.912494]=A0 [] driver_register+0x77/0x150 [=A0=A0=A0 1.912494]=A0 [] ? ohci_hcd_mod_init+0x54/0= x54 [=A0=A0=A0 1.912494]=A0 [] __pci_register_driver+0x6f= /0xe0 [=A0=A0=A0 1.912494]=A0 [] ? ohci_hcd_mod_init+0x54/0= x54 [=A0=A0=A0 1.912494]=A0 [] ? ohci_hcd_mod_init+0x54/0= x54 [=A0=A0=A0 1.912494]=A0 [] uhci_hcd_init+0x80/0xc3 [=A0=A0=A0 1.912494]=A0 [] ? ohci_hcd_mod_init+0x54/0= x54 [=A0=A0=A0 1.912494]=A0 [] do_one_initcall+0x122/0x17= 0 [=A0=A0=A0 1.912494]=A0 [] kernel_init+0x139/0x1bd [=A0=A0=A0 1.912494]=A0 [] ? do_early_param+0x8c/0x8c [=A0=A0=A0 1.912494]=A0 [] kernel_thread_helper+0x4/0= x10 [=A0=A0=A0 1.912494]=A0 [] ? retint_restore_args+0xe/= 0xe [=A0=A0=A0 1.912494]=A0 [] ? start_kernel+0x3bc/0x3bc [=A0=A0=A0 1.912494]=A0 [] ? gs_change+0xb/0xb [=A0=A0=A0 3.201635] ata4.00: ATA-6: ST3200822AS, 3.01, max UDMA/133 [=A0=A0=A0 3.209975] ata4.00: 390721968 sectors, multi 16: LBA48=20 [=A0=A0=A0 3.218619] uhci_hcd 0000:00:10.1: UHCI Host Controller [=A0=A0=A0 3.227150] ata2: SATA link down (SStatus 0 SControl 0) Thanks. Best regards Adko.