From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugme-daemon@bugzilla.kernel.org Subject: [Bug 11646] QLA2xxx: Kernel deadlock on high load somewhere after 2.6.20 Date: Wed, 19 Nov 2008 14:10:09 -0800 (PST) Message-ID: <20081119221009.166C910800F@picon.linux-foundation.org> References: Return-path: Received: from smtp1.linux-foundation.org ([140.211.169.13]:58294 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750968AbYKSWKk (ORCPT ); Wed, 19 Nov 2008 17:10:40 -0500 Received: from picon.linux-foundation.org (picon.linux-foundation.org [140.211.169.79]) by smtp1.linux-foundation.org (8.14.2/8.13.5/Debian-3ubuntu1.1) with ESMTP id mAJMA9We023515 for ; Wed, 19 Nov 2008 14:10:10 -0800 In-Reply-To: Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: linux-scsi@vger.kernel.org http://bugzilla.kernel.org/show_bug.cgi?id=11646 ------- Comment #16 from daniel@economicmodeling.com 2008-11-19 14:10 ------- I have experienced this bug on IBM HS21 Blades running Debian Lenny/2.6.22 connected to IBM DS3400 storage via qlogic switch. The crashes occurred during cp and rsync operations from one array to another. I solved the problem by replacing the Linux qla2xxx module with the official qlogic RHEL/SUSE driver and hacking it to work as a module in Debian. The mailbox timeouts stopped after switching drivers. This suggests a bug in the current Linux qla2xxx driver- NOT a hardware problem. Here is syslog output from a typical crash: Jan 27 19:41:53 hqhost kernel: qla2xxx 0000:08:01.0: Mailbox command timeout occured. Issuing ISP abort. Jan 27 19:41:53 hqhost kernel: qla2xxx 0000:08:01.0: Performing ISP error recovery - ha= ffff810223d0c530. Jan 27 19:41:53 hqhost kernel: qla2xxx 0000:08:01.0: LOOP UP detected (4 Gbps). Jan 27 19:41:54 hqhost kernel: qla2xxx 0000:08:01.0: SNS scan failed -- assuming zero-entry result... Jan 27 19:41:54 hqhost kernel: APIC error on CPU0: 00(40) Jan 27 19:41:54 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:1): Abort command issued -- 0 9a776 2002. Jan 27 19:42:28 hqhost kernel: rport-0:0-0: blocked FC remote port time out: removing target and saving binding Jan 27 19:42:28 hqhost kernel: rport-0:0-4: blocked FC remote port time out: removing target and saving binding Jan 27 19:42:28 hqhost kernel: rport-0:0-5: blocked FC remote port time out: removing target and saving binding Jan 27 19:42:28 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:0): DEVICE RESET ISSUED. Jan 27 19:42:28 hqhost kernel: APIC error on CPU5: 00(40) Jan 27 19:42:28 hqhost kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache -- Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.