From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Christopher S. Aker" Subject: Re: Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device Date: Tue, 27 Sep 2011 14:13:21 -0400 Message-ID: <4E821241.6090602@theshore.net> References: <4CB38558.5060207@theshore.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4CB38558.5060207@theshore.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen devel Cc: Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On 10/11/10 5:44 PM, Christopher S. Aker wrote: > In an effort to fix the problem described in my previous xen-devel post > ("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we've > come across another problem. 3ware 9690SA cards to not behave under Xen > 4.1 (as of cs 22155). > > We have a simple Xen thrash test suite which fires up domUs that do > different workloads (some swap thrash, some kernel build, some spin > CPUs, some cycle rebooting, etc). Almost immediately after launching the > suite we can get the 3ware 9690SA card to fail with something like the > following: > > sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting > card. > sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting > card. > sd 0:0:0:0: rejecting I/O to offline device > sd 0:0:0:0: rejecting I/O to offline device > > Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so: > > http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt > > Results matrix: > > +---------------------------------------------------------------+ > | Xen | Dom0 | 9550SXU | 9690SA | 9750 | > +---------------------------------------------------------------+ > | 3.4.1 | 2.6.18.8-931-2 | OK | OK | OK | > | 3.4.4-rc1-pre | 2.6.18.8-931-2 | OK | OK | OK | > | 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK | OK | OK | > | 4.1 @ 22155 | 2.6.18.8-931-2 | OK | FAIL | OK | > | 4.1 @ 22155 | 2.6.32.23-g41a85de5 | OK | FAIL | OK | > +---------------------------------------------------------------+ > > The failures were verified on at least 2 machines of identical > specification. > > The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb > under Xen 4.1. I'm back at this, and the problem still exists with a 4.1.1/3.0.4 stack. Konrad, in the "offline raid" thread you asked for the following debug information: http://www.theshore.net/~caker/xen/BUGS/offline-raid/ The sysrq-t.txt and triple-a-star.txt outputs are after I got the raid card to hang up (but before it timed out and started spewing to the console). Oddly, lspci shows three devices assigned IRQ 16, however /proc/interrupts only lists two of them. Side effect of MSI? Also, the problem still happens even with MSI disabled (pci=nomsi). Thanks, -Chris