From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device Date: Tue, 27 Sep 2011 19:22:25 +0100 Message-ID: <4E821461.3060009@citrix.com> References: <4CB38558.5060207@theshore.net> <4E821241.6090602@theshore.net> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4E821241.6090602@theshore.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On 27/09/2011 19:13, Christopher S. Aker wrote: > On 10/11/10 5:44 PM, Christopher S. Aker wrote: >> In an effort to fix the problem described in my previous xen-devel post >> ("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we've >> come across another problem. 3ware 9690SA cards to not behave under Xen >> 4.1 (as of cs 22155). >> >> We have a simple Xen thrash test suite which fires up domUs that do >> different workloads (some swap thrash, some kernel build, some spin >> CPUs, some cycle rebooting, etc). Almost immediately after launching the >> suite we can get the 3ware 9690SA card to fail with something like the >> following: >> >> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting >> card. >> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting >> card. >> sd 0:0:0:0: rejecting I/O to offline device >> sd 0:0:0:0: rejecting I/O to offline device >> >> Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so: >> >> http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt >> >> Results matrix: >> >> +---------------------------------------------------------------+ >> | Xen | Dom0 | 9550SXU | 9690SA | 9750 | >> +---------------------------------------------------------------+ >> | 3.4.1 | 2.6.18.8-931-2 | OK | OK | OK | >> | 3.4.4-rc1-pre | 2.6.18.8-931-2 | OK | OK | OK | >> | 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK | OK | OK | >> | 4.1 @ 22155 | 2.6.18.8-931-2 | OK | FAIL | OK | >> | 4.1 @ 22155 | 2.6.32.23-g41a85de5 | OK | FAIL | OK | >> +---------------------------------------------------------------+ >> >> The failures were verified on at least 2 machines of identical >> specification. >> >> The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb >> under Xen 4.1. > I'm back at this, and the problem still exists with a 4.1.1/3.0.4 stack. > > Konrad, in the "offline raid" thread you asked for the following debug > information: > > http://www.theshore.net/~caker/xen/BUGS/offline-raid/ > > The sysrq-t.txt and triple-a-star.txt outputs are after I got the raid > card to hang up (but before it timed out and started spewing to the > console). > > Oddly, lspci shows three devices assigned IRQ 16, however > /proc/interrupts only lists two of them. Side effect of MSI? > > Also, the problem still happens even with MSI disabled (pci=nomsi). > > Thanks, > -Chris This is almost certainly the bug to do with not ack'ing a migrating line level interrupt which I fixed in c/s 23145:1092a143ef9d. Try applying that patch, or just running from the tip of http://xenbits.xen.org/hg/xen-4.1-testing.hg/ ~Andrew > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel