xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
@ 2010-10-11 21:44 Christopher S. Aker
  2010-11-21 16:55 ` gianfi
  2011-09-27 18:13 ` Christopher S. Aker
  0 siblings, 2 replies; 6+ messages in thread
From: Christopher S. Aker @ 2010-10-11 21:44 UTC (permalink / raw)
  To: xen devel

In an effort to fix the problem described in my previous xen-devel post 
("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we've 
come across another problem.  3ware 9690SA cards to not behave under Xen 
4.1 (as of cs 22155).

We have a simple Xen thrash test suite which fires up domUs that do 
different workloads (some swap thrash, some kernel build, some spin 
CPUs, some cycle rebooting, etc).  Almost immediately after launching 
the suite we can get the 3ware 9690SA card to fail with something like 
the following:

sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting 
card.
sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting card.
sd 0:0:0:0: rejecting I/O to offline device
sd 0:0:0:0: rejecting I/O to offline device

Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so:

http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt

Results matrix:

+---------------------------------------------------------------+
| Xen           | Dom0                | 9550SXU | 9690SA | 9750 |
+---------------------------------------------------------------+
| 3.4.1         | 2.6.18.8-931-2      | OK      | OK     | OK   |
| 3.4.4-rc1-pre | 2.6.18.8-931-2      | OK      | OK     | OK   |
| 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK      | OK     | OK   |
| 4.1 @ 22155   | 2.6.18.8-931-2      | OK      | FAIL   | OK   |
| 4.1 @ 22155   | 2.6.32.23-g41a85de5 | OK      | FAIL   | OK   |
+---------------------------------------------------------------+

The failures were verified on at least 2 machines of identical 
specification.

The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb 
under Xen 4.1.

-Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
  2010-10-11 21:44 Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device Christopher S. Aker
@ 2010-11-21 16:55 ` gianfi
  2010-11-22 16:37   ` Konrad Rzeszutek Wilk
  2011-09-27 18:13 ` Christopher S. Aker
  1 sibling, 1 reply; 6+ messages in thread
From: gianfi @ 2010-11-21 16:55 UTC (permalink / raw)
  To: xen-devel


Hello,
i can confirm the same behaviour on xen 4.0.1, with a 3ware 9690SA card,
triggered by heavy I/O load. Does anybody know a possible workaround for the
issue? 
Thank you very much.
-- 
View this message in context: http://xen.1045712.n5.nabble.com/Xen-4-1-3ware-9690SA-rejecting-I-O-to-offline-device-tp3208156p3274461.html
Sent from the Xen - Dev mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
  2010-11-21 16:55 ` gianfi
@ 2010-11-22 16:37   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 6+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-11-22 16:37 UTC (permalink / raw)
  To: gianfi; +Cc: xen-devel

On Sun, Nov 21, 2010 at 08:55:18AM -0800, gianfi wrote:
> 
> Hello,
> i can confirm the same behaviour on xen 4.0.1, with a 3ware 9690SA card,

Uhh, can you refer to the thread that explains "same behaviour"?
> triggered by heavy I/O load. Does anybody know a possible workaround for the
> issue? 

It might be related to  "pci-passthrough in pvops causing offline raid" - look
in that e-mail thread and I posted a list of things I would like you to do
so that we can get to the bottom of this.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
  2010-10-11 21:44 Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device Christopher S. Aker
  2010-11-21 16:55 ` gianfi
@ 2011-09-27 18:13 ` Christopher S. Aker
  2011-09-27 18:22   ` Andrew Cooper
  1 sibling, 1 reply; 6+ messages in thread
From: Christopher S. Aker @ 2011-09-27 18:13 UTC (permalink / raw)
  To: xen devel; +Cc: Konrad Rzeszutek Wilk

On 10/11/10 5:44 PM, Christopher S. Aker wrote:
> In an effort to fix the problem described in my previous xen-devel post
> ("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we've
> come across another problem. 3ware 9690SA cards to not behave under Xen
> 4.1 (as of cs 22155).
>
> We have a simple Xen thrash test suite which fires up domUs that do
> different workloads (some swap thrash, some kernel build, some spin
> CPUs, some cycle rebooting, etc). Almost immediately after launching the
> suite we can get the 3ware 9690SA card to fail with something like the
> following:
>
> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting
> card.
> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting
> card.
> sd 0:0:0:0: rejecting I/O to offline device
> sd 0:0:0:0: rejecting I/O to offline device
>
> Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so:
>
> http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt
>
> Results matrix:
>
> +---------------------------------------------------------------+
> | Xen           | Dom0                | 9550SXU | 9690SA | 9750 |
> +---------------------------------------------------------------+
> | 3.4.1         | 2.6.18.8-931-2      | OK      | OK     | OK   |
> | 3.4.4-rc1-pre | 2.6.18.8-931-2      | OK      | OK     | OK   |
> | 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK      | OK     | OK   |
> | 4.1 @ 22155   | 2.6.18.8-931-2      | OK      | FAIL   | OK   |
> | 4.1 @ 22155   | 2.6.32.23-g41a85de5 | OK      | FAIL   | OK   |
> +---------------------------------------------------------------+
>
> The failures were verified on at least 2 machines of identical
> specification.
>
> The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb
> under Xen 4.1.

I'm back at this, and the problem still exists with a 4.1.1/3.0.4 stack.

Konrad, in the "offline raid" thread you asked for the following debug 
information:

http://www.theshore.net/~caker/xen/BUGS/offline-raid/

The sysrq-t.txt and triple-a-star.txt outputs are after I got the raid 
card to hang up (but before it timed out and started spewing to the 
console).

Oddly, lspci shows three devices assigned IRQ 16, however 
/proc/interrupts only lists two of them.  Side effect of MSI?

Also, the problem still happens even with MSI disabled (pci=nomsi).

Thanks,
-Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
  2011-09-27 18:13 ` Christopher S. Aker
@ 2011-09-27 18:22   ` Andrew Cooper
  2011-09-27 19:33     ` Christopher S. Aker
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2011-09-27 18:22 UTC (permalink / raw)
  To: xen-devel

On 27/09/2011 19:13, Christopher S. Aker wrote:
> On 10/11/10 5:44 PM, Christopher S. Aker wrote:
>> In an effort to fix the problem described in my previous xen-devel post
>> ("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we've
>> come across another problem. 3ware 9690SA cards to not behave under Xen
>> 4.1 (as of cs 22155).
>>
>> We have a simple Xen thrash test suite which fires up domUs that do
>> different workloads (some swap thrash, some kernel build, some spin
>> CPUs, some cycle rebooting, etc). Almost immediately after launching the
>> suite we can get the 3ware 9690SA card to fail with something like the
>> following:
>>
>> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting
>> card.
>> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting
>> card.
>> sd 0:0:0:0: rejecting I/O to offline device
>> sd 0:0:0:0: rejecting I/O to offline device
>>
>> Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so:
>>
>> http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt
>>
>> Results matrix:
>>
>> +---------------------------------------------------------------+
>> | Xen           | Dom0                | 9550SXU | 9690SA | 9750 |
>> +---------------------------------------------------------------+
>> | 3.4.1         | 2.6.18.8-931-2      | OK      | OK     | OK   |
>> | 3.4.4-rc1-pre | 2.6.18.8-931-2      | OK      | OK     | OK   |
>> | 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK      | OK     | OK   |
>> | 4.1 @ 22155   | 2.6.18.8-931-2      | OK      | FAIL   | OK   |
>> | 4.1 @ 22155   | 2.6.32.23-g41a85de5 | OK      | FAIL   | OK   |
>> +---------------------------------------------------------------+
>>
>> The failures were verified on at least 2 machines of identical
>> specification.
>>
>> The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb
>> under Xen 4.1.
> I'm back at this, and the problem still exists with a 4.1.1/3.0.4 stack.
>
> Konrad, in the "offline raid" thread you asked for the following debug 
> information:
>
> http://www.theshore.net/~caker/xen/BUGS/offline-raid/
>
> The sysrq-t.txt and triple-a-star.txt outputs are after I got the raid 
> card to hang up (but before it timed out and started spewing to the 
> console).
>
> Oddly, lspci shows three devices assigned IRQ 16, however 
> /proc/interrupts only lists two of them.  Side effect of MSI?
>
> Also, the problem still happens even with MSI disabled (pci=nomsi).
>
> Thanks,
> -Chris

This is almost certainly the bug to do with not ack'ing a migrating line
level interrupt which I fixed in c/s 23145:1092a143ef9d.  Try applying
that patch, or just running from the tip of
http://xenbits.xen.org/hg/xen-4.1-testing.hg/

~Andrew

>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device
  2011-09-27 18:22   ` Andrew Cooper
@ 2011-09-27 19:33     ` Christopher S. Aker
  0 siblings, 0 replies; 6+ messages in thread
From: Christopher S. Aker @ 2011-09-27 19:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Konrad Rzeszutek Wilk

On 9/27/11 2:22 PM, Andrew Cooper wrote:
> This is almost certainly the bug to do with not ack'ing a migrating line
> level interrupt which I fixed in c/s 23145:1092a143ef9d.  Try applying
> that patch, or just running from the tip of
> http://xenbits.xen.org/hg/xen-4.1-testing.hg/

That was it!  You're a champion.

Thanks,
-Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-09-27 19:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-11 21:44 Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device Christopher S. Aker
2010-11-21 16:55 ` gianfi
2010-11-22 16:37   ` Konrad Rzeszutek Wilk
2011-09-27 18:13 ` Christopher S. Aker
2011-09-27 18:22   ` Andrew Cooper
2011-09-27 19:33     ` Christopher S. Aker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).