xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* megasas stops I/O when running kernel as dom0 under xen4.1/4.2
@ 2011-08-11 13:59 Andreas Olsowski
  2011-08-11 16:27 ` Simon Rowe
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Andreas Olsowski @ 2011-08-11 13:59 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com


[-- Attachment #1.1: Type: text/plain, Size: 3359 bytes --]

Hello xen-devel,

as one of the people using Dell Servers i am aware that the LSI megaraid 
drivers are quite old in the current 2.6.32 pvops tree,
but it seems that, once again, i have run into problems that are more 
rare than the usual "cant find disk" issues. (Of which i had none, ever)


The situation:
--------------
I have 2 dom0 kernels, 2.6.32.44 and 3.0.1 that work fine when booted 
bare-metal. I can run stress -m 40 -d 4 -i 1 for hours on end without 
any error occuring.
The 2.6.32.44 kernels use version 00.00.05.30 megasas modules.

When i boot that kernel on my R610 servers under xen (4.1 and 4.2) the 
kernels work fine too. I create 10 virtual machines, each running 4 
"stress -m 40" and can do disk i/o on my local storage as much as i want to.

But on my Dell R710 system things dont look so good.
Booted bare-metal, both kernels work fine.
When i boot them as dom0 under xen, everything seems to be okay at first.
Then i create my 10 virtual machines that put some load on the memory.
And as soon as i do i/o to the local disk, even a "ls /usr/src/" can 
suffice, i/o freezes, the system stops to respond to anything that 
requires disk acccess.
After a while the kernel will start spewing out error messages:

#### lots of these
sd 0:2:0:0: [sda] megasas: RESET -83318 cmd=2a retries=0
megaraid_sas: HBA reset handler invoked without an internal reset condition.
megasas: [ 0]waiting for 16 commands to complete
megaraid_sas: no more pending commands remain after reset handling.
megasas: reset successful
###

### then some of these
sd 0:2:0:0: Device offlined - not ready after error recovery
###

### goes on to
sd 0:2:0:0: [sda] Unhandled error code
sd 0:2:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
sd 0:2:0:0: [sda] CDB: Write(10): 2a 00 08 45 6f 00 00 01 88 00
end_request: I/O error, dev sda, sector 138768128
Buffer I/O error on device sda2, logical block 5138912
lost page write due to I/O error on sda2
Buffer I/O error on device sda2, logical block 5138913
###

### and finally these, as often as one tries to access the disk
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device


If a kernel works fine on one set of servers (Dell R610 with LSI Logic / 
Symbios Logic LSI MegaSAS 9260 (rev 05) raid controllers) and crashes on 
another server (Dell R710 with a LSI Logic / Symbios Logic MegaRAID SAS 
1078 (rev 04) raid controller),
it would seem logical to assume, that the kernel does not support the 
hardware properly.
But when run bare-metal, no errors occur.

I for one ran out of things to try, the R710 worked fine before i 
upgraded its firmware to the most current versions and went from 
xen4.0.1 to xen4.1/4.2.

So i put it to you, fine sirs of xen-devel:
is it:
A.) a hardware problem, because the software works on different hardware
or
B.) a xen problem, because the hardware runs fine in a non-virtualized 
scenario with the same kernel

Or is it something else entirely?

Help, input, questions and suggestions are, as always, greatly appreciated.


With best regards

-- 
Andreas Olsowski
Leuphana Universität Lüneburg
Rechen- und Medienzentrum
Scharnhorststraße 1, C7.015
21335 Lüneburg

Tel: ++49 4131 677 1309


[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 6595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-08-30 12:46 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-11 13:59 megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Andreas Olsowski
2011-08-11 16:27 ` Simon Rowe
2011-08-11 22:51   ` Konrad Rzeszutek Wilk
2011-08-12  6:31     ` xen.frontend flag for higher display resolution (vnc) for HVM domU domains Mark Schneider
2011-08-12  7:26       ` Marc - A. Dahlhaus
2011-08-12  7:42     ` megasas stops I/O when running kernel as dom0 under xen4.1/4.2 Simon Rowe
2011-08-12  9:11     ` Andreas Olsowski
2011-08-12  9:23       ` Simon Rowe
2011-08-15 10:49       ` Simon Rowe
2011-08-15 12:52         ` Andreas Olsowski
2011-08-19 12:28           ` Andrew Cooper
2011-08-19 14:17             ` Andreas Olsowski
2011-08-19 14:57               ` Andrew Cooper
2011-08-19 16:37                 ` Andreas Olsowski
2011-08-19 16:49                   ` Andrew Cooper
2011-08-19 18:10                     ` Andreas Olsowski
2011-08-22  9:05                       ` Andrew Cooper
2011-08-24 12:06                         ` Andrew Cooper
2011-08-24 16:57                           ` Andrew Cooper
2011-08-24 17:09                             ` Konrad Rzeszutek Wilk
2011-08-24 17:20                               ` Andrew Cooper
2011-08-26 18:16                                 ` Andrew Cooper
2011-08-26 18:32                                   ` Andrew Cooper
2011-08-30 12:02                                     ` Andreas Olsowski
2011-08-30 12:11                                       ` Andrew Cooper
2011-08-30 12:46                                         ` Keir Fraser
2011-08-12  9:02 ` Simon Rowe
2011-08-12 16:26   ` Pasi Kärkkäinen
2011-08-15  7:44     ` Simon Rowe
2011-08-12 16:25 ` Pasi Kärkkäinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).