* Kernel 3.7.[12] - irq 16: nobody cared
@ 2013-01-15 3:27 Steven Haigh
2013-01-15 15:23 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Steven Haigh @ 2013-01-15 3:27 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 13392 bytes --]
Hi all,
Firstly, please include me in any replies as I am not a list subscriber.
I'm trying to nail down a problem using Xen 4.2.1 & Kernel 3.7.1 (also
3.7.2). It seems at random periods of time I get the following via the
syslog:
Message from syslogd@xenhost at Jan 15 09:02:36 ...
kernel:Disabling IRQ #16
Looking at IRQ16:
[root@xenhost xen]# cat /proc/interrupts | grep 16
16: 1900000 xen-pirq-ioapic-level sata_mv
I also see this in the dmesg:
irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper/0 Not tainted 3.7.2-1.el6xen.x86_64 #1
Call Trace:
<IRQ> [<ffffffff810a77f2>] __report_bad_irq+0x3a/0xc6
[<ffffffff810a79e7>] note_interrupt+0x169/0x1e5
[<ffffffff810a59b7>] handle_irq_event_percpu+0x16e/0x1b6
[<ffffffff810a5a37>] handle_irq_event+0x38/0x54
[<ffffffff810a8199>] handle_fasteoi_irq+0x88/0xd5
[<ffffffff812c23f5>] __xen_evtchn_do_upcall+0x15a/0x1f7
[<ffffffff812c3707>] xen_evtchn_do_upcall+0x2f/0x42
[<ffffffff814a44be>] xen_do_hypervisor_callback+0x1e/0x30
<EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[<ffffffff81007047>] ? xen_safe_halt+0x10/0x1a
[<ffffffff810169b1>] ? default_idle+0x50/0x8a
[<ffffffff81016318>] ? cpu_idle+0xc0/0xff
[<ffffffff8148160e>] ? rest_init+0x72/0x74
[<ffffffff81745b22>] ? start_kernel+0x3b0/0x3bd
[<ffffffff817455a7>] ? repair_env_string+0x58/0x58
[<ffffffff817452dd>] ? x86_64_start_reservations+0xb8/0xbd
[<ffffffff81748cad>] ? xen_start_kernel+0x4f2/0x4f4
handlers:
[<ffffffffa012edd9>] mv_interrupt [sata_mv]
Disabling IRQ #16
I have tried booting with the irqpoll option on the kernel boot line,
but the same problem occurs.
It seems disk throughput almost drops dead when this happens - as the
SATA controller seems to go into some different mode of operation. It
also seems like this has only happened recently - I was using builds of
3.6.x as my Xen Dom0 kernel with no signs of this problem.
Has anyone else seen this in recent kernel releases? I'm not quite sure
how to try and track this down.
Some system specs follow:
# dmidecode 2.11
SMBIOS 2.7 present.
75 structures occupying 3098 bytes.
Table at 0x000EB420.
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: American Megatrends Inc.
Version: U1f
Release Date: 06/13/2012
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 4096 kB
Characteristics:
PCI is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
BIOS ROM is socketed
EDD is supported
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Targeted content distribution is supported
UEFI is supported
BIOS Revision: 4.6
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: To be filled by O.E.M.
Version: To be filled by O.E.M.
Serial Number: To be filled by O.E.M.
UUID: 03E50250-0449-054D-4A06-F60700080009
Wake-up Type: Power Switch
SKU Number: To be filled by O.E.M.
Family: To be filled by O.E.M.
Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: Z68M-D2H
Version: To be filled by O.E.M.
Serial Number: To be filled by O.E.M.
Asset Tag: To be filled by O.E.M.
Features:
Board is a hosting board
Board is replaceable
Location In Chassis: To be filled by O.E.M.
Chassis Handle: 0x0003
Type: Motherboard
Contained Object Handles: 0
Handle 0x0003, DMI type 3, 22 bytes
Chassis Information
Manufacturer: Gigabyte Technology Co., Ltd.
Type: Desktop
Lock: Not Present
Version: To Be Filled By O.E.M.
Serial Number: To Be Filled By O.E.M.
Asset Tag: To Be Filled By O.E.M.
Boot-up State: Safe
Power Supply State: Safe
Thermal State: Safe
Security Status: None
OEM Information: 0x00000000
Height: Unspecified
Number Of Power Cords: 1
Contained Elements: 0
SKU Number: To be filled by O.E.M.
Handle 0x0004, DMI type 7, 19 bytes
Cache Information
Socket Designation: CPU Internal L1
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Through
Location: Internal
Installed Size: 128 kB
Maximum Size: 128 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Parity
System Type: Other
Associativity: 16-way Set-associative
Handle 0x0005, DMI type 7, 19 bytes
Cache Information
Socket Designation: CPU Internal L2
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Through
Location: Internal
Installed Size: 1024 kB
Maximum Size: 1024 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Multi-bit ECC
System Type: Instruction
Associativity: 16-way Set-associative
Handle 0x0006, DMI type 7, 19 bytes
Cache Information
Socket Designation: CPU Internal L3
Configuration: Enabled, Not Socketed, Level 3
Operational Mode: Write Back
Location: Internal
Installed Size: 6144 kB
Maximum Size: 6144 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Multi-bit ECC
System Type: Instruction
Associativity: 48-way Set-associative
... snip a bit ...
Handle 0x0020, DMI type 9, 17 bytes
System Slot Information
Designation: J6B2
Type: x16 PCI Express
Current Usage: In Use
Length: Long
ID: 0
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:00:02.0
Handle 0x0021, DMI type 9, 17 bytes
System Slot Information
Designation: J6B1
Type: x1 PCI Express
Current Usage: In Use
Length: Short
ID: 1
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:00:1c.0
Handle 0x0022, DMI type 9, 17 bytes
System Slot Information
Designation: J6D1
Type: x8 PCI Express
Current Usage: In Use
Length: Short
ID: 2
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:00:01.0
Handle 0x0023, DMI type 9, 17 bytes
System Slot Information
Designation: J7B1
Type: x16 PCI Express
Current Usage: In Use
Length: Short
ID: 3
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:00:03.0
Handle 0x0024, DMI type 9, 17 bytes
System Slot Information
Designation: J8B4
Type: x1 PCI Express
Current Usage: In Use
Length: Short
ID: 4
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:00:1c.7
Handle 0x0025, DMI type 9, 17 bytes
System Slot Information
Designation: J8B3
Type: 32-bit PCI
Current Usage: In Use
Length: Short
ID: 6
Characteristics:
3.3 V is provided
Opening is shared
PME signal is supported
Bus Address: 0000:14:1e.0
... snip a bit more ....
Handle 0x0043, DMI type 4, 42 bytes
Processor Information
Socket Designation: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
Type: Central Processor
Family: Core i7
Manufacturer: Intel
ID: A7 06 02 00 FF FB EB BF
Signature: Type 0, Family 6, Model 42, Stepping 7
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
DS (Debug store)
ACPI (ACPI supported)
MMX (MMX technology supported)
FXSR (FXSAVE and FXSTOR instructions supported)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
SS (Self-snoop)
HTT (Multi-threading)
TM (Thermal monitor supported)
PBE (Pending break enabled)
Version: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
Voltage: 1.2 V
External Clock: 100 MHz
Max Speed: 7000 MHz
Current Speed: 3700 MHz
Status: Populated, Enabled
Upgrade: Other
L1 Cache Handle: 0x0004
L2 Cache Handle: 0x0005
L3 Cache Handle: 0x0006
Serial Number: Not Specified
Asset Tag: Fill By OEM
Part Number: Fill By OEM
Core Count: 4
Core Enabled: 1
Characteristics:
64-bit capable
... end
# lspci
00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor
Family DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200/2nd Generation Core
Processor Family PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core
Processor Family Integrated Graphics Controller (rev 09)
00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series
Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset
Family USB Enhanced Host Controller #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset
Family PCI Express Root Port 1 (rev b5)
00:1c.6 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset
Family PCI Express Root Port 7 (rev b5)
00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset
Family USB Enhanced Host Controller #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation Z68 Express Chipset Family LPC
Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset
Family SATA AHCI Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family
SMBus Controller (rev 05)
01:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SX7042
PCI-e 4-port SATA-II (rev 02)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
Disks are configured as such:
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda1[1] sdb1[0]
204788 blocks super 1.0 [2/2] [UU]
md2 : active raid6 sdc[5] sde[1] sdf[4] sdd[0]
3907026688 blocks super 1.2 level 6, 128k chunk, algorithm 2
[4/4] [UUUU]
md1 : active raid1 sdb2[0] sda2[1]
77942716 blocks super 1.1 [2/2] [UU]
--
Steven Haigh
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4965 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel 3.7.[12] - irq 16: nobody cared
2013-01-15 3:27 Kernel 3.7.[12] - irq 16: nobody cared Steven Haigh
@ 2013-01-15 15:23 ` Jan Beulich
2013-01-15 17:15 ` Steven Haigh
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2013-01-15 15:23 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
>>> On 15.01.13 at 04:27, Steven Haigh <netwiz@crc.id.au> wrote:
> irq 16: nobody cared (try booting with the "irqpoll" option)
> Pid: 0, comm: swapper/0 Not tainted 3.7.2-1.el6xen.x86_64 #1
> Call Trace:
> <IRQ> [<ffffffff810a77f2>] __report_bad_irq+0x3a/0xc6
> [<ffffffff810a79e7>] note_interrupt+0x169/0x1e5
> [<ffffffff810a59b7>] handle_irq_event_percpu+0x16e/0x1b6
> [<ffffffff810a5a37>] handle_irq_event+0x38/0x54
> [<ffffffff810a8199>] handle_fasteoi_irq+0x88/0xd5
> [<ffffffff812c23f5>] __xen_evtchn_do_upcall+0x15a/0x1f7
> [<ffffffff812c3707>] xen_evtchn_do_upcall+0x2f/0x42
> [<ffffffff814a44be>] xen_do_hypervisor_callback+0x1e/0x30
> <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [<ffffffff81007047>] ? xen_safe_halt+0x10/0x1a
> [<ffffffff810169b1>] ? default_idle+0x50/0x8a
> [<ffffffff81016318>] ? cpu_idle+0xc0/0xff
> [<ffffffff8148160e>] ? rest_init+0x72/0x74
> [<ffffffff81745b22>] ? start_kernel+0x3b0/0x3bd
> [<ffffffff817455a7>] ? repair_env_string+0x58/0x58
> [<ffffffff817452dd>] ? x86_64_start_reservations+0xb8/0xbd
> [<ffffffff81748cad>] ? xen_start_kernel+0x4f2/0x4f4
> handlers:
> [<ffffffffa012edd9>] mv_interrupt [sata_mv]
> Disabling IRQ #16
>
> I have tried booting with the irqpoll option on the kernel boot line,
> but the same problem occurs.
>
> It seems disk throughput almost drops dead when this happens - as the
> SATA controller seems to go into some different mode of operation. It
> also seems like this has only happened recently - I was using builds of
> 3.6.x as my Xen Dom0 kernel with no signs of this problem.
>
> Has anyone else seen this in recent kernel releases? I'm not quite sure
> how to try and track this down.
First of all, you'll want to clarify whether this problem is present
_only_ when running under Xen, or also when running the same
kernel without Xen underneath. This is primarily because the
output you provided shows that IRQ 16 actually has a handler,
just that it apparently ignores the interrupts (and that's nothing
that Xen controls).
Then, if this is a Xen-only problem, you will want to provide full
hypervisor and kernel (boot) logs, the hypervisor one including
debug key 'i' output, and the kernel one once with and once
without Xen.
Finally you'll want to clarify whether, when updating the kernel,
you also updated the hypervisor (and if so, try the know good
and known bad kernels on identical hypervisors).
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel 3.7.[12] - irq 16: nobody cared
2013-01-15 15:23 ` Jan Beulich
@ 2013-01-15 17:15 ` Steven Haigh
2013-01-16 9:42 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Steven Haigh @ 2013-01-15 17:15 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
Hi Jan,
On 16/01/2013 2:23 AM, Jan Beulich wrote:
>>>> On 15.01.13 at 04:27, Steven Haigh <netwiz@crc.id.au> wrote:
>> irq 16: nobody cared (try booting with the "irqpoll" option)
>> Pid: 0, comm: swapper/0 Not tainted 3.7.2-1.el6xen.x86_64 #1
>> Call Trace:
>> <IRQ> [<ffffffff810a77f2>] __report_bad_irq+0x3a/0xc6
>> [<ffffffff810a79e7>] note_interrupt+0x169/0x1e5
>> [<ffffffff810a59b7>] handle_irq_event_percpu+0x16e/0x1b6
>> [<ffffffff810a5a37>] handle_irq_event+0x38/0x54
>> [<ffffffff810a8199>] handle_fasteoi_irq+0x88/0xd5
>> [<ffffffff812c23f5>] __xen_evtchn_do_upcall+0x15a/0x1f7
>> [<ffffffff812c3707>] xen_evtchn_do_upcall+0x2f/0x42
>> [<ffffffff814a44be>] xen_do_hypervisor_callback+0x1e/0x30
>> <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [<ffffffff81007047>] ? xen_safe_halt+0x10/0x1a
>> [<ffffffff810169b1>] ? default_idle+0x50/0x8a
>> [<ffffffff81016318>] ? cpu_idle+0xc0/0xff
>> [<ffffffff8148160e>] ? rest_init+0x72/0x74
>> [<ffffffff81745b22>] ? start_kernel+0x3b0/0x3bd
>> [<ffffffff817455a7>] ? repair_env_string+0x58/0x58
>> [<ffffffff817452dd>] ? x86_64_start_reservations+0xb8/0xbd
>> [<ffffffff81748cad>] ? xen_start_kernel+0x4f2/0x4f4
>> handlers:
>> [<ffffffffa012edd9>] mv_interrupt [sata_mv]
>> Disabling IRQ #16
>>
>> I have tried booting with the irqpoll option on the kernel boot line,
>> but the same problem occurs.
>>
>> It seems disk throughput almost drops dead when this happens - as the
>> SATA controller seems to go into some different mode of operation. It
>> also seems like this has only happened recently - I was using builds of
>> 3.6.x as my Xen Dom0 kernel with no signs of this problem.
>>
>> Has anyone else seen this in recent kernel releases? I'm not quite sure
>> how to try and track this down.
> First of all, you'll want to clarify whether this problem is present
> _only_ when running under Xen, or also when running the same
> kernel without Xen underneath. This is primarily because the
> output you provided shows that IRQ 16 actually has a handler,
> just that it apparently ignores the interrupts (and that's nothing
> that Xen controls).
I'm not 100% sure how to do this. I haven't been able to find a method
to cause the problem to happen... It just does - and it seems random
when it does happen. Part of the problem with running the system without
the hypervisor in place is that I can't replicate any kind of workload
that would normally trigger the problem.
> Then, if this is a Xen-only problem, you will want to provide full
> hypervisor and kernel (boot) logs, the hypervisor one including
> debug key 'i' output, and the kernel one once with and once
> without Xen.
>
> Finally you'll want to clarify whether, when updating the kernel,
> you also updated the hypervisor (and if so, try the know good
> and known bad kernels on identical hypervisors).
I have been running Xen 4.2.1 for a while - and used multiple kernel
versions with it. Sadly, I don't have an archive of the RPMs that I used
(even though I built them!). I've only really noticed this happening in
the last month - when I've been running kernel 3.7.1+
On the off chance today, I have moved the card from one 16x PCIe slot to
the second one on the mainboard. This has moved the card from IRQ16 to
IRQ19. As of yet, I haven't had the problem occur - however as it is a
seemingly random occurrence, there is no guarantee that the problem is
solved. I've tried loading up the i/o by doing a resync of the RAID6 (of
which, 2 drives are on the sata_mv card) as well as hammering i/o in the
DomUs (rather random stuff), but still no reliable way to force the
problem to occur :(
I'm open to any suggestions :)
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel 3.7.[12] - irq 16: nobody cared
2013-01-15 17:15 ` Steven Haigh
@ 2013-01-16 9:42 ` Jan Beulich
2013-01-16 9:54 ` Steven Haigh
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2013-01-16 9:42 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
>>> On 15.01.13 at 18:15, Steven Haigh <netwiz@crc.id.au> wrote:
> I'm not 100% sure how to do this. I haven't been able to find a method
> to cause the problem to happen... It just does - and it seems random
> when it does happen. Part of the problem with running the system without
> the hypervisor in place is that I can't replicate any kind of workload
> that would normally trigger the problem.
That's pretty odd - there need to be almost 100,000 unhandled
interrupts within a tenth of a second, so there _must_ be
something triggering this if the device is otherwise working fine.
You're not by chance passing through to a guest any other
device using the same IRQ?
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel 3.7.[12] - irq 16: nobody cared
2013-01-16 9:42 ` Jan Beulich
@ 2013-01-16 9:54 ` Steven Haigh
2013-01-16 10:05 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Steven Haigh @ 2013-01-16 9:54 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 7299 bytes --]
On 16/01/2013 8:42 PM, Jan Beulich wrote:
>>>> On 15.01.13 at 18:15, Steven Haigh <netwiz@crc.id.au> wrote:
>> I'm not 100% sure how to do this. I haven't been able to find a method
>> to cause the problem to happen... It just does - and it seems random
>> when it does happen. Part of the problem with running the system without
>> the hypervisor in place is that I can't replicate any kind of workload
>> that would normally trigger the problem.
> That's pretty odd - there need to be almost 100,000 unhandled
> interrupts within a tenth of a second, so there _must_ be
> something triggering this if the device is otherwise working fine.
>
> You're not by chance passing through to a guest any other
> device using the same IRQ?
Hi Jan,
I don't pass any devices at all to any DomU's. All guests are PV Linux
systems, all EL6. The only thing each DomU has is a disk, a network
interface, and 2 x vcpus.
So far, I have:
# uptime
20:50:40 up 1 day, 1:11, 1 user, load average: 0.36, 0.17, 0.13
As I mentioned, I moved the sata card to the second 16x PCIe slot in the
mainboard - which changed the IRQ from 16 to 19. Currently I see:
# grep sata_mv /proc/interrupts
19: 21243495 xen-pirq-ioapic-level sata_mv
Which is interestingly more than the onboard SATA ports:
# grep ahci /proc/interrupts
50: 9004117 xen-pirq-msi ahci
I'm not sure if this will give any further info:
# xm dmesg
__ __ _ _ ____ _ _ _ __
\ \/ /___ _ __ | || | |___ \ / | / | ___| |/ /_
\ // _ \ '_ \ | || |_ __) | | |__| | / _ \ | '_ \
/ \ __/ | | | |__ _| / __/ _| |__| || __/ | (_) |
/_/\_\___|_| |_| |_|(_)_____(_)_| |_(_)___|_|\___/
(XEN) Xen version 4.2.1 (mockbuild@crc.id.au) (gcc (GCC) 4.4.6 20120305
(Red Hat 4.4.6-4)) Wed Dec 19 01:32:40 EST 2012
(XEN) Latest ChangeSet: unavailable
(XEN) Bootloader: GNU GRUB 0.97
(XEN) Command line: dom0_mem=1024M cpufreq=xen dom0_max_vcpus=1
dom0_vcpus_pin
(XEN) Video information:
(XEN) VGA is text mode 80x25, font 8x16
(XEN) VBE/DDC methods: none; EDID transfer time: 0 seconds
(XEN) EDID info not retrieved because no DDC retrieval method detected
(XEN) Disc information:
(XEN) Found 2 MBR signatures
(XEN) Found 3 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN) 0000000000000000 - 000000000009d800 (usable)
(XEN) 000000000009d800 - 00000000000a0000 (reserved)
(XEN) 00000000000e0000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 0000000020000000 (usable)
(XEN) 0000000020000000 - 0000000020200000 (reserved)
(XEN) 0000000020200000 - 0000000040000000 (usable)
(XEN) 0000000040000000 - 0000000040200000 (reserved)
(XEN) 0000000040200000 - 00000000dbb1b000 (usable)
(XEN) 00000000dbb1b000 - 00000000dc3c7000 (reserved)
(XEN) 00000000dc3c7000 - 00000000dc647000 (ACPI NVS)
(XEN) 00000000dc647000 - 00000000dc64c000 (ACPI data)
(XEN) 00000000dc64c000 - 00000000dc68f000 (ACPI NVS)
(XEN) 00000000dc68f000 - 00000000dcdca000 (usable)
(XEN) 00000000dcdca000 - 00000000dcfdd000 (reserved)
(XEN) 00000000dcfdd000 - 00000000dd000000 (usable)
(XEN) 00000000dd800000 - 00000000dfa00000 (reserved)
(XEN) 00000000f8000000 - 00000000fc000000 (reserved)
(XEN) 00000000fec00000 - 00000000fec01000 (reserved)
(XEN) 00000000fed00000 - 00000000fed04000 (reserved)
(XEN) 00000000fed1c000 - 00000000fed20000 (reserved)
(XEN) 00000000fee00000 - 00000000fee01000 (reserved)
(XEN) 00000000ff000000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 000000021f600000 (usable)
(XEN) ACPI: RSDP 000F0490, 0024 (r2 ALASKA)
(XEN) ACPI: XSDT DC629070, 0064 (r1 ALASKA A M I 1072009 AMI 10013)
(XEN) ACPI: FACP DC632928, 00F4 (r4 ALASKA A M I 1072009 AMI 10013)
(XEN) ACPI: DSDT DC629170, 97B8 (r2 ALASKA A M I 12 INTL 20051117)
(XEN) ACPI: FACS DC645F80, 0040
(XEN) ACPI: APIC DC632A20, 0072 (r3 ALASKA A M I 1072009 AMI 10013)
(XEN) ACPI: MCFG DC632A98, 003C (r1 ALASKA A M I 1072009 MSFT 97)
(XEN) ACPI: HPET DC632AD8, 0038 (r1 ALASKA A M I 1072009 AMI. 5)
(XEN) ACPI: SSDT DC632B10, 036D (r1 SataRe SataTabl 1000 INTL 20091112)
(XEN) ACPI: SSDT DC632E80, 09AA (r1 PmRef Cpu0Ist 3000 INTL 20051117)
(XEN) ACPI: SSDT DC633830, 0A92 (r1 PmRef CpuPm 3000 INTL 20051117)
(XEN) ACPI: MATS DC6342C8, 0034 (r2 ALASKA A M I 2 wx2 0)
(XEN) System RAM: 8116MB (8310872kB)
(XEN) Domain heap initialised
(XEN) ACPI: 32/64X FACS address mismatch in FADT -
dc645f80/0000000000000000, using 32
(XEN) Processor #0 6:10 APIC version 21
(XEN) Processor #2 6:10 APIC version 21
(XEN) Processor #4 6:10 APIC version 21
(XEN) Processor #6 6:10 APIC version 21
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
(XEN) Enabling APIC mode: Flat. Using 1 I/O APICs
(XEN) Table is not found!
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 3303.320 MHz processor.
(XEN) Initing memory sharing.
(XEN) xstate_init: using cntxt_size: 0x340 and states: 0x7
(XEN) I/O virtualisation disabled
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN) -> Using old ACK method
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 16 KiB.
(XEN) VMX: Supported advanced features:
(XEN) - APIC MMIO access virtualisation
(XEN) - APIC TPR shadow
(XEN) - Extended Page Tables (EPT)
(XEN) - Virtual-Processor Identifiers (VPID)
(XEN) - Virtual NMI
(XEN) - MSR direct-access bitmap
(XEN) - Unrestricted Guest
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB
(XEN) Brought up 4 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Xen kernel: 64-bit, lsb, compat32
(XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x1d87000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN) Dom0 alloc.: 0000000210000000->0000000214000000 (236799 pages
to be allocated)
(XEN) Init. ramdisk: 000000021d2ff000->000000021f5ff800
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN) Loaded kernel: ffffffff81000000->ffffffff81d87000
(XEN) Init. ramdisk: ffffffff81d87000->ffffffff84087800
(XEN) Phys-Mach map: ffffffff84088000->ffffffff84288000
(XEN) Start info: ffffffff84288000->ffffffff842884b4
(XEN) Page tables: ffffffff84289000->ffffffff842ae000
(XEN) Boot stack: ffffffff842ae000->ffffffff842af000
(XEN) TOTAL: ffffffff80000000->ffffffff84400000
(XEN) ENTRY ADDRESS: ffffffff81745210
(XEN) Dom0 has maximum 1 VCPUs
(XEN) Scrubbing Free RAM:
......................................................................done.
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
input to Xen)
(XEN) Freed 252kB init memory.
(XEN) no cpu_id for acpi_id 5
(XEN) no cpu_id for acpi_id 6
(XEN) no cpu_id for acpi_id 7
(XEN) no cpu_id for acpi_id 8
--
Steven Haigh
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4965 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel 3.7.[12] - irq 16: nobody cared
2013-01-16 9:54 ` Steven Haigh
@ 2013-01-16 10:05 ` Jan Beulich
2013-01-16 10:13 ` Steven Haigh
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2013-01-16 10:05 UTC (permalink / raw)
To: Steven Haigh; +Cc: xen-devel
>>> On 16.01.13 at 10:54, Steven Haigh <netwiz@crc.id.au> wrote:
> So far, I have:
> # uptime
> 20:50:40 up 1 day, 1:11, 1 user, load average: 0.36, 0.17, 0.13
>
> As I mentioned, I moved the sata card to the second 16x PCIe slot in the
> mainboard - which changed the IRQ from 16 to 19. Currently I see:
> # grep sata_mv /proc/interrupts
> 19: 21243495 xen-pirq-ioapic-level sata_mv
>
> Which is interestingly more than the onboard SATA ports:
> # grep ahci /proc/interrupts
> 50: 9004117 xen-pirq-msi ahci
Whether the former count is too high depends on the I/O amount
going through each controller. Of course it is possible for there to
be spikes that usually don't reach the 99,900 cutoff point, but
once in a while do. Figuring whether that's the case would require
adding a little bit more verbosity to
kernel/irq/spurious.c:note_interrupt(), e.g. to warn when having
reached half the threshold.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Kernel 3.7.[12] - irq 16: nobody cared
2013-01-16 10:05 ` Jan Beulich
@ 2013-01-16 10:13 ` Steven Haigh
0 siblings, 0 replies; 7+ messages in thread
From: Steven Haigh @ 2013-01-16 10:13 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 5610 bytes --]
On 16/01/2013 9:05 PM, Jan Beulich wrote:
>>>> On 16.01.13 at 10:54, Steven Haigh <netwiz@crc.id.au> wrote:
>> So far, I have:
>> # uptime
>> 20:50:40 up 1 day, 1:11, 1 user, load average: 0.36, 0.17, 0.13
>>
>> As I mentioned, I moved the sata card to the second 16x PCIe slot in the
>> mainboard - which changed the IRQ from 16 to 19. Currently I see:
>> # grep sata_mv /proc/interrupts
>> 19: 21243495 xen-pirq-ioapic-level sata_mv
>>
>> Which is interestingly more than the onboard SATA ports:
>> # grep ahci /proc/interrupts
>> 50: 9004117 xen-pirq-msi ahci
> Whether the former count is too high depends on the I/O amount
> going through each controller. Of course it is possible for there to
> be spikes that usually don't reach the 99,900 cutoff point, but
> once in a while do. Figuring whether that's the case would require
> adding a little bit more verbosity to
> kernel/irq/spurious.c:note_interrupt(), e.g. to warn when having
> reached half the threshold.
Interestingly, I just realised I have 3 of the 4 drives in this RAID6 on
the sata_mv card. I did originally think I had 2 drives on the onboard
SATA ports, and the other 2 on the sata_mv card. This would mean 3/4 of
the IO would be going via this card - but only 1/4 on the onboard.
# lsdrv
PCI [ahci] 00:1f.2 SATA controller: Intel Corporation 6 Series/C200
Series Chipset Family SATA AHCI Controller (rev 05)
.scsi 0:0:0:0 ATA ST380815AS {6RAB72DZ}
..sda 74.53g [8:0] Partitioned (dos)
. .sda1 200.00m [8:1] MD raid1 (1/2) (w/ sdb1) in_sync
'localhost.localdomain:0' {9f19116a-d280-8216-cc87-af34eae68242}
. ..md0 199.99m [9:0] MD v1.0 raid1 (2) clean
. . . Partitioned (dos)
{6578dbc0-9e07-4ccc-8eff-15f2a1da8df1}
. . .Mounted as /dev/md0 @ /boot
. .sda2 74.33g [8:2] MD raid1 (1/2) (w/ sdb2) in_sync
'localhost.localdomain:1' {afb92c19-b9b1-e3ae-07af-315d738e38be}
. .md1 74.33g [9:1] MD v1.1 raid1 (2) clean
. . PV LVM2_member 74.33g used, 0 free
{2koqPs-U1IA-9erV-ua4N-mxW1-BhRs-V3mlAH}
. .VG RAID1 74.33g 0 free {HEGjco-Ptil-M5ZG-2qQR-zNo4-3cc5-b9Z3Kj}
. .dm-0 9.77g [253:0] LV xenhost ext4
{d2fa50d5-1a51-4599-9b72-f38f86b8f99e}
. ..Mounted as /dev/mapper/RAID1-xenhost @ /
. .dm-7 64.56g [253:7] LV zeus.vm ext4
{67310780-b15c-47e4-812e-d954aa7d8e3b}
.scsi 1:0:0:0 ATA ST380815AS {6QZ6L9SD}
..sdb 74.53g [8:16] Partitioned (dos)
. .sdb1 200.00m [8:17] MD raid1 (0/2) (w/ sda1) in_sync
'localhost.localdomain:0' {9f19116a-d280-8216-cc87-af34eae68242}
. ..md0 199.99m [9:0] MD v1.0 raid1 (2) clean
. . Partitioned (dos)
{6578dbc0-9e07-4ccc-8eff-15f2a1da8df1}
. .sdb2 74.33g [8:18] MD raid1 (0/2) (w/ sda2) in_sync
'localhost.localdomain:1' {afb92c19-b9b1-e3ae-07af-315d738e38be}
. .md1 74.33g [9:1] MD v1.1 raid1 (2) clean
. PV LVM2_member 74.33g used, 0 free
{2koqPs-U1IA-9erV-ua4N-mxW1-BhRs-V3mlAH}
.scsi 2:x:x:x [Empty]
.scsi 3:0:0:0 ATA ST2000VX000-9YW1 {Z1E10QQJ}
..sdc 1.82t [8:32] MD raid6 (3/4) (w/ sdd,sde,sdf) in_sync
'xenhost.lan.crc.id.au:2' {cd8cc032-4898-fa88-3ba1-af64cf91583b}
. .md2 3.64t [9:2] MD v1.2 raid6,left-sym (4) active, 128k Chunk
. . PV LVM2_member 2.12t used, 1.52t free
{8pyp2G-D268-fqKW-mBvf-wZbI-Qurt-aeTvOh}
. .VG vg_raid6 3.64t 1.52t free {UrqTRc-AozJ-2RDf-qcZB-UdX3-tno9-3KHjjv}
. .dm-6 2.00t [253:6] LV fileshare xfs
{af405459-7569-4d82-82d9-ca27912316c7}
. .dm-3 10.00g [253:3] LV lamp.vm ext4
{67310780-b15c-47e4-812e-d954aa7d8e3b}
. .dm-2 40.00g [253:2] LV mail.vm ext4
{67310780-b15c-47e4-812e-d954aa7d8e3b}
. .dm-4 20.00g [253:4] LV remotedesktop.vm Partitioned (dos)
. .dm-5 2.00g [253:5] LV template.vm ext4
{67310780-b15c-47e4-812e-d954aa7d8e3b}
. .dm-1 50.00g [253:1] LV tsm.vm ext4
{67310780-b15c-47e4-812e-d954aa7d8e3b}
.scsi 4:x:x:x [Empty]
.scsi 5:x:x:x [Empty]
PCI [sata_mv] 04:00.0 SCSI storage controller: Marvell Technology Group
Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02)
.scsi 6:0:0:0 ATA ST2000VX000-9YW1 {Z1E11E7R}
..sdd 1.82t [8:48] MD raid6 (0/4) (w/ sdc,sde,sdf) in_sync
'xenhost.lan.crc.id.au:2' {cd8cc032-4898-fa88-3ba1-af64cf91583b}
. .md2 3.64t [9:2] MD v1.2 raid6,left-sym (4) active, 128k Chunk
. PV LVM2_member 2.12t used, 1.52t free
{8pyp2G-D268-fqKW-mBvf-wZbI-Qurt-aeTvOh}
.scsi 7:x:x:x [Empty]
.scsi 8:0:0:0 ATA ST2000VX000-9YW1 {Z1E0MD58}
..sde 1.82t [8:64] MD raid6 (1/4) (w/ sdc,sdd,sdf) in_sync
'xenhost.lan.crc.id.au:2' {cd8cc032-4898-fa88-3ba1-af64cf91583b}
. .md2 3.64t [9:2] MD v1.2 raid6,left-sym (4) active, 128k Chunk
. PV LVM2_member 2.12t used, 1.52t free
{8pyp2G-D268-fqKW-mBvf-wZbI-Qurt-aeTvOh}
.scsi 9:0:0:0 ATA ST2000VX000-9YW1 {Z1E17C3X}
.sdf 1.82t [8:80] MD raid6 (2/4) (w/ sdc,sdd,sde) in_sync
'xenhost.lan.crc.id.au:2' {cd8cc032-4898-fa88-3ba1-af64cf91583b}
.md2 3.64t [9:2] MD v1.2 raid6,left-sym (4) active, 128k Chunk
PV LVM2_member 2.12t used, 1.52t free
{8pyp2G-D268-fqKW-mBvf-wZbI-Qurt-aeTvOh}
I'm going to leave it as is at the moment to see if it happens again as
it has been randomly over the last 3-4 weeks. I'll try to pull any info
off this time before rebooting the system - as I only recently found
this problem. Hopefully, either changing the slot, or even just
reseating the card may have had some effect - but I guess only time will
tell.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4965 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-01-16 10:13 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-15 3:27 Kernel 3.7.[12] - irq 16: nobody cared Steven Haigh
2013-01-15 15:23 ` Jan Beulich
2013-01-15 17:15 ` Steven Haigh
2013-01-16 9:42 ` Jan Beulich
2013-01-16 9:54 ` Steven Haigh
2013-01-16 10:05 ` Jan Beulich
2013-01-16 10:13 ` Steven Haigh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).