From: Steven Haigh <netwiz@crc.id.au>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel@lists.xen.org
Subject: Re: Kernel 3.7.[12] - irq 16: nobody cared
Date: Wed, 16 Jan 2013 04:15:38 +1100 [thread overview]
Message-ID: <50F58EBA.708@crc.id.au> (raw)
In-Reply-To: <50F5828302000078000B5E06@nat28.tlf.novell.com>
Hi Jan,
On 16/01/2013 2:23 AM, Jan Beulich wrote:
>>>> On 15.01.13 at 04:27, Steven Haigh <netwiz@crc.id.au> wrote:
>> irq 16: nobody cared (try booting with the "irqpoll" option)
>> Pid: 0, comm: swapper/0 Not tainted 3.7.2-1.el6xen.x86_64 #1
>> Call Trace:
>> <IRQ> [<ffffffff810a77f2>] __report_bad_irq+0x3a/0xc6
>> [<ffffffff810a79e7>] note_interrupt+0x169/0x1e5
>> [<ffffffff810a59b7>] handle_irq_event_percpu+0x16e/0x1b6
>> [<ffffffff810a5a37>] handle_irq_event+0x38/0x54
>> [<ffffffff810a8199>] handle_fasteoi_irq+0x88/0xd5
>> [<ffffffff812c23f5>] __xen_evtchn_do_upcall+0x15a/0x1f7
>> [<ffffffff812c3707>] xen_evtchn_do_upcall+0x2f/0x42
>> [<ffffffff814a44be>] xen_do_hypervisor_callback+0x1e/0x30
>> <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
>> [<ffffffff81007047>] ? xen_safe_halt+0x10/0x1a
>> [<ffffffff810169b1>] ? default_idle+0x50/0x8a
>> [<ffffffff81016318>] ? cpu_idle+0xc0/0xff
>> [<ffffffff8148160e>] ? rest_init+0x72/0x74
>> [<ffffffff81745b22>] ? start_kernel+0x3b0/0x3bd
>> [<ffffffff817455a7>] ? repair_env_string+0x58/0x58
>> [<ffffffff817452dd>] ? x86_64_start_reservations+0xb8/0xbd
>> [<ffffffff81748cad>] ? xen_start_kernel+0x4f2/0x4f4
>> handlers:
>> [<ffffffffa012edd9>] mv_interrupt [sata_mv]
>> Disabling IRQ #16
>>
>> I have tried booting with the irqpoll option on the kernel boot line,
>> but the same problem occurs.
>>
>> It seems disk throughput almost drops dead when this happens - as the
>> SATA controller seems to go into some different mode of operation. It
>> also seems like this has only happened recently - I was using builds of
>> 3.6.x as my Xen Dom0 kernel with no signs of this problem.
>>
>> Has anyone else seen this in recent kernel releases? I'm not quite sure
>> how to try and track this down.
> First of all, you'll want to clarify whether this problem is present
> _only_ when running under Xen, or also when running the same
> kernel without Xen underneath. This is primarily because the
> output you provided shows that IRQ 16 actually has a handler,
> just that it apparently ignores the interrupts (and that's nothing
> that Xen controls).
I'm not 100% sure how to do this. I haven't been able to find a method
to cause the problem to happen... It just does - and it seems random
when it does happen. Part of the problem with running the system without
the hypervisor in place is that I can't replicate any kind of workload
that would normally trigger the problem.
> Then, if this is a Xen-only problem, you will want to provide full
> hypervisor and kernel (boot) logs, the hypervisor one including
> debug key 'i' output, and the kernel one once with and once
> without Xen.
>
> Finally you'll want to clarify whether, when updating the kernel,
> you also updated the hypervisor (and if so, try the know good
> and known bad kernels on identical hypervisors).
I have been running Xen 4.2.1 for a while - and used multiple kernel
versions with it. Sadly, I don't have an archive of the RPMs that I used
(even though I built them!). I've only really noticed this happening in
the last month - when I've been running kernel 3.7.1+
On the off chance today, I have moved the card from one 16x PCIe slot to
the second one on the mainboard. This has moved the card from IRQ16 to
IRQ19. As of yet, I haven't had the problem occur - however as it is a
seemingly random occurrence, there is no guarantee that the problem is
solved. I've tried loading up the i/o by doing a resync of the RAID6 (of
which, 2 drives are on the sata_mv card) as well as hammering i/o in the
DomUs (rather random stuff), but still no reliable way to force the
problem to occur :(
I'm open to any suggestions :)
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299
next prev parent reply other threads:[~2013-01-15 17:15 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-15 3:27 Kernel 3.7.[12] - irq 16: nobody cared Steven Haigh
2013-01-15 15:23 ` Jan Beulich
2013-01-15 17:15 ` Steven Haigh [this message]
2013-01-16 9:42 ` Jan Beulich
2013-01-16 9:54 ` Steven Haigh
2013-01-16 10:05 ` Jan Beulich
2013-01-16 10:13 ` Steven Haigh
-- strict thread matches above, loose matches on Subject: below --
2013-01-14 23:16 Steven Haigh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50F58EBA.708@crc.id.au \
--to=netwiz@crc.id.au \
--cc=JBeulich@suse.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.