From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Ian Murray <murrayie@yahoo.co.uk>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
Jan Beulich <JBeulich@suse.com>
Subject: Re: Dom 0 crash
Date: Wed, 6 Nov 2013 12:43:28 -0500 [thread overview]
Message-ID: <20131106174328.GD17101@phenom.dumpdata.com> (raw)
In-Reply-To: <5279715D.3010804@yahoo.co.uk>
On Tue, Nov 05, 2013 at 10:29:49PM +0000, Ian Murray wrote:
> On 05/11/13 13:16, Jan Beulich wrote:
> >>>>On 05.11.13 at 12:58, Ian Murray <murrayie@yahoo.co.uk> wrote:
> >>I have a recurring crash using Xen 4.3.1-RC2 and Ubuntu 12.04 as Dom0
> >>(3.2.0-55-generic). I have software RAID 5 with LVM's. DomU (also 12.04
> >>Ubuntu 3.2.0-55 kernel) has a dedicated logical volume, which is being backed
> >>up shutting down the DomU, an LVM snapshot being created, restart of DomU and
> >>then the snapshot dd'ed to another logical volume. The snapshot is then
> >>removed and the second LV is dd'ed to gzip and onto DAT tape.
> >>
> >>I currently have this running every hour (unless its already running) for
> >>testing purposes. After 6-12 runs of this, the Dom0 kernel crashes with he
> >>below output.
> >>
> >>When I preform this booting into the same kernel standalone, the problem
> >>does not occur.
> >Likely because the action that triggers this doesn't get performed
> >in that case?
> Thanks for the response.
>
> I am obviously comparing apples and oranges, but I have tried to be
> as similar as possible in as much as I have limited kernel memory to
> 512M as I do with Dom0 and have used a background task writing
> /dev/urandom to the LV that the domU would normally be using. The
> only difference is that it isn't running under Xen and I don't have
> a domU running in the background. I will repeat the exercise with no
> domU running, but under Xen.
>
>
> >>Can anyone please suggest what I am doing wrong or identify if it is bug?
> >Considering that exception address ...
> >
> >>RIP: e030:[<ffffffff8142655d>] [<ffffffff8142655d>] scsi_dispatch_cmd+0x6d/0x2e0
> >... and call stack ...
> >
> >>[24149.786311] Call Trace:
> >>[24149.786315] <IRQ>
> >>[24149.786323] [<ffffffff8142da62>] scsi_request_fn+0x3a2/0x470
> >>[24149.786333] [<ffffffff812f1a28>] blk_run_queue+0x38/0x60
> >>[24149.786339] [<ffffffff8142c416>] scsi_run_queue+0xd6/0x1b0
> >>[24149.786347] [<ffffffff8142e822>] scsi_next_command+0x42/0x60
> >>[24149.786354] [<ffffffff8142ea52>] scsi_io_completion+0x1b2/0x630
> >>[24149.786363] [<ffffffff816611fe>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
> >>[24149.786371] [<ffffffff81424b5c>] scsi_finish_command+0xcc/0x130
> >>[24149.786378] [<ffffffff8142e7ae>] scsi_softirq_done+0x13e/0x150
> >>[24149.786386] [<ffffffff812fb6b3>] blk_done_softirq+0x83/0xa0
> >>[24149.786394] [<ffffffff8106fa38>] __do_softirq+0xa8/0x210
> >>[24149.786402] [<ffffffff8166ba6c>] call_softirq+0x1c/0x30
> >>[24149.786410] [<ffffffff810162f5>] do_softirq+0x65/0xa0
> >>[24149.786416] [<ffffffff8106fe1e>] irq_exit+0x8e/0xb0
> >>[24149.786428] [<ffffffff813aecd5>] xen_evtchn_do_upcall+0x35/0x50
> >>[24149.786436] [<ffffffff8166babe>] xen_do_hypervisor_callback+0x1e/0x30
> >>[24149.786441] <EOI>
> >>[24149.786449] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
> >>[24149.786456] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
> >>[24149.786464] [<ffffffff8100a500>] ? xen_safe_halt+0x10/0x20
> >>[24149.786472] [<ffffffff8101c913>] ? default_idle+0x53/0x1d0
> >>[24149.786478] [<ffffffff81013236>] ? cpu_idle+0xd6/0x120
> >... point into the SCSI subsystem, this is likely the wrong list to
> >ask for help on.
> ... but the right list to confirm that I am on the wrong list? :)
:-)
>
> Seriously, the specific evidence may suggest it's a non-Xen
> issue/bug, but Xen is the only measurable/visible difference so far.
> I referred it to this list because here the demarcation between
> hypervisor, PVOPS and regular kernel code interaction is likely best
> understood.
But you wouldn't do the same workload under baremetal thought?
Here is a thought. If you just do: "LV is dd'ed to gzip and onto DAT tape."
for 15 times under baremetal do you see the same issue?
And is there something particular about this DAT? Is it just a generic
/dev/st device?
Lastly, complete shot in the dark - try increasing the swiotlb size.
Do 'swiotlb=65543' on the Linux command line when booting under Xen.
>
> Thanks again for your response.
>
> >
> >Jan
> >
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2013-11-06 17:43 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-05 11:58 Dom 0 crash Ian Murray
2013-11-05 13:16 ` Jan Beulich
2013-11-05 22:29 ` Ian Murray
2013-11-06 17:43 ` Konrad Rzeszutek Wilk [this message]
-- strict thread matches above, loose matches on Subject: below --
2004-06-28 12:00 dom >0 crash James Harper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131106174328.GD17101@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=JBeulich@suse.com \
--cc=murrayie@yahoo.co.uk \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.