All of lore.kernel.org
 help / color / mirror / Atom feed
From: Scott Garron <xen-devel@sce.pridelands.org>
To: "Xu, Dongxiao" <dongxiao.xu@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Daniel Stodden <daniel.stodden@citrix.com>
Subject: Re: Making snapshot of logical volumes handling HVM	domU causes OOPS and instability
Date: Tue, 31 Aug 2010 04:16:09 -0400	[thread overview]
Message-ID: <4C7CBA49.2030306@sce.pridelands.org> (raw)
In-Reply-To: <D5AB6E638E5A3E4B8F4406B113A5A19A2A4D1D5B@shsmsx501.ccr.corp.intel.com>

>> Scott Garron wrote:
>>> Another issue that comes up is that if I run the 2.6.32.18 pvops
>>> kernel for my Linux domUs, after a time (usually only about an
>>> hour or so), the network interfaces stop responding.

> Jeremy Fitzhardinge wrote:
>> That's a separate problem in netfront that appears to be a bug in
>> the "smartpoll" code.  I think Dongxiao is looking into it.

On 8/31/2010 2:59 AM, Xu, Dongxiao wrote:
> Yes, I tried to reproduce these days, however I could catch it
> locally. I tried both netperf and ping for a long time, but the bug
> is not triggered. What workload are you using when met the bug?

      I'd say that the whole machine is under moderate to high
utilization because it has 10 virtual machines running - three of which
are Windows 2008 Servers as HVM guests.  However, as far as the "load"
goes, most of the virtual machines are fairly idle and probably not
under much stress, overall.  Just to give you an idea, we have a
10Mbit/s connection to the Internet, and this server's physical network
interface (all 10 of the domUs' traffic, combined) usually accounts for
less than 2Mbit/s of the outbound traffic at any given point in the day.
  Aside from Windows being Windows (the HVM guests are running graphical
desktops), I wouldn't say that any of them cause a high CPU load,
either.  Database load is fairly low to moderate on guests running MySQL
and/or PostgreSQL.  The only guest that seems to use more CPU and
RAM is one serving e-mail, and that's because it runs ClamAV and
SpamAssassin.  That e-mail server was one that kept its network
connectivity the longest, though (after a few hours, it did stop
responding, but that was after some guests with lighter loads stopped
responding).

      An observation that I made, and it may just be coincidental,
but at least noteworthy, is that the virtual machines that are assigned
less RAM seem to lose connectivity more quickly than those with more
RAM.  The most recent time that I was able to trigger the bug, the
virtual machine that lost connectivity was only assigned 384MB RAM,
running 2.6.32.18.  At the time, the rest of my paravirtualized guests
were running 2.6.31.14, and they didn't experience the problem.

      I've previously triggered the bug in multiple domUs that were
running a more recent kernel (I think it was 2.6.32.17 - before I
reverted to a netback-patched 2.6.31.14 kernel), and the first ones to
disappear from the network were ones that were only assigned 256MB.
Eventually, they all disappeared, though.  The only "load" on one of the
first to disappear is an installation of bind9, servicing about 50
domain names - none of which receive an abnormally high hit count.

      The first time I noticed the problem, I had started 7
paravirtualized guests, of varying memory assignments.  The moment I
started the 8th guest, an HVM Windows 2008 Server, the networking on all
of the running of the guests (the paravirt ones) stopped responding at
the same time.  That may also be something to try/look at.

      After a reboot, I avoided starting any of the HVM guests, and the
connectivity lasted a couple of hours on the 7 running paravirt guests,
but started disappearing one guest at a time, over the course of the
next few hours.

      I didn't mention in my previous e-mail that in order to get
networking to work in a stable fashion in the 2.6.31.14 kernel (the one
I reverted to), I had to apply the patch mentioned here:
http://lists.xensource.com/archives/html/xen-devel/2010-05/msg01570.html
Otherwise, networking became unstable immediately at the time of guest
creation.  That patch was already applied to the 2.6.32.18 kernel that
is giving me the eventual network loss problems, though.

      More specifics about my configuration can be found here:
http://www.pridelands.org/~simba/hurricane-server.txt

-- 
Scott Garron

      reply	other threads:[~2010-08-31  8:16 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-28  1:22 Making snapshot of logical volumes handling HVM domU causes OOPS and instability Scott Garron
2010-08-30 16:52 ` Jeremy Fitzhardinge
2010-08-30 18:18   ` Scott Garron
2010-09-12  9:33     ` J. Roeleveld
2010-08-30 19:13   ` Daniel Stodden
2010-08-30 20:30     ` Scott Garron
2010-08-31  9:20       ` Daniel Stodden
2010-08-31 18:06         ` Scott Garron
2010-09-03  8:06           ` Scott Garron
2010-09-12  9:41             ` J. Roeleveld
2010-09-12 18:48               ` Scott Garron
2010-09-13  0:15                 ` Making snapshot of logical volumes handling HVM domUcauses " James Harper
2010-09-13  8:35                   ` J. Roeleveld
2010-09-13  8:33                 ` Making snapshot of logical volumes handling HVM domU causes " J. Roeleveld
     [not found]           ` <4C80ABA6.6000203@pridelands.org>
2010-09-03 15:40             ` Jeremy Fitzhardinge
2010-09-11 19:16               ` Scott Garron
2010-09-12  0:20                 ` Making snapshot of logical volumes handling HVM domUcauses " James Harper
2010-08-31  6:59   ` Making snapshot of logical volumes handling HVM domU causes " Xu, Dongxiao
2010-08-31  8:16     ` Scott Garron [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C7CBA49.2030306@sce.pridelands.org \
    --to=xen-devel@sce.pridelands.org \
    --cc=daniel.stodden@citrix.com \
    --cc=dongxiao.xu@intel.com \
    --cc=jeremy@goop.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.