All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Liu <wei.liu2@citrix.com>
To: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
	xen-devel@lists.xensource.com,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	osstest service owner <osstest-admin@xenproject.org>
Subject: Re: [linux-4.1 test] 63030: regressions - FAIL
Date: Wed, 21 Oct 2015 18:34:05 +0100	[thread overview]
Message-ID: <20151021173405.GG5060@zion.uk.xensource.com> (raw)
In-Reply-To: <1445446026.32735.18.camel@citrix.com>

On Wed, Oct 21, 2015 at 05:47:06PM +0100, Ian Campbell wrote:
> On Tue, 2015-10-20 at 16:34 +0100, Ian Jackson wrote:
> > Wei Liu writes ("Re: [Xen-devel] [linux-4.1 test] 63030: regressions 
> > - FAIL"):
> > > From mere code inspection and document of lwip 1.3.0 I think mini
> > -os
> > > does send gratuitous ARP.
> > 
> > The guest is using the PVHVM drivers at this point, with the backend
> > directly in dom0, so it is the guest's gratuitous arp which is needed,
> > I think.
> 
> It would be worth investigating whether mini-os's gratuitous ARP might
> also be occurring and confusing things, e.g. by coming after and
> therefore taking precedence over the one coming from the guest.
> 

Several observations:

1. The guest doesn't always send gratuitous arp -- but this might not be
   the cause of this failure. Guest works fine when using qemu-trad
   only.
2. Guest only sends one gratuitous arp at most.
3. When using stubdom, guest is a lot less responsive. See two
   experiments and analysis below.

I statically add arp entry for guest interface because arp entry some
times gets deleted. Note that this is not covering up the root cause of
failure because  the arp entry is normally deleted after a few migration
iterations. The failure on merlot* mostly fail on first iteration. And
when arp entry is not available, the error for ssh should be "No route
to host", not "timed out".

Furthermore when the arp entry is not available, dom0 naturally sends an
arp request to guest. When stubdom is not in use, guest responded
instantly, when stubdom is in use, guest was a lot less responsive.

I use a script to repeat migration and ssh.

  i=1
  while true; do
      echo "#### iteration $i"
      ssh localhost xl migrate wheezy-hvm localhost
      if [ $? != 0 ]; then
          echo "migration failed $?";
          exit 1;
      fi 
      timeout 40 ssh -o BatchMode=yes -o ConnectTimeout=100 -o ServerAliveInterval=100 root@10.80.239.39 date
      st=$?
      if [ $st != 0 ]; then
          echo "failed $st";
          exit 1;
      fi 
      i=$((i+1))
  done

At the same time
  tcpdump -i xenbr0 arp and host $GUEST_IP

When stubdom is present.

Scenario 1:
  xl shows "Migration successful."
  ...30s...
  xenbr0 receives gratuitous arp
  ...1s...
  ssh date command comes back

Scenario 2:
  xenbr0 receives gratuitous arp
  ...1s...
  xl shows "Migration successful."
  ssh date command comes back

When stubdom was not present I never saw scenario 1.

Note that my machine is relative old (>6 years). It would never pass
the test in osstest because in osstest the timeout is 10s.

The slowness in osstest seems to be host specific because all failures
in guest migrate test failed on merlot*. It's not only linux-4.1 is
failing, other branches fail the same test step on merlot*, too.

Wei.

  reply	other threads:[~2015-10-21 17:34 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-18 17:52 [linux-4.1 test] 63030: regressions - FAIL osstest service owner
2015-10-19 13:51 ` Wei Liu
2015-10-20 14:39   ` Ian Jackson
2015-10-20 15:24     ` Wei Liu
2015-10-20 15:34       ` Ian Jackson
2015-10-21 16:47         ` Ian Campbell
2015-10-21 17:34           ` Wei Liu [this message]
2015-10-22  9:50             ` Ian Campbell
2015-10-22 10:28               ` Wei Liu
2015-10-22 10:39                 ` Ian Campbell
2015-10-22 11:03                   ` Wei Liu
2015-10-22 11:12                     ` Ian Campbell
2015-10-22 14:41                       ` Ian Jackson
2015-10-22 14:56                         ` Ian Campbell
2015-10-22 15:18                           ` Ian Jackson
2015-10-21  9:04       ` Ian Campbell
2015-10-21  9:24         ` Wei Liu
2015-10-21  9:44           ` Ian Campbell
2015-10-21 10:04             ` Ian Campbell
2015-10-21 10:35             ` Wei Liu
2015-10-21 10:48               ` Ian Campbell
2015-10-21 11:07                 ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151021173405.GG5060@zion.uk.xensource.com \
    --to=wei.liu2@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=osstest-admin@xenproject.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.