From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: "Ren, Yongjie" <yongjie.ren@intel.com>
Cc: "george.dunlap@eu.citrix.com" <george.dunlap@eu.citrix.com>,
"Xu, YongweiX" <yongweix.xu@intel.com>,
"Liu, SongtaoX" <songtaox.liu@intel.com>,
"Tian, Yongxue" <yongxue.tian@intel.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: test report for Xen 4.3 RC1
Date: Fri, 21 Jun 2013 14:17:52 -0400 [thread overview]
Message-ID: <20130621181752.GE15809@phenom.dumpdata.com> (raw)
In-Reply-To: <1B4B44D9196EFF41AE41FDA404FC0A1001B19F25@SHSMSX102.ccr.corp.intel.com>
On Thu, Jun 20, 2013 at 02:53:06AM +0000, Ren, Yongjie wrote:
> > -----Original Message-----
> > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> > Sent: Monday, June 17, 2013 10:23 PM
> > To: Ren, Yongjie
> > Cc: george.dunlap@eu.citrix.com; Xu, YongweiX; Liu, SongtaoX; Tian,
> > Yongxue; xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] test report for Xen 4.3 RC1
> >
> > On Sun, Jun 16, 2013 at 04:10:22AM +0000, Ren, Yongjie wrote:
> > > > -----Original Message-----
> > > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> > > > Sent: Wednesday, June 05, 2013 10:50 PM
> > > > To: Ren, Yongjie
> > > > Cc: george.dunlap@eu.citrix.com; Xu, YongweiX; Liu, SongtaoX; Tian,
> > > > Yongxue; xen-devel@lists.xen.org
> > > > Subject: Re: [Xen-devel] test report for Xen 4.3 RC1
> > > >
> > > > > >
> > > >
> > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1851
> > > > > > > >
> > > > > > > > That looks like you are hitting the udev race.
> > > > > > > >
> > > > > > > > Could you verify that these patches:
> > > > > > > > https://lkml.org/lkml/2013/5/13/520
> > > > > > > >
> > > > > > > > fix the issue (They are destined for v3.11)
> > > > > > > >
> > > > > > > Not tried yet. I'll update it to you later.
> > > > > >
> > > > > > Thanks!
> > > > > > >
> > > > > We tested kernel 3.9.3 with the 2 patches you mentioned, and found
> > this
> > > > > bug still exist. For example, we did CPU online-offline for Dom0 for
> > 100
> > > > times,
> > > > > and found 2 times (of 100 times) failed.
> > > >
> > > > Hm, does it fail b/c udev can't online the sysfs entry?
> > > >
> > > I think no.
> > > When it fails to online CPU #3 (trying online #1~#3), it doesn't show any
> > info
> > > about CPU #3 via the output of "devadm monitor --env" CMD. It does
> > show
> > > info about #1 and #2 which are onlined succefully.
> >
> > And if you re-trigger the the 'xl vcpu-set' it eventually comes back up right?
> >
> We don't use 'xl vcpu-set' command when doing the CPU hot-plug.
> We just call the xc_cpu_online/offline() in tools/libxc/xc_cpu_hotplug.c to test.
Oh. That is very different than what I thought. You are not offlining/onlining
vCPUS - you offlining/onlining pCPUS! So Xen has to cramp the dom0 vCPUs in the
remaining vCPUS.
There should be no vCPU re-sizing correct?
> (see the attachment about my test code in that bugzilla.)
> And, yes, if a CPU failed to online, it can also be onlined again when we re-trigger
> online function.
>
> > >
> > > > .. snip..
> > > > > > >
> > > > > > > > >
> > > > > > > > > Old bugs: (11)
> > > > > > > > > 1. [ACPI] Dom0 can't resume from S3 sleep
> > > > > > > > > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1707
> > > > > > > >
> > > > > > > > That should be fixed in v3.11 (as now we have the fixes)
> > > > > > > > Could you try v3.10 with the Rafael's ACPI tree merged in?
> > > > > > > > (so the patches that he wants to submit for v3.11)
> > > > > > > >
> > > > > > > I re-tested with Rafel's linux-pm.git tree (master and acpi-hotplug
> > > > > > branch),
> > > > > > > and found Dom0 S3 sleep/resume can't work, either.
> > > > > >
> > > > > > The patches he has to submit for v3.11 are in the linux-next branch.
> > > > > > You need to use that branch.
> > > > > >
> > > > > Dom0 S3 sleep/resume doesn't work with linux-next branch, either.
> > > > > attached the log.
> > > >
> > > > It does work on my box. So I am not sure if this is related to the
> > > > IvyTown box you are using. Does it work on other machines?
> > > >
> > > No, it doesn't work on other machines, either. I also tried on
> > SandyBridge,
> > > IvyBridge desktop and Haswell mobile machines.
> >
> > I just double checked on my AMD machines with v3.10-rc5 with
> > these extra patches:
> >
> > ebe2886 x86/cpa: Use pte_attrs instead of pte_flags on
> > CPA/set_p.._wb/wc operations.
> > 7c4ae96 Revert "xen/pat: Disable PAT support for now."
> > 729c6ec Revert "xen/pat: Disable PAT using pat_enabled value."
> > bd4fd16 microcode_xen: Add support for AMD family >= 15h
> > 6271c21 x86/microcode: check proper return code.
> > b9a48c8 xen: add CPU microcode update driver
> > c62566c cpu: make sure that cpu/online file created before KOBJ_ADD is
> > emitted
> > 0790542 cpu: fix "crash_notes" and "crash_notes_size" leaks in
> > register_cpu()
> > f90099b xen / ACPI / sleep: Register an acpi_suspend_lowlevel callback.
> > 29ca6e9 x86 / ACPI / sleep: Provide registration for
> > acpi_suspend_lowlevel.
> >
> > and it worked. Let me recompile a kernel without most of them to
> > doublecheck
> > whether those patches are making the ACPI S3 suspend/resume working.
> > This is with Xen 4.3 (82cb411). The machine is M5A97, BIOS 1208
> > 04/18/2012
> > with 01:00.0 VGA compatible controller: NVIDIA Corporation G84 [GeForce
> > 8600 GT] (rev a1)
> > as its graphic card.
> >
> After re-testing with linux-pm.git tree (kernel:3.10.rc6+ commit: a913b188df) on
> my IvyTown-EP and IvyBridge desktop systems, Dom0 S3 sleep/resume can work!
> When these codes are upstreamed to linux.git tree, I can close this bug.
Yes! Thought Ben found another issue with extended sleep - where it will
not use the hypercall. <sigh>
>
> > >
> > > > >
> > > > > > >
> > > > > > > > > 2. [XL]"xl vcpu-set" causes dom0 crash or panic
> > > > > > > > > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1730
> > > > > > > >
> > > > > > > > That I think is fixed in v3.10. Could you please check v3.10-rc3?
> > > > > > > >
> > > > > > > Still exists on v3.10-rc3.
> > > > > > > The following command lines can reproduce it:
> > > > > > > # xl vcpu-set 0 1
> > > > > > > # xl vcpu-set 0 20
> > > > > >
> > > > > > Ugh, same exact stack trace? And can you attach the full dmesg or
> > > > serial
> > > > > > output (so that Ican see what there is at bootup)
> > > > > >
> > > > > Yes, the same. Also attached in this mail.
> > > >
> > > > One of the fixes is this one:
> > > > http://www.gossamer-threads.com/lists/xen/devel/284897
> > > >
> > > > but the other ones I had not seen. I am wondering if the
> > > > update_sd_lb_stats is b/c of the previous conditions (that is the
> > > > tick_nohz_idle_start hadn't been called).
> > > >
> > > > It is a shoot in the dark - but if you use the above mentioned patch
> > > > do you still see the update_sd_lb_stats crash?
> > > >
> > > Yes, with the patch we still see the update_sd_lb_stats crash.
> > > It has almost the same trace log as before. Log file is attached.
> >
> > Would it be possible to do a bit of 'git bisect' to figure out why
> > this started?
> >
> It's hard.
> This issue exists for a long time. We don't even know which version of
> linux upstream as dom0 can work for this bug.
Then a bit of digging will be needed. Sadly I am out of time to do this
ATM.
>
> > > > > > > > > 4. 'xl vcpu-set' can't decrease the vCPU number of a HVM
> > guest
> > > > > > > > > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1822
> > > > > > > >
> > > > > > > > That I believe was an QEMU bug:
> > > > > > > >
> > > > http://lists.xen.org/archives/html/xen-devel/2013-05/msg01054.html
> > > > > > > >
> > > > > > > > which should be in QEMU traditional now (05-21 was when it
> > went
> > > > > > > > in the tree)
> > > > > > > >
> > > > > > > In this year or past year, this bug always exists (at least in our
> > > > testing).
> > > > > > > 'xl vcpu-set' can't decrease the vCPU number of a HVM guest
> > > > > >
> > > > > > Could you retry with Xen 4.3 please?
> > > > > >
> > > > > With Xen 4.3 & Linux:3.10.0-rc3, I can't decrease the vCPU number of
> > a
> > > > guest.
> > > >
> > > sorry, when I said this message, I still use rhel6.4 kernel as the guest.
> > > After upgrading guest kernel to 3.10.0-rc3, the result became better.
> > > Basically vCPU increment/decrement can work fine. I'll close that bug.
> >
> > Excellent!
> > > But there's still a minor issue as following.
> > > After booting guest with 'vcpus=4' and 'maxvcpus=32', change its vCPU
> > number.
> > > # xl vcpu-set $domID 32
> > > then you can only get less than 32 (e.g. 19) CPUs in the guest; again, you
> > set
> > > vCPU number to 32 (from 19), then it works to get 32vCPU for the guest.
> > > but 'xl vcpu-set $domID 8' can work fine as we expected.
> > > vCPU decrement has the same result.
> > > Can you also have a try to reproduce my issue?
> >
> This issue doesn't exist when using the latest QEMU traditional tree.
> My pervious QEMU was old (March 2013), and I found some of your patches
> were applied in May 2013. These fixes can fix the issue we reported.
> Close this bug.
Yes!
>
> But, it introduced another issue: when doing 'xl vcpu-set' for HVM several
> times (e.g. 5 times), the guest will panic. Log is attached.
> Before your patches in qemu traditional tree in May 2013, we never meet
> guest kernel panic.
> dom0: 3.10.0-rc3
> Xen: 4.3.0-RCx
> QEMU: the latest traditional tree
> guest kernel: 3.10.0-RC3
> I'll file another bug to track this bug ?
Please.
> Can you reproduce this ?
Could you tell me how you are doing 'xl vcpu-set'? Is there a particular
test script you are using?
>
> > Sure. Now how many PCPUS do you have? And what version of QEMU
> > traditional
> > were you using?
> >
> There're 32 pCPU in that system we used.
>
> Best Regards,
> Yongjie (Jay)
next prev parent reply other threads:[~2013-06-21 18:17 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-16 4:10 test report for Xen 4.3 RC1 Ren, Yongjie
2013-06-17 14:23 ` Konrad Rzeszutek Wilk
2013-06-17 20:35 ` Konrad Rzeszutek Wilk
2013-06-17 20:36 ` Konrad Rzeszutek Wilk
2013-06-20 2:53 ` Ren, Yongjie
2013-06-21 18:17 ` Konrad Rzeszutek Wilk [this message]
2013-07-02 8:09 ` Ren, Yongjie
2013-07-02 13:36 ` Konrad Rzeszutek Wilk
-- strict thread matches above, loose matches on Subject: below --
2013-06-05 10:14 Ren, Yongjie
2013-06-05 14:50 ` Konrad Rzeszutek Wilk
2013-06-04 15:59 Ren, Yongjie
2013-06-04 16:35 ` Konrad Rzeszutek Wilk
2013-05-27 3:49 Ren, Yongjie
2013-05-28 15:15 ` Konrad Rzeszutek Wilk
2013-05-28 15:21 ` Konrad Rzeszutek Wilk
2013-05-28 15:24 ` George Dunlap
2013-11-11 10:22 ` Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130621181752.GE15809@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=george.dunlap@eu.citrix.com \
--cc=songtaox.liu@intel.com \
--cc=xen-devel@lists.xen.org \
--cc=yongjie.ren@intel.com \
--cc=yongweix.xu@intel.com \
--cc=yongxue.tian@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.