From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: test report for Xen 4.3 RC1 Date: Mon, 17 Jun 2013 10:23:04 -0400 Message-ID: <20130617142304.GP30071@phenom.dumpdata.com> References: <1B4B44D9196EFF41AE41FDA404FC0A1001B15762@SHSMSX102.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1B4B44D9196EFF41AE41FDA404FC0A1001B15762@SHSMSX102.ccr.corp.intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "Ren, Yongjie" Cc: "george.dunlap@eu.citrix.com" , "Xu, YongweiX" , "Liu, SongtaoX" , "Tian, Yongxue" , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On Sun, Jun 16, 2013 at 04:10:22AM +0000, Ren, Yongjie wrote: > > -----Original Message----- > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com] > > Sent: Wednesday, June 05, 2013 10:50 PM > > To: Ren, Yongjie > > Cc: george.dunlap@eu.citrix.com; Xu, YongweiX; Liu, SongtaoX; Tian, > > Yongxue; xen-devel@lists.xen.org > > Subject: Re: [Xen-devel] test report for Xen 4.3 RC1 > > > > > > > > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1851 > > > > > > > > > > > > That looks like you are hitting the udev race. > > > > > > > > > > > > Could you verify that these patches: > > > > > > https://lkml.org/lkml/2013/5/13/520 > > > > > > > > > > > > fix the issue (They are destined for v3.11) > > > > > > > > > > > Not tried yet. I'll update it to you later. > > > > > > > > Thanks! > > > > > > > > We tested kernel 3.9.3 with the 2 patches you mentioned, and found this > > > bug still exist. For example, we did CPU online-offline for Dom0 for 100 > > times, > > > and found 2 times (of 100 times) failed. > > > > Hm, does it fail b/c udev can't online the sysfs entry? > > > I think no. > When it fails to online CPU #3 (trying online #1~#3), it doesn't show any info > about CPU #3 via the output of "devadm monitor --env" CMD. It does show > info about #1 and #2 which are onlined succefully. And if you re-trigger the the 'xl vcpu-set' it eventually comes back up right? > > > .. snip.. > > > > > > > > > > > > > > > > > > > Old bugs: (11) > > > > > > > 1. [ACPI] Dom0 can't resume from S3 sleep > > > > > > > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1707 > > > > > > > > > > > > That should be fixed in v3.11 (as now we have the fixes) > > > > > > Could you try v3.10 with the Rafael's ACPI tree merged in? > > > > > > (so the patches that he wants to submit for v3.11) > > > > > > > > > > > I re-tested with Rafel's linux-pm.git tree (master and acpi-hotplug > > > > branch), > > > > > and found Dom0 S3 sleep/resume can't work, either. > > > > > > > > The patches he has to submit for v3.11 are in the linux-next branch. > > > > You need to use that branch. > > > > > > > Dom0 S3 sleep/resume doesn't work with linux-next branch, either. > > > attached the log. > > > > It does work on my box. So I am not sure if this is related to the > > IvyTown box you are using. Does it work on other machines? > > > No, it doesn't work on other machines, either. I also tried on SandyBridge, > IvyBridge desktop and Haswell mobile machines. I just double checked on my AMD machines with v3.10-rc5 with these extra patches: ebe2886 x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations. 7c4ae96 Revert "xen/pat: Disable PAT support for now." 729c6ec Revert "xen/pat: Disable PAT using pat_enabled value." bd4fd16 microcode_xen: Add support for AMD family >= 15h 6271c21 x86/microcode: check proper return code. b9a48c8 xen: add CPU microcode update driver c62566c cpu: make sure that cpu/online file created before KOBJ_ADD is emitted 0790542 cpu: fix "crash_notes" and "crash_notes_size" leaks in register_cpu() f90099b xen / ACPI / sleep: Register an acpi_suspend_lowlevel callback. 29ca6e9 x86 / ACPI / sleep: Provide registration for acpi_suspend_lowlevel. and it worked. Let me recompile a kernel without most of them to doublecheck whether those patches are making the ACPI S3 suspend/resume working. This is with Xen 4.3 (82cb411). The machine is M5A97, BIOS 1208 04/18/2012 with 01:00.0 VGA compatible controller: NVIDIA Corporation G84 [GeForce 8600 GT] (rev a1) as its graphic card. > > > > > > > > > > > > > > > > 2. [XL]"xl vcpu-set" causes dom0 crash or panic > > > > > > > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1730 > > > > > > > > > > > > That I think is fixed in v3.10. Could you please check v3.10-rc3? > > > > > > > > > > > Still exists on v3.10-rc3. > > > > > The following command lines can reproduce it: > > > > > # xl vcpu-set 0 1 > > > > > # xl vcpu-set 0 20 > > > > > > > > Ugh, same exact stack trace? And can you attach the full dmesg or > > serial > > > > output (so that Ican see what there is at bootup) > > > > > > > Yes, the same. Also attached in this mail. > > > > One of the fixes is this one: > > http://www.gossamer-threads.com/lists/xen/devel/284897 > > > > but the other ones I had not seen. I am wondering if the > > update_sd_lb_stats is b/c of the previous conditions (that is the > > tick_nohz_idle_start hadn't been called). > > > > It is a shoot in the dark - but if you use the above mentioned patch > > do you still see the update_sd_lb_stats crash? > > > Yes, with the patch we still see the update_sd_lb_stats crash. > It has almost the same trace log as before. Log file is attached. Would it be possible to do a bit of 'git bisect' to figure out why this started? > > > > > > > > > > > > > 4. 'xl vcpu-set' can't decrease the vCPU number of a HVM guest > > > > > > > http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1822 > > > > > > > > > > > > That I believe was an QEMU bug: > > > > > > > > http://lists.xen.org/archives/html/xen-devel/2013-05/msg01054.html > > > > > > > > > > > > which should be in QEMU traditional now (05-21 was when it went > > > > > > in the tree) > > > > > > > > > > > In this year or past year, this bug always exists (at least in our > > testing). > > > > > 'xl vcpu-set' can't decrease the vCPU number of a HVM guest > > > > > > > > Could you retry with Xen 4.3 please? > > > > > > > With Xen 4.3 & Linux:3.10.0-rc3, I can't decrease the vCPU number of a > > guest. > > > sorry, when I said this message, I still use rhel6.4 kernel as the guest. > After upgrading guest kernel to 3.10.0-rc3, the result became better. > Basically vCPU increment/decrement can work fine. I'll close that bug. Excellent! > But there's still a minor issue as following. > After booting guest with 'vcpus=4' and 'maxvcpus=32', change its vCPU number. > # xl vcpu-set $domID 32 > then you can only get less than 32 (e.g. 19) CPUs in the guest; again, you set > vCPU number to 32 (from 19), then it works to get 32vCPU for the guest. > but 'xl vcpu-set $domID 8' can work fine as we expected. > vCPU decrement has the same result. > Can you also have a try to reproduce my issue? Sure. Now how many PCPUS do you have? And what version of QEMU traditional were you using? > > > Could you give some more details? Could you include the > > /var/log/xen/qemu-... log file? > > > Attached the qemu log. Thank you. > > > You are using the traditional QEMU right? (you need to have this in your > > guest > > config: > > device_model_version = 'qemu-xen-traditional' > > > Yes. > > -- > Jay > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel