All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marek Marczykowski <marmarek@invisiblethingslab.com>
To: Ben Guthro <ben@guthro.net>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <JBeulich@suse.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
Date: Tue, 16 Apr 2013 03:02:53 +0200	[thread overview]
Message-ID: <516CA33D.3090503@invisiblethingslab.com> (raw)
In-Reply-To: <CAOvdn6UgX_ZvR_-U6MXCoLW9jHFy19k=oaF=mZgz_HBGoJDEvw@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 4911 bytes --]

On 16.04.2013 01:36, Ben Guthro wrote:
> On Mon, Apr 15, 2013 at 11:09 PM, Marek Marczykowski
> <marmarek@invisiblethingslab.com> wrote:
>> On 02.04.2013 03:13, Marek Marczykowski wrote:
>>> On 01.04.2013 15:53, Ben Guthro wrote:
>>>> On Thu, Mar 28, 2013 at 3:03 PM, Marek Marczykowski
>>>> <marmarek@invisiblethingslab.com> wrote:
>>>>> (XEN) Restoring affinity for d2v3
>>>>> (XEN) Assertion '!cpus_empty(cpus) && cpu_isset(cpu, cpus)' failed at
>>>>> sched_credit.c:481
>>>>
>>>>
>>>> I think the "fix-suspend-scheduler-*" patches posted here are applicable here:
>>>> http://markmail.org/message/llj3oyhgjzvw3t23
>>>>
>>>>
>>>> Specifically, I think you need this bit:
>>>>
>>>> diff --git a/xen/common/cpu.c b/xen/common/cpu.c
>>>> index 630881e..e20868c 100644
>>>> --- a/xen/common/cpu.c
>>>> +++ b/xen/common/cpu.c
>>>> @@ -5,6 +5,7 @@
>>>>  #include <xen/init.h>
>>>>  #include <xen/sched.h>
>>>>  #include <xen/stop_machine.h>
>>>> +#include <xen/sched-if.h>
>>>>
>>>>  unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
>>>>  #ifndef nr_cpumask_bits
>>>> @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void)
>>>>              BUG_ON(error == -EBUSY);
>>>>              printk("Error taking CPU%d up: %d\n", cpu, error);
>>>>          }
>>>> +        if (system_state == SYS_STATE_resume)
>>>> +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
>>>>      }
>>>>
>>>>      cpumask_clear(&frozen_cpus);
>>>>
>>>
>>> Indeed, this makes things better, but still not ideal.
>>> Now after resume all CPUs are in Pool-0, which is good. But CPU0 is much more
>>> preferred than others (xl vcpu-list). For example if I start 4 busy loops in
>>> dom0, I got (even after some time):
>>> [user@dom0 ~]$ xl vcpu-list
>>> Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
>>> dom0                                 0     0    0   r--      98.5  any cpu
>>> dom0                                 0     1    0   ---     181.3  any cpu
>>> dom0                                 0     2    2   r--     262.4  any cpu
>>> dom0                                 0     3    3   r--     230.8  any cpu
>>> netvm                                1     0    0   -b-      18.4  any cpu
>>> netvm                                1     1    0   -b-       9.1  any cpu
>>> netvm                                1     2    0   -b-       7.1  any cpu
>>> netvm                                1     3    0   -b-       5.4  any cpu
>>> firewallvm                           2     0    0   -b-      10.7  any cpu
>>> firewallvm                           2     1    0   -b-       3.0  any cpu
>>> firewallvm                           2     2    0   -b-       2.5  any cpu
>>> firewallvm                           2     3    3   -b-       3.6  any cpu
>>>
>>> If I remove some CPU from Pool-0 and re-add it, things back to normal for this
>>> particular CPU (so I got two equally used CPUs) - to fully restore system I
>>> must remove all but CPU0 from Pool-0 and add it again.
>>>
>>> Also still only CPU0 have all C-states (C0-C3), all others have only C0-C1.
>>> This probably could be fixed by your "xen: Re-upload processor PM data to
>>> hypervisor after S3 resume" patch (reload of xen-acpi-processor module helps
>>> here). But I don't think it is a right way. It isn't necessary on other
>>> systems (with somehow older hardware). It must be something missing on resume
>>> path. The question is what...
>>>
>>> Perhaps someone need to go through enable_nonboot_cpus() (__cpu_up?) and check
>>> if it restore all things disabled in disable_nonboot_cpus() (__cpu_disable?).
>>> Unfortunately I don't know x86 details so good to follow that code...
>>
>> Summarize ACPI S3 issues:
>>
>> I. Fixed issues:
>>
>> 1. IRQ problem fixed by "x86: irq_move_cleanup_interrupt() must ignore legacy
>> vectors" commit
>> 2. Assertion failure on resume with vcpu affinity used, fixes by "x86/S3:
>> Restore broken vcpu affinity on resume" commit
>>
>>
>> II. Not (fully) fixed issues:
>>
>> 1. CPU Pool-0 contains only CPU0 after resume - patch quoted above fixes the
>> issue, but it isn't applied to xen-unstable
>> 2. After resume scheduler chooses (almost) only CPU0 (above quoted listing).
>> Removing and re-adding all CPUs to Pool-0 solves the problem. Perhaps some
>> timers are not restarted after resume?
> 
> Marek,
> Please try the patch from this thread to see if it solves your 2 issues above:
> http://markmail.org/thread/35ecqimv7bwq3k6d
> 
> This patch was NAK'ed due to cpupool breakage...but in my testing, it
> solved both of these problems.
> 
> I don't know how to properly solve it in a cpupool compatible way...
> but I also haven't put much additional effort into doing so.

Indeed this makes problem disappear.

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2013-04-16  1:02 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-13 20:50 High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x Marek Marczykowski
2013-03-15  3:00 ` Dario Faggioli
2013-03-15  3:22   ` Marek Marczykowski
2013-03-15 13:02 ` Konrad Rzeszutek Wilk
2013-03-22 15:34   ` Marek Marczykowski
2013-03-22 16:56     ` Konrad Rzeszutek Wilk
2013-03-25 11:36       ` Marek Marczykowski
2013-03-25 14:17         ` Konrad Rzeszutek Wilk
2013-03-25 14:56           ` Marek Marczykowski
2013-03-26 12:17           ` Marek Marczykowski
2013-03-26 13:11             ` Jan Beulich
2013-03-26 13:50               ` Marek Marczykowski
2013-03-26 15:47                 ` Andrew Cooper
2013-03-26 16:12                   ` Andrew Cooper
2013-03-26 16:47                     ` Marek Marczykowski
2013-03-26 16:03                 ` Jan Beulich
2013-03-26 16:45                   ` Marek Marczykowski
2013-03-26 17:02                     ` Andrew Cooper
2013-03-26 17:42                       ` Marek Marczykowski
2013-03-26 17:54                         ` Andrew Cooper
2013-03-26 18:21                           ` Marek Marczykowski
2013-03-26 18:50                             ` Andrew Cooper
2013-03-27  8:50                               ` Marek Marczykowski
2013-03-27  8:58                                 ` Jan Beulich
2013-03-27  8:52                               ` Jan Beulich
2013-03-27  9:03                                 ` Jan Beulich
2013-03-27 14:01                                   ` Marek Marczykowski
2013-03-27 14:31                                 ` Marek Marczykowski
2013-03-27 14:46                                   ` Andrew Cooper
2013-03-27 14:49                                     ` Marek Marczykowski
2013-03-27 15:51                                       ` Marek Marczykowski
2013-03-27 16:27                                         ` Andrew Cooper
2013-03-27 18:16                                           ` Marek Marczykowski
2013-03-27 18:56                                             ` Andrew Cooper
2013-03-28 14:43                                               ` Marek Marczykowski
2013-03-28 10:50                                           ` Jan Beulich
2013-03-28 11:53                                             ` Andrew Cooper
2013-03-28 12:54                                               ` Jan Beulich
2013-03-28 13:19                                                 ` Jan Beulich
2013-03-27 14:52                                     ` Andrew Cooper
2013-03-27 15:47                                       ` Konrad Rzeszutek Wilk
2013-03-27 16:56                                         ` Andrew Cooper
2013-03-27 17:15                                           ` Marek Marczykowski
2013-03-28 17:41                                             ` Andrew Cooper
2013-03-28 17:44                                               ` Marek Marczykowski
2013-03-28 17:50                                                 ` Andrew Cooper
2013-03-29  0:26                                                   ` Marek Marczykowski
2013-03-28 16:13                                   ` Jan Beulich
2013-03-28 19:03                                     ` Marek Marczykowski
2013-04-01 13:53                                       ` Ben Guthro
2013-04-02  1:13                                         ` Marek Marczykowski
2013-04-02 14:05                                           ` Konrad Rzeszutek Wilk
2013-04-15 22:09                                           ` Marek Marczykowski
2013-04-15 23:36                                             ` Ben Guthro
2013-04-15 23:51                                               ` konrad wilk
2013-04-16  0:19                                                 ` Ben Guthro
2013-04-16  0:46                                                   ` Ben Guthro
2013-04-16  3:20                                                     ` konrad wilk
2013-04-16  1:02                                               ` Marek Marczykowski [this message]
2013-04-16  8:47                                             ` Jan Beulich
2013-04-16 11:49                                               ` Ben Guthro
2013-04-16 11:57                                                 ` Jan Beulich
2013-04-16 12:09                                                   ` Ben Guthro
2013-04-16 12:51                                                     ` Jan Beulich
2013-03-28 16:25                                   ` Jan Beulich
2013-03-28 16:31                                     ` Marek Marczykowski
2013-03-28 16:52                                       ` Jan Beulich
2013-03-28 17:09                                         ` Marek Marczykowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=516CA33D.3090503@invisiblethingslab.com \
    --to=marmarek@invisiblethingslab.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ben@guthro.net \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.