xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Avi Kivity <avi@redhat.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
	riel@redhat.com, amit shah <amit.shah@redhat.com>,
	mtosatti@redhat.com, xen-devel@lists.xensource.com,
	Ian.Campbell@citrix.com
Subject: Re: [PATCH] BUG in pv_clock when overflow condition is detected
Date: Tue, 21 Feb 2012 12:35:42 +0100	[thread overview]
Message-ID: <4F43818E.4010407@redhat.com> (raw)
In-Reply-To: <20120220152855.GA25535@phenom.dumpdata.com>

On 02/20/2012 04:28 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Feb 17, 2012 at 04:25:04PM +0100, Igor Mammedov wrote:
>> On 02/16/2012 03:03 PM, Avi Kivity wrote:
>>> On 02/15/2012 07:18 PM, Igor Mammedov wrote:
>>>>> On 02/15/2012 01:23 PM, Igor Mammedov wrote:
>>>>>>>>    static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time
>>>>>>>> *shadow)
>>>>>>>>    {
>>>>>>>> -    u64 delta = native_read_tsc() - shadow->tsc_timestamp;
>>>>>>>> +    u64 delta;
>>>>>>>> +    u64 tsc = native_read_tsc();
>>>>>>>> +    BUG_ON(tsc<    shadow->tsc_timestamp);
>>>>>>>> +    delta = tsc - shadow->tsc_timestamp;
>>>>>>>>        return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
>>>>>>>>                       shadow->tsc_shift);
>>>>>>>
>>>>>>> Maybe a WARN_ON_ONCE()?  Otherwise a relatively minor hypervisor
>>>>>>> bug can
>>>>>>> kill the guest.
>>>>>>
>>>>>>
>>>>>> An attempt to print from this place is not perfect since it often
>>>>>> leads
>>>>>> to recursive calling to this very function and it hang there
>>>>>> anyway.
>>>>>> But if you insist I'll re-post it with WARN_ON_ONCE,
>>>>>> It won't make much difference because guest will hang/stall due
>>>>>> overflow
>>>>>> anyway.
>>>>>
>>>>> Won't a BUG_ON() also result in a printk?
>>>> Yes, it will. But stack will still keep failure point and poking
>>>> with crash/gdb at core will always show where it's BUGged.
>>>>
>>>> In case it manages to print dump somehow (saw it couple times from ~
>>>> 30 test cycles), logs from console or from kernel message buffer
>>>> (again poking with gdb) will show where it was called from.
>>>>
>>>> If WARN* is used, it will still totaly screwup clock and
>>>> "last value" and system will become unusable, requiring looking with
>>>> gdb/crash at the core any way.
>>>>
>>>> So I've just used more stable failure point that will leave trace
>>>> everywhere it manages (maybe in console log, but for sure in stack)
>>>> in case of WARN it might leave trace on console or not and probably
>>>> won't reflect failure point in stack either leaving only kernel
>>>> message buffer for clue.
>>>>
>>>
>>> Makes sense.  But do get an ack from the Xen people to ensure this
>>> doesn't break for them.
>>>
>> Konrad, Ian
>>
>> Could you please review patch form point of view of xen?
>> Whole thread could be found here https://lkml.org/lkml/2012/2/13/286
>
> What are the conditions under which this happens?
> You should probably include that in the git description as well?
This happens on cpu hot-plug in kvm guest:
https://lkml.org/lkml/2012/2/7/222

It probably doesn't affect xen pv guest but issue might affect hvm one.
I'm certainly not xen expert to say it for sure after a cursory look
at the code. If you can confirm that it affects xen hvm I will write
early_percpu_clock_init patch for it as well.

> Is this something that happens often?
Very seldom and unlikely.

> Hm, so are you asking for review for this patch
I was asking for review of subj patch
   "BUG in pv_clock when overflow condition is detected"
I'll update patch description and re-spin it.

>  If there is an overflow can you synthesize a value instead of
> crashing the guest?
> or for http://www.spinics.net/lists/kvm/msg68440.html ?
Probably could, but there was argument that it is fixing the symptoms
and not the root cause. It seems that you've already found patch that
proposes this "pvclock: Make pv_clock more robust and fixup it if overflow happens"

>
> (which would also entail a early_percpu_clock_init implementation
> in the Xen code naturally).
>

-- 
Thanks,
  Igor

      reply	other threads:[~2012-02-21 11:35 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <28a9ca8a-4696-4c9c-bd15-f2fa5558740e@zmail16.collab.prod.int.phx2.redhat.com>
     [not found] ` <4F3D0CB1.5070707@redhat.com>
2012-02-17 15:25   ` [PATCH] BUG in pv_clock when overflow condition is detected Igor Mammedov
2012-02-20 15:28     ` Konrad Rzeszutek Wilk
2012-02-21 11:35       ` Igor Mammedov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F43818E.4010407@redhat.com \
    --to=imammedo@redhat.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=amit.shah@redhat.com \
    --cc=avi@redhat.com \
    --cc=hpa@zytor.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).