From: Scott Wood <scottwood@freescale.com>
To: Alexander Graf <agraf@suse.de>
Cc: Bharat Bhushan <r65777@freescale.com>,
kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
Bharat Bhushan <Bharat.Bhushan@freescale.com>
Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation
Date: Mon, 09 Jul 2012 17:09:10 +0000 [thread overview]
Message-ID: <4FFB1036.7060405@freescale.com> (raw)
In-Reply-To: <5A60FB43-C554-4B72-8D86-AEA927F17503@suse.de>
On 07/07/2012 02:50 AM, Alexander Graf wrote:
>
> On 07.07.2012, at 01:37, Scott Wood wrote:
>
>> On 07/06/2012 08:17 AM, Alexander Graf wrote:
>>> On 28.06.2012, at 08:17, Bharat Bhushan wrote:
>>>> +/*
>>>> + * The timer system can almost deal with LONG_MAX timeouts, except that
>>>> + * when you get very close to LONG_MAX, the slack added can cause overflow.
>>>> + *
>>>> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for
>>>> + * any realistic use.
>>>> + */
>>>> +#define MAX_TIMEOUT (LONG_MAX/2)
>>>
>>> Should this really be in kvm code?
>>
>> It looks like we can use NEXT_TIMER_MAX_DELTA for this.
>>
>>>> + mask = 1ULL << (63 - period);
>>>> + tb = get_tb();
>>>> + if (tb & mask)
>>>> + nr_jiffies += mask;
>>>
>>> To be honest, you lost me here. nr_jiffies is jiffies, right? While
>>> mask is basically in timebase granularity. I suppose you're just
>>> reusing the variable here for no good reason - the compiler will
>>> gladly optimize things for you if you write things a bit more verbose
>>> :).
>>
>> Probably due to the way do_div() works, but yeah, it's confusing. Maybe
>> something generic like "ticks", "interval", "remaining", etc. would be
>> better, with a comment on the do_div saying it's converting timebase
>> ticks into jiffies.
>
> Well, you could start off with a variable "delta_tb", then do
>
> nr_jiffies = delta_tb;
> x = do_div(...);
>
> and things would suddenly become readable :). Of course I don't object to comments along the code either :).
>
>>
>>>> +static void arm_next_watchdog(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + unsigned long nr_jiffies;
>>>> +
>>>> + nr_jiffies = watchdog_next_timeout(vcpu);
>>>> + if (nr_jiffies < MAX_TIMEOUT)
>>>> + mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies);
>>>> + else
>>>> + del_timer(&vcpu->arch.wdt_timer);
>>>
>>> Can you del a timer that's not armed? Could that ever happen in this case?
>>
>> "del_timer() deactivates a timer - this works on both active and
>> inactive timers."
>
> Ah, good :).
>
>>
>>> Also, could the timer possibly be running somewhere, so do we need del_timer_sync? Or don't we need to care?
>>
>> This can be called in the context of the timer, so del_timer_sync()
>> would hang.
>>
>> As for what would happen if a caller from a different context were to
>> race with a timer, I think you could end up with the timer armed based
>> on an old TCR. del_timer_sync() won't help though, unless you replace
>> mod_timer() with del_timer_sync() plus add_timer() (with a check to see
>> whether it's running in timer context). A better solution is probably
>> to use a spinlock in arm_next_watchdog().
>
> Yup. Either way, we have a race that the guest might not expect.
>
>>
>>>> +void kvmppc_watchdog_func(unsigned long data)
>>>> +{
>>>> + struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
>>>> + u32 tsr, new_tsr;
>>>> + int final;
>>>> +
>>>> + do {
>>>> + new_tsr = tsr = vcpu->arch.tsr;
>>>> + final = 0;
>>>> +
>>>> + /* Time out event */
>>>> + if (tsr & TSR_ENW) {
>>>> + if (tsr & TSR_WIS) {
>>>> + new_tsr = (tsr & ~TCR_WRC_MASK) |
>>>> + (vcpu->arch.tcr & TCR_WRC_MASK);
>>>> + vcpu->arch.tcr &= ~TCR_WRC_MASK;
>>>> + final = 1;
>>>> + } else {
>>>> + new_tsr = tsr | TSR_WIS;
>>>> + }
>>>> + } else {
>>>> + new_tsr = tsr | TSR_ENW;
>>>> + }
>>>> + } while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr);
>>>> +
>>>> + if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) {
>>>> + smp_wmb();
>>>> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
>>>> + kvm_vcpu_kick(vcpu);
>>>> + }
>>>> +
>>>> + /*
>>>> + * Avoid getting a storm of timers if the guest sets
>>>> + * the period very short. We'll restart it if anything
>>>> + * changes.
>>>> + */
>>>> + if (!final)
>>>> + arm_next_watchdog(vcpu);
>>>
>>> Mind to explain this part a bit further?
>>
>> The whole function, or some subset near the end?
>>
>> The "if (!final)" check means that we stop running the timer after final
>> expiration, to prevent the host from being flooded with timers if the
>> guest sets a short period but does not have TCR set to exit to QEMU.
>> Timers will resume the next time TSR/TCR is updated.
>
> Ah. The semantics make sense. The comment however is slightly too short. Please explain this in a more verbose way, so someone who didn't write the code knows what's going on :).
>
>>
>>>> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
>>>> }
>>>>
>>>> if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) {
>>>> + u32 old_tsr = vcpu->arch.tsr;
>>>> +
>>>> vcpu->arch.tsr = sregs->u.e.tsr;
>>>> +
>>>> + if ((old_tsr ^ vcpu->arch.tsr) &
>>>> + (TSR_ENW | TSR_WIS | TCR_WRC_MASK))
>>>> + arm_next_watchdog(vcpu);
>>>
>>> Why isn't this one guarded by vcpu->arch.watchdog_enable?
>>
>> I'm not sure that any of them should be -- there's no reason for the
>> watchdog interrupt mechanism to be dependent on QEMU, only the
>> heavyweight exit on final expiration.
>
> Well - I like the concept of having new features switchable.
The code quickly becomes an unmantainable mess if every new feature is
switchable (and where is the line between new feature and bugfix when it
comes to filling in missing emulation?). The reason for switchability
in this case is limited to QEMU interface compatibility.
The internal-to-KVM part of the watchdog isn't much different from the
FIT -- we're not going to conditionalize that, are we?
> Overlapping the "watchdog is implemented" feature with "user space
> wants watchdog exits" makes sense. But I definitely want to have a
> switch for the former, because we otherwise differ quite
> substantially from the emulation we had before.
It differs in that it is now a bit closer to being correct.
-Scott
WARNING: multiple messages have this Message-ID (diff)
From: Scott Wood <scottwood@freescale.com>
To: Alexander Graf <agraf@suse.de>
Cc: Bharat Bhushan <r65777@freescale.com>, <kvm-ppc@vger.kernel.org>,
<kvm@vger.kernel.org>,
Bharat Bhushan <Bharat.Bhushan@freescale.com>
Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation
Date: Mon, 9 Jul 2012 12:09:10 -0500 [thread overview]
Message-ID: <4FFB1036.7060405@freescale.com> (raw)
In-Reply-To: <5A60FB43-C554-4B72-8D86-AEA927F17503@suse.de>
On 07/07/2012 02:50 AM, Alexander Graf wrote:
>
> On 07.07.2012, at 01:37, Scott Wood wrote:
>
>> On 07/06/2012 08:17 AM, Alexander Graf wrote:
>>> On 28.06.2012, at 08:17, Bharat Bhushan wrote:
>>>> +/*
>>>> + * The timer system can almost deal with LONG_MAX timeouts, except that
>>>> + * when you get very close to LONG_MAX, the slack added can cause overflow.
>>>> + *
>>>> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for
>>>> + * any realistic use.
>>>> + */
>>>> +#define MAX_TIMEOUT (LONG_MAX/2)
>>>
>>> Should this really be in kvm code?
>>
>> It looks like we can use NEXT_TIMER_MAX_DELTA for this.
>>
>>>> + mask = 1ULL << (63 - period);
>>>> + tb = get_tb();
>>>> + if (tb & mask)
>>>> + nr_jiffies += mask;
>>>
>>> To be honest, you lost me here. nr_jiffies is jiffies, right? While
>>> mask is basically in timebase granularity. I suppose you're just
>>> reusing the variable here for no good reason - the compiler will
>>> gladly optimize things for you if you write things a bit more verbose
>>> :).
>>
>> Probably due to the way do_div() works, but yeah, it's confusing. Maybe
>> something generic like "ticks", "interval", "remaining", etc. would be
>> better, with a comment on the do_div saying it's converting timebase
>> ticks into jiffies.
>
> Well, you could start off with a variable "delta_tb", then do
>
> nr_jiffies = delta_tb;
> x = do_div(...);
>
> and things would suddenly become readable :). Of course I don't object to comments along the code either :).
>
>>
>>>> +static void arm_next_watchdog(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + unsigned long nr_jiffies;
>>>> +
>>>> + nr_jiffies = watchdog_next_timeout(vcpu);
>>>> + if (nr_jiffies < MAX_TIMEOUT)
>>>> + mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies);
>>>> + else
>>>> + del_timer(&vcpu->arch.wdt_timer);
>>>
>>> Can you del a timer that's not armed? Could that ever happen in this case?
>>
>> "del_timer() deactivates a timer - this works on both active and
>> inactive timers."
>
> Ah, good :).
>
>>
>>> Also, could the timer possibly be running somewhere, so do we need del_timer_sync? Or don't we need to care?
>>
>> This can be called in the context of the timer, so del_timer_sync()
>> would hang.
>>
>> As for what would happen if a caller from a different context were to
>> race with a timer, I think you could end up with the timer armed based
>> on an old TCR. del_timer_sync() won't help though, unless you replace
>> mod_timer() with del_timer_sync() plus add_timer() (with a check to see
>> whether it's running in timer context). A better solution is probably
>> to use a spinlock in arm_next_watchdog().
>
> Yup. Either way, we have a race that the guest might not expect.
>
>>
>>>> +void kvmppc_watchdog_func(unsigned long data)
>>>> +{
>>>> + struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
>>>> + u32 tsr, new_tsr;
>>>> + int final;
>>>> +
>>>> + do {
>>>> + new_tsr = tsr = vcpu->arch.tsr;
>>>> + final = 0;
>>>> +
>>>> + /* Time out event */
>>>> + if (tsr & TSR_ENW) {
>>>> + if (tsr & TSR_WIS) {
>>>> + new_tsr = (tsr & ~TCR_WRC_MASK) |
>>>> + (vcpu->arch.tcr & TCR_WRC_MASK);
>>>> + vcpu->arch.tcr &= ~TCR_WRC_MASK;
>>>> + final = 1;
>>>> + } else {
>>>> + new_tsr = tsr | TSR_WIS;
>>>> + }
>>>> + } else {
>>>> + new_tsr = tsr | TSR_ENW;
>>>> + }
>>>> + } while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr);
>>>> +
>>>> + if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) {
>>>> + smp_wmb();
>>>> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
>>>> + kvm_vcpu_kick(vcpu);
>>>> + }
>>>> +
>>>> + /*
>>>> + * Avoid getting a storm of timers if the guest sets
>>>> + * the period very short. We'll restart it if anything
>>>> + * changes.
>>>> + */
>>>> + if (!final)
>>>> + arm_next_watchdog(vcpu);
>>>
>>> Mind to explain this part a bit further?
>>
>> The whole function, or some subset near the end?
>>
>> The "if (!final)" check means that we stop running the timer after final
>> expiration, to prevent the host from being flooded with timers if the
>> guest sets a short period but does not have TCR set to exit to QEMU.
>> Timers will resume the next time TSR/TCR is updated.
>
> Ah. The semantics make sense. The comment however is slightly too short. Please explain this in a more verbose way, so someone who didn't write the code knows what's going on :).
>
>>
>>>> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
>>>> }
>>>>
>>>> if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) {
>>>> + u32 old_tsr = vcpu->arch.tsr;
>>>> +
>>>> vcpu->arch.tsr = sregs->u.e.tsr;
>>>> +
>>>> + if ((old_tsr ^ vcpu->arch.tsr) &
>>>> + (TSR_ENW | TSR_WIS | TCR_WRC_MASK))
>>>> + arm_next_watchdog(vcpu);
>>>
>>> Why isn't this one guarded by vcpu->arch.watchdog_enable?
>>
>> I'm not sure that any of them should be -- there's no reason for the
>> watchdog interrupt mechanism to be dependent on QEMU, only the
>> heavyweight exit on final expiration.
>
> Well - I like the concept of having new features switchable.
The code quickly becomes an unmantainable mess if every new feature is
switchable (and where is the line between new feature and bugfix when it
comes to filling in missing emulation?). The reason for switchability
in this case is limited to QEMU interface compatibility.
The internal-to-KVM part of the watchdog isn't much different from the
FIT -- we're not going to conditionalize that, are we?
> Overlapping the "watchdog is implemented" feature with "user space
> wants watchdog exits" makes sense. But I definitely want to have a
> switch for the former, because we otherwise differ quite
> substantially from the emulation we had before.
It differs in that it is now a bit closer to being correct.
-Scott
next prev parent reply other threads:[~2012-07-09 17:09 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-28 6:17 [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation Bharat Bhushan
2012-06-28 6:29 ` Bharat Bhushan
2012-07-06 13:17 ` Alexander Graf
2012-07-06 13:17 ` Alexander Graf
2012-07-06 23:37 ` Scott Wood
2012-07-06 23:37 ` Scott Wood
2012-07-07 7:50 ` Alexander Graf
2012-07-07 7:50 ` Alexander Graf
2012-07-09 5:13 ` Bhushan Bharat-R65777
2012-07-09 5:13 ` Bhushan Bharat-R65777
2012-07-09 8:49 ` Alexander Graf
2012-07-09 8:49 ` Alexander Graf
2012-07-09 14:28 ` Scott Wood
2012-07-09 14:28 ` Scott Wood
2012-07-09 14:36 ` Alexander Graf
2012-07-09 14:36 ` Alexander Graf
2012-07-09 14:44 ` Bhushan Bharat-R65777
2012-07-09 14:44 ` Bhushan Bharat-R65777
2012-07-09 17:09 ` Scott Wood [this message]
2012-07-09 17:09 ` Scott Wood
2012-07-09 17:29 ` Alexander Graf
2012-07-09 17:29 ` Alexander Graf
2012-07-09 6:43 ` Bhushan Bharat-R65777
2012-07-09 6:43 ` Bhushan Bharat-R65777
2012-07-09 9:11 ` Alexander Graf
2012-07-09 9:11 ` Alexander Graf
-- strict thread matches above, loose matches on Subject: below --
2012-07-20 4:59 Bharat Bhushan
2012-07-20 5:11 ` Bharat Bhushan
2012-07-20 4:59 ` Bhushan Bharat-R65777
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FFB1036.7060405@freescale.com \
--to=scottwood@freescale.com \
--cc=Bharat.Bhushan@freescale.com \
--cc=agraf@suse.de \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=r65777@freescale.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.