From: Gleb Natapov <gleb@redhat.com>
To: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: jeremy@goop.org, gregkh@suse.de, kvm@vger.kernel.org,
linux-doc@vger.kernel.org, peterz@infradead.org,
drjones@redhat.com, virtualization@lists.linux-foundation.org,
andi@firstfloor.org, hpa@zytor.com,
stefano.stabellini@eu.citrix.com, xen-devel@lists.xensource.com,
x86@kernel.org, mingo@redhat.com, habanero@linux.vnet.ibm.com,
riel@redhat.com, konrad.wilk@oracle.com, ouyang@cs.pitt.edu,
avi.kivity@gmail.com, tglx@linutronix.de, chegu_vinod@hp.com,
linux-kernel@vger.kernel.org, srivatsa.vaddagiri@gmail.com,
attilio.rao@citrix.com, pbonzini@redhat.com,
torvalds@linux-foundation.org, stephan.diestelhorst@amd.com
Subject: Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
Date: Mon, 15 Jul 2013 13:36:49 +0300 [thread overview]
Message-ID: <20130715103648.GN11772@redhat.com> (raw)
In-Reply-To: <51E3C5CE.7000009@linux.vnet.ibm.com>
On Mon, Jul 15, 2013 at 03:20:06PM +0530, Raghavendra K T wrote:
> On 07/14/2013 06:42 PM, Gleb Natapov wrote:
> >On Mon, Jun 24, 2013 at 06:13:42PM +0530, Raghavendra K T wrote:
> >>kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
> >>
> >>From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> >>
> trimming
> [...]
> >>+
> >>+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> >>+{
> >>+ struct kvm_lock_waiting *w;
> >>+ int cpu;
> >>+ u64 start;
> >>+ unsigned long flags;
> >>+
> >>+ w = &__get_cpu_var(lock_waiting);
> >>+ cpu = smp_processor_id();
> >>+ start = spin_time_start();
> >>+
> >>+ /*
> >>+ * Make sure an interrupt handler can't upset things in a
> >>+ * partially setup state.
> >>+ */
> >>+ local_irq_save(flags);
> >>+
> >>+ /*
> >>+ * The ordering protocol on this is that the "lock" pointer
> >>+ * may only be set non-NULL if the "want" ticket is correct.
> >>+ * If we're updating "want", we must first clear "lock".
> >>+ */
> >>+ w->lock = NULL;
> >>+ smp_wmb();
> >>+ w->want = want;
> >>+ smp_wmb();
> >>+ w->lock = lock;
> >>+
> >>+ add_stats(TAKEN_SLOW, 1);
> >>+
> >>+ /*
> >>+ * This uses set_bit, which is atomic but we should not rely on its
> >>+ * reordering gurantees. So barrier is needed after this call.
> >>+ */
> >>+ cpumask_set_cpu(cpu, &waiting_cpus);
> >>+
> >>+ barrier();
> >>+
> >>+ /*
> >>+ * Mark entry to slowpath before doing the pickup test to make
> >>+ * sure we don't deadlock with an unlocker.
> >>+ */
> >>+ __ticket_enter_slowpath(lock);
> >>+
> >>+ /*
> >>+ * check again make sure it didn't become free while
> >>+ * we weren't looking.
> >>+ */
> >>+ if (ACCESS_ONCE(lock->tickets.head) == want) {
> >>+ add_stats(TAKEN_SLOW_PICKUP, 1);
> >>+ goto out;
> >>+ }
> >>+
> >>+ /* Allow interrupts while blocked */
> >>+ local_irq_restore(flags);
> >>+
> >So what happens if an interrupt comes here and an interrupt handler
> >takes another spinlock that goes into the slow path? As far as I see
> >lock_waiting will become overwritten and cpu will be cleared from
> >waiting_cpus bitmap by nested kvm_lock_spinning(), so when halt is
> >called here after returning from the interrupt handler nobody is going
> >to wake this lock holder. Next random interrupt will "fix" it, but it
> >may be several milliseconds away, or never. We should probably check
> >if interrupt were enabled and call native_safe_halt() here.
> >
>
> Okay you mean something like below should be done.
> if irq_enabled()
> native_safe_halt()
> else
> halt()
>
> It is been a complex stuff for analysis for me.
>
> So in our discussion stack would looking like this.
>
> spinlock()
> kvm_lock_spinning()
> <------ interrupt here
> halt()
>
>
> From the halt if we trace
>
It is to early to trace the halt since it was not executed yet. Guest
stack trace will look something like this:
spinlock(a)
kvm_lock_spinning(a)
lock_waiting = a
set bit in waiting_cpus
<------ interrupt here
spinlock(b)
kvm_lock_spinning(b)
lock_waiting = b
set bit in waiting_cpus
halt()
unset bit in waiting_cpus
lock_waiting = NULL
----------> ret from interrupt
halt()
Now at the time of the last halt above lock_waiting == NULL and
waiting_cpus is empty and not interrupt it pending, so who will unhalt
the waiter?
> halt()
> kvm_vcpu_block()
> kvm_arch_vcpu_runnable())
> kvm_make_request(KVM_REQ_UNHALT)
>
> This would drive us out of halt handler, and we are fine when it
> happens since we would revisit kvm_lock_spinning.
>
> But I see that
>
> kvm_arch_vcpu_runnable() has this condition
> (kvm_arch_interrupt_allowed(vcpu) && kvm_cpu_has_interrupt(vcpu));
>
> which means that if we going process the interrupt here we would set
> KVM_REQ_UNHALT. and we are fine.
>
> But if we are in the situation kvm_arch_interrupt_allowed(vcpu) =
> true, but we already processed interrupt and
> kvm_cpu_has_interrupt(vcpu) is false, we have problem till next
> random interrupt.
>
> The confusing part to me is the case
> kvm_cpu_has_interrupt(vcpu)=false and irq
> already handled and overwritten the lock_waiting. can this
> situation happen? or is it that we will process the interrupt only
> after this point (kvm_vcpu_block). Because if that is the case we are
> fine.
kvm_vcpu_block has nothing to do with processing interrupt. All the code in kvm_arch_vcpu_runnable()
is just to make sure that interrupt unhalts vcpu if vcpu is already in
halt. Interrupts are injected as soon as they happen and CPU is in a
right state to receive them and it will be after local_irq_restore()
before halt. X86 inhibits interrupt till next instruction after sti to
solve exactly this kind of races. native_safe_halt() evaluates to "sti;
hlt" to make interrupt enablement and halt atomic with regards to
interrupts and NMIs.
>
> Please let me know.
--
Gleb.
WARNING: multiple messages have this Message-ID (diff)
From: Gleb Natapov <gleb@redhat.com>
To: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: mingo@redhat.com, jeremy@goop.org, x86@kernel.org,
konrad.wilk@oracle.com, hpa@zytor.com, pbonzini@redhat.com,
linux-doc@vger.kernel.org, habanero@linux.vnet.ibm.com,
xen-devel@lists.xensource.com, peterz@infradead.org,
mtosatti@redhat.com, stefano.stabellini@eu.citrix.com,
andi@firstfloor.org, attilio.rao@citrix.com, ouyang@cs.pitt.edu,
gregkh@suse.de, agraf@suse.de, chegu_vinod@hp.com,
torvalds@linux-foundation.org, avi.kivity@gmail.com,
tglx@linutronix.de, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, stephan.diestelhorst@amd.com,
riel@redhat.com, drjones@redhat.com,
virtualization@lists.linux-foundation.org,
srivatsa.vaddagiri@gmail.com
Subject: Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
Date: Mon, 15 Jul 2013 13:36:49 +0300 [thread overview]
Message-ID: <20130715103648.GN11772@redhat.com> (raw)
In-Reply-To: <51E3C5CE.7000009@linux.vnet.ibm.com>
On Mon, Jul 15, 2013 at 03:20:06PM +0530, Raghavendra K T wrote:
> On 07/14/2013 06:42 PM, Gleb Natapov wrote:
> >On Mon, Jun 24, 2013 at 06:13:42PM +0530, Raghavendra K T wrote:
> >>kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
> >>
> >>From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> >>
> trimming
> [...]
> >>+
> >>+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
> >>+{
> >>+ struct kvm_lock_waiting *w;
> >>+ int cpu;
> >>+ u64 start;
> >>+ unsigned long flags;
> >>+
> >>+ w = &__get_cpu_var(lock_waiting);
> >>+ cpu = smp_processor_id();
> >>+ start = spin_time_start();
> >>+
> >>+ /*
> >>+ * Make sure an interrupt handler can't upset things in a
> >>+ * partially setup state.
> >>+ */
> >>+ local_irq_save(flags);
> >>+
> >>+ /*
> >>+ * The ordering protocol on this is that the "lock" pointer
> >>+ * may only be set non-NULL if the "want" ticket is correct.
> >>+ * If we're updating "want", we must first clear "lock".
> >>+ */
> >>+ w->lock = NULL;
> >>+ smp_wmb();
> >>+ w->want = want;
> >>+ smp_wmb();
> >>+ w->lock = lock;
> >>+
> >>+ add_stats(TAKEN_SLOW, 1);
> >>+
> >>+ /*
> >>+ * This uses set_bit, which is atomic but we should not rely on its
> >>+ * reordering gurantees. So barrier is needed after this call.
> >>+ */
> >>+ cpumask_set_cpu(cpu, &waiting_cpus);
> >>+
> >>+ barrier();
> >>+
> >>+ /*
> >>+ * Mark entry to slowpath before doing the pickup test to make
> >>+ * sure we don't deadlock with an unlocker.
> >>+ */
> >>+ __ticket_enter_slowpath(lock);
> >>+
> >>+ /*
> >>+ * check again make sure it didn't become free while
> >>+ * we weren't looking.
> >>+ */
> >>+ if (ACCESS_ONCE(lock->tickets.head) == want) {
> >>+ add_stats(TAKEN_SLOW_PICKUP, 1);
> >>+ goto out;
> >>+ }
> >>+
> >>+ /* Allow interrupts while blocked */
> >>+ local_irq_restore(flags);
> >>+
> >So what happens if an interrupt comes here and an interrupt handler
> >takes another spinlock that goes into the slow path? As far as I see
> >lock_waiting will become overwritten and cpu will be cleared from
> >waiting_cpus bitmap by nested kvm_lock_spinning(), so when halt is
> >called here after returning from the interrupt handler nobody is going
> >to wake this lock holder. Next random interrupt will "fix" it, but it
> >may be several milliseconds away, or never. We should probably check
> >if interrupt were enabled and call native_safe_halt() here.
> >
>
> Okay you mean something like below should be done.
> if irq_enabled()
> native_safe_halt()
> else
> halt()
>
> It is been a complex stuff for analysis for me.
>
> So in our discussion stack would looking like this.
>
> spinlock()
> kvm_lock_spinning()
> <------ interrupt here
> halt()
>
>
> From the halt if we trace
>
It is to early to trace the halt since it was not executed yet. Guest
stack trace will look something like this:
spinlock(a)
kvm_lock_spinning(a)
lock_waiting = a
set bit in waiting_cpus
<------ interrupt here
spinlock(b)
kvm_lock_spinning(b)
lock_waiting = b
set bit in waiting_cpus
halt()
unset bit in waiting_cpus
lock_waiting = NULL
----------> ret from interrupt
halt()
Now at the time of the last halt above lock_waiting == NULL and
waiting_cpus is empty and not interrupt it pending, so who will unhalt
the waiter?
> halt()
> kvm_vcpu_block()
> kvm_arch_vcpu_runnable())
> kvm_make_request(KVM_REQ_UNHALT)
>
> This would drive us out of halt handler, and we are fine when it
> happens since we would revisit kvm_lock_spinning.
>
> But I see that
>
> kvm_arch_vcpu_runnable() has this condition
> (kvm_arch_interrupt_allowed(vcpu) && kvm_cpu_has_interrupt(vcpu));
>
> which means that if we going process the interrupt here we would set
> KVM_REQ_UNHALT. and we are fine.
>
> But if we are in the situation kvm_arch_interrupt_allowed(vcpu) =
> true, but we already processed interrupt and
> kvm_cpu_has_interrupt(vcpu) is false, we have problem till next
> random interrupt.
>
> The confusing part to me is the case
> kvm_cpu_has_interrupt(vcpu)=false and irq
> already handled and overwritten the lock_waiting. can this
> situation happen? or is it that we will process the interrupt only
> after this point (kvm_vcpu_block). Because if that is the case we are
> fine.
kvm_vcpu_block has nothing to do with processing interrupt. All the code in kvm_arch_vcpu_runnable()
is just to make sure that interrupt unhalts vcpu if vcpu is already in
halt. Interrupts are injected as soon as they happen and CPU is in a
right state to receive them and it will be after local_irq_restore()
before halt. X86 inhibits interrupt till next instruction after sti to
solve exactly this kind of races. native_safe_halt() evaluates to "sti;
hlt" to make interrupt enablement and halt atomic with regards to
interrupts and NMIs.
>
> Please let me know.
--
Gleb.
next prev parent reply other threads:[~2013-07-15 10:36 UTC|newest]
Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-24 12:40 [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks Raghavendra K T
2013-06-24 12:40 ` Raghavendra K T
2013-06-24 12:40 ` [PATCH RFC V10 1/18] x86/spinlock: Replace pv spinlocks with pv ticketlocks Raghavendra K T
2013-06-24 12:40 ` Raghavendra K T
2013-06-24 12:40 ` Raghavendra K T
2013-06-24 12:40 ` [PATCH RFC V10 2/18] x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks Raghavendra K T
2013-06-24 12:40 ` Raghavendra K T
2013-06-24 12:40 ` Raghavendra K T
2013-06-24 12:41 ` [PATCH RFC V10 3/18] x86/ticketlock: Collapse a layer of functions Raghavendra K T
2013-06-24 12:41 ` Raghavendra K T
2013-06-24 12:41 ` Raghavendra K T
2013-06-24 12:41 ` [PATCH RFC V10 4/18] xen: Defer spinlock setup until boot CPU setup Raghavendra K T
2013-06-24 12:41 ` Raghavendra K T
2013-06-24 12:41 ` Raghavendra K T
2013-06-24 12:41 ` [PATCH RFC V10 5/18] xen/pvticketlock: Xen implementation for PV ticket locks Raghavendra K T
2013-06-24 12:41 ` Raghavendra K T
2013-06-24 12:41 ` Raghavendra K T
2013-06-24 12:41 ` [PATCH RFC V10 6/18] xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks Raghavendra K T
2013-06-24 12:41 ` Raghavendra K T
2013-06-24 12:41 ` Raghavendra K T
2013-06-24 12:42 ` [PATCH RFC V10 7/18] x86/pvticketlock: Use callee-save for lock_spinning Raghavendra K T
2013-06-24 12:42 ` Raghavendra K T
2013-06-24 12:42 ` Raghavendra K T
2013-06-24 12:42 ` [PATCH RFC V10 8/18] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2 Raghavendra K T
2013-06-24 12:42 ` Raghavendra K T
2013-06-24 12:42 ` Raghavendra K T
2013-06-24 12:42 ` [PATCH RFC V10 9/18] jump_label: Split out rate limiting from jump_label.h Raghavendra K T
2013-06-24 12:42 ` Raghavendra K T
2013-06-24 12:42 ` Raghavendra K T
2013-06-24 12:42 ` [PATCH RFC V10 10/18] x86/ticketlock: Add slowpath logic Raghavendra K T
2013-06-24 12:42 ` Raghavendra K T
2013-06-24 12:42 ` Raghavendra K T
2013-06-24 12:42 ` [PATCH RFC V10 11/18] xen/pvticketlock: Allow interrupts to be enabled while blocking Raghavendra K T
2013-06-24 12:42 ` Raghavendra K T
2013-06-24 12:42 ` Raghavendra K T
2013-06-24 12:43 ` [PATCH RFC V10 12/18] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks Raghavendra K T
2013-06-24 12:43 ` Raghavendra K T
2013-06-24 12:43 ` Raghavendra K T
2013-07-14 13:48 ` Gleb Natapov
2013-07-14 13:48 ` Gleb Natapov
2013-07-15 5:53 ` Raghavendra K T
2013-07-15 5:53 ` Raghavendra K T
2013-06-24 12:43 ` [PATCH RFC V10 13/18] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration Raghavendra K T
2013-06-24 12:43 ` Raghavendra K T
2013-06-24 12:43 ` Raghavendra K T
2013-06-24 12:43 ` [PATCH RFC V10 14/18] kvm guest : Add configuration support to enable debug information for KVM Guests Raghavendra K T
2013-06-24 12:43 ` Raghavendra K T
2013-06-24 12:43 ` Raghavendra K T
2013-06-24 12:43 ` [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor Raghavendra K T
2013-06-24 12:43 ` Raghavendra K T
2013-06-24 12:43 ` Raghavendra K T
2013-07-14 13:12 ` Gleb Natapov
2013-07-14 13:12 ` Gleb Natapov
2013-07-15 9:50 ` Raghavendra K T
2013-07-15 9:50 ` Raghavendra K T
2013-07-15 10:36 ` Gleb Natapov [this message]
2013-07-15 10:36 ` Gleb Natapov
2013-07-16 3:37 ` Raghavendra K T
2013-07-16 3:37 ` Raghavendra K T
2013-07-16 6:02 ` Gleb Natapov
2013-07-16 6:02 ` Gleb Natapov
2013-07-16 15:48 ` Peter Zijlstra
2013-07-16 15:48 ` Peter Zijlstra
2013-07-16 16:31 ` Gleb Natapov
2013-07-16 16:31 ` Gleb Natapov
2013-07-16 18:49 ` Raghavendra K T
2013-07-16 18:49 ` Raghavendra K T
2013-07-16 18:42 ` Raghavendra K T
2013-07-17 9:34 ` Gleb Natapov
2013-07-17 9:34 ` Gleb Natapov
2013-07-17 10:05 ` Raghavendra K T
2013-07-17 10:05 ` Raghavendra K T
2013-07-17 10:38 ` Raghavendra K T
2013-07-17 10:38 ` Raghavendra K T
2013-07-17 12:45 ` Gleb Natapov
2013-07-17 12:45 ` Gleb Natapov
2013-07-17 12:55 ` Raghavendra K T
2013-07-17 12:55 ` Raghavendra K T
2013-07-17 13:25 ` Gleb Natapov
2013-07-17 13:25 ` Gleb Natapov
2013-07-17 14:13 ` Raghavendra K T
2013-07-17 14:13 ` Raghavendra K T
2013-07-17 14:14 ` Raghavendra K T
2013-07-17 14:14 ` Raghavendra K T
2013-07-17 14:14 ` Raghavendra K T
2013-07-17 14:44 ` Gleb Natapov
2013-07-17 14:44 ` Gleb Natapov
2013-07-17 14:55 ` Raghavendra K T
2013-07-17 14:55 ` Raghavendra K T
2013-07-17 15:11 ` Gleb Natapov
2013-07-17 15:11 ` Gleb Natapov
2013-07-17 15:22 ` Raghavendra K T
2013-07-17 15:22 ` Raghavendra K T
2013-07-17 15:20 ` Raghavendra K T
2013-07-17 15:20 ` Raghavendra K T
2013-07-16 18:42 ` Raghavendra K T
2013-06-24 12:43 ` [PATCH RFC V10 16/18] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic Raghavendra K T
2013-06-24 12:43 ` Raghavendra K T
2013-06-24 12:43 ` Raghavendra K T
2013-07-14 13:24 ` Gleb Natapov
2013-07-14 13:24 ` Gleb Natapov
2013-07-15 15:36 ` Raghavendra K T
2013-07-15 15:36 ` Raghavendra K T
2013-07-15 15:46 ` Gleb Natapov
2013-07-15 15:46 ` Gleb Natapov
2013-07-16 18:19 ` Raghavendra K T
2013-07-16 18:19 ` Raghavendra K T
2013-06-24 12:44 ` [PATCH RFC V10 17/18] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock Raghavendra K T
2013-06-24 12:44 ` Raghavendra K T
2013-06-24 12:44 ` Raghavendra K T
2013-06-24 12:44 ` [PATCH RFC V10 18/18] kvm hypervisor: Add directed yield in vcpu block path Raghavendra K T
2013-06-24 12:44 ` Raghavendra K T
2013-06-24 12:44 ` Raghavendra K T
2013-07-14 14:18 ` Gleb Natapov
2013-07-14 14:18 ` Gleb Natapov
2013-07-15 6:04 ` Raghavendra K T
2013-07-15 6:04 ` Raghavendra K T
2013-06-24 13:17 ` [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks Andrew Jones
2013-06-24 13:17 ` Andrew Jones
2013-06-24 13:49 ` Raghavendra K T
2013-06-24 13:49 ` Raghavendra K T
2013-06-26 8:33 ` Raghavendra K T
2013-06-26 8:33 ` Raghavendra K T
2013-06-27 11:47 ` Raghavendra K T
2013-06-27 11:47 ` Raghavendra K T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130715103648.GN11772@redhat.com \
--to=gleb@redhat.com \
--cc=andi@firstfloor.org \
--cc=attilio.rao@citrix.com \
--cc=avi.kivity@gmail.com \
--cc=chegu_vinod@hp.com \
--cc=drjones@redhat.com \
--cc=gregkh@suse.de \
--cc=habanero@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=jeremy@goop.org \
--cc=konrad.wilk@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=ouyang@cs.pitt.edu \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=srivatsa.vaddagiri@gmail.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=stephan.diestelhorst@amd.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.