xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: mingo@redhat.com, jeremy@goop.org, x86@kernel.org,
	konrad.wilk@oracle.com, hpa@zytor.com, pbonzini@redhat.com,
	linux-doc@vger.kernel.org, habanero@linux.vnet.ibm.com,
	xen-devel@lists.xensource.com, peterz@infradead.org,
	mtosatti@redhat.com, stefano.stabellini@eu.citrix.com,
	andi@firstfloor.org, attilio.rao@citrix.com, ouyang@cs.pitt.edu,
	gregkh@suse.de, agraf@suse.de, chegu_vinod@hp.com,
	torvalds@linux-foundation.org, avi.kivity@gmail.com,
	tglx@linutronix.de, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, stephan.diestelhorst@amd.com,
	riel@redhat.com, drjones@redhat.com,
	virtualization@lists.linux-foundation.org,
	srivatsa.vaddagiri@gmail.com
Subject: Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
Date: Tue, 16 Jul 2013 09:07:53 +0530	[thread overview]
Message-ID: <51E4C011.4060803@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130715103648.GN11772@redhat.com>

On 07/15/2013 04:06 PM, Gleb Natapov wrote:
> On Mon, Jul 15, 2013 at 03:20:06PM +0530, Raghavendra K T wrote:
>> On 07/14/2013 06:42 PM, Gleb Natapov wrote:
>>> On Mon, Jun 24, 2013 at 06:13:42PM +0530, Raghavendra K T wrote:
>>>> kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor
>>>>
>>>> From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
>>>>
>> trimming
>> [...]
>>>> +
>>>> +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
>>>> +{
>>>> +	struct kvm_lock_waiting *w;
>>>> +	int cpu;
>>>> +	u64 start;
>>>> +	unsigned long flags;
>>>> +
>>>> +	w = &__get_cpu_var(lock_waiting);
>>>> +	cpu = smp_processor_id();
>>>> +	start = spin_time_start();
>>>> +
>>>> +	/*
>>>> +	 * Make sure an interrupt handler can't upset things in a
>>>> +	 * partially setup state.
>>>> +	 */
>>>> +	local_irq_save(flags);
>>>> +
>>>> +	/*
>>>> +	 * The ordering protocol on this is that the "lock" pointer
>>>> +	 * may only be set non-NULL if the "want" ticket is correct.
>>>> +	 * If we're updating "want", we must first clear "lock".
>>>> +	 */
>>>> +	w->lock = NULL;
>>>> +	smp_wmb();
>>>> +	w->want = want;
>>>> +	smp_wmb();
>>>> +	w->lock = lock;
>>>> +
>>>> +	add_stats(TAKEN_SLOW, 1);
>>>> +
>>>> +	/*
>>>> +	 * This uses set_bit, which is atomic but we should not rely on its
>>>> +	 * reordering gurantees. So barrier is needed after this call.
>>>> +	 */
>>>> +	cpumask_set_cpu(cpu, &waiting_cpus);
>>>> +
>>>> +	barrier();
>>>> +
>>>> +	/*
>>>> +	 * Mark entry to slowpath before doing the pickup test to make
>>>> +	 * sure we don't deadlock with an unlocker.
>>>> +	 */
>>>> +	__ticket_enter_slowpath(lock);
>>>> +
>>>> +	/*
>>>> +	 * check again make sure it didn't become free while
>>>> +	 * we weren't looking.
>>>> +	 */
>>>> +	if (ACCESS_ONCE(lock->tickets.head) == want) {
>>>> +		add_stats(TAKEN_SLOW_PICKUP, 1);
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	/* Allow interrupts while blocked */
>>>> +	local_irq_restore(flags);
>>>> +
>>> So what happens if an interrupt comes here and an interrupt handler
>>> takes another spinlock that goes into the slow path? As far as I see
>>> lock_waiting will become overwritten and cpu will be cleared from
>>> waiting_cpus bitmap by nested kvm_lock_spinning(), so when halt is
>>> called here after returning from the interrupt handler nobody is going
>>> to wake this lock holder. Next random interrupt will "fix" it, but it
>>> may be several milliseconds away, or never. We should probably check
>>> if interrupt were enabled and call native_safe_halt() here.
>>>
>>
>> Okay you mean something like below should be done.
>> if irq_enabled()
>>    native_safe_halt()
>> else
>>    halt()
>>
>> It is been a complex stuff for analysis for me.
>>
>> So in our discussion stack would looking like this.
>>
>> spinlock()
>>    kvm_lock_spinning()
>>                    <------ interrupt here
>>            halt()
>>
>>
>>  From the halt if we trace
>>
> It is to early to trace the halt since it was not executed yet. Guest
> stack trace will look something like this:
>
> spinlock(a)
>    kvm_lock_spinning(a)
>     lock_waiting = a
>     set bit in waiting_cpus
>                  <------ interrupt here
>                  spinlock(b)
>                    kvm_lock_spinning(b)
>                      lock_waiting = b
>                      set bit in waiting_cpus
>                      halt()
>                      unset bit in waiting_cpus
>                      lock_waiting = NULL
>       ----------> ret from interrupt
>     halt()
>
> Now at the time of the last halt above lock_waiting == NULL and
> waiting_cpus is empty and not interrupt it pending, so who will unhalt
> the waiter?
>

Yes. if an interrupt occurs between
local_irq_restore() and halt(), this is possible. and since this is 
rarest of rare (possiility of irq entering slowpath and then no random 
irq to do spurious wakeup), we had never hit this problem in the past.

So I am,
1. trying to artificially reproduce this.

2. I replaced the halt with below code,
        if (arch_irqs_disabled())
                 halt();

and ran benchmarks.
But this results in degradation because, it means we again go back and 
spin in irq enabled case.

3. Now I am analyzing the performance overhead of safe_halt in irq 
enabled case.
       if (arch_irqs_disabled())
                halt();
       else
                safe_halt();


  reply	other threads:[~2013-07-16  3:37 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24 12:40 [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks Raghavendra K T
2013-06-24 12:40 ` [PATCH RFC V10 1/18] x86/spinlock: Replace pv spinlocks with pv ticketlocks Raghavendra K T
2013-06-24 12:40 ` [PATCH RFC V10 2/18] x86/ticketlock: Don't inline _spin_unlock when using paravirt spinlocks Raghavendra K T
2013-06-24 12:41 ` [PATCH RFC V10 3/18] x86/ticketlock: Collapse a layer of functions Raghavendra K T
2013-06-24 12:41 ` [PATCH RFC V10 4/18] xen: Defer spinlock setup until boot CPU setup Raghavendra K T
2013-06-24 12:41 ` [PATCH RFC V10 5/18] xen/pvticketlock: Xen implementation for PV ticket locks Raghavendra K T
2013-06-24 12:41 ` [PATCH RFC V10 6/18] xen/pvticketlocks: Add xen_nopvspin parameter to disable xen pv ticketlocks Raghavendra K T
2013-06-24 12:42 ` [PATCH RFC V10 7/18] x86/pvticketlock: Use callee-save for lock_spinning Raghavendra K T
2013-06-24 12:42 ` [PATCH RFC V10 8/18] x86/pvticketlock: When paravirtualizing ticket locks, increment by 2 Raghavendra K T
2013-06-24 12:42 ` [PATCH RFC V10 9/18] jump_label: Split out rate limiting from jump_label.h Raghavendra K T
2013-06-24 12:42 ` [PATCH RFC V10 10/18] x86/ticketlock: Add slowpath logic Raghavendra K T
2013-06-24 12:42 ` [PATCH RFC V10 11/18] xen/pvticketlock: Allow interrupts to be enabled while blocking Raghavendra K T
2013-06-24 12:43 ` [PATCH RFC V10 12/18] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks Raghavendra K T
2013-07-14 13:48   ` Gleb Natapov
2013-07-15  5:53     ` Raghavendra K T
2013-06-24 12:43 ` [PATCH RFC V10 13/18] kvm : Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration Raghavendra K T
2013-06-24 12:43 ` [PATCH RFC V10 14/18] kvm guest : Add configuration support to enable debug information for KVM Guests Raghavendra K T
2013-06-24 12:43 ` [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor Raghavendra K T
2013-07-14 13:12   ` Gleb Natapov
2013-07-15  9:50     ` Raghavendra K T
2013-07-15 10:36       ` Gleb Natapov
2013-07-16  3:37         ` Raghavendra K T [this message]
2013-07-16  6:02           ` Gleb Natapov
2013-07-16 15:48             ` Peter Zijlstra
2013-07-16 16:31               ` Gleb Natapov
2013-07-16 18:49               ` Raghavendra K T
2013-07-16 18:42             ` Raghavendra K T
2013-07-17  9:34               ` Gleb Natapov
2013-07-17 10:05                 ` Raghavendra K T
2013-07-17 10:38                   ` Raghavendra K T
2013-07-17 12:45                   ` Gleb Natapov
2013-07-17 12:55                     ` Raghavendra K T
2013-07-17 13:25                       ` Gleb Natapov
2013-07-17 14:13                         ` Raghavendra K T
2013-07-17 14:14                           ` Raghavendra K T
2013-07-17 14:44                           ` Gleb Natapov
2013-07-17 14:55                             ` Raghavendra K T
2013-07-17 15:11                               ` Gleb Natapov
2013-07-17 15:22                                 ` Raghavendra K T
2013-07-17 15:20                               ` Raghavendra K T
2013-06-24 12:43 ` [PATCH RFC V10 16/18] kvm hypervisor : Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic Raghavendra K T
2013-07-14 13:24   ` Gleb Natapov
2013-07-15 15:36     ` Raghavendra K T
2013-07-15 15:46       ` Gleb Natapov
2013-07-16 18:19         ` Raghavendra K T
2013-06-24 12:44 ` [PATCH RFC V10 17/18] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock Raghavendra K T
2013-06-24 12:44 ` [PATCH RFC V10 18/18] kvm hypervisor: Add directed yield in vcpu block path Raghavendra K T
2013-07-14 14:18   ` Gleb Natapov
2013-07-15  6:04     ` Raghavendra K T
2013-06-24 13:17 ` [PATCH RFC V10 0/18] Paravirtualized ticket spinlocks Andrew Jones
2013-06-24 13:49   ` Raghavendra K T
2013-06-26  8:33   ` Raghavendra K T
2013-06-27 11:47     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E4C011.4060803@linux.vnet.ibm.com \
    --to=raghavendra.kt@linux.vnet.ibm.com \
    --cc=agraf@suse.de \
    --cc=andi@firstfloor.org \
    --cc=attilio.rao@citrix.com \
    --cc=avi.kivity@gmail.com \
    --cc=chegu_vinod@hp.com \
    --cc=drjones@redhat.com \
    --cc=gleb@redhat.com \
    --cc=gregkh@suse.de \
    --cc=habanero@linux.vnet.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@goop.org \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=ouyang@cs.pitt.edu \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=srivatsa.vaddagiri@gmail.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=stephan.diestelhorst@amd.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).