Re: [PATCH] locking/osq: Drop the overload of osq lock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: panxinhui <xinhui@linux.vnet.ibm.com>,
	Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	dave@stgolabs.net, will.deacon@arm.com, Waiman.Long@hpe.com,
	benh@kernel.crashing.org
Subject: Re: [PATCH] locking/osq: Drop the overload of osq lock
Date: Mon, 27 Jun 2016 10:09:59 +0200	[thread overview]
Message-ID: <20160627080959.GU30154@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160627064506.GE6512@insomnia>

On Mon, Jun 27, 2016 at 02:45:06PM +0800, Boqun Feng wrote:
> +++ b/include/linux/vcpu_preempt.h
> @@ -0,0 +1,82 @@
> +/*
> + * Primitives for checking the vcpu preemption from the guest.
> + */
> +
> +static long __vcpu_preempt_count(void)
> +{
> +	return 0;
> +}
> +
> +static bool __vcpu_has_preempted(long vpc)
> +{
> +	return false;
> +}
> +
> +static bool __vcpu_is_preempted(int cpu)
> +{
> +	return false;
> +}
> +
> +struct vcpu_preempt_ops {
> +	/*
> +	 * Get the current vcpu's "preempt count", which is going to use for
> +	 * checking whether the current vcpu has ever been preempted
> +	 */
> +	long (*preempt_count)(void);
> +
> +	/*
> +	 * Return whether a vcpu is preempted
> +	 */
> +	bool (*is_preempted)(int cpu);
> +
> +	/*
> +	 * Given a "vcpu preempt count", Return whether a vcpu preemption ever
> +	 * happened after the .preempt_count() was called.
> +	 */
> +	bool (*has_preempted)(long vpc);
> +};
> +
> +extern struct vcpu_preempt_ops vcpu_preempt_ops;
> +
> +/* Default boilerplate */
> +#define DEFAULT_VCPU_PREEMPT_OPS			\
> +	{						\
> +		.preempt_count = __vcpu_preempt_count,	\
> +		.is_preempted = __vcpu_is_preempted,	\
> +		.has_preempted = __vcpu_has_preempted	\
> +	}
> +
> +#ifdef CONFIG_HAS_VCPU_PREEMPTION_DETECTION
> +/*
> + * vcpu_preempt_count: Get the current cpu's "vcpu preempt count"(vpc).
> + *
> + * The vpc is used for checking whether the current vcpu has ever been
> + * preempted via vcpu_has_preempted().
> + *
> + * This function and vcpu_has_preepmted() should be called in the same
> + * preemption disabled critical section.
> + */
> +#define vcpu_preempt_count()	vcpu_preempt_ops.preempt_count()
> +
> +/*
> + * vcpu_is_preempted: Check whether @cpu's vcpu is preempted.
> + */
> +#define vcpu_is_preempted(cpu)	vcpu_preempt_ops.is_preempted(cpu)
> +
> +/*
> + * vcpu_has_preepmted: Check whether the current cpu's vcpu has ever been
> + * preempted.
> + *
> + * The checked duration is between the vcpu_preempt_count() which returns @vpc
> + * is called and this function called.
> + *
> + * This function and corresponding vcpu_preempt_count() should be in the same
> + * preemption disabled cirtial section.
> + */
> +#define vcpu_has_preempted(vpc)	vcpu_preempt_ops.has_preempted(vpc)
> +
> +#else /* CONFIG_HAS_VCPU_PREEMPTION_DETECTION */
> +#define vcpu_preempt_count() __vcpu_preempt_count()
> +#define vcpu_is_preempted(cpu) __vcpu_is_preempted(cpu)
> +#define vcpu_has_preempted(vpc) __vcpu_has_preempted(vpc)
> +#endif /* CONFIG_HAS_VCPU_PREEPMTION_DETECTION */

No, this is entirely insane, also broken.

No vectors, no actual function calls, nothing like that. You want the
below to completely compile away and generate the exact 100% same code
it does today.

> +++ b/kernel/locking/osq_lock.c
> @@ -1,6 +1,7 @@
>  #include <linux/percpu.h>
>  #include <linux/sched.h>
>  #include <linux/osq_lock.h>
> +#include <linux/vcpu_preempt.h>
>  
>  /*
>   * An MCS like lock especially tailored for optimistic spinning for sleeping
> @@ -87,6 +88,8 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	struct optimistic_spin_node *prev, *next;
>  	int curr = encode_cpu(smp_processor_id());
>  	int old;
> +	int loops;
> +	long vpc;
>  
>  	node->locked = 0;
>  	node->next = NULL;
> @@ -106,6 +109,9 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	node->prev = prev;
>  	WRITE_ONCE(prev->next, node);
>  
> +	old = old - 1;

That's just nasty, and could result in an unconditional decrement being
issues, even though its never used.

> +	vpc = vcpu_preempt_count();
> +
>  	/*
>  	 * Normally @prev is untouchable after the above store; because at that
>  	 * moment unlock can proceed and wipe the node element from stack.
> @@ -118,8 +124,14 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	while (!READ_ONCE(node->locked)) {
>  		/*
>  		 * If we need to reschedule bail... so we can block.
> +		 * An over-committed guest with more vCPUs than pCPUs
> +		 * might fall in this loop and cause a huge overload.
> +		 * This is because vCPU A(prev) hold the osq lock and yield out,
> +		 * vCPU B(node) wait ->locked to be set, IOW, wait till
> +		 * vCPU A run and unlock the osq lock.
> +		 * NOTE that vCPU A and vCPU B might run on same physical cpu.
>  		 */
> -		if (need_resched())
> +		if (need_resched() || vcpu_is_preempted(old) || vcpu_has_preempted(vpc))
>  			goto unqueue;
>  
>  		cpu_relax_lowlatency();
>

next prev parent reply	other threads:[~2016-06-27  8:10 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-25 17:42 [PATCH] locking/osq: Drop the overload of osq lock Pan Xinhui
2016-06-25 14:24 ` Peter Zijlstra
2016-06-25 15:21   ` Boqun Feng
2016-06-25 16:09     ` Peter Zijlstra
2016-06-25 16:13       ` Peter Zijlstra
2016-06-25 17:27         ` panxinhui
2016-06-25 19:12           ` Peter Zijlstra
2016-06-26  4:59             ` panxinhui
2016-06-27  7:55               ` Peter Zijlstra
2016-06-27 10:19                 ` xinhui
2016-06-25 16:28       ` Boqun Feng
2016-06-25 18:28         ` Peter Zijlstra
2016-06-25 16:15     ` Peter Zijlstra
2016-06-25 16:45       ` Boqun Feng
2016-06-25 17:27         ` panxinhui
2016-06-25 19:20           ` Peter Zijlstra
2016-06-25 19:29             ` Peter Zijlstra
2016-06-26  2:26             ` Boqun Feng
2016-06-26  5:21             ` panxinhui
2016-06-26  6:10               ` Boqun Feng
2016-06-26  6:58                 ` panxinhui
2016-06-26 14:11                   ` Boqun Feng
2016-06-26 15:54                     ` panxinhui
2016-06-26  6:59                 ` Boqun Feng
2016-06-26  7:08                   ` panxinhui
2016-06-26 14:29                     ` Boqun Feng
2016-06-26 15:11                       ` panxinhui
2016-06-27  6:45                   ` Boqun Feng
2016-06-27  7:36                     ` xinhui
2016-06-27  8:09                     ` Peter Zijlstra [this message]
2016-06-27 10:31                       ` Boqun Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160627080959.GU30154@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Waiman.Long@hpe.com \
    --cc=benh@kernel.crashing.org \
    --cc=boqun.feng@gmail.com \
    --cc=dave@stgolabs.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=will.deacon@arm.com \
    --cc=xinhui.pan@linux.vnet.ibm.com \
    --cc=xinhui@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.