* Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check
From: Balbir Singh @ 2016-07-06 10:54 UTC (permalink / raw)
To: Pan Xinhui, linux-kernel, linuxppc-dev, virtualization,
linux-s390
Cc: dave, peterz, mpe, boqun.feng, will.deacon, waiman.long, mingo,
paulus, benh, schwidefsky, paulmck
In-Reply-To: <1467124991-13164-3-git-send-email-xinhui.pan@linux.vnet.ibm.com>
On Tue, 2016-06-28 at 10:43 -0400, Pan Xinhui wrote:
> This is to fix some lock holder preemption issues. Some other locks
> implementation do a spin loop before acquiring the lock itself. Currently
> kernel has an interface of bool vcpu_is_preempted(int cpu). It take the cpu
^^ takes
> as parameter and return true if the cpu is preempted. Then kernel can break
> the spin loops upon on the retval of vcpu_is_preempted.
>
> As kernel has used this interface, So lets support it.
>
> Only pSeries need supoort it. And the fact is powerNV are built into same
^^ support
> kernel image with pSeries. So we need return false if we are runnig as
> powerNV. The another fact is that lppaca->yiled_count keeps zero on
^^ yield
> powerNV. So we can just skip the machine type.
>
> Suggested-by: Boqun Feng <boqun.feng@gmail.com>
> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
> arch/powerpc/include/asm/spinlock.h | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index 523673d..3ac9fcb 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -52,6 +52,24 @@
> #define SYNC_IO
> #endif
>
> +/*
> + * This support kernel to check if one cpu is preempted or not.
> + * Then we can fix some lock holder preemption issue.
> + */
> +#ifdef CONFIG_PPC_PSERIES
> +#define vcpu_is_preempted vcpu_is_preempted
> +static inline bool vcpu_is_preempted(int cpu)
> +{
> + /*
> + * pSeries and powerNV can be built into same kernel image. In
> + * principle we need return false directly if we are running as
> + * powerNV. However the yield_count is always zero on powerNV, So
> + * skip such machine type check
Or you could use the ppc_md interface callbacks if required, but your
solution works as well
> + */
> + return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
> +}
> +#endif
> +
> static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> {
> return lock.slock == 0;
Balbir Singh.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Paolo Bonzini @ 2016-07-06 10:44 UTC (permalink / raw)
To: Peter Zijlstra, Pan Xinhui
Cc: linux-s390, dave, mpe, boqun.feng, will.deacon, linux-kernel,
waiman.long, virtualization, mingo, paulus, benh, schwidefsky,
paulmck, linuxppc-dev
In-Reply-To: <20160706065255.GH30909@twins.programming.kicks-ass.net>
On 06/07/2016 08:52, Peter Zijlstra wrote:
> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>> change fomr v1:
>> a simplier definition of default vcpu_is_preempted
>> skip mahcine type check on ppc, and add config. remove dedicated macro.
>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>> add more comments
>> thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>
>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>>
>
> Paolo, could you help out with an (x86) KVM interface for this?
If it's just for spin loops, you can check if the version field in the
steal time structure has changed.
Paolo
> Waiman, could you see if you can utilize this to get rid of the
> SPIN_THRESHOLD in qspinlock_paravirt?
>
^ permalink raw reply
* Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check
From: xinhui @ 2016-07-06 10:18 UTC (permalink / raw)
To: Wanpeng Li, Peter Zijlstra
Cc: linux-s390, Davidlohr Bueso, kvm, mpe, boqun.feng, will.deacon,
linux-kernel@vger.kernel.org, Waiman Long, virtualization,
Ingo Molnar, Paul Mackerras, benh, schwidefsky, Paolo Bonzini,
Paul McKenney, linuxppc-dev
In-Reply-To: <CANRm+Cz5-XGDTNkZ_RcoX+gxF_+JDpUB6OB+ECOQcScuaeKfGA@mail.gmail.com>
On 2016年07月06日 16:32, Wanpeng Li wrote:
> 2016-07-06 15:58 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
>> On Wed, Jul 06, 2016 at 02:46:34PM +0800, Wanpeng Li wrote:
>>>> SO it's easy for ppc to implement such interface. Note that yield_count is
>>>> set by powerVM/KVM.
>>>> and only pSeries can run a guest for now. :)
>>>>
>>>> I also review x86 related code, looks like we need add one hyer-call to get
>>>> such vcpu preemption info?
>>>
>>> There is no such stuff to record lock holder in x86 kvm, maybe we
>>> don't need to depend on PLE handler algorithm to guess it if we can
>>> know lock holder vCPU directly.
>>
>> x86/kvm has vcpu->preempted to indicate if a vcpu is currently preempted
>> or not. I'm just not sure if that is visible to the guest or how to make
>> it so.
>
> Yeah, I miss it. I can be the volunteer to do it if there is any idea,
> ping Paolo. :)
>
glad to know that. :)
> Regards,
> Wanpeng Li
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: xinhui @ 2016-07-06 10:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-s390, dave, mpe, boqun.feng, will.deacon, linux-kernel,
waiman.long, virtualization, mingo, paulus, benh, schwidefsky,
pbonzini, paulmck, linuxppc-dev
In-Reply-To: <20160706065255.GH30909@twins.programming.kicks-ass.net>
On 2016年07月06日 14:52, Peter Zijlstra wrote:
> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>> change fomr v1:
>> a simplier definition of default vcpu_is_preempted
>> skip mahcine type check on ppc, and add config. remove dedicated macro.
>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>> add more comments
>> thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>
>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>>
>
> Paolo, could you help out with an (x86) KVM interface for this?
>
> Waiman, could you see if you can utilize this to get rid of the
> SPIN_THRESHOLD in qspinlock_paravirt?
>
hmm. maybe something like below. wait_node can go into pv_wait() earlier as soon as the prev cpu is preempted.
but for the wait_head, as qspinlock does not record the lock_holder correctly(thanks to lock stealing), vcpu preemption check might get wrong results.
Waiman, I have used one hash table to keep the lock holder in my ppc implementation patch. I think we could do something similar in generic code?
diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 74c4a86..40560e8 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -312,7 +312,8 @@ pv_wait_early(struct pv_node *prev, int loop)
if ((loop & PV_PREV_CHECK_MASK) != 0)
return false;
- return READ_ONCE(prev->state) != vcpu_running;
+ return READ_ONCE(prev->state) != vcpu_running ||
+ vcpu_is_preempted(prev->cpu);
}
/*
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* Re: [PATCH] virtio: Return correct errno for function init_vq's failure
From: Cornelia Huck @ 2016-07-06 9:18 UTC (permalink / raw)
To: Minfei Huang, mst; +Cc: virtualization, linux-kernel, Minfei Huang
In-Reply-To: <1466993358-12064-1-git-send-email-mnghuan@gmail.com>
On Mon, 27 Jun 2016 10:09:18 +0800
Minfei Huang <mnghuan@gmail.com> wrote:
> The error number -ENOENT or 0 will be returned, if we can not allocate
> more memory in function init_vq. If host can support multiple virtual
> queues, and we fails to allocate necessary memory structures for vq,
> kernel may crash due to incorrect returning.
>
> To fix it, kernel will return correct value in init_vq.
>
> Signed-off-by: Minfei Huang <mnghuan@gmail.com>
> Signed-off-by: Minfei Huang <minfei.hmf@alibaba-inc.com>
> ---
> drivers/block/virtio_blk.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 42758b5..40ecb2b 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -393,11 +393,10 @@ static int init_vq(struct virtio_blk *vblk)
> if (err)
> num_vqs = 1;
>
> + err = -ENOMEM;
> vblk->vqs = kmalloc(sizeof(*vblk->vqs) * num_vqs, GFP_KERNEL);
> - if (!vblk->vqs) {
> - err = -ENOMEM;
> + if (!vblk->vqs)
> goto out;
> - }
>
> names = kmalloc(sizeof(*names) * num_vqs, GFP_KERNEL);
> if (!names)
The error handling in this function looks horrible.
When mq was introduced, init_vq started mixing up several things:
- The mq feature is not available - which is not an error, and
therefore should not have any influence on the return code.
- One of the several memory allocations failed - only ->vqs gets
special treatment, however.
- The ->find_vqs callback failed.
Your patch fixes the code, but it is still very convoluted due to the
temporary arrays.
May it be worthwile to introduce a helper for setting up the virtqueues
where all virtqueues are essentially the same and just get a
consecutive number? Michael?
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Juergen Gross @ 2016-07-06 8:38 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-s390, xen-devel-request, dave, benh, Pan Xinhui, boqun.feng,
will.deacon, linux-kernel, waiman.long, virtualization, mingo,
paulus, mpe, schwidefsky, pbonzini, paulmck, linuxppc-dev
In-Reply-To: <20160706081920.GG30921@twins.programming.kicks-ass.net>
On 06/07/16 10:19, Peter Zijlstra wrote:
> On Wed, Jul 06, 2016 at 09:47:18AM +0200, Juergen Gross wrote:
>> On 06/07/16 08:52, Peter Zijlstra wrote:
>
>>> Paolo, could you help out with an (x86) KVM interface for this?
>>
>> Xen support of this interface should be rather easy. Could you please
>> Cc: xen-devel-request@lists.xenproject.org in the next version?
>
> So meta question; aren't all you virt people looking at the regular
> virtualization list? Or should we really dig out all the various
> hypervisor lists and Cc them?
Hmm, good question. Up to now I didn't look at the virtualization list,
just changed that. :-)
Can't speak for the other virt people.
Juergen
^ permalink raw reply
* Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check
From: Wanpeng Li @ 2016-07-06 8:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-s390, Davidlohr Bueso, benh, kvm, xinhui, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, mpe, schwidefsky,
Paolo Bonzini, Paul McKenney, linuxppc-dev
In-Reply-To: <20160706075814.GF30921@twins.programming.kicks-ass.net>
2016-07-06 15:58 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Wed, Jul 06, 2016 at 02:46:34PM +0800, Wanpeng Li wrote:
>> > SO it's easy for ppc to implement such interface. Note that yield_count is
>> > set by powerVM/KVM.
>> > and only pSeries can run a guest for now. :)
>> >
>> > I also review x86 related code, looks like we need add one hyer-call to get
>> > such vcpu preemption info?
>>
>> There is no such stuff to record lock holder in x86 kvm, maybe we
>> don't need to depend on PLE handler algorithm to guess it if we can
>> know lock holder vCPU directly.
>
> x86/kvm has vcpu->preempted to indicate if a vcpu is currently preempted
> or not. I'm just not sure if that is visible to the guest or how to make
> it so.
Yeah, I miss it. I can be the volunteer to do it if there is any idea,
ping Paolo. :)
Regards,
Wanpeng Li
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Peter Zijlstra @ 2016-07-06 8:19 UTC (permalink / raw)
To: Juergen Gross
Cc: linux-s390, xen-devel-request, dave, benh, Pan Xinhui, boqun.feng,
will.deacon, linux-kernel, waiman.long, virtualization, mingo,
paulus, mpe, schwidefsky, pbonzini, paulmck, linuxppc-dev
In-Reply-To: <577CB786.8000902@suse.com>
On Wed, Jul 06, 2016 at 09:47:18AM +0200, Juergen Gross wrote:
> On 06/07/16 08:52, Peter Zijlstra wrote:
> > Paolo, could you help out with an (x86) KVM interface for this?
>
> Xen support of this interface should be rather easy. Could you please
> Cc: xen-devel-request@lists.xenproject.org in the next version?
So meta question; aren't all you virt people looking at the regular
virtualization list? Or should we really dig out all the various
hypervisor lists and Cc them?
^ permalink raw reply
* Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check
From: Peter Zijlstra @ 2016-07-06 7:58 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-s390, Davidlohr Bueso, benh, kvm, xinhui, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, mpe, schwidefsky,
Paolo Bonzini, Paul McKenney, linuxppc-dev
In-Reply-To: <CANRm+CySutzEYQuCaeqyLKKVPPcFDFT94UsNaYWo8fU43VsSig@mail.gmail.com>
On Wed, Jul 06, 2016 at 02:46:34PM +0800, Wanpeng Li wrote:
> > SO it's easy for ppc to implement such interface. Note that yield_count is
> > set by powerVM/KVM.
> > and only pSeries can run a guest for now. :)
> >
> > I also review x86 related code, looks like we need add one hyer-call to get
> > such vcpu preemption info?
>
> There is no such stuff to record lock holder in x86 kvm, maybe we
> don't need to depend on PLE handler algorithm to guess it if we can
> know lock holder vCPU directly.
x86/kvm has vcpu->preempted to indicate if a vcpu is currently preempted
or not. I'm just not sure if that is visible to the guest or how to make
it so.
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Juergen Gross @ 2016-07-06 7:47 UTC (permalink / raw)
To: Pan Xinhui
Cc: linux-s390, xen-devel-request, dave, Peter Zijlstra, mpe,
boqun.feng, will.deacon, linux-kernel, waiman.long,
virtualization, mingo, paulus, benh, schwidefsky, pbonzini,
paulmck, linuxppc-dev
In-Reply-To: <20160706065255.GH30909@twins.programming.kicks-ass.net>
On 06/07/16 08:52, Peter Zijlstra wrote:
> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>> change fomr v1:
>> a simplier definition of default vcpu_is_preempted
>> skip mahcine type check on ppc, and add config. remove dedicated macro.
>> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>> add more comments
>> thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>
>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>>
>
> Paolo, could you help out with an (x86) KVM interface for this?
Xen support of this interface should be rather easy. Could you please
Cc: xen-devel-request@lists.xenproject.org in the next version?
Juergen
^ permalink raw reply
* Re: [PATCH v2 0/4] implement vcpu preempted check
From: Peter Zijlstra @ 2016-07-06 6:52 UTC (permalink / raw)
To: Pan Xinhui
Cc: linux-s390, dave, mpe, boqun.feng, will.deacon, linux-kernel,
waiman.long, virtualization, mingo, paulus, benh, schwidefsky,
pbonzini, paulmck, linuxppc-dev
In-Reply-To: <1467124991-13164-1-git-send-email-xinhui.pan@linux.vnet.ibm.com>
On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
> change fomr v1:
> a simplier definition of default vcpu_is_preempted
> skip mahcine type check on ppc, and add config. remove dedicated macro.
> add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
> add more comments
> thanks boqun and Peter's suggestion.
>
> This patch set aims to fix lock holder preemption issues.
>
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
>
> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
> 2.49% sched-messaging [kernel.vmlinux] [k] system_call
>
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch set
>
Paolo, could you help out with an (x86) KVM interface for this?
Waiman, could you see if you can utilize this to get rid of the
SPIN_THRESHOLD in qspinlock_paravirt?
^ permalink raw reply
* Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check
From: Wanpeng Li @ 2016-07-06 6:46 UTC (permalink / raw)
To: xinhui
Cc: linux-s390, Davidlohr Bueso, kvm, Peter Zijlstra, mpe, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, benh, schwidefsky,
Paolo Bonzini, Paul McKenney, linuxppc-dev
In-Reply-To: <577C8FE3.3010208@linux.vnet.ibm.com>
Cc Paolo, kvm ml
2016-07-06 12:58 GMT+08:00 xinhui <xinhui.pan@linux.vnet.ibm.com>:
> Hi, wanpeng
>
> On 2016年07月05日 17:57, Wanpeng Li wrote:
>>
>> Hi Xinhui,
>> 2016-06-28 22:43 GMT+08:00 Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>:
>>>
>>> This is to fix some lock holder preemption issues. Some other locks
>>> implementation do a spin loop before acquiring the lock itself. Currently
>>> kernel has an interface of bool vcpu_is_preempted(int cpu). It take the
>>> cpu
>>> as parameter and return true if the cpu is preempted. Then kernel can
>>> break
>>> the spin loops upon on the retval of vcpu_is_preempted.
>>>
>>> As kernel has used this interface, So lets support it.
>>>
>>> Only pSeries need supoort it. And the fact is powerNV are built into same
>>> kernel image with pSeries. So we need return false if we are runnig as
>>> powerNV. The another fact is that lppaca->yiled_count keeps zero on
>>> powerNV. So we can just skip the machine type.
>>
>>
>> Lock holder vCPU preemption can be detected by hardware pSeries or
>> paravirt method?
>>
> There is one shard struct between kernel and powerVM/KVM. And we read the
> yield_count of this struct to detect if one vcpu is running or not.
> SO it's easy for ppc to implement such interface. Note that yield_count is
> set by powerVM/KVM.
> and only pSeries can run a guest for now. :)
>
> I also review x86 related code, looks like we need add one hyer-call to get
> such vcpu preemption info?
There is no such stuff to record lock holder in x86 kvm, maybe we
don't need to depend on PLE handler algorithm to guess it if we can
know lock holder vCPU directly.
Regards,
Wanpeng Li
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check
From: xinhui @ 2016-07-06 4:58 UTC (permalink / raw)
To: Wanpeng Li
Cc: linux-s390, Davidlohr Bueso, Peter Zijlstra, mpe, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, benh, schwidefsky,
Paul McKenney, linuxppc-dev
In-Reply-To: <CANRm+CxQMSrg69p2Xey829Mz4Au9YCai+4JD17i9urdbj0VgkQ@mail.gmail.com>
Hi, wanpeng
On 2016年07月05日 17:57, Wanpeng Li wrote:
> Hi Xinhui,
> 2016-06-28 22:43 GMT+08:00 Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>:
>> This is to fix some lock holder preemption issues. Some other locks
>> implementation do a spin loop before acquiring the lock itself. Currently
>> kernel has an interface of bool vcpu_is_preempted(int cpu). It take the cpu
>> as parameter and return true if the cpu is preempted. Then kernel can break
>> the spin loops upon on the retval of vcpu_is_preempted.
>>
>> As kernel has used this interface, So lets support it.
>>
>> Only pSeries need supoort it. And the fact is powerNV are built into same
>> kernel image with pSeries. So we need return false if we are runnig as
>> powerNV. The another fact is that lppaca->yiled_count keeps zero on
>> powerNV. So we can just skip the machine type.
>
> Lock holder vCPU preemption can be detected by hardware pSeries or
> paravirt method?
>
There is one shard struct between kernel and powerVM/KVM. And we read the yield_count of this struct to detect if one vcpu is running or not.
SO it's easy for ppc to implement such interface. Note that yield_count is set by powerVM/KVM.
and only pSeries can run a guest for now. :)
I also review x86 related code, looks like we need add one hyer-call to get such vcpu preemption info?
thanks
xinui
> Regards,
> Wanpeng Li
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* RE: [PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
From: Li, Liang Z @ 2016-07-06 0:43 UTC (permalink / raw)
To: mst@redhat.com
Cc: virtio-dev@lists.oasis-open.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
dgilbert@redhat.com, virtualization@lists.linux-foundation.org
In-Reply-To: <1467196340-22079-1-git-send-email-liang.z.li@intel.com>
Ping ...
Liang
> -----Original Message-----
> From: Li, Liang Z
> Sent: Wednesday, June 29, 2016 6:32 PM
> To: mst@redhat.com
> Cc: linux-kernel@vger.kernel.org; virtualization@lists.linux-foundation.org;
> kvm@vger.kernel.org; qemu-devel@nongnu.org; virtio-dev@lists.oasis-
> open.org; dgilbert@redhat.com; quintela@redhat.com; Li, Liang Z
> Subject: [PATCH v2 kernel 0/7] Extend virtio-balloon for fast (de)inflating &
> fast live migration
>
> This patch set contains two parts of changes to the virtio-balloon.
>
> One is the change for speeding up the inflating & deflating process, the main
> idea of this optimization is to use bitmap to send the page information to
> host instead of the PFNs, to reduce the overhead of virtio data transmission,
> address translation and madvise(). This can help to improve the performance
> by about 85%.
>
> Another change is for speeding up live migration. By skipping process guest's
> free pages in the first round of data copy, to reduce needless data processing,
> this can help to save quite a lot of CPU cycles and network bandwidth. We
> put guest's free page information in bitmap and send it to host with the virt
> queue of virtio-balloon. For an idle 8GB guest, this can help to shorten the
> total live migration time from 2Sec to about 500ms in the 10Gbps network
> environment.
>
>
> Changes from v1 to v2:
> * Abandon the patch for dropping page cache.
> * Put some structures to uapi head file.
> * Use a new way to determine the page bitmap size.
> * Use a unified way to send the free page information with the bitmap
> * Address the issues referred in MST's comments
>
> Liang Li (7):
> virtio-balloon: rework deflate to add page to a list
> virtio-balloon: define new feature bit and page bitmap head
> mm: add a function to get the max pfn
> virtio-balloon: speed up inflate/deflate process
> virtio-balloon: define feature bit and head for misc virt queue
> mm: add the related functions to get free page info
> virtio-balloon: tell host vm's free page info
>
> drivers/virtio/virtio_balloon.c | 306
> +++++++++++++++++++++++++++++++-----
> include/uapi/linux/virtio_balloon.h | 41 +++++
> mm/page_alloc.c | 52 ++++++
> 3 files changed, 359 insertions(+), 40 deletions(-)
>
> --
> 1.8.3.1
^ permalink raw reply
* Re: [PATCH v2 2/4] powerpc/spinlock: support vcpu preempted check
From: Wanpeng Li @ 2016-07-05 9:57 UTC (permalink / raw)
To: Pan Xinhui
Cc: linux-s390, Davidlohr Bueso, Peter Zijlstra, mpe, boqun.feng,
will.deacon, linux-kernel@vger.kernel.org, Waiman Long,
virtualization, Ingo Molnar, Paul Mackerras, benh, schwidefsky,
Paul McKenney, linuxppc-dev
In-Reply-To: <1467124991-13164-3-git-send-email-xinhui.pan@linux.vnet.ibm.com>
Hi Xinhui,
2016-06-28 22:43 GMT+08:00 Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>:
> This is to fix some lock holder preemption issues. Some other locks
> implementation do a spin loop before acquiring the lock itself. Currently
> kernel has an interface of bool vcpu_is_preempted(int cpu). It take the cpu
> as parameter and return true if the cpu is preempted. Then kernel can break
> the spin loops upon on the retval of vcpu_is_preempted.
>
> As kernel has used this interface, So lets support it.
>
> Only pSeries need supoort it. And the fact is powerNV are built into same
> kernel image with pSeries. So we need return false if we are runnig as
> powerNV. The another fact is that lppaca->yiled_count keeps zero on
> powerNV. So we can just skip the machine type.
Lock holder vCPU preemption can be detected by hardware pSeries or
paravirt method?
Regards,
Wanpeng Li
^ permalink raw reply
* (unknown),
From: basavaraj Khodanpur @ 2016-07-04 12:04 UTC (permalink / raw)
To: virtualization
[-- Attachment #1: Type: text/html, Size: 195 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: Build regressions/improvements in v4.7-rc6
From: Paolo Bonzini @ 2016-07-04 11:06 UTC (permalink / raw)
To: Arnd Bergmann, linux-arm-kernel
Cc: Geert Uytterhoeven, linux-kernel@vger.kernel.org, kvm,
virtualization
In-Reply-To: <6183003.0tbSMroAbk@wuerfel>
On 04/07/2016 11:17, Arnd Bergmann wrote:
>
> If we want to avoid this one, we could make the inclusion of
> drivers/vhost/Kconfig from arch/arm/kvm/Kconfig depend on CONFIG_AEABI,
> or perhaps go further force-enable CONFIG_AEABI for ARMv6k and higher
> (cmpxchg64() is broken on OABI too), and only include vhost if KVM
> is enabled (KVM in turn requires ARMv7).
I don't think it is correct to include vhost only if KVM is enabled.
vhost is an implementation of virtio; it is usually used together with
KVM but there's no dependency between the two.
Paolo
^ permalink raw reply
* Re: Build regressions/improvements in v4.7-rc6
From: Arnd Bergmann @ 2016-07-04 9:17 UTC (permalink / raw)
To: linux-arm-kernel
Cc: Geert Uytterhoeven, linux-kernel@vger.kernel.org, kvm,
virtualization
In-Reply-To: <CAMuHMdUDO0k86TTVTiCQGS2=H0dEXVW77NNfvt7fvWbwdpS=2g@mail.gmail.com>
On Monday, July 4, 2016 10:21:45 AM CEST Geert Uytterhoeven wrote:
> On Mon, Jul 4, 2016 at 10:12 AM, Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > JFYI, when comparing v4.7-rc6[1] to v4.7-rc5[3], the summaries are:
> > - build errors: +3/-2
>
> + /home/kisskb/slave/src/drivers/vhost/vhost.c: error: call to
> '__compiletime_assert_844' declared with attribute error: BUILD_BUG_ON
> failed: __alignof__ *vq->avail > VRING_AVAIL_ALIGN_SIZE: => 844:3
>
> arm-randconfig
>
> > [1] http://kisskb.ellerman.id.au/kisskb/head/10562/ (260 out of 263 configs)
> > [3] http://kisskb.ellerman.id.au/kisskb/head/10532/ (260 out of 263 configs)
I don't see any changes in the code in this time frame, but this is the
code causing it:
struct vring_avail {
__virtio16 flags;
__virtio16 idx;
__virtio16 ring[];
};
/* The virtqueue structure describes a queue attached to a device. */
struct vhost_virtqueue {
struct vhost_dev *dev;
/* The actual ring of buffers. */
struct mutex mutex;
unsigned int num;
struct vring_desc __user *desc;
struct vring_avail __user *avail;
struct vring_used __user *used;
...
};
struct vhost_virtqueue *vq;
BUILD_BUG_ON(__alignof__ *vq->avail > VRING_AVAIL_ALIGN_SIZE);
The alignment of the *vq->avail should be '2' on all architectures,
however an ARM OABI compiler will have a padded structure with alignment '4'.
Looking at the build logs, I find it only in a single randconfig build
at http://kisskb.ellerman.id.au/kisskb/buildresult/12735927/ which apparently
enabled the vhost driver in combination with ARM_AEABI=n.
In my own randconfig builds I am forcing ARM_AEABI=y because there are a
couple of other problems with OABI.
If we want to avoid this one, we could make the inclusion of
drivers/vhost/Kconfig from arch/arm/kvm/Kconfig depend on CONFIG_AEABI,
or perhaps go further force-enable CONFIG_AEABI for ARMv6k and higher
(cmpxchg64() is broken on OABI too), and only include vhost if KVM
is enabled (KVM in turn requires ARMv7).
Arnd
^ permalink raw reply
* Re: Build regressions/improvements in v4.7-rc6
From: Geert Uytterhoeven @ 2016-07-04 8:21 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org, kvm, virtualization
In-Reply-To: <1467619946-424-1-git-send-email-geert@linux-m68k.org>
On Mon, Jul 4, 2016 at 10:12 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> JFYI, when comparing v4.7-rc6[1] to v4.7-rc5[3], the summaries are:
> - build errors: +3/-2
+ /home/kisskb/slave/src/drivers/vhost/vhost.c: error: call to
'__compiletime_assert_844' declared with attribute error: BUILD_BUG_ON
failed: __alignof__ *vq->avail > VRING_AVAIL_ALIGN_SIZE: => 844:3
arm-randconfig
> [1] http://kisskb.ellerman.id.au/kisskb/head/10562/ (260 out of 263 configs)
> [3] http://kisskb.ellerman.id.au/kisskb/head/10532/ (260 out of 263 configs)
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
From: David Miller @ 2016-07-01 9:40 UTC (permalink / raw)
To: jasowang
Cc: kvm, eric.dumazet, mst, netdev, linux-kernel, virtualization,
brouer
In-Reply-To: <1467269136-8082-1-git-send-email-jasowang@redhat.com>
From: Jason Wang <jasowang@redhat.com>
Date: Thu, 30 Jun 2016 14:45:30 +0800
> This series tries to switch to use skb array in tun. This is used to
> eliminate the spinlock contention between producer and consumer. The
> conversion was straightforward: just introdce a tx skb array and use
> it instead of sk_receive_queue.
Series applied, thanks Jason.
^ permalink raw reply
* Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun
From: Jason Wang @ 2016-07-01 6:04 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: kvm, eric.dumazet, netdev, linux-kernel, virtualization, brouer,
davem
In-Reply-To: <20160630184242-mutt-send-email-mst@redhat.com>
On 2016年06月30日 23:45, Michael S. Tsirkin wrote:
> On Thu, Jun 30, 2016 at 02:45:30PM +0800, Jason Wang wrote:
>> >Hi all:
>> >
>> >This series tries to switch to use skb array in tun. This is used to
>> >eliminate the spinlock contention between producer and consumer. The
>> >conversion was straightforward: just introdce a tx skb array and use
>> >it instead of sk_receive_queue.
>> >
>> >A minor issue is to keep the tx_queue_len behaviour, since tun used to
>> >use it for the length of sk_receive_queue. This is done through:
>> >
>> >- add the ability to resize multiple rings at once to avoid handling
>> > partial resize failure for mutiple rings.
>> >- add the support for zero length ring.
>> >- introduce a notifier which was triggered when tx_queue_len was
>> > changed for a netdev.
>> >- resize all queues during the tx_queue_len changing.
>> >
>> >Tests shows about 15% improvement on guest rx pps:
>> >
>> >Before: ~1300000pps
>> >After : ~1500000pps
> Acked-by: Michael S. Tsirkin<mst@redhat.com>
>
> Acked-from-altitude: 34697 feet.
Wow, thanks a lot!
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2 00/12] gendisk: Generate uevent after attribute available
From: Dan Williams @ 2016-07-01 1:29 UTC (permalink / raw)
To: Fam Zheng
Cc: Sergey Senozhatsky, Michael S. Tsirkin, Benjamin Herrenschmidt,
linux-nvme, virtualization, Keith Busch, Paul Mackerras,
Michael Ellerman, Christoph Hellwig, Shaohua Li, Nitin Gupta,
Jiri Kosina, linux-raid, Ed L. Cashin, Jens Axboe, linux-block,
David Woodhouse, linux-mmc@vger.kernel.org,
Linux Kernel Mailing List, Minchan Kim, linux-mtd, Brian Norris
In-Reply-To: <20160701010113.GB10122@ad.usersys.redhat.com>
On Thu, Jun 30, 2016 at 6:01 PM, Fam Zheng <famz@redhat.com> wrote:
> On Wed, 06/29 23:38, Christoph Hellwig wrote:
>> On Thu, Jun 30, 2016 at 02:35:54PM +0800, Fam Zheng wrote:
>> > also more code and less flexible IMO. For example, we need at least two
>> > variants, for attribute_group and device_attribute separately, right?
>>
>> Yes, or maybe just a calling convention that just passes both.
>
> OK, I can look into that, but I'm not sure about the error handling. Currently
> add_disk returns void, do you have any plan on that too? should I change it in
> v3 (to at least return the attribute creation failure)?
I think we should only support a "groups" interface to
device_add_disk() and convert all the drivers that currently do
device_create_file() after add_disk() to pass in a group list instead.
That way we follow the expectation that the only way to get an
attribute for a device to show up before KOBJ_ADD, is to define a
group:
From Documentation/driver-model/device.txt:
As explained in Documentation/kobject.txt, device attributes must be be
created before the KOBJ_ADD uevent is generated. The only way to realize
that is by defining an attribute group.
Let's defer the return value fixing for now.
You can find the pending device_add_disk() patches in the nvdimm
patchwork starting with "block: introduce device_add_disk()"
https://patchwork.kernel.org/project/linux-nvdimm/list/
^ permalink raw reply
* Re: [PATCH v2 04/12] axonrom: Generate uevent after attribute available
From: Fam Zheng @ 2016-07-01 1:03 UTC (permalink / raw)
To: Dan Williams
Cc: Sergey Senozhatsky, Michael S. Tsirkin, Benjamin Herrenschmidt,
linux-nvme, virtualization, Keith Busch, Paul Mackerras,
Michael Ellerman, Christoph Hellwig, Shaohua Li, Nitin Gupta,
Jiri Kosina, linux-block, Ed L. Cashin, Jens Axboe, linux-raid,
David Woodhouse, linux-mmc@vger.kernel.org,
Linux Kernel Mailing List, Minchan Kim, linux-mtd, Brian Norris
In-Reply-To: <CAA9_cmf0cKZXdQ2N3iSbwVqgLDGJ7TMvVYuZTqjVE-7EqHHcdg@mail.gmail.com>
On Thu, 06/30 15:10, Dan Williams wrote:
> On Wed, Jun 29, 2016 at 6:59 PM, Fam Zheng <famz@redhat.com> wrote:
> > It is documented that KOBJ_ADD should be generated after the object's
> > attributes and children are ready. We can achieve this with the new
> > disk_gen_uevents interface.
> >
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> > arch/powerpc/sysdev/axonram.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
> > index 4efd69b..27e7175 100644
> > --- a/arch/powerpc/sysdev/axonram.c
> > +++ b/arch/powerpc/sysdev/axonram.c
> > @@ -238,7 +238,7 @@ static int axon_ram_probe(struct platform_device *device)
> > set_capacity(bank->disk, bank->size >> AXON_RAM_SECTOR_SHIFT);
> > blk_queue_make_request(bank->disk->queue, axon_ram_make_request);
> > blk_queue_logical_block_size(bank->disk->queue, AXON_RAM_SECTOR_SIZE);
> > - add_disk(bank->disk, true);
> > + add_disk(bank->disk, false);
> >
> > bank->irq_id = irq_of_parse_and_map(device->dev.of_node, 0);
> > if (bank->irq_id == NO_IRQ) {
> > @@ -262,6 +262,7 @@ static int axon_ram_probe(struct platform_device *device)
> > rc = -EFAULT;
> > goto failed;
> > }
> > + disk_gen_uevents(bank->disk);
>
> I assume you are doing this after:
>
> rc = device_create_file(&device->dev, &dev_attr_ecc);
>
> ...so that userspace gets notified of the new attribute, but this
> attribute is on the parent device, not the disk itself. Instead I
> think this attribute should simply be registered before the call to
> add_disk(). Then the KOBJ_ADD event for the disk comes after the
> attribute is available. It's still not a clean fit, because userspace
> should not be expecting a child device uevent to signal new attributes
> available on the parent.
Yes you are right, this patch is a mistake. Moving to before add_disk makes
sense to me.
Thanks for taking a look!
Fam
^ permalink raw reply
* Re: [PATCH v2 00/12] gendisk: Generate uevent after attribute available
From: Fam Zheng @ 2016-07-01 1:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Sergey Senozhatsky, Michael S. Tsirkin, Benjamin Herrenschmidt,
linux-nvme, virtualization, Keith Busch, Paul Mackerras,
Michael Ellerman, Shaohua Li, Nitin Gupta, famz, Jiri Kosina,
linux-block, dan.j.williams, Ed L. Cashin, Jens Axboe, linux-raid,
David Woodhouse, linux-mmc, linux-kernel, Minchan Kim, linux-mtd,
Brian Norris, linuxppc-dev
In-Reply-To: <20160630063839.GA17205@infradead.org>
On Wed, 06/29 23:38, Christoph Hellwig wrote:
> On Thu, Jun 30, 2016 at 02:35:54PM +0800, Fam Zheng wrote:
> > also more code and less flexible IMO. For example, we need at least two
> > variants, for attribute_group and device_attribute separately, right?
>
> Yes, or maybe just a calling convention that just passes both.
OK, I can look into that, but I'm not sure about the error handling. Currently
add_disk returns void, do you have any plan on that too? should I change it in
v3 (to at least return the attribute creation failure)?
Fam
^ permalink raw reply
* Re: [PATCH v2 05/12] aoeblk: Generate uevent after attribute available
From: Ed Cashin @ 2016-07-01 0:57 UTC (permalink / raw)
To: Fam Zheng, linux-kernel
Cc: Sergey Senozhatsky, Michael S. Tsirkin, Benjamin Herrenschmidt,
linux-nvme, virtualization, Keith Busch, Paul Mackerras,
Michael Ellerman, Christoph Hellwig, Shaohua Li, Nitin Gupta,
Jiri Kosina, linux-block, Jens Axboe, linux-raid, David Woodhouse,
linux-mmc, Minchan Kim, linux-mtd, Brian Norris, linuxppc-dev
In-Reply-To: <20160630015953.6888-6-famz@redhat.com>
On 06/29/2016 09:59 PM, Fam Zheng wrote:
> It is documented that KOBJ_ADD should be generated after the object's
> attributes and children are ready. We can achieve this with the new
> disk_gen_uevents interface.
Looks like an improvement, thanks!
--
Ed
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox