* Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
From: Stephan Diestelhorst @ 2012-03-21 13:22 UTC (permalink / raw)
To: Attilio Rao
Cc: the arch/x86 maintainers, KVM, Konrad Rzeszutek Wilk,
Peter Zijlstra, Raghavendra K T, LKML, Andi Kleen,
Srivatsa Vaddagiri, Avi Kivity, Jeremy Fitzhardinge,
H. Peter Anvin, Ingo Molnar, Virtualization, Linus Torvalds,
Xen Devel, Stefano Stabellini
In-Reply-To: <4F69D1D9.9080107@citrix.com>
On Wednesday 21 March 2012, 13:04:25 Attilio Rao wrote:
> On 21/03/12 10:20, Raghavendra K T wrote:
> > From: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
> >
> > Rather than outright replacing the entire spinlock implementation in
> > order to paravirtualize it, keep the ticket lock implementation but add
> > a couple of pvops hooks on the slow patch (long spin on lock, unlocking
> > a contended lock).
> >
> > Ticket locks have a number of nice properties, but they also have some
> > surprising behaviours in virtual environments. They enforce a strict
> > FIFO ordering on cpus trying to take a lock; however, if the hypervisor
> > scheduler does not schedule the cpus in the correct order, the system can
> > waste a huge amount of time spinning until the next cpu can take the lock.
> >
> > (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
> > http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
> >
> > To address this, we add two hooks:
> > - __ticket_spin_lock which is called after the cpu has been
> > spinning on the lock for a significant number of iterations but has
> > failed to take the lock (presumably because the cpu holding the lock
> > has been descheduled). The lock_spinning pvop is expected to block
> > the cpu until it has been kicked by the current lock holder.
> > - __ticket_spin_unlock, which on releasing a contended lock
> > (there are more cpus with tail tickets), it looks to see if the next
> > cpu is blocked and wakes it if so.
> >
> > When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
> > functions causes all the extra code to go away.
> >
>
> I've made some real world benchmarks based on this serie of patches
> applied on top of a vanilla Linux-3.3-rc6 (commit
> 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both
> CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions
> compared:
> * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch
> * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch
> * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch
> * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch
>
[...]
> == Results
> This test points in the direction that Jeremy's rebased patches don't
> introduce a peformance penalty at all, but also that we could likely
> consider CONFIG_PARAVIRT_SPINLOCK option removal, or turn it on by
> default and suggest disabling just on very old CPUs (assuming a
> performance regression can be proven there).
Very interesting results, in particular knowing that in the one guest
case things do not get (significantly) slower due to the added logic
and LOCKed RMW in the unlock path.
AFAICR, the problem really became apparent when running multiple guests
time sharing the physical CPUs, i.e., two guests with eight vCPUs each
on an eight core machine. Did you look at this setup with your tests?
Thanks,
Stephan
--
Stephan Diestelhorst, AMD Operating System Research Center
stephan.diestelhorst@amd.com, Tel. +49 (0)351 448 356 719 (AMD: 29-719)
^ permalink raw reply
* Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
From: Attilio Rao @ 2012-03-21 13:49 UTC (permalink / raw)
To: Stephan Diestelhorst
Cc: the arch/x86 maintainers, KVM, Konrad Rzeszutek Wilk,
Peter Zijlstra, Raghavendra K T, LKML, Andi Kleen,
Srivatsa Vaddagiri, Avi Kivity, Jeremy Fitzhardinge,
H. Peter Anvin, Ingo Molnar, Virtualization, Linus Torvalds,
Xen Devel, Stefano Stabellini
In-Reply-To: <2425963.NBpIGX9T40@chlor>
On 21/03/12 13:22, Stephan Diestelhorst wrote:
> On Wednesday 21 March 2012, 13:04:25 Attilio Rao wrote:
>
>> On 21/03/12 10:20, Raghavendra K T wrote:
>>
>>> From: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
>>>
>>> Rather than outright replacing the entire spinlock implementation in
>>> order to paravirtualize it, keep the ticket lock implementation but add
>>> a couple of pvops hooks on the slow patch (long spin on lock, unlocking
>>> a contended lock).
>>>
>>> Ticket locks have a number of nice properties, but they also have some
>>> surprising behaviours in virtual environments. They enforce a strict
>>> FIFO ordering on cpus trying to take a lock; however, if the hypervisor
>>> scheduler does not schedule the cpus in the correct order, the system can
>>> waste a huge amount of time spinning until the next cpu can take the lock.
>>>
>>> (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
>>> http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
>>>
>>> To address this, we add two hooks:
>>> - __ticket_spin_lock which is called after the cpu has been
>>> spinning on the lock for a significant number of iterations but has
>>> failed to take the lock (presumably because the cpu holding the lock
>>> has been descheduled). The lock_spinning pvop is expected to block
>>> the cpu until it has been kicked by the current lock holder.
>>> - __ticket_spin_unlock, which on releasing a contended lock
>>> (there are more cpus with tail tickets), it looks to see if the next
>>> cpu is blocked and wakes it if so.
>>>
>>> When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
>>> functions causes all the extra code to go away.
>>>
>>>
>> I've made some real world benchmarks based on this serie of patches
>> applied on top of a vanilla Linux-3.3-rc6 (commit
>> 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both
>> CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions
>> compared:
>> * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch
>> * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch
>> * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch
>> * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch
>>
>>
> [...]
>
>> == Results
>> This test points in the direction that Jeremy's rebased patches don't
>> introduce a peformance penalty at all, but also that we could likely
>> consider CONFIG_PARAVIRT_SPINLOCK option removal, or turn it on by
>> default and suggest disabling just on very old CPUs (assuming a
>> performance regression can be proven there).
>>
> Very interesting results, in particular knowing that in the one guest
> case things do not get (significantly) slower due to the added logic
> and LOCKed RMW in the unlock path.
>
> AFAICR, the problem really became apparent when running multiple guests
> time sharing the physical CPUs, i.e., two guests with eight vCPUs each
> on an eight core machine. Did you look at this setup with your tests?
>
>
Please note that my tests are made on native Linux, without XEN involvement.
You maybe meant that the spinlock paravirtualization became generally
useful in the case you mentioned? (2 guests, 8vpcu + 8vcpu)?
Attilio
^ permalink raw reply
* Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
From: Stephan Diestelhorst @ 2012-03-21 14:25 UTC (permalink / raw)
To: Attilio Rao
Cc: the arch/x86 maintainers, KVM, Konrad Rzeszutek Wilk,
Peter Zijlstra, Raghavendra K T, LKML, Andi Kleen,
Srivatsa Vaddagiri, Avi Kivity, Jeremy Fitzhardinge,
H. Peter Anvin, Ingo Molnar, Virtualization, Linus Torvalds,
Xen Devel, Stefano Stabellini
In-Reply-To: <4F69DC68.6080200@citrix.com>
On Wednesday 21 March 2012, 13:49:28 Attilio Rao wrote:
> On 21/03/12 13:22, Stephan Diestelhorst wrote:
> > On Wednesday 21 March 2012, 13:04:25 Attilio Rao wrote:
> >
> >> On 21/03/12 10:20, Raghavendra K T wrote:
> >>
> >>> From: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
> >>>
> >>> Rather than outright replacing the entire spinlock implementation in
> >>> order to paravirtualize it, keep the ticket lock implementation but add
> >>> a couple of pvops hooks on the slow patch (long spin on lock, unlocking
> >>> a contended lock).
> >>>
> >>> Ticket locks have a number of nice properties, but they also have some
> >>> surprising behaviours in virtual environments. They enforce a strict
> >>> FIFO ordering on cpus trying to take a lock; however, if the hypervisor
> >>> scheduler does not schedule the cpus in the correct order, the system can
> >>> waste a huge amount of time spinning until the next cpu can take the lock.
> >>>
> >>> (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
> >>> http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
> >>>
> >>> To address this, we add two hooks:
> >>> - __ticket_spin_lock which is called after the cpu has been
> >>> spinning on the lock for a significant number of iterations but has
> >>> failed to take the lock (presumably because the cpu holding the lock
> >>> has been descheduled). The lock_spinning pvop is expected to block
> >>> the cpu until it has been kicked by the current lock holder.
> >>> - __ticket_spin_unlock, which on releasing a contended lock
> >>> (there are more cpus with tail tickets), it looks to see if the next
> >>> cpu is blocked and wakes it if so.
> >>>
> >>> When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
> >>> functions causes all the extra code to go away.
> >>>
> >>>
> >> I've made some real world benchmarks based on this serie of patches
> >> applied on top of a vanilla Linux-3.3-rc6 (commit
> >> 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both
> >> CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions
> >> compared:
> >> * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch
> >> * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch
> >> * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch
> >> * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch
> >>
> >>
> > [...]
> >
> >> == Results
> >> This test points in the direction that Jeremy's rebased patches don't
> >> introduce a peformance penalty at all, but also that we could likely
> >> consider CONFIG_PARAVIRT_SPINLOCK option removal, or turn it on by
> >> default and suggest disabling just on very old CPUs (assuming a
> >> performance regression can be proven there).
> >>
> > Very interesting results, in particular knowing that in the one guest
> > case things do not get (significantly) slower due to the added logic
> > and LOCKed RMW in the unlock path.
> >
> > AFAICR, the problem really became apparent when running multiple guests
> > time sharing the physical CPUs, i.e., two guests with eight vCPUs each
> > on an eight core machine. Did you look at this setup with your tests?
> >
> >
>
> Please note that my tests are made on native Linux, without XEN involvement.
>
> You maybe meant that the spinlock paravirtualization became generally
> useful in the case you mentioned? (2 guests, 8vpcu + 8vcpu)?
Yes, that is what I meant. Just to clarify why you do not see any
speed-ups, and were wondering why. If the whole point of the exercise
was to see that there are no perforamnce regressions, fine. In that
case I misunderstood.
Stephan
--
Stephan Diestelhorst, AMD Operating System Research Center
stephan.diestelhorst@amd.com
Tel. +49 (0)351 448 356 719
Advanced Micro Devices GmbH
Einsteinring 24
85609 Aschheim
Germany
Geschaeftsfuehrer: Alberto Bozzo
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632, WEEE-Reg-Nr: DE 12919551
^ permalink raw reply
* Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
From: Attilio Rao @ 2012-03-21 14:33 UTC (permalink / raw)
To: Stephan Diestelhorst
Cc: the arch/x86 maintainers, KVM, Konrad Rzeszutek Wilk,
Peter Zijlstra, Raghavendra K T, LKML, Andi Kleen,
Srivatsa Vaddagiri, Avi Kivity, Jeremy Fitzhardinge,
H. Peter Anvin, Ingo Molnar, Virtualization, Linus Torvalds,
Xen Devel, Stefano Stabellini
In-Reply-To: <1363312.nixp29LUbv@chlor>
On 21/03/12 14:25, Stephan Diestelhorst wrote:
> On Wednesday 21 March 2012, 13:49:28 Attilio Rao wrote:
>
>> On 21/03/12 13:22, Stephan Diestelhorst wrote:
>>
>>> On Wednesday 21 March 2012, 13:04:25 Attilio Rao wrote:
>>>
>>>
>>>> On 21/03/12 10:20, Raghavendra K T wrote:
>>>>
>>>>
>>>>> From: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
>>>>>
>>>>> Rather than outright replacing the entire spinlock implementation in
>>>>> order to paravirtualize it, keep the ticket lock implementation but add
>>>>> a couple of pvops hooks on the slow patch (long spin on lock, unlocking
>>>>> a contended lock).
>>>>>
>>>>> Ticket locks have a number of nice properties, but they also have some
>>>>> surprising behaviours in virtual environments. They enforce a strict
>>>>> FIFO ordering on cpus trying to take a lock; however, if the hypervisor
>>>>> scheduler does not schedule the cpus in the correct order, the system can
>>>>> waste a huge amount of time spinning until the next cpu can take the lock.
>>>>>
>>>>> (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
>>>>> http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
>>>>>
>>>>> To address this, we add two hooks:
>>>>> - __ticket_spin_lock which is called after the cpu has been
>>>>> spinning on the lock for a significant number of iterations but has
>>>>> failed to take the lock (presumably because the cpu holding the lock
>>>>> has been descheduled). The lock_spinning pvop is expected to block
>>>>> the cpu until it has been kicked by the current lock holder.
>>>>> - __ticket_spin_unlock, which on releasing a contended lock
>>>>> (there are more cpus with tail tickets), it looks to see if the next
>>>>> cpu is blocked and wakes it if so.
>>>>>
>>>>> When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
>>>>> functions causes all the extra code to go away.
>>>>>
>>>>>
>>>>>
>>>> I've made some real world benchmarks based on this serie of patches
>>>> applied on top of a vanilla Linux-3.3-rc6 (commit
>>>> 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both
>>>> CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions
>>>> compared:
>>>> * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch
>>>> * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch
>>>> * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch
>>>> * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch
>>>>
>>>>
>>>>
>>> [...]
>>>
>>>
>>>> == Results
>>>> This test points in the direction that Jeremy's rebased patches don't
>>>> introduce a peformance penalty at all, but also that we could likely
>>>> consider CONFIG_PARAVIRT_SPINLOCK option removal, or turn it on by
>>>> default and suggest disabling just on very old CPUs (assuming a
>>>> performance regression can be proven there).
>>>>
>>>>
>>> Very interesting results, in particular knowing that in the one guest
>>> case things do not get (significantly) slower due to the added logic
>>> and LOCKed RMW in the unlock path.
>>>
>>> AFAICR, the problem really became apparent when running multiple guests
>>> time sharing the physical CPUs, i.e., two guests with eight vCPUs each
>>> on an eight core machine. Did you look at this setup with your tests?
>>>
>>>
>>>
>> Please note that my tests are made on native Linux, without XEN involvement.
>>
>> You maybe meant that the spinlock paravirtualization became generally
>> useful in the case you mentioned? (2 guests, 8vpcu + 8vcpu)?
>>
> Yes, that is what I meant. Just to clarify why you do not see any
> speed-ups, and were wondering why. If the whole point of the exercise
> was to see that there are no perforamnce regressions, fine. In that
> case I misunderstood.
>
Yes, that's right, I just wanted to measure (possible) overhead in
native Linux and the cost of leaving CONFIG_PARAVIRT_SPINLOCK on.
Thanks,
Attilio
^ permalink raw reply
* Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
From: Raghavendra K T @ 2012-03-21 14:49 UTC (permalink / raw)
To: Stephan Diestelhorst
Cc: KVM, Konrad Rzeszutek Wilk, Peter Zijlstra,
the arch/x86 maintainers, LKML, Virtualization, Andi Kleen,
Srivatsa Vaddagiri, Avi Kivity, Jeremy Fitzhardinge,
H. Peter Anvin, Attilio Rao, Ingo Molnar, Linus Torvalds,
Xen Devel, Stefano Stabellini
In-Reply-To: <4F69E6BB.508@citrix.com>
On 03/21/2012 08:03 PM, Attilio Rao wrote:
> On 21/03/12 14:25, Stephan Diestelhorst wrote:
>> On Wednesday 21 March 2012, 13:49:28 Attilio Rao wrote:
>>> On 21/03/12 13:22, Stephan Diestelhorst wrote:
>>>> On Wednesday 21 March 2012, 13:04:25 Attilio Rao wrote:
>>>>
>>>>> On 21/03/12 10:20, Raghavendra K T wrote:
>>>>>
>>>>>> From: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
>>>>>>
>>>>>> Rather than outright replacing the entire spinlock implementation in
>>>>>> order to paravirtualize it, keep the ticket lock implementation
>>>>>> but add
>>>>>> a couple of pvops hooks on the slow patch (long spin on lock,
>>>>>> unlocking
>>>>>> a contended lock).
>>>>>>
>>>>>> Ticket locks have a number of nice properties, but they also have
>>>>>> some
>>>>>> surprising behaviours in virtual environments. They enforce a strict
>>>>>> FIFO ordering on cpus trying to take a lock; however, if the
>>>>>> hypervisor
>>>>>> scheduler does not schedule the cpus in the correct order, the
>>>>>> system can
>>>>>> waste a huge amount of time spinning until the next cpu can take
>>>>>> the lock.
>>>>>>
>>>>>> (See Thomas Friebel's talk "Prevent Guests from Spinning Around"
>>>>>> http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
>>>>>>
>>>>>> To address this, we add two hooks:
>>>>>> - __ticket_spin_lock which is called after the cpu has been
>>>>>> spinning on the lock for a significant number of iterations but has
>>>>>> failed to take the lock (presumably because the cpu holding the lock
>>>>>> has been descheduled). The lock_spinning pvop is expected to block
>>>>>> the cpu until it has been kicked by the current lock holder.
>>>>>> - __ticket_spin_unlock, which on releasing a contended lock
>>>>>> (there are more cpus with tail tickets), it looks to see if the next
>>>>>> cpu is blocked and wakes it if so.
>>>>>>
>>>>>> When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub
>>>>>> functions causes all the extra code to go away.
>>>>>>
>>>>>>
>>>>> I've made some real world benchmarks based on this serie of patches
>>>>> applied on top of a vanilla Linux-3.3-rc6 (commit
>>>>> 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both
>>>>> CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions
>>>>> compared:
>>>>> * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch
>>>>> * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch
>>>>> * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch
>>>>> * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch
>>>>>
>>>>>
>>>> [...]
>>>>
>>>>> == Results
>>>>> This test points in the direction that Jeremy's rebased patches don't
>>>>> introduce a peformance penalty at all, but also that we could likely
>>>>> consider CONFIG_PARAVIRT_SPINLOCK option removal, or turn it on by
>>>>> default and suggest disabling just on very old CPUs (assuming a
>>>>> performance regression can be proven there).
>>>>>
>>>> Very interesting results, in particular knowing that in the one guest
>>>> case things do not get (significantly) slower due to the added logic
>>>> and LOCKed RMW in the unlock path.
>>>>
>>>> AFAICR, the problem really became apparent when running multiple guests
>>>> time sharing the physical CPUs, i.e., two guests with eight vCPUs each
>>>> on an eight core machine. Did you look at this setup with your tests?
>>>>
>>>>
>>> Please note that my tests are made on native Linux, without XEN
>>> involvement.
>>>
>>> You maybe meant that the spinlock paravirtualization became generally
>>> useful in the case you mentioned? (2 guests, 8vpcu + 8vcpu)?
>> Yes, that is what I meant. Just to clarify why you do not see any
>> speed-ups, and were wondering why. If the whole point of the exercise
>> was to see that there are no perforamnce regressions, fine. In that
>> case I misunderstood.
>
> Yes, that's right, I just wanted to measure (possible) overhead in
> native Linux and the cost of leaving CONFIG_PARAVIRT_SPINLOCK on.
True. Even my result was only revolved around native overhead.
Till now main concern in the community was native overhead. So this time
we have the results that proves CONFG_PARAVRT_SPINLOCK is now in par
with corresponding vanilla because of ticketlock improvements.
Coming to Guest scenario, I intend to post KVM counterpart of the
patches with results where we see huge improvement (around 90%) in
contention scenario and almost zero overhead in normal case.
>
^ permalink raw reply
* Re: [PATCH RFC V6 2/11] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks
From: Linus Torvalds @ 2012-03-21 17:13 UTC (permalink / raw)
To: Raghavendra K T
Cc: KVM, Konrad Rzeszutek Wilk, Peter Zijlstra, Stefano Stabellini,
the arch/x86 maintainers, LKML, Virtualization, Andi Kleen,
Srivatsa Vaddagiri, Avi Kivity, Jeremy Fitzhardinge,
H. Peter Anvin, Attilio Rao, Ingo Molnar, Xen Devel,
Stephan Diestelhorst
In-Reply-To: <20120321102107.473.89777.sendpatchset@codeblue.in.ibm.com>
On Wed, Mar 21, 2012 at 3:21 AM, Raghavendra K T
<raghavendra.kt@linux.vnet.ibm.com> wrote:
> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
>
> The code size expands somewhat, and its probably better to just call
> a function rather than inline it.
>
> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
> ---
> arch/x86/Kconfig | 3 +++
> kernel/Kconfig.locks | 2 +-
> 2 files changed, 4 insertions(+), 1 deletions(-)
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5bed94e..10c28ec 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -623,6 +623,9 @@ config PARAVIRT_SPINLOCKS
>
> If you are unsure how to answer this question, answer N.
>
> +config ARCH_NOINLINE_SPIN_UNLOCK
> + def_bool PARAVIRT_SPINLOCKS
> +
> config PARAVIRT_CLOCK
> bool
>
> diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
> index 5068e2a..584637b 100644
> --- a/kernel/Kconfig.locks
> +++ b/kernel/Kconfig.locks
> @@ -125,7 +125,7 @@ config INLINE_SPIN_LOCK_IRQSAVE
> ARCH_INLINE_SPIN_LOCK_IRQSAVE
>
> config INLINE_SPIN_UNLOCK
> - def_bool !DEBUG_SPINLOCK && (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK)
> + def_bool !DEBUG_SPINLOCK && (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) && !ARCH_NOINLINE_SPIN_UNLOCK
>
> config INLINE_SPIN_UNLOCK_BH
> def_bool !DEBUG_SPINLOCK && ARCH_INLINE_SPIN_UNLOCK_BH
Ugh. This is getting really ugly.
Can we just fix it by
- getting rid of INLINE_SPIN_UNLOCK entirely
- replacing it with UNINLINE_SPIN_UNLOCK instead with the reverse
meaning, and no "def_bool" at all, just a simple
config UNINLINE_SPIN_UNLOCK
bool
- make the various people who want to uninline the spinlocks (like
spinlock debugging, paravirt etc) all just do
select UNINLINE_SPIN_UNLOCK
because quite frankly, the whole spinunlock inlining logic is
*already* unreadable, and you just made it worse.
Linus
^ permalink raw reply
* Re: vhost question
From: Steve Glass @ 2012-03-22 1:48 UTC (permalink / raw)
To: virtualization, kvm, netdev
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Just some further information concerning my earlier question
concerning vhost and virtio.
I'm using virtio to implement an emulated mac80211 device in the
guest. A simple network simulation will be used to control delivery of
frames between guests and for this I am using the vhost approach.
A simple first-cut attempt at the tx and rx kick handlers are given
below. When the guest transmits frames the vhost's TX kick handler is
executed and copies the buffers onto a queue for the intended
recipient(s). When the vhost's RX kick handler is run it copies the
buffer from the queue and notifies the client that the buffers have
been used.
The problem is that if there are no frames in the queue when the guest
rx kick handler runs then it has to exit and I have to arrange that it
runs again. That's done in the current prototype by having the guests
poll using a timer - which is ugly and inefficient. Can I get the
vhost tx kick handler to wake the appropriate vhost rx kick handler?
How can I achieve this?
Many thanks,
Steve
static void
handle_rx(struct vhost_work *work)
{
int n;
unsigned out, in, frames;
struct transmission *t;
struct vhost_poll *p = container_of(work, struct vhost_poll,
work);
struct vhost_virtqueue *vq =
container_of(p, struct vhost_virtqueue, poll);
struct vhost_node *node =
container_of(vq, struct vhost_node, vqs[WLAN_VQ_RX]);
struct vhost_dev *dev = &node->vdev;
mutex_lock(&vq->mutex);
vhost_disable_notify(dev, vq);
while (!queue_empty(&node->rxq)) {
n = vhost_get_vq_desc(dev,
vq,
vq->iov,
ARRAY_SIZE(vq->iov),
&out,
&in,
NULL,
NULL);
if (0 < n || n == vq->num)
break;
t = queue_pop(&node->rxq);
BUG_ON(copy_to_user(vq->iov[0].iov_base, t->buf,
t->buf_sz));
vq->iov[0].iov_len = t->buf_sz;
vhost_add_used(vq, n, out);
transmission_free(t);
++frames;
}
if (frames)
vhost_signal(dev, vq);
vhost_enable_notify(dev, vq);
mutex_unlock(&vq->mutex);
}
static void
handle_tx(struct vhost_work *work)
{
int n;
unsigned out, in;
struct transmission *t;
struct vhost_node *receiver;
struct vhost_poll *p =
container_of(work, struct vhost_poll, work);
struct vhost_virtqueue *vq =
container_of(p, struct vhost_virtqueue, poll);
struct vhost_node *w =
container_of(vq, struct vhost_node, vqs[WLAN_VQ_TX]);
struct vhost_dev *dev = &w->vdev;
mutex_lock(&vq->mutex);
do {
vhost_disable_notify(dev, vq);
n = vhost_get_vq_desc(dev,
vq,
vq->iov,
ARRAY_SIZE(vq->iov),
&out,
&in,
NULL,
NULL);
while (n >= 0 && n != vq->num) {
receiver = net_get_receiver(w);
if ((receiver) && (t = transmission_alloc())) {
BUG_ON(copy_from_user(t->buf,
vq->iov[1].iov_base,
vq->iov[1].iov_len));
t->buf_sz = vq->iov[1].iov_len;
queue_push(&receiver->rxq, t);
// ToDo: kick receiver's handle_rx
}
vhost_add_used(vq, n, out);
n = vhost_get_vq_desc(dev,
vq,
vq->iov,
ARRAY_SIZE(vq->iov),
&out,
&in,
NULL,
NULL);
}
vhost_signal(dev, vq);
} while (vhost_enable_notify(dev, vq));
mutex_unlock(&vq->mutex);
}
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk9qhO4ACgkQW7aAm65EWy7w4wCgrzGB2Zit4rWUzMjwpJEJnIfj
xDsAoLBDMj+4MVrjPS5upgDSIGOi4IzL
=Ms/+
-----END PGP SIGNATURE-----
^ permalink raw reply
* Re: [PATCH] virtio-spec: clarify ro/rw bits and updating rule of virtio-net status field
From: Rusty Russell @ 2012-03-22 4:30 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: qemu-devel, Christian Borntraeger, linux-kernel, virtualization
In-Reply-To: <20120321063746.GA6773@redhat.com>
On Wed, 21 Mar 2012 08:37:46 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> Ah. Right, we need to trap for host to clear the bit.
> OK, so let's make the bit RO, and add
> VIRTIO_NET_CTRL_ANNOUNCED to acknowledge that we've
> seen VIRTIO_NET_S_ANNOUNCE using the control VQ?
Thanks, that's nice. Guest should send arp packets first, then send
VIRTIO_NET_CTRL_ANNOUNCED, and ignore the bit being set in the meantime.
Thanks,
Rusty.
--
How could I marry someone with more hair than me? http://baldalex.org
^ permalink raw reply
* Re: [PULL] vhost-net/virtio: fixes for 3.4
From: Michael S. Tsirkin @ 2012-03-22 8:27 UTC (permalink / raw)
To: David Miller
Cc: kvm, virtualization, netdev, linux-kernel, levinsasha928, nyh,
nyh
In-Reply-To: <20120320145010.GA31570@redhat.com>
On Tue, Mar 20, 2012 at 04:50:41PM +0200, Michael S. Tsirkin wrote:
> The following changes since commit 5ffca28a4ac7abb8a254fafe6bd03b2f83667df7:
>
> Merge git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs (2012-02-27 07:59:33 -0800)
>
> are available in the git repository at:
>
> ra.kernel.org:/pub/scm/linux/kernel/git/mst/vhost.git for_davem
>
> (ssh url as git.kernel.org seems down at the moment, when it comes up
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git for_davem
> should be the equivalent).
>
> for you to fetch changes up to ea5d404655ba3b356d0c06d6a3c4f24112124522:
>
> vhost: fix release path lockdep checks (2012-02-28 09:13:22 +0200)
>
> ----------------------------------------------------------------
> vhost/virtio: fixes for 3.4
>
> This includes a couple of vhost-net bugfixes,
> and fixes tools/virtio making it useful again.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> ----------------------------------------------------------------
Dave, just checking - not sure I made it clear that this pull request
is intended to go in through your tree.
If you see any issues pls let me know so I can fix them.
Thanks!
> Michael S. Tsirkin (4):
> tools/virtio: add linux/module.h stub
> tools/virtio: add linux/hrtimer.h stub
> tools/virtio: stub out strong barriers
> vhost: fix release path lockdep checks
>
> Nadav Har'El (1):
> vhost: don't forget to schedule()
>
> drivers/vhost/net.c | 2 +-
> drivers/vhost/vhost.c | 11 +++++++----
> drivers/vhost/vhost.h | 2 +-
> tools/virtio/linux/virtio.h | 3 +++
> 4 files changed, 12 insertions(+), 6 deletions(-)
> create mode 100644 tools/virtio/linux/hrtimer.h
> create mode 100644 tools/virtio/linux/module.h
^ permalink raw reply
* Re: [PULL] vhost-net/virtio: fixes for 3.4
From: David Miller @ 2012-03-22 8:57 UTC (permalink / raw)
To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, levinsasha928, nyh,
nyh
In-Reply-To: <20120322082718.GA11258@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 22 Mar 2012 10:27:19 +0200
> Dave, just checking - not sure I made it clear that this pull request
> is intended to go in through your tree.
> If you see any issues pls let me know so I can fix them.
I missed it, sorry.
For some reason patchwork didn't pick it up, because if it did
then it wouldn't have mattered that I lost it in my huge inbox.
Oh well :-/
But I've got it now, thanks. I'll work on it tomorrow.
^ permalink raw reply
* [PATCH] virtio-spec: fix typo
From: Jason Wang @ 2012-03-22 9:46 UTC (permalink / raw)
To: rusty, virtualization, linux-kernel, mst
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
virtio-0.9.4.lyx | 12 +++++++++---
1 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/virtio-0.9.4.lyx b/virtio-0.9.4.lyx
index 6c7bab1..9d30977 100644
--- a/virtio-0.9.4.lyx
+++ b/virtio-0.9.4.lyx
@@ -58,6 +58,7 @@
\html_be_strict false
\author -608949062 "Rusty Russell,,,"
\author 1531152142 "pbonzini"
+\author 2090695081 "Jason"
\end_header
\begin_body
@@ -4657,9 +4658,14 @@ Control Virtqueue
\end_layout
\begin_layout Standard
-The driver uses the control virtqueue (if VIRTIO_NET_F_VTRL_VQ is negotiated)
- to send commands to manipulate various features of the device which would
- not easily map into the configuration space.
+The driver uses the control virtqueue (if VIRTIO_NET_F_
+\change_inserted 2090695081 1332409259
+C
+\change_deleted 2090695081 1332409259
+V
+\change_unchanged
+TRL_VQ is negotiated) to send commands to manipulate various features of
+ the device which would not easily map into the configuration space.
\end_layout
\begin_layout Standard
^ permalink raw reply related
* [RFC PATCH] virtio-spec: ack the announce notification through ctrl_vq
From: Jason Wang @ 2012-03-22 9:47 UTC (permalink / raw)
To: rusty, virtualization, linux-kernel, mst
During link announcement, driver needs a method to notify device that it has
received the notification and let it clear the VIRITO_NET_S_ANNOUNCE bit in the
status field. Doing this through a dedicated command looks suitable for all
platforms (especially for the ones who don't trap the status read or write) with
a ctrl vq and can solve the race between host and guest.
So this patch makes VIRTIO_NET_F_ANNOUNCE depends on VIRTIO_NET_F_CTRL_VQ and
introduces a dedicated command VIRTIO_NET_CTRL_ANNOUNCE_ACK to let device clear
the VIRTIO_NET_S_ANNOUNCE bit in the status field.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
virtio-0.9.4.lyx | 63 +++++++++++++++++++++++++++++++++++++++++++++++++-----
1 files changed, 57 insertions(+), 6 deletions(-)
diff --git a/virtio-0.9.4.lyx b/virtio-0.9.4.lyx
index 9d30977..d01284c 100644
--- a/virtio-0.9.4.lyx
+++ b/virtio-0.9.4.lyx
@@ -4013,8 +4013,12 @@ configuration
layout Two configuration fields are currently defined.
The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
- Two bits are currently defined for the status field: VIRTIO_NET_S_LINK_UP
- and VIRTIO_NET_S_ANNOUNCE.
+ Two
+\change_inserted 2090695081 1332406434
+read-only
+\change_unchanged
+bits are currently defined for the status field: VIRTIO_NET_S_LINK_UP and
+ VIRTIO_NET_S_ANNOUNCE.
\begin_inset listings
inline false
@@ -4902,18 +4906,58 @@ Gratuitous Packet Sending
\end_layout
\begin_layout Standard
-If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE, it can ask the
- guest to send gratuitous packets; this is usually done after the guest
- has been physically migrated, and needs to announce its presence on the
- new network links.
+If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE
+\change_inserted 2090695081 1332407810
+ (depends on VIRTIO_NET_F_CTRL_VQ)
+\change_unchanged
+, it can ask the guest to send gratuitous packets; this is usually done
+ after the guest has been physically migrated, and needs to announce its
+ presence on the new network links.
(As hypervisor does not have the knowledge of guest network configuration
(eg.
tagged vlan) it is simplest to prod the guest in this way).
+\change_inserted 2090695081 1332405026
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 2090695081 1332405026
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1332405658
+
+#define VIRTIO_NET_CTRL_ANNOUNCE 3
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1332407582
+
+ #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+
\end_layout
\begin_layout Standard
The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status field when
it notices the changes of device configuration.
+
+\change_inserted 2090695081 1332407079
+ The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that driver
+ has recevied the notification and device would clear the VIRTIO_NET_S_ANNOUNCE
+ bit in the status filed after it received this command.
+\change_unchanged
+
\end_layout
\begin_layout Standard
@@ -4921,7 +4965,14 @@ Processing this notification involves:
\end_layout
\begin_layout Enumerate
+
+\change_inserted 2090695081 1332405963
+Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq.
+
+\change_deleted 2090695081 1332405924
Clearing VIRTIO_NET_S_ANNOUNCE bit in the status field.
+\change_unchanged
+
\end_layout
\begin_layout Enumerate
^ permalink raw reply related
* Re: vhost question
From: Stefan Hajnoczi @ 2012-03-22 9:52 UTC (permalink / raw)
To: Steve Glass; +Cc: netdev, kvm, virtualization
In-Reply-To: <4F6A84EF.9060105@gmail.com>
On Thu, Mar 22, 2012 at 1:48 AM, Steve Glass <stevie.glass@gmail.com> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Just some further information concerning my earlier question
> concerning vhost and virtio.
>
> I'm using virtio to implement an emulated mac80211 device in the
> guest. A simple network simulation will be used to control delivery of
> frames between guests and for this I am using the vhost approach.
>
> A simple first-cut attempt at the tx and rx kick handlers are given
> below. When the guest transmits frames the vhost's TX kick handler is
> executed and copies the buffers onto a queue for the intended
> recipient(s). When the vhost's RX kick handler is run it copies the
> buffer from the queue and notifies the client that the buffers have
> been used.
>
> The problem is that if there are no frames in the queue when the guest
> rx kick handler runs then it has to exit and I have to arrange that it
> runs again. That's done in the current prototype by having the guests
> poll using a timer - which is ugly and inefficient. Can I get the
> vhost tx kick handler to wake the appropriate vhost rx kick handler?
> How can I achieve this?
Can you queue a tx->rx kick on the vhost work queue with vhost_work_queue()?
Stefan
^ permalink raw reply
* Re: [PATCH RFC V6 2/11] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks
From: Raghavendra K T @ 2012-03-22 10:06 UTC (permalink / raw)
To: Linus Torvalds
Cc: KVM, Konrad Rzeszutek Wilk, Peter Zijlstra, Stefano Stabellini,
the arch/x86 maintainers, LKML, Virtualization, Andi Kleen,
Srivatsa Vaddagiri, Avi Kivity, Jeremy Fitzhardinge,
H. Peter Anvin, Attilio Rao, Ingo Molnar, Xen Devel,
Stephan Diestelhorst
In-Reply-To: <CA+55aFzJn_U=2-kxGX15v6zLaCV+Jgtxjop2PudizzsqSn+nLg@mail.gmail.com>
On 03/21/2012 10:43 PM, Linus Torvalds wrote:
> On Wed, Mar 21, 2012 at 3:21 AM, Raghavendra K T
> <raghavendra.kt@linux.vnet.ibm.com> wrote:
>> From: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
>>
>> The code size expands somewhat, and its probably better to just call
>> a function rather than inline it.
>>
>> Signed-off-by: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>> ---
>> arch/x86/Kconfig | 3 +++
>> kernel/Kconfig.locks | 2 +-
>> 2 files changed, 4 insertions(+), 1 deletions(-)
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 5bed94e..10c28ec 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -623,6 +623,9 @@ config PARAVIRT_SPINLOCKS
>>
>> If you are unsure how to answer this question, answer N.
>>
>> +config ARCH_NOINLINE_SPIN_UNLOCK
>> + def_bool PARAVIRT_SPINLOCKS
>> +
>> config PARAVIRT_CLOCK
>> bool
>>
>> diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
>> index 5068e2a..584637b 100644
>> --- a/kernel/Kconfig.locks
>> +++ b/kernel/Kconfig.locks
>> @@ -125,7 +125,7 @@ config INLINE_SPIN_LOCK_IRQSAVE
>> ARCH_INLINE_SPIN_LOCK_IRQSAVE
>>
>> config INLINE_SPIN_UNLOCK
>> - def_bool !DEBUG_SPINLOCK&& (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK)
>> + def_bool !DEBUG_SPINLOCK&& (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK)&& !ARCH_NOINLINE_SPIN_UNLOCK
>>
>> config INLINE_SPIN_UNLOCK_BH
>> def_bool !DEBUG_SPINLOCK&& ARCH_INLINE_SPIN_UNLOCK_BH
>
> Ugh. This is getting really ugly.
>
Agree that it had become longer.
> Can we just fix it by
> - getting rid of INLINE_SPIN_UNLOCK entirely
>
> - replacing it with UNINLINE_SPIN_UNLOCK instead with the reverse
> meaning, and no "def_bool" at all, just a simple
>
> config UNINLINE_SPIN_UNLOCK
> bool
>
> - make the various people who want to uninline the spinlocks (like
> spinlock debugging, paravirt etc) all just do
>
> select UNINLINE_SPIN_UNLOCK
I just posted https://lkml.org/lkml/2012/3/22/94. Please let me know
if that looks better.
And this patch should now become something like
---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5bed94e..2666b7d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -613,6 +613,7 @@ config PARAVIRT
config PARAVIRT_SPINLOCKS
bool "Paravirtualization layer for spinlocks"
depends on PARAVIRT && SMP && EXPERIMENTAL
+ select UNINLINE_SPIN_UNLOCK
---help---
Paravirtualized spinlocks allow a pvops backend to replace the
spinlock implementation with something virtualization-friendly
^ permalink raw reply related
* [PATCH] xen/smp: Remove unnecessary call to smp_processor_id()
From: Srivatsa S. Bhat @ 2012-03-22 12:59 UTC (permalink / raw)
To: konrad.wilk, jeremy
Cc: xen-devel, x86, linux-kernel, virtualization, mingo,
Srivatsa S. Bhat, hpa, tglx
There is an extra and unnecessary call to smp_processor_id()
in cpu_bringup(). Remove it.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---
arch/x86/xen/smp.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 501d4e0..acfe7a3 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -59,7 +59,7 @@ static irqreturn_t xen_reschedule_interrupt(int irq, void *dev_id)
static void __cpuinit cpu_bringup(void)
{
- int cpu = smp_processor_id();
+ int cpu;
cpu_init();
touch_softlockup_watchdog();
^ permalink raw reply related
* Re: [PULL] vhost-net/virtio: fixes for 3.4
From: David Miller @ 2012-03-22 20:56 UTC (permalink / raw)
To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, levinsasha928, nyh,
nyh
In-Reply-To: <20120322082718.GA11258@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Thu, 22 Mar 2012 10:27:19 +0200
> On Tue, Mar 20, 2012 at 04:50:41PM +0200, Michael S. Tsirkin wrote:
>> The following changes since commit 5ffca28a4ac7abb8a254fafe6bd03b2f83667df7:
>>
>> Merge git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs (2012-02-27 07:59:33 -0800)
>>
>> are available in the git repository at:
>>
>> ra.kernel.org:/pub/scm/linux/kernel/git/mst/vhost.git for_davem
>>
>> (ssh url as git.kernel.org seems down at the moment, when it comes up
>> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git for_davem
>> should be the equivalent).
Neither of these URL's work for me:
[davem@drr net]$ git pull git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git for_davem
fatal: Couldn't find remote ref for_davem
[davem@drr net]$ git pull ra.kernel.org:/pub/scm/linux/kernel/git/mst/vhost.git for_davem
Enter passphrase for key '/home/davem/.ssh/davem':
Permission denied (publickey).
fatal: The remote end hung up unexpectedly
[davem@drr net]$
If you're sending me signed pull requests that thus require a newer
version of GIT, please don't.
^ permalink raw reply
* Re: [PULL] vhost-net/virtio: fixes for 3.4
From: Michael S. Tsirkin @ 2012-03-22 22:12 UTC (permalink / raw)
To: David Miller
Cc: kvm, virtualization, netdev, linux-kernel, levinsasha928, nyh,
nyh
In-Reply-To: <20120322.165649.32205268396882350.davem@davemloft.net>
On Thu, Mar 22, 2012 at 04:56:49PM -0400, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Thu, 22 Mar 2012 10:27:19 +0200
>
> > On Tue, Mar 20, 2012 at 04:50:41PM +0200, Michael S. Tsirkin wrote:
> >> The following changes since commit 5ffca28a4ac7abb8a254fafe6bd03b2f83667df7:
> >>
> >> Merge git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs (2012-02-27 07:59:33 -0800)
> >>
> >> are available in the git repository at:
> >>
> >> ra.kernel.org:/pub/scm/linux/kernel/git/mst/vhost.git for_davem
> >>
> >> (ssh url as git.kernel.org seems down at the moment, when it comes up
> >> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git for_davem
> >> should be the equivalent).
>
> Neither of these URL's work for me:
>
> [davem@drr net]$ git pull git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git for_davem
> fatal: Couldn't find remote ref for_davem
> [davem@drr net]$ git pull ra.kernel.org:/pub/scm/linux/kernel/git/mst/vhost.git for_davem
> Enter passphrase for key '/home/davem/.ssh/davem':
> Permission denied (publickey).
> fatal: The remote end hung up unexpectedly
> [davem@drr net]$
>
> If you're sending me signed pull requests that thus require a newer
> version of GIT, please don't.
OK, sorry about that. Can't fix right now as I'm not at
the box that has the key but this works for me with an old git:
/usr/bin/git fetch git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git refs/tags/for_davem
You just don't get the signature checked.
--
MST
^ permalink raw reply
* Call for Participation: 12th IEEE/ACM Int. Symp. on Cluster, Grid and Cloud, Computing (CCGrid 2012) -- May13-16 in Ottawa, Canada
From: Ioan Raicu @ 2012-03-22 22:32 UTC (permalink / raw)
To: virtualization
*** Our apologies if you receive multiple copies of this Call ***
12th IEEE/ACM International Symposium on Cluster, Grid and Cloud
Computing (CCGrid 2012) Ottawa, Canada
May 13-16, 2012
http://www.cloudbus.org/ccgrid2012
Venue: The Delta Ottawa City Centre Hotel
[Special rates for conference attendees. To book visit:
www.cloudbus.org/ccgrid2012/accommodations.html.]
***************************
CALL FOR PARTICPATION
***************************
*** Registration is Open. ***
*** Take advantage of the Early Registration Rates until April 9/2012 ***
Overview:
Rapid advances in processing, communication and systems/middleware
technologies are leading to new paradigms and platforms for computing,
ranging from computing Clusters to widely distributed Grid and
emerging Clouds. CCGrid is a series of very successful conferences,
sponsored by the IEEE Computer Society Technical Committee on Scalable
Computing (TCSC) and ACM, with the overarching goal of bringing
together international researchers, developers, and users and to
provide an international forum to present leading research activities
and results on a broad range of topics related to these platforms and
paradigms and their applications. The conference features keynotes,
technical presentations, posters and research demos, workshops,
tutorials, as well as the SCALE challenges featuring live
demonstrations.
In 2012, CCGrid will come to Canada for the first time and will be
held in Ottawa, the capital city. The symposium will be held on
May 13-16 during which the city will be celebrating its world-famous Tulip
Festival.
CCGrid 2012 will have a focus on important and immediate issues that are
significantly influencing all aspects of cluster, cloud and grid computing.
Topics discussed in the technical sessions include: Applications and
Experiences;
Architecture: System architectures, Design and deployment; Autonomic
Computing
and Cyberinfrastructure; Performance Modeling and Evaluation; Programming
Models, Systems, and Fault-Tolerant Computing; Multicore and
Accelerator-based
Computing; Scheduling and Resource Management; Cloud Computing: Cloud
architectures; Software tools and techniques for clouds.
*******************
PROGRAM HIGHLIGHTS
******************
TECHNICAL SESSIONS WILL INCLUDE
* Programming Models and File Systems
* Map Reduce and Workflows
* QoS and Architecture
* GPU
* Cloud Services I
* I/O and File Systems)
* Programming Models)
* Cloud Computing I
* Communication and Networks
* Faults, Failures and Reliability
* Workflows
* Scheduling and Monitoring
* Virtualization
* Cloud Services
* Data on the Cloud
* Multicore Architectures
* Cloud Computing II
* Applications
KEYNOTES:
* Dr. Alok Chaudhury (North Western University, USA)
* Dr. Dick Epema (TU Delfts, Netherlands)
* The winner of the TCSC medal on Scalable Computing
WORKSHOPS:
* International workshop on Cloud for Business, Industry and Enterprises
(C4BIE 2012)
* Workshop on Cloud Computing Optimization (CCOPT 2012)
* 2nd International Workshop on Cloud Computing and Scientific
Applications (CCSA 2012)
* Workshop on Modeling and Simulation on Grid and Cloud Computing
(MSGC 2012)
* 1st International Workshop on Data-intensive Process Management in
Large-Scale Sensor Systems (DPMSS 2012)
TUTORIALS
PANELS & INDUSTRIAL SESSIONS
DOCTORAL SYMPOSIUM
POSTER/DEMO SESSIONS
SCALE 2012: The Fourth IEEE International Scalable Computing Challenge
CHAIRS
General Chair
* Shikharesh Majumdar, Carleton University, Canada
Program Committee Co-Chairs
* Rajkumar Buyya, University of Melbourne, Australia
* Pavan Balaji, Argonne National Laboratory, USA
Program Committee Vice-chairs
* Daniel S. Katz (Applications and Experiences)
* Dhabaleswar K. Panda (Architecture)
* Manish Parashar (Middleware, Autonomic Computing, and
Cyberinfrastructure)
* Ahmad Afsahi (Performance Modeling and Analysis)
* Xian-He Sun (Performance Measurement and Evaluation)
* William Gropp (Programming Models, Systems, and Fault-Tolerant
computing)
* David Bader (Multicore and Accelerator-based Computing)
* Thomas Fahringer (Scheduling and Resource Management)
* Ignacio Martin Llorente and Madhusudhan Govindaraju (Cloud Computing)
Honorary Chair
* Geoffrey Fox, Indiana University, USA
IMPORTANT DATES
Early Registration: From now until April 9/2012
Late/Onsite Registration: April 10, 2012 onwards
Conference: May 13-16, 2012
--
=================================================================
Ioan Raicu, Ph.D.
Assistant Professor, Illinois Institute of Technology (IIT)
Guest Research Faculty, Argonne National Laboratory (ANL)
=================================================================
Data-Intensive Distributed Systems Laboratory, CS/IIT
Distributed Systems Laboratory, MCS/ANL
=================================================================
Cel: 1-847-722-0876
Office: 1-312-567-5704
Email: iraicu@cs.iit.edu
Web: http://www.cs.iit.edu/~iraicu/
Web: http://datasys.cs.iit.edu/
=================================================================
=================================================================
^ permalink raw reply
* Re: [PULL] vhost-net/virtio: fixes for 3.4
From: David Miller @ 2012-03-22 23:34 UTC (permalink / raw)
To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, levinsasha928, nyh,
nyh
In-Reply-To: <20120322221227.GA17849@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 23 Mar 2012 00:12:28 +0200
> OK, sorry about that. Can't fix right now as I'm not at
> the box that has the key but this works for me with an old git:
Please fix it up so I can pull properly.
^ permalink raw reply
* Re: [Qemu-devel] [PATCH] virtio-spec: clarify ro/rw bits and updating rule of virtio-net status field
From: Jason Wang @ 2012-03-23 2:57 UTC (permalink / raw)
To: Rusty Russell
Cc: Christian Borntraeger, virtualization, qemu-devel, linux-kernel,
Michael S. Tsirkin
In-Reply-To: <87aa399s0x.fsf@rustcorp.com.au>
On 03/22/2012 12:30 PM, Rusty Russell wrote:
> On Wed, 21 Mar 2012 08:37:46 +0200, "Michael S. Tsirkin"<mst@redhat.com> wrote:
>> Ah. Right, we need to trap for host to clear the bit.
>> OK, so let's make the bit RO, and add
>> VIRTIO_NET_CTRL_ANNOUNCED to acknowledge that we've
>> seen VIRTIO_NET_S_ANNOUNCE using the control VQ?
> Thanks, that's nice. Guest should send arp packets first, then send
> VIRTIO_NET_CTRL_ANNOUNCED, and ignore the bit being set in the meantime.
>
> Thanks,
> Rusty.
Sure, I would update and re-send the spec updating patch.
Thanks
^ permalink raw reply
* [RFC V2 PATCH] virtio-spec: ack the announce notification through ctrl_vq
From: Jason Wang @ 2012-03-23 3:05 UTC (permalink / raw)
To: rusty, virtualization, linux-kernel, mst
During link announcement, driver needs a method to notify device that it has
received the notification and let it clear the VIRITO_NET_S_ANNOUNCE bit in the
status field. Doing this through a dedicated command looks suitable for all
platforms (especially for the ones who don't trap the status read or write) with
a ctrl vq and can solve the race between host and guest.
So this patch makes VIRTIO_NET_F_ANNOUNCE depends on VIRTIO_NET_F_CTRL_VQ and
introduces a dedicated command VIRTIO_NET_CTRL_ANNOUNCE_ACK to let device clear
the VIRTIO_NET_S_ANNOUNCE bit in the status field.
Changes from v1:
- Send the gratuitous packets or mark them as pending before send
VIRTIO_NET_CTRL_ANNOUNCE_ACK command.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
virtio-0.9.4.lyx | 76 +++++++++++++++++++++++++++++++++++++++++++++++++-----
1 files changed, 69 insertions(+), 7 deletions(-)
diff --git a/virtio-0.9.4.lyx b/virtio-0.9.4.lyx
index 9d30977..f376cb8 100644
--- a/virtio-0.9.4.lyx
+++ b/virtio-0.9.4.lyx
@@ -4013,8 +4013,12 @@ configuration
layout Two configuration fields are currently defined.
The mac address field always exists (though is only valid if VIRTIO_NET_F_MAC
is set), and the status field only exists if VIRTIO_NET_F_STATUS is set.
- Two bits are currently defined for the status field: VIRTIO_NET_S_LINK_UP
- and VIRTIO_NET_S_ANNOUNCE.
+ Two
+\change_inserted 2090695081 1332406434
+read-only
+\change_unchanged
+bits are currently defined for the status field: VIRTIO_NET_S_LINK_UP and
+ VIRTIO_NET_S_ANNOUNCE.
\begin_inset listings
inline false
@@ -4902,18 +4906,58 @@ Gratuitous Packet Sending
\end_layout
\begin_layout Standard
-If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE, it can ask the
- guest to send gratuitous packets; this is usually done after the guest
- has been physically migrated, and needs to announce its presence on the
- new network links.
+If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE
+\change_inserted 2090695081 1332407810
+ (depends on VIRTIO_NET_F_CTRL_VQ)
+\change_unchanged
+, it can ask the guest to send gratuitous packets; this is usually done
+ after the guest has been physically migrated, and needs to announce its
+ presence on the new network links.
(As hypervisor does not have the knowledge of guest network configuration
(eg.
tagged vlan) it is simplest to prod the guest in this way).
+\change_inserted 2090695081 1332405026
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 2090695081 1332405026
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1332405658
+
+#define VIRTIO_NET_CTRL_ANNOUNCE 3
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 2090695081 1332407582
+
+ #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+
\end_layout
\begin_layout Standard
The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status field when
it notices the changes of device configuration.
+
+\change_inserted 2090695081 1332407079
+ The command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that driver
+ has recevied the notification and device would clear the VIRTIO_NET_S_ANNOUNCE
+ bit in the status filed after it received this command.
+\change_unchanged
+
\end_layout
\begin_layout Standard
@@ -4921,11 +4965,29 @@ Processing this notification involves:
\end_layout
\begin_layout Enumerate
+
+\change_inserted 2090695081 1332471639
+Sending the gratuitous packets or marking there are pending gratuitous packets
+ to be sent and letting deferred routine to send them.
+\end_layout
+
+\begin_layout Enumerate
+
+\change_inserted 2090695081 1332405963
+Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control vq.
+
+\change_deleted 2090695081 1332405924
Clearing VIRTIO_NET_S_ANNOUNCE bit in the status field.
+\change_unchanged
+
\end_layout
\begin_layout Enumerate
-Sending the gratuitous packets.
+
+\change_deleted 2090695081 1332471331
+Sending the gratuitous packets
+\change_unchanged
+.
\end_layout
^ permalink raw reply related
* [PATCH RFC V5 0/6] kvm : Paravirt-spinlock support for KVM guests
From: Raghavendra K T @ 2012-03-23 8:05 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin
Cc: Jeremy Fitzhardinge, X86, KVM, Konrad Rzeszutek Wilk, LKML,
Greg Kroah-Hartman, linux-doc, Xen, Avi Kivity,
Srivatsa Vaddagiri, Virtualization, Stefano Stabellini,
Sasha Levin
The 6-patch series to follow this email extends KVM-hypervisor and Linux guest
running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's
implementation.
One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
one MSR is added to aid live migration.
Changes in V5:
- rebased to 3.3-rc6
- added PV_UNHALT_MSR that would help in live migration (Avi)
- removed PV_LOCK_KICK vcpu request and pv_unhalt flag (re)added.
- Changed hypercall documentaion (Alex).
- mode_t changed to umode_t in debugfs.
- MSR related documentation added.
- rename PV_LOCK_KICK to PV_UNHALT.
- host and guest patches not mixed. (Marcelo, Alex)
- kvm_kick_cpu now takes cpu so it can be used by flush_tlb_ipi_other
paravirtualization (Nikunj)
- coding style changes in variable declarion etc (Srikar)
Changes in V4:
- reabsed to 3.2.0 pre.
- use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching (Avi)
- fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related
changes for UNHALT path to make pv ticket spinlock migration friendly(Avi, Marcello)
- Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
- Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
- cumulative variable type changed (int ==> u32) in add_stat (Konrad)
- remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case
Changes in V3:
- rebased to 3.2-rc1
- use halt() instead of wait for kick hypercall.
- modify kick hyper call to do wakeup halted vcpu.
- hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c).
- fix the potential race when zero_stat is read.
- export debugfs_create_32 and add documentation to API.
- use static inline and enum instead of ADDSTAT macro.
- add barrier() in after setting kick_vcpu.
- empty static inline function for kvm_spinlock_init.
- combine the patches one and two readuce overhead.
- make KVM_DEBUGFS depends on DEBUGFS.
- include debugfs header unconditionally.
Changes in V2:
- rebased patchesto -rc9
- synchronization related changes based on Jeremy's changes
(Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>) pointed by
Stephan Diestelhorst <stephan.diestelhorst@amd.com>
- enabling 32 bit guests
- splitted patches into two more chunks
Test Set up :
The BASE patch is 3.3.0-rc6 + jumplabel split patch (https://lkml.org/lkml/2012/2/21/167)
+ ticketlock cleanup patch (https://lkml.org/lkml/2012/3/21/161)
Results:
The performance gain is mainly because of reduced busy-wait time.
From the results we can see that patched kernel performance is similar to
BASE when there is no lock contention. But once we start seeing more
contention, patched kernel outperforms BASE.
3 guests with 8VCPU, 8GB RAM, 1 used for kernbench
(kernbench -f -H -M -o 20) other for cpuhog (shell script while
true with an instruction)
1x: no hogs
2x: 8hogs in one guest
3x: 8hogs each in two guest
1) kernbench
Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM
BASE BASE+patch %improvement
mean (sd) mean (sd)
case 1x: 38.1033 (43.502) 38.09 (43.4269) 0.0349051
case 2x: 778.622 (1092.68) 129.342 (156.324) 83.3883
case 3x: 2399.11 (3548.32) 114.913 (139.5) 95.2102
2) pgbench:
pgbench version: http://www.postgresql.org/ftp/snapshot/dev/
tool used for benchmarking: git://git.postgresql.org/git/pgbench-tools.git
Ananlysis is done using ministat.
Test is done for 1x overcommit to check overhead of pv spinlock.
There is small performance penalty in non contention scenario (note BASE
is jeremy's ticketlock). But with increase in number of threads, improvement is
seen.
guest: 64bit 8 vCPU and 8GB RAM
shared buffer size = 2GB
x base_kernel
+ patched_kernel
N Min Max Median Avg Stddev
+--------------------- NRCLIENT = 1 ----------------------------------------+
x 10 7468.0719 7774.0026 7529.9217 7594.9696 128.7725
+ 10 7280.413 7650.6619 7425.7968 7434.9344 144.59127
Difference at 95.0% confidence
-160.035 +/- 128.641
-2.10712% +/- 1.69376%
+--------------------- NRCLIENT = 2 ----------------------------------------+
x 10 14604.344 14849.358 14725.845 14724.722 76.866294
+ 10 14070.064 14246.013 14125.556 14138.169 60.556379
Difference at 95.0% confidence
-586.553 +/- 65.014
-3.98346% +/- 0.441529%
+--------------------- NRCLIENT = 4 ----------------------------------------+
x 10 27891.073 28305.466 28059.892 28060.231 115.65612
+ 10 27237.685 27639.645 27297.79 27375.966 145.31006
Difference at 95.0% confidence
-684.265 +/- 123.39
-2.43856% +/- 0.439734%
+--------------------- NRCLIENT = 8 ----------------------------------------+
x 10 53063.509 53498.677 53343.24 53309.697 138.77983
+ 10 51705.708 52208.274 52030.06 51987.067 156.65323
Difference at 95.0% confidence
-1322.63 +/- 139.048
-2.48103% +/- 0.26083%
+--------------------- NRCLIENT = 16 ---------------------------------------+
x 10 50043.347 52701.253 52235.978 51993.466 817.44911
+ 10 51562.772 52272.412 51905.317 51946.557 228.54314
No difference proven at 95.0% confidence
+--------------------- NRCLIENT = 32 --------------------------------------+
x 10 49178.789 51284.599 50288.185 50275.212 616.80154
+ 10 50722.097 52145.041 51551.112 51512.423 469.18898
Difference at 95.0% confidence
1237.21 +/- 514.888
2.46088% +/- 1.02414%
+--------------------------------------------------------------------------+
Let me know if you have any sugestion/comments...
---
V4 kernel changes:
https://lkml.org/lkml/2012/1/14/66
Qemu changes for V4:
http://www.mail-archive.com/kvm@vger.kernel.org/msg66450.html
V3 kernel Changes:
https://lkml.org/lkml/2011/11/30/62
V2 kernel changes :
https://lkml.org/lkml/2011/10/23/207
Previous discussions : (posted by Srivatsa V).
https://lkml.org/lkml/2010/7/26/24
https://lkml.org/lkml/2011/1/19/212
Qemu patch for V3:
http://lists.gnu.org/archive/html/qemu-devel/2011-12/msg00397.html
Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (6):
Add debugfs support to print u32-arrays in debugfs
Add a hypercall to KVM hypervisor to support pv-ticketlocks
Add unhalt msr to aid migration
Added configuration support to enable debug information for KVM Guests
pv-ticketlock support for linux guests running on KVM hypervisor
Add documentation on Hypercalls and features used for PV spinlock
Documentation/virtual/kvm/api.txt | 7 +
Documentation/virtual/kvm/cpuid.txt | 4 +
Documentation/virtual/kvm/hypercalls.txt | 59 +++++++
Documentation/virtual/kvm/msr.txt | 9 +
arch/x86/Kconfig | 9 +
arch/x86/include/asm/kvm_para.h | 18 ++-
arch/x86/kernel/kvm.c | 254 ++++++++++++++++++++++++++++++
arch/x86/kvm/cpuid.c | 3 +-
arch/x86/kvm/x86.c | 40 +++++-
arch/x86/xen/debugfs.c | 104 ------------
arch/x86/xen/debugfs.h | 4 -
arch/x86/xen/spinlock.c | 2 +-
fs/debugfs/file.c | 128 +++++++++++++++
include/linux/debugfs.h | 11 ++
include/linux/kvm.h | 1 +
include/linux/kvm_host.h | 1 +
include/linux/kvm_para.h | 1 +
virt/kvm/kvm_main.c | 4 +
18 files changed, 545 insertions(+), 114 deletions(-)
^ permalink raw reply
* [PATCH RFC V5 1/6] debugfs: Add support to print u32 array in debugfs
From: Raghavendra K T @ 2012-03-23 8:06 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin
Cc: Jeremy Fitzhardinge, X86, KVM, Konrad Rzeszutek Wilk, LKML,
Greg Kroah-Hartman, linux-doc, Xen, Avi Kivity,
Srivatsa Vaddagiri, Virtualization, Stefano Stabellini,
Sasha Levin
In-Reply-To: <20120323080503.14568.43092.sendpatchset@codeblue>
From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Move the code from Xen to debugfs to make the code common
for other users as well.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c
index ef1db19..c8377fb 100644
--- a/arch/x86/xen/debugfs.c
+++ b/arch/x86/xen/debugfs.c
@@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void)
return d_xen_debug;
}
-struct array_data
-{
- void *array;
- unsigned elements;
-};
-
-static int u32_array_open(struct inode *inode, struct file *file)
-{
- file->private_data = NULL;
- return nonseekable_open(inode, file);
-}
-
-static size_t format_array(char *buf, size_t bufsize, const char *fmt,
- u32 *array, unsigned array_size)
-{
- size_t ret = 0;
- unsigned i;
-
- for(i = 0; i < array_size; i++) {
- size_t len;
-
- len = snprintf(buf, bufsize, fmt, array[i]);
- len++; /* ' ' or '\n' */
- ret += len;
-
- if (buf) {
- buf += len;
- bufsize -= len;
- buf[-1] = (i == array_size-1) ? '\n' : ' ';
- }
- }
-
- ret++; /* \0 */
- if (buf)
- *buf = '\0';
-
- return ret;
-}
-
-static char *format_array_alloc(const char *fmt, u32 *array, unsigned array_size)
-{
- size_t len = format_array(NULL, 0, fmt, array, array_size);
- char *ret;
-
- ret = kmalloc(len, GFP_KERNEL);
- if (ret == NULL)
- return NULL;
-
- format_array(ret, len, fmt, array, array_size);
- return ret;
-}
-
-static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
- loff_t *ppos)
-{
- struct inode *inode = file->f_path.dentry->d_inode;
- struct array_data *data = inode->i_private;
- size_t size;
-
- if (*ppos == 0) {
- if (file->private_data) {
- kfree(file->private_data);
- file->private_data = NULL;
- }
-
- file->private_data = format_array_alloc("%u", data->array, data->elements);
- }
-
- size = 0;
- if (file->private_data)
- size = strlen(file->private_data);
-
- return simple_read_from_buffer(buf, len, ppos, file->private_data, size);
-}
-
-static int xen_array_release(struct inode *inode, struct file *file)
-{
- kfree(file->private_data);
-
- return 0;
-}
-
-static const struct file_operations u32_array_fops = {
- .owner = THIS_MODULE,
- .open = u32_array_open,
- .release= xen_array_release,
- .read = u32_array_read,
- .llseek = no_llseek,
-};
-
-struct dentry *xen_debugfs_create_u32_array(const char *name, umode_t mode,
- struct dentry *parent,
- u32 *array, unsigned elements)
-{
- struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
-
- if (data == NULL)
- return NULL;
-
- data->array = array;
- data->elements = elements;
-
- return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
-}
diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h
index 78d2549..12ebf33 100644
--- a/arch/x86/xen/debugfs.h
+++ b/arch/x86/xen/debugfs.h
@@ -3,8 +3,4 @@
struct dentry * __init xen_init_debugfs(void);
-struct dentry *xen_debugfs_create_u32_array(const char *name, umode_t mode,
- struct dentry *parent,
- u32 *array, unsigned elements);
-
#endif /* _XEN_DEBUGFS_H */
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 4926974..b74cebb 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -314,7 +314,7 @@ static int __init xen_spinlock_debugfs(void)
debugfs_create_u64("time_blocked", 0444, d_spin_debug,
&spinlock_stats.time_blocked);
- xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+ debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
return 0;
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index ef023ee..cb6cff3 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -20,6 +20,7 @@
#include <linux/namei.h>
#include <linux/debugfs.h>
#include <linux/io.h>
+#include <linux/slab.h>
static ssize_t default_read_file(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -528,6 +529,133 @@ struct dentry *debugfs_create_blob(const char *name, umode_t mode,
}
EXPORT_SYMBOL_GPL(debugfs_create_blob);
+struct array_data {
+ void *array;
+ u32 elements;
+};
+
+static int u32_array_open(struct inode *inode, struct file *file)
+{
+ file->private_data = NULL;
+ return nonseekable_open(inode, file);
+}
+
+static size_t format_array(char *buf, size_t bufsize, const char *fmt,
+ u32 *array, u32 array_size)
+{
+ size_t ret = 0;
+ u32 i;
+
+ for (i = 0; i < array_size; i++) {
+ size_t len;
+
+ len = snprintf(buf, bufsize, fmt, array[i]);
+ len++; /* ' ' or '\n' */
+ ret += len;
+
+ if (buf) {
+ buf += len;
+ bufsize -= len;
+ buf[-1] = (i == array_size-1) ? '\n' : ' ';
+ }
+ }
+
+ ret++; /* \0 */
+ if (buf)
+ *buf = '\0';
+
+ return ret;
+}
+
+static char *format_array_alloc(const char *fmt, u32 *array,
+ u32 array_size)
+{
+ size_t len = format_array(NULL, 0, fmt, array, array_size);
+ char *ret;
+
+ ret = kmalloc(len, GFP_KERNEL);
+ if (ret == NULL)
+ return NULL;
+
+ format_array(ret, len, fmt, array, array_size);
+ return ret;
+}
+
+static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
+ loff_t *ppos)
+{
+ struct inode *inode = file->f_path.dentry->d_inode;
+ struct array_data *data = inode->i_private;
+ size_t size;
+
+ if (*ppos == 0) {
+ if (file->private_data) {
+ kfree(file->private_data);
+ file->private_data = NULL;
+ }
+
+ file->private_data = format_array_alloc("%u", data->array,
+ data->elements);
+ }
+
+ size = 0;
+ if (file->private_data)
+ size = strlen(file->private_data);
+
+ return simple_read_from_buffer(buf, len, ppos,
+ file->private_data, size);
+}
+
+static int u32_array_release(struct inode *inode, struct file *file)
+{
+ kfree(file->private_data);
+
+ return 0;
+}
+
+static const struct file_operations u32_array_fops = {
+ .owner = THIS_MODULE,
+ .open = u32_array_open,
+ .release = u32_array_release,
+ .read = u32_array_read,
+ .llseek = no_llseek,
+};
+
+/**
+ * debugfs_create_u32_array - create a debugfs file that is used to read u32
+ * array.
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: a pointer to the parent dentry for this file. This should be a
+ * directory dentry if set. If this parameter is %NULL, then the
+ * file will be created in the root of the debugfs filesystem.
+ * @array: u32 array that provides data.
+ * @elements: total number of elements in the array.
+ *
+ * This function creates a file in debugfs with the given name that exports
+ * @array as data. If the @mode variable is so set it can be read from.
+ * Writing is not supported. Seek within the file is also not supported.
+ * Once array is created its size can not be changed.
+ *
+ * The function returns a pointer to dentry on success. If debugfs is not
+ * enabled in the kernel, the value -%ENODEV will be returned.
+ */
+struct dentry *debugfs_create_u32_array(const char *name, umode_t mode,
+ struct dentry *parent,
+ u32 *array, u32 elements)
+{
+ struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
+
+ if (data == NULL)
+ return NULL;
+
+ data->array = array;
+ data->elements = elements;
+
+ return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_u32_array);
+
#ifdef CONFIG_HAS_IOMEM
/*
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index 6169c26..5cb4435 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -93,6 +93,10 @@ struct dentry *debugfs_create_regset32(const char *name, mode_t mode,
int debugfs_print_regs32(struct seq_file *s, const struct debugfs_reg32 *regs,
int nregs, void __iomem *base, char *prefix);
+struct dentry *debugfs_create_u32_array(const char *name, umode_t mode,
+ struct dentry *parent,
+ u32 *array, u32 elements);
+
bool debugfs_initialized(void);
#else
@@ -219,6 +223,13 @@ static inline bool debugfs_initialized(void)
return false;
}
+struct dentry *debugfs_create_u32_array(const char *name, umode_t mode,
+ struct dentry *parent,
+ u32 *array, u32 elements)
+{
+ return ERR_PTR(-ENODEV);
+}
+
#endif
#endif
^ permalink raw reply related
* [PATCH RFC V5 2/6] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
From: Raghavendra K T @ 2012-03-23 8:07 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin
Cc: Jeremy Fitzhardinge, X86, KVM, Konrad Rzeszutek Wilk, LKML,
Greg Kroah-Hartman, linux-doc, Xen, Avi Kivity,
Srivatsa Vaddagiri, Virtualization, Stefano Stabellini,
Sasha Levin
In-Reply-To: <20120323080503.14568.43092.sendpatchset@codeblue>
From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
The presence of these hypercalls is indicated to guest via
KVM_FEATURE_PV_UNHALT/KVM_CAP_PV_UNHALT.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..9234f13 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,12 +16,14 @@
#define KVM_FEATURE_CLOCKSOURCE 0
#define KVM_FEATURE_NOP_IO_DELAY 1
#define KVM_FEATURE_MMU_OP 2
+
/* This indicates that the new set of kvmclock msrs
* are available. The use of 0x11 and 0x12 is deprecated
*/
#define KVM_FEATURE_CLOCKSOURCE2 3
#define KVM_FEATURE_ASYNC_PF 4
#define KVM_FEATURE_STEAL_TIME 5
+#define KVM_FEATURE_PV_UNHALT 6
/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
@@ -32,6 +34,7 @@
#define MSR_KVM_SYSTEM_TIME 0x12
#define KVM_MSR_ENABLED 1
+
/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
#define MSR_KVM_WALL_CLOCK_NEW 0x4b564d00
#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 89b02bf..61388b9 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -408,7 +408,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_NOP_IO_DELAY) |
(1 << KVM_FEATURE_CLOCKSOURCE2) |
(1 << KVM_FEATURE_ASYNC_PF) |
- (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+ (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+ (1 << KVM_FEATURE_PV_UNHALT);
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9cbfc06..bd5ef91 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2079,6 +2079,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
case KVM_CAP_GET_TSC_KHZ:
+ case KVM_CAP_PV_UNHALT:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -4913,6 +4914,30 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
return 1;
}
+/*
+ * kvm_pv_kick_cpu_op: Kick a vcpu.
+ *
+ * @apicid - apicid of vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
+{
+ struct kvm_vcpu *vcpu = NULL;
+ int i;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (!kvm_apic_present(vcpu))
+ continue;
+
+ if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
+ break;
+ }
+ if (vcpu) {
+ vcpu->pv_unhalted = 1;
+ smp_mb();
+ kvm_vcpu_kick(vcpu);
+ }
+}
+
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
{
unsigned long nr, a0, a1, a2, a3, ret;
@@ -4946,6 +4971,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
case KVM_HC_VAPIC_POLL_IRQ:
ret = 0;
break;
+ case KVM_HC_KICK_CPU:
+ kvm_pv_kick_cpu_op(vcpu->kvm, a0);
+ ret = 0;
+ break;
default:
ret = -KVM_ENOSYS;
break;
@@ -6174,6 +6203,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
!vcpu->arch.apf.halted)
|| !list_empty_careful(&vcpu->async_pf.done)
|| vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
+ || vcpu->pv_unhalted
|| atomic_read(&vcpu->arch.nmi_queued) ||
(kvm_arch_interrupt_allowed(vcpu) &&
kvm_cpu_has_interrupt(vcpu));
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 68e67e5..e822d96 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
#define KVM_CAP_PPC_PAPR 68
#define KVM_CAP_S390_GMAP 71
#define KVM_CAP_TSC_DEADLINE_TIMER 72
+#define KVM_CAP_PV_UNHALT 73
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 900c763..433ae97 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -158,6 +158,7 @@ struct kvm_vcpu {
#endif
struct kvm_vcpu_arch arch;
+ int pv_unhalted;
};
static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index ff476dd..38226e1 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
#define KVM_HC_MMU_OP 2
#define KVM_HC_FEATURES 3
#define KVM_HC_PPC_MAP_MAGIC_PAGE 4
+#define KVM_HC_KICK_CPU 5
/*
* hypercalls use architecture specific
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a91f980..d3b98b1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
vcpu->kvm = kvm;
vcpu->vcpu_id = id;
vcpu->pid = NULL;
+ vcpu->pv_unhalted = 0;
init_waitqueue_head(&vcpu->wq);
kvm_async_pf_vcpu_init(vcpu);
@@ -1567,6 +1568,9 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
if (kvm_arch_vcpu_runnable(vcpu)) {
+ vcpu->pv_unhalted = 0;
+ /* preventing reordering should be enough here */
+ barrier();
kvm_make_request(KVM_REQ_UNHALT, vcpu);
break;
}
^ permalink raw reply related
* [PATCH RFC V5 3/6] kvm : Add unhalt msr to aid (live) migration
From: Raghavendra K T @ 2012-03-23 8:07 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin
Cc: Jeremy Fitzhardinge, X86, KVM, Konrad Rzeszutek Wilk, LKML,
Greg Kroah-Hartman, linux-doc, Xen, Avi Kivity,
Srivatsa Vaddagiri, Virtualization, Stefano Stabellini,
Sasha Levin
In-Reply-To: <20120323080503.14568.43092.sendpatchset@codeblue>
From: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Currently guest does not need to know pv_unhalt state and intended to be
used via GET/SET_MSR ioctls during migration.
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9234f13..46f9751 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -40,6 +40,7 @@
#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
#define MSR_KVM_ASYNC_PF_EN 0x4b564d02
#define MSR_KVM_STEAL_TIME 0x4b564d03
+#define MSR_KVM_PV_UNHALT 0x4b564d04
struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bd5ef91..38e6c47 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -784,12 +784,13 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc);
* kvm-specific. Those are put in the beginning of the list.
*/
-#define KVM_SAVE_MSRS_BEGIN 9
+#define KVM_SAVE_MSRS_BEGIN 10
static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
HV_X64_MSR_APIC_ASSIST_PAGE, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
+ MSR_KVM_PV_UNHALT,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
MSR_STAR,
#ifdef CONFIG_X86_64
@@ -1606,7 +1607,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu);
break;
-
+ case MSR_KVM_PV_UNHALT:
+ vcpu->pv_unhalted = (u32) data;
+ break;
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MC0_CTL + 4 * KVM_MAX_MCE_BANKS - 1:
@@ -1917,6 +1920,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
case MSR_KVM_STEAL_TIME:
data = vcpu->arch.st.msr_val;
break;
+ case MSR_KVM_PV_UNHALT:
+ data = (u64)vcpu->pv_unhalted;
+ break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox