Linux virtualization list

Linux virtualization list
 help / color / mirror / Atom feed

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Alexander Graf @ 2012-01-16  4:00 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Sasha Levin, Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman,
	LKML
In-Reply-To: <20120116035114.GI9129@linux.vnet.ibm.com>


On 16.01.2012, at 04:51, Srivatsa Vaddagiri wrote:

> * Alexander Graf <agraf@suse.de> [2012-01-16 04:23:24]:
> 
>>> +5. KVM_HC_KICK_CPU
>>> +------------------------
>>> +value: 5
>>> +Architecture: x86
>>> +Purpose: Hypercall used to wakeup a vcpu from HLT state
>>> +
>>> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
>>> +kernel mode for an event to occur (ex: a spinlock to become available)
>>> +can execute HLT instruction once it has busy-waited for more than a
>>> +threshold time-interval. Execution of HLT instruction would cause
>>> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
>>> +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
>>> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
>>> +wokenup.
>> 
>> The description is way too specific. The hypercall basically gives the guest the ability to yield() its current vcpu to another chosen vcpu.
> 
> Hmm ..the hypercall does not allow a vcpu to yield. It just allows some
> target vcpu to be prodded/wokenup, after which vcpu continues execution.
> 
> Note that semantics of this hypercall is different from the hypercall on which
> PPC pv-spinlock (__spin_yield()) is currently dependent. This is mainly because 
> of ticketlocks on x86 (which does not allow us to easily store owning cpu
> details in lock word itself).

Yes, sorry for not being more exact in my wording. It is a directed yield(). Not like the normal old style thing that just says "I'm done, get some work to someone else" but more something like "I'm done, get some work to this specific guy over there" :).


Alex

^ permalink raw reply

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
From: Jeremy Fitzhardinge @ 2012-01-16  6:40 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen
In-Reply-To: <3EC1B881-0724-49E3-B892-F40BEB07D15D@suse.de>

On Jan 16, 2012, at 2:57 PM, Alexander Graf wrote:

> 
> On 14.01.2012, at 19:25, Raghavendra K T wrote:
> 
>> The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
>> running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.
>> 
>> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
>> another vcpu out of halt state.
>> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
> 
> Is the code for this even upstream? Prerequisite series seem to have been posted by Jeremy, but they didn't appear to have made it in yet.

No, not yet.  The patches are unchanged since I last posted them, and as far as I know there are no objections to them, but I'd like to get some performance numbers just to make sure they don't cause any surprising regressions, especially in the non-virtual case.

> 
> Either way, thinking about this I stumbled over the following passage of his patch:
> 
>> +               unsigned count = SPIN_THRESHOLD;
>> +
>> +               do {
>> +                       if (inc.head == inc.tail)
>> +                               goto out;
>> +                       cpu_relax();
>> +                       inc.head = ACCESS_ONCE(lock->tickets.head);
>> +               } while (--count);
>> +               __ticket_lock_spinning(lock, inc.tail);
> 
> 
> That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
> 
> Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.

I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.

> 
> Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!

Are you saying the threshold should be dynamic depending on how loaded the system is?  How can a guest know what the overall system contention is?  How should a guest use that to work out a good spin time?

One possibility is to use the ticket lock queue depth to work out how contended the lock is, and therefore how long it might be worth waiting for.  I was thinking of something along the lines of "threshold = (THRESHOLD >> queue_depth)".  But that's pure hand wave, and someone would actually need to experiment before coming up with something reasonable.

But all of this is good to consider for future work, rather than being essential for the first version.

> So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.
> 
> Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.

Yes, that mechanism exists, but it doesn't solve a very interesting problem.

The most important thing to solve is making sure that when *releasing* a ticketlock, the correct next VCPU gets scheduled promptly.  If you don't, you're just relying on the VCPU scheduler getting around to scheduling the correct VCPU, but if it doesn't it just ends up burning a timeslice of PCPU time while the wrong VCPU spins.

Limiting the spin time with a timeout or the rep/nop interrupt somewhat mitigates this, but it still means you end up spending a lot of time slices spinning the wrong VCPU until it finally schedules the correct one.  And the more contended the machine is, the worse the problem gets.

> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.

The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.

But as I mentioned above, I'd like to see some benchmarks to prove that's the case.

	J

> 
>> 
>> Changes in V4:
>> - reabsed to 3.2.0 pre.
>> - use APIC ID for kicking the vcpu and use kvm_apic_match_dest for matching. (Avi)
>> - fold vcpu->kicked flag into vcpu->requests (KVM_REQ_PVLOCK_KICK) and related 
>> changes for UNHALT path to make pv ticket spinlock migration friendly. (Avi, Marcello)
>> - Added Documentation for CPUID, Hypercall (KVM_HC_KICK_CPU)
>> and capabilty (KVM_CAP_PVLOCK_KICK) (Avi)
>> - Remove unneeded kvm_arch_vcpu_ioctl_set_mpstate call. (Marcello)
>> - cumulative variable type changed (int ==> u32) in add_stat (Konrad)
>> - remove unneeded kvm_guest_init for !CONFIG_KVM_GUEST case
>> 
>> Changes in V3:
>> - rebased to 3.2-rc1
>> - use halt() instead of wait for kick hypercall.
>> - modify kick hyper call to do wakeup halted vcpu.
>> - hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c).
>> - fix the potential race when zero_stat is read.
>> - export debugfs_create_32 and add documentation to API.
>> - use static inline and enum instead of ADDSTAT macro. 
>> - add  barrier() in after setting kick_vcpu.
>> - empty static inline function for kvm_spinlock_init.
>> - combine the patches one and two readuce overhead.
>> - make KVM_DEBUGFS depends on DEBUGFS.
>> - include debugfs header unconditionally.
>> 
>> Changes in V2:
>> - rebased patchesto -rc9
>> - synchronization related changes based on Jeremy's changes 
>> (Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>) pointed by 
>> Stephan Diestelhorst <stephan.diestelhorst@amd.com>
>> - enabling 32 bit guests
>> - splitted patches into two more chunks
>> 
>> Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (5): 
>> Add debugfs support to print u32-arrays in debugfs
>> Add a hypercall to KVM hypervisor to support pv-ticketlocks
>> Added configuration support to enable debug information for KVM Guests
>> pv-ticketlocks support for linux guests running on KVM hypervisor
>> Add documentation on Hypercalls and features used for PV spinlock
>> 
>> Test Set up :
>> The BASE patch is pre 3.2.0 + Jeremy's following patches.
>> xadd (https://lkml.org/lkml/2011/10/4/328)
>> x86/ticketlocklock  (https://lkml.org/lkml/2011/10/12/496).
>> Kernel for host/guest : 3.2.0 + Jeremy's xadd, pv spinlock patches as BASE
>> (Note:locked add change is not taken yet)
>> 
>> Results:
>> The performance gain is mainly because of reduced busy-wait time.
>> From the results we can see that patched kernel performance is similar to
>> BASE when there is no lock contention. But once we start seeing more
>> contention, patched kernel outperforms BASE (non PLE).
>> On PLE machine we do not see greater performance improvement because of PLE
>> complimenting halt()
>> 
>> 3 guests with 8VCPU, 4GB RAM, 1 used for kernbench
>> (kernbench -f -H -M -o 20) other for cpuhog (shell script while
>> true with an instruction)
>> 
>> scenario A: unpinned
>> 
>> 1x: no hogs
>> 2x: 8hogs in one guest
>> 3x: 8hogs each in two guest
>> 
>> scenario B: unpinned, run kernbench on all the guests no hogs.
>> 
>> Dbench on PLE machine:
>> dbench run on all the guest simultaneously with
>> dbench --warmup=30 -t 120 with NRCLIENTS=(8/16/32).
>> 
>> Result for Non PLE machine :
>> ============================
>> Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM
>> 		 BASE                    BASE+patch            %improvement
>> 		 mean (sd)               mean (sd)
>> Scenario A:
>> case 1x:	 164.233 (16.5506) 	 163.584 (15.4598 	0.39517
>> case 2x:	 897.654 (543.993) 	 328.63 (103.771) 	63.3901
>> case 3x:	 2855.73 (2201.41) 	 315.029 (111.854) 	88.9685
>> 
>> Dbench:
>> Throughput is in MB/sec
>> NRCLIENTS	 BASE                    BASE+patch            %improvement
>>              	 mean (sd)               mean (sd)
>> 8       	1.774307  (0.061361) 	1.725667  (0.034644) 	-2.74135
>> 16      	1.445967  (0.044805) 	1.463173  (0.094399) 	1.18993
>> 32        	2.136667  (0.105717) 	2.193792  (0.129357) 	2.67356
>> 
>> Result for PLE machine:
>> ======================
>> Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
>>        online cores and 4*64GB RAM
>> 
>> Kernbench:
>> 		 BASE                    BASE+patch            %improvement
>> 		 mean (sd)               mean (sd)
>> Scenario A:	 			
>> case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
>> case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
>> case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446
>> 
>> Scenario B:
>> 		 446.104 (58.54 )	 433.12733 (54.476)	2.91
>> 
>> Dbench:
>> Throughput is in MB/sec
>> NRCLIENTS	 BASE                    BASE+patch            %improvement
>>              	 mean (sd)               mean (sd)
>> 8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
>> 16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
>> 32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012
> 
> So on a very contended system we're actually slower? Is this expected?
> 
> 
> Alex
> 

^ permalink raw reply

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
From: Raghavendra K T @ 2012-01-16  7:25 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen
In-Reply-To: <62E14C21-4DF1-4C06-9CBB-FF36E4D49F64@suse.de>

On 01/16/2012 08:42 AM, Alexander Graf wrote:
>
> On 14.01.2012, at 19:26, Raghavendra K T wrote:
>
>> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks.
>>
>> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
>> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>> support for pv-ticketlocks is registered via pv_lock_ops.
>>
>> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
>>
>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>> Signed-off-by: Suzuki Poulose<suzuki@in.ibm.com>
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>> ---
>> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
>> index 7a94987..cf5327c 100644
>> --- a/arch/x86/include/asm/kvm_para.h
>> +++ b/arch/x86/include/asm/kvm_para.h
>> @@ -195,10 +195,20 @@ void kvm_async_pf_task_wait(u32 token);
>> void kvm_async_pf_task_wake(u32 token);
[...]
>> +}
>> +#endif	/* CONFIG_PARAVIRT_SPINLOCKS */
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c7b05fc..4d7a950 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>
> This patch is mixing host and guest code. Please split those up.
>
>

Agree. The host code should have gone to patch 2.

> Alex
>
>> @@ -5754,8 +5754,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>

^ permalink raw reply

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
From: Raghavendra K T @ 2012-01-16  8:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen
In-Reply-To: <5861AA49-E70B-4812-BBE4-A8507B3FCF80@suse.de>

On 01/16/2012 08:54 AM, Alexander Graf wrote:
>
> On 14.01.2012, at 19:25, Raghavendra K T wrote:
>
>> Add a hypercall to KVM hypervisor to support pv-ticketlocks
>>
>> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
>>
>> The presence of these hypercalls is indicated to guest via
>> KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.
>>
>> Qemu needs a corresponding patch to pass up the presence of this feature to
>> guest via cpuid. Patch to qemu will be sent separately.
>>
>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>> Signed-off-by: Suzuki Poulose<suzuki@in.ibm.com>
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>> ---
>> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
>> index 734c376..7a94987 100644
>> --- a/arch/x86/include/asm/kvm_para.h
>> +++ b/arch/x86/include/asm/kvm_para.h
>> @@ -16,12 +16,14 @@
>> #define KVM_FEATURE_CLOCKSOURCE		0
>> #define KVM_FEATURE_NOP_IO_DELAY	1
>> #define KVM_FEATURE_MMU_OP		2
>> +
>> /* This indicates that the new set of kvmclock msrs
>>   * are available. The use of 0x11 and 0x12 is deprecated
>>   */
>> #define KVM_FEATURE_CLOCKSOURCE2        3
>> #define KVM_FEATURE_ASYNC_PF		4
>> #define KVM_FEATURE_STEAL_TIME		5
>> +#define KVM_FEATURE_PVLOCK_KICK		6
>>
>> /* The last 8 bits are used to indicate how to interpret the flags field
>>   * in pvclock structure. If no bits are set, all flags are ignored.
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 4c938da..c7b05fc 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2099,6 +2099,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>> 	case KVM_CAP_XSAVE:
>> 	case KVM_CAP_ASYNC_PF:
>> 	case KVM_CAP_GET_TSC_KHZ:
>> +	case KVM_CAP_PVLOCK_KICK:
>> 		r = 1;
>> 		break;
>> 	case KVM_CAP_COALESCED_MMIO:
>> @@ -2576,7 +2577,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>> 			     (1<<  KVM_FEATURE_NOP_IO_DELAY) |
>> 			     (1<<  KVM_FEATURE_CLOCKSOURCE2) |
>> 			     (1<<  KVM_FEATURE_ASYNC_PF) |
>> -			     (1<<  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
>> +			     (1<<  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
>> +			     (1<<  KVM_FEATURE_PVLOCK_KICK);
>>
>> 		if (sched_info_on())
>> 			entry->eax |= (1<<  KVM_FEATURE_STEAL_TIME);
>> @@ -5304,6 +5306,29 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>> 	return 1;
>> }
>>
>> +/*
>> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
>> + *
>> + * @apicid - apicid of vcpu to be kicked.
>> + */
>> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
>> +{
>> +	struct kvm_vcpu *vcpu = NULL;
>> +	int i;
>> +
>> +	kvm_for_each_vcpu(i, vcpu, kvm) {
>> +		if (!kvm_apic_present(vcpu))
>> +			continue;
>> +
>> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
>> +			break;
>> +	}
>> +	if (vcpu) {
>> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
>> +		kvm_vcpu_kick(vcpu);
>> +	}
>> +}
>> +
>> int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>> {
>> 	unsigned long nr, a0, a1, a2, a3, ret;
>> @@ -5340,6 +5365,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>> 	case KVM_HC_MMU_OP:
>> 		r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2),&ret);
>> 		break;
>> +	case KVM_HC_KICK_CPU:
>> +		kvm_pv_kick_cpu_op(vcpu->kvm, a0);
>> +		ret = 0;
>> +		break;
>> 	default:
>> 		ret = -KVM_ENOSYS;
>> 		break;
>> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
>> index 68e67e5..63fb6b0 100644
>> --- a/include/linux/kvm.h
>> +++ b/include/linux/kvm.h
>> @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
>> #define KVM_CAP_PPC_PAPR 68
>> #define KVM_CAP_S390_GMAP 71
>> #define KVM_CAP_TSC_DEADLINE_TIMER 72
>> +#define KVM_CAP_PVLOCK_KICK 73
>>
>> #ifdef KVM_CAP_IRQ_ROUTING
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index d526231..3b1ae7b 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -50,6 +50,7 @@
>> #define KVM_REQ_APF_HALT          12
>> #define KVM_REQ_STEAL_UPDATE      13
>> #define KVM_REQ_NMI               14
>> +#define KVM_REQ_PVLOCK_KICK       15
>
> Everything I see in this patch is pvlock agnostic. It's only a vcpu kick hypercall. So it's probably a good idea to also name it accordingly :).
>
>
> Alex
>
>

It was indeed KICK_VCPU in V4. But since we are currently dealing with
only pv locks it is renamed so.  But if we start using the code for
flush_tlb_others_ipi() optimization etc, it is good idea to rename
accordingly. OR even  go back to KICK_VCPU as used earlier..

  - Raghu

^ permalink raw reply

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Raghavendra K T @ 2012-01-16  8:44 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen
In-Reply-To: <AD31813D-E4D5-43F3-B06A-9EB1B6FC9381@suse.de>

On 01/16/2012 08:53 AM, Alexander Graf wrote:
>
> On 14.01.2012, at 19:27, Raghavendra K T wrote:
>
>> Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.
>>
>> KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
>> paravirtual spinlock enabled guest.
>>
>> KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
>> be enabled in guest. support in host is queried via
>> ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
>>
>> A minimal Documentation and template is added for hypercalls.
>>
>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>> ---
[...]
>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
>> new file mode 100644
>> index 0000000..7872da5
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/hypercalls.txt
>> @@ -0,0 +1,54 @@
>> +KVM Hypercalls Documentation
>> +===========================

>> +2. KVM_HC_MMU_OP
>> +------------------------
>> +value: 2
>> +Architecture: x86
>> +Purpose: Support MMU operations such as writing to PTE,
>> +flushing TLB, release PT.
>
> This one is deprecated, no? Should probably be mentioned here.

Ok, then may be adding state = deprecated/obsolete/in use (active) may
be good idea.

>
>> +
>> +3. KVM_HC_FEATURES
>> +------------------------
>> +value: 3
>> +Architecture: PPC
>> +Purpose:
>
> Expose hypercall availability to the guest. On x86 you use cpuid to enumerate which hypercalls are available. The natural fit on ppc would be device tree based lookup (which is also what EPAPR dictates), but we also have a second enumeration mechanism that's KVM specific - which is this hypercall.
>

Thanks, will add this. I hope you are OK if I add Signed-off-by: you.

>> +
>> +4. KVM_HC_PPC_MAP_MAGIC_PAGE
>> +------------------------
>> +value: 4
>> +Architecture: PPC
>> +Purpose: To enable communication between the hypervisor and guest there is a
>> +new
>
> It's not new anymore :)
>
>> shared page that contains parts of supervisor visible register state.
>> +The guest can map this shared page using this hypercall.
>
> ... to access its supervisor register through memory.
>

Will update accordingly.

- Raghu

^ permalink raw reply

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Avi Kivity @ 2012-01-16  8:47 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Srivatsa Vaddagiri, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, Virtualization, Greg Kroah-Hartman,
	LKML
In-Reply-To: <5E0038B3-3830-4668-B4BB-781976710ED1@suse.de>

On 01/16/2012 06:00 AM, Alexander Graf wrote:
> On 16.01.2012, at 04:51, Srivatsa Vaddagiri wrote:
>
> > * Alexander Graf <agraf@suse.de> [2012-01-16 04:23:24]:
> > 
> >>> +5. KVM_HC_KICK_CPU
> >>> +------------------------
> >>> +value: 5
> >>> +Architecture: x86
> >>> +Purpose: Hypercall used to wakeup a vcpu from HLT state
> >>> +
> >>> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
> >>> +kernel mode for an event to occur (ex: a spinlock to become available)
> >>> +can execute HLT instruction once it has busy-waited for more than a
> >>> +threshold time-interval. Execution of HLT instruction would cause
> >>> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
> >>> +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
> >>> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
> >>> +wokenup.
> >> 
> >> The description is way too specific. The hypercall basically gives the guest the ability to yield() its current vcpu to another chosen vcpu.
> > 
> > Hmm ..the hypercall does not allow a vcpu to yield. It just allows some
> > target vcpu to be prodded/wokenup, after which vcpu continues execution.
> > 
> > Note that semantics of this hypercall is different from the hypercall on which
> > PPC pv-spinlock (__spin_yield()) is currently dependent. This is mainly because 
> > of ticketlocks on x86 (which does not allow us to easily store owning cpu
> > details in lock word itself).
>
> Yes, sorry for not being more exact in my wording. It is a directed yield(). Not like the normal old style thing that just says "I'm done, get some work to someone else" but more something like "I'm done, get some work to this specific guy over there" :).
>

It's not a yield.  It unhalts a vcpu.  Kind of like an IPI, but without
actually issuing an interrupt on the target, and disregarding the
interrupt flag.  It says nothing about the source.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
From: Avi Kivity @ 2012-01-16  8:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen, Suzuki Poulose
In-Reply-To: <03D10A71-19F8-4278-B7A4-3F618ED6ECF0@goop.org>

On 01/16/2012 08:40 AM, Jeremy Fitzhardinge wrote:
> > 
> > That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
> > 
> > Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.
>
> I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.

The wakeup path is slower though.  The previous lock holder has to
hypercall, and the new lock holder has to be scheduled, and transition
from halted state to running (a vmentry).  So it's only a clear win if
we can do something with the cpu other than go into the idle loop.

> > 
> > Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!
>
> Are you saying the threshold should be dynamic depending on how loaded the system is?  How can a guest know what the overall system contention is?  How should a guest use that to work out a good spin time?
>
> One possibility is to use the ticket lock queue depth to work out how contended the lock is, and therefore how long it might be worth waiting for.  I was thinking of something along the lines of "threshold = (THRESHOLD >> queue_depth)".  But that's pure hand wave, and someone would actually need to experiment before coming up with something reasonable.
>
> But all of this is good to consider for future work, rather than being essential for the first version.

Agree.

> > So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.
> > 
> > Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.
>
> Yes, that mechanism exists, but it doesn't solve a very interesting problem.
>
> The most important thing to solve is making sure that when *releasing* a ticketlock, the correct next VCPU gets scheduled promptly.  If you don't, you're just relying on the VCPU scheduler getting around to scheduling the correct VCPU, but if it doesn't it just ends up burning a timeslice of PCPU time while the wrong VCPU spins.

kvm does a directed yield to an unscheduled vcpu, selected in a round
robin fashion.  So if your overload factor is N (N runnable vcpus for
every physical cpu), and your spin counter waits for S cycles before
exiting, you will burn N*S cycles (actually more since there is overhead
involved, but lets fold it into S).

> Limiting the spin time with a timeout or the rep/nop interrupt somewhat mitigates this, but it still means you end up spending a lot of time slices spinning the wrong VCPU until it finally schedules the correct one.  And the more contended the machine is, the worse the problem gets.

Right.

>
> > Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.
>
> The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.
>
> But as I mentioned above, I'd like to see some benchmarks to prove that's the case.
>

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Avi Kivity @ 2012-01-16  9:00 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen, Suzuki
In-Reply-To: <20120114182710.8604.22277.sendpatchset@oc5400248562.ibm.com>

On 01/14/2012 08:27 PM, Raghavendra K T wrote:
> +
> +5. KVM_HC_KICK_CPU
> +------------------------
> +value: 5
> +Architecture: x86
> +Purpose: Hypercall used to wakeup a vcpu from HLT state
> +
> +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest
> +kernel mode for an event to occur (ex: a spinlock to become available)
> +can execute HLT instruction once it has busy-waited for more than a
> +threshold time-interval. Execution of HLT instruction would cause
> +the hypervisor to put the vcpu to sleep (unless yield_on_hlt=0) until occurence
> +of an appropriate event. Another vcpu of the same guest can wakeup the sleeping
> +vcpu by issuing KVM_HC_KICK_CPU hypercall, specifying APIC ID of the vcpu to be
> +wokenup.

Wait, what happens with yield_on_hlt=0?  Will the hypercall work as
advertised?

> +
> +TODO:
> +1. more information on input and output needed?
> +2. Add more detail to purpose of hypercalls.
>


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
From: Avi Kivity @ 2012-01-16  9:03 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki
In-Reply-To: <20120114182553.8604.41642.sendpatchset@oc5400248562.ibm.com>

On 01/14/2012 08:25 PM, Raghavendra K T wrote:
> Add a hypercall to KVM hypervisor to support pv-ticketlocks 
>
> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
>     
> The presence of these hypercalls is indicated to guest via
> KVM_FEATURE_PVLOCK_KICK/KVM_CAP_PVLOCK_KICK.
>
> Qemu needs a corresponding patch to pass up the presence of this feature to 
> guest via cpuid. Patch to qemu will be sent separately.

No need to discuss qemu in a kernel patch.

>  
> +/*
> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
> + *
> + * @apicid - apicid of vcpu to be kicked.
> + */
> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
> +{
> +	struct kvm_vcpu *vcpu = NULL;
> +	int i;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_apic_present(vcpu))
> +			continue;
> +
> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
> +			break;
> +	}
> +	if (vcpu) {
> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
> +		kvm_vcpu_kick(vcpu);
> +	}
> +}
> +

The code that handles KVM_REQ_PVLOCK_KICK needs to be in this patch.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
From: Avi Kivity @ 2012-01-16  9:05 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki
In-Reply-To: <20120114182645.8604.68884.sendpatchset@oc5400248562.ibm.com>

On 01/14/2012 08:26 PM, Raghavendra K T wrote:
> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks. 
>
> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>  support for pv-ticketlocks is registered via pv_lock_ops.
>
> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
> +
> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
> +
> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW]);
> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
> +
> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW]);
> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
> +		   &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
> +
> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
> +			   &spinlock_stats.time_blocked);
> +
> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
> +
>

Please drop all of these and replace with tracepoints in the appropriate
spots.  Everything else (including the historgram) can be reconstructed
the tracepoints in userspace.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Srivatsa Vaddagiri @ 2012-01-16  9:40 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman, LKML,
	Dave Hansen, Suzuki
In-Reply-To: <4F13E739.7040300@redhat.com>

* Avi Kivity <avi@redhat.com> [2012-01-16 11:00:41]:

> Wait, what happens with yield_on_hlt=0?  Will the hypercall work as
> advertised?

Hmm ..I don't think it will work when yield_on_hlt=0.

One option is to make the kick hypercall available only when
yield_on_hlt=1?

- vatsa

^ permalink raw reply

* Re: [PATCH RFC V4 2/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
From: Raghavendra K T @ 2012-01-16  9:55 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki
In-Reply-To: <4F13E7D3.1060004@redhat.com>

On 01/16/2012 02:33 PM, Avi Kivity wrote:
>> +/*
>> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
>> + *
>> + * @apicid - apicid of vcpu to be kicked.
>> + */
>> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
>> +{
>> +	struct kvm_vcpu *vcpu = NULL;
>> +	int i;
>> +
>> +	kvm_for_each_vcpu(i, vcpu, kvm) {
>> +		if (!kvm_apic_present(vcpu))
>> +			continue;
>> +
>> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
>> +			break;
>> +	}
>> +	if (vcpu) {
>> +		kvm_make_request(KVM_REQ_PVLOCK_KICK, vcpu);
>> +		kvm_vcpu_kick(vcpu);
>> +	}
>> +}
>> +
>
> The code that handles KVM_REQ_PVLOCK_KICK needs to be in this patch.
>
>

Yes, Agree. as Alex also pointed, the related hunk from patch 4 should 
be added here.

^ permalink raw reply

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Avi Kivity @ 2012-01-16 10:14 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman, LKML,
	Dave Hansen, Suzuki
In-Reply-To: <20120116094020.GA6019@linux.vnet.ibm.com>

On 01/16/2012 11:40 AM, Srivatsa Vaddagiri wrote:
> * Avi Kivity <avi@redhat.com> [2012-01-16 11:00:41]:
>
> > Wait, what happens with yield_on_hlt=0?  Will the hypercall work as
> > advertised?
>
> Hmm ..I don't think it will work when yield_on_hlt=0.
>
> One option is to make the kick hypercall available only when
> yield_on_hlt=1?

It's not a good idea to tie various options together.  Features should
be orthogonal.

Can't we make it work?  Just have different handling for
KVM_REQ_PVLOCK_KICK (let's rename it, and the hypercall, PV_UNHALT,
since we can use it for non-locks too).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
From: Alexander Graf @ 2012-01-16 10:24 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Raghavendra K T, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML, Dave Hansen
In-Reply-To: <03D10A71-19F8-4278-B7A4-3F618ED6ECF0@goop.org>


On 16.01.2012, at 07:40, Jeremy Fitzhardinge wrote:

> On Jan 16, 2012, at 2:57 PM, Alexander Graf wrote:
> 
>> 
>> On 14.01.2012, at 19:25, Raghavendra K T wrote:
>> 
>>> The 5-patch series to follow this email extends KVM-hypervisor and Linux guest 
>>> running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.
>>> 
>>> One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
>>> another vcpu out of halt state.
>>> The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
>> 
>> Is the code for this even upstream? Prerequisite series seem to have been posted by Jeremy, but they didn't appear to have made it in yet.
> 
> No, not yet.  The patches are unchanged since I last posted them, and as far as I know there are no objections to them, but I'd like to get some performance numbers just to make sure they don't cause any surprising regressions, especially in the non-virtual case.

Yup, that's a very good idea :)

> 
>> 
>> Either way, thinking about this I stumbled over the following passage of his patch:
>> 
>>> +               unsigned count = SPIN_THRESHOLD;
>>> +
>>> +               do {
>>> +                       if (inc.head == inc.tail)
>>> +                               goto out;
>>> +                       cpu_relax();
>>> +                       inc.head = ACCESS_ONCE(lock->tickets.head);
>>> +               } while (--count);
>>> +               __ticket_lock_spinning(lock, inc.tail);
>> 
>> 
>> That means we're spinning for n cycles, then notify the spinlock holder that we'd like to get kicked and go sleeping. While I'm pretty sure that it improves the situation, it doesn't solve all of the issues we have.
>> 
>> Imagine we have an idle host. All vcpus can freely run and everyone can fetch the lock as fast as on real machines. We don't need to / want to go to sleep here. Locks that take too long are bugs that need to be solved on real hw just as well, so all we do is possibly incur overhead.
> 
> I'm not quite sure what your concern is.  The lock is under contention, so there's nothing to do except spin; all this patch adds is a variable decrement/test to the spin loop, but that's not going to waste any more CPU than the non-counting case.  And once it falls into the blocking path, its a win because the VCPU isn't burning CPU any more.
> 
>> 
>> Imagine we have a contended host. Every vcpu gets at most 10% of a real CPU's runtime. So chances are 1:10 that you're currently running while you need to be. In such a setup, it's probably a good idea to be very pessimistic. Try to fetch the lock for 100 cycles and then immediately make room for all the other VMs that have real work going on!
> 
> Are you saying the threshold should be dynamic depending on how loaded the system is?  How can a guest know what the overall system contention is?  How should a guest use that to work out a good spin time?

I'm saying what I'm saying in the next paragraph :). The guest doesn't know, but the host does. So if we had shared memory between guest and host, the host could put its threshold limit in there, which on an idle system could be -1 and on a contended system could be 1.

> One possibility is to use the ticket lock queue depth to work out how contended the lock is, and therefore how long it might be worth waiting for.  I was thinking of something along the lines of "threshold = (THRESHOLD >> queue_depth)".  But that's pure hand wave, and someone would actually need to experiment before coming up with something reasonable.
> 
> But all of this is good to consider for future work, rather than being essential for the first version.

Well, yes, of course! It's by no means an objection to what's there today. I'm just trying to think of ways to make it even better :)

> 
>> So what I'm trying to get to is that if we had a hypervisor settable spin threshold, we could adjust it according to the host's load, getting VMs to behave differently on different (guest invisible) circumstances.
>> 
>> Speaking of which - don't we have spin lock counters in the CPUs now? I thought we could set intercepts that notify us when the guest issues too many repz nops or whatever the typical spinlock identifier was. Can't we reuse that and just interrupt the guest if we see this with a special KVM interrupt that kicks off the internal spin lock waiting code? That way we don't slow down all those bare metal boxes.
> 
> Yes, that mechanism exists, but it doesn't solve a very interesting problem.
> 
> The most important thing to solve is making sure that when *releasing* a ticketlock, the correct next VCPU gets scheduled promptly.  If you don't, you're just relying on the VCPU scheduler getting around to scheduling the correct VCPU, but if it doesn't it just ends up burning a timeslice of PCPU time while the wrong VCPU spins.
> 
> Limiting the spin time with a timeout or the rep/nop interrupt somewhat mitigates this, but it still means you end up spending a lot of time slices spinning the wrong VCPU until it finally schedules the correct one.  And the more contended the machine is, the worse the problem gets.

This is true in case you're spinning. If on overcommit spinlocks would instead of spin just yield(), we wouldn't have any vcpu running that's just waiting for a late ticket.

We still have an issue finding the point in time when a vcpu could run again, which is what this whole series is about. My point above was that instead of doing a count loop, we could just do the normal spin dance and set the threshold to when we enable the magic to have another spin lock notify us in the CPU. That way we

  * don't change the uncontended case
  * can set the threshold on the host, which knows how contended the system is

And since we control what spin locks look like, we can for example always keep the pointer to it in a specific register so that we can handle pv_lock_ops.lock_spinning() inside there and fetch all the information we need from our pt_regs.

> 
>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal? Last time I checked, enabling all the PV ops did incur significant slowdown which is why I went though the work to split the individual pv ops features up to only enable a few for KVM guests.
> 
> The whole point of the pv-ticketlock work is to keep the pvops hooks out of the locking fast path, so that the calls are only made on the slow path - that is, when spinning too long on a contended lock, and when releasing a lock that's in a "slow" state.  In the fast path case of no contention, there are no pvops, and the executed code path is almost identical to native.

You're still changing a tight loop that does nothing (CPU detects it and saves power) into something that performs calculations.

> But as I mentioned above, I'd like to see some benchmarks to prove that's the case.

Yes, that would be very good to have :)


Alex

^ permalink raw reply

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Alexander Graf @ 2012-01-16 10:26 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, linux-doc, Peter Zijlstra, Jan Kiszka,
	Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Srivatsa Vaddagiri, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	Greg Kroah-Hartman, LKML, Dave Hansen
In-Reply-To: <4F13E36B.9020408@linux.vnet.ibm.com>


On 16.01.2012, at 09:44, Raghavendra K T wrote:

> On 01/16/2012 08:53 AM, Alexander Graf wrote:
>> 
>> On 14.01.2012, at 19:27, Raghavendra K T wrote:
>> 
>>> Add Documentation on CPUID, KVM_CAP_PVLOCK_KICK, and Hypercalls supported.
>>> 
>>> KVM_HC_KICK_CPU  hypercall added to wakeup halted vcpu in
>>> paravirtual spinlock enabled guest.
>>> 
>>> KVM_FEATURE_PVLOCK_KICK enables guest to check whether pv spinlock can
>>> be enabled in guest. support in host is queried via
>>> ioctl(KVM_CHECK_EXTENSION, KVM_CAP_PVLOCK_KICK)
>>> 
>>> A minimal Documentation and template is added for hypercalls.
>>> 
>>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>>> ---
> [...]
>>> diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt
>>> new file mode 100644
>>> index 0000000..7872da5
>>> --- /dev/null
>>> +++ b/Documentation/virtual/kvm/hypercalls.txt
>>> @@ -0,0 +1,54 @@
>>> +KVM Hypercalls Documentation
>>> +===========================
> 
>>> +2. KVM_HC_MMU_OP
>>> +------------------------
>>> +value: 2
>>> +Architecture: x86
>>> +Purpose: Support MMU operations such as writing to PTE,
>>> +flushing TLB, release PT.
>> 
>> This one is deprecated, no? Should probably be mentioned here.
> 
> Ok, then may be adding state = deprecated/obsolete/in use (active) may
> be good idea.
> 
>> 
>>> +
>>> +3. KVM_HC_FEATURES
>>> +------------------------
>>> +value: 3
>>> +Architecture: PPC
>>> +Purpose:
>> 
>> Expose hypercall availability to the guest. On x86 you use cpuid to enumerate which hypercalls are available. The natural fit on ppc would be device tree based lookup (which is also what EPAPR dictates), but we also have a second enumeration mechanism that's KVM specific - which is this hypercall.
>> 
> 
> Thanks, will add this. I hope you are OK if I add Signed-off-by: you.

I don't think you need a signed-off-by from me for this very simple documentation addition :). You should probably also reword it. I didn't quite write it as a paragraph that should go into the file.


Alex

^ permalink raw reply

* Re: [PATCH] vhost-net: add module alias (v2.1)
From: Alan Cox @ 2012-01-16 12:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, netdev, kay.sievers, David Miller, shemminger,
	virtualization, device
In-Reply-To: <20120115124236.GA31012@redhat.com>

> > ACKs, NACKs?  What is happening here?
> 
> I would like an Ack from Alan Cox who switched vhost-net
> to a dynamic minor in the first place, in commit
> 79907d89c397b8bc2e05b347ec94e928ea919d33.

Sorry device@lanana.org isn't yet back from the kernel hack incident.

I don't read netdev so someone needs to summarise the issue and send me
a copy of the patch to look at.

Alan

^ permalink raw reply

* Re: [PATCH] vhost-net: add module alias
From: Avi Kivity @ 2012-01-16 12:28 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, virtualization, kvm, Michael S. Tsirkin
In-Reply-To: <20120110205400.6c1cb306@nehalam.linuxnetplumber.net>

On 01/11/2012 06:54 AM, Stephen Hemminger wrote:
> By adding the a module alias, programs (or users) won't have to explicitly
> call modprobe. Vhost-net will always be available if built into the kernel.
> It does require assigning a permanent minor number for depmod to work.
> Choose one next to TUN since this driver is related to it.

Statically allocated numbers have to go through lanana, no?

This increases the security exposure and the kernel footprint for hosts
that don't want vhost-net.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
From: Raghavendra K T @ 2012-01-16 13:43 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Avi Kivity, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen
In-Reply-To: <3EC1B881-0724-49E3-B892-F40BEB07D15D@suse.de>

On 01/16/2012 09:27 AM, Alexander Graf wrote:
>
[...]
>> Result for PLE machine:
>> ======================
>> Machine : IBM xSeries with Intel(R) Xeon(R)  X7560 2.27GHz CPU with 32/64 core, with 8
>>          online cores and 4*64GB RAM
>>
>> Kernbench:
>> 		 BASE                    BASE+patch            %improvement
>> 		 mean (sd)               mean (sd)
>> Scenario A:	 			
>> case 1x:	 161.263 (56.518) 	 159.635 (40.5621) 	1.00953
>> case 2x:	 190.748 (61.2745) 	 190.606 (54.4766) 	0.0744438
>> case 3x:	 227.378 (100.215) 	 225.442 (92.0809) 	0.851446
>>
>> Scenario B:
>> 		 446.104 (58.54 )	 433.12733 (54.476)	2.91
>>
>> Dbench:
>> Throughput is in MB/sec
>> NRCLIENTS	 BASE                    BASE+patch            %improvement
>>                	 mean (sd)               mean (sd)
>> 8       	1.101190  (0.875082) 	1.700395  (0.846809) 	54.4143
>> 16      	1.524312  (0.120354) 	1.477553  (0.058166) 	-3.06755
>> 32        	2.143028  (0.157103) 	2.090307  (0.136778) 	-2.46012
>
> So on a very contended system we're actually slower? Is this expected?
>
>

I think, the result is interesting because its PLE machine. I have to 
experiment more with parameters, SPIN_THRESHOLD, and also may be ple_gap 
and ple_window.

> Alex
>
>

^ permalink raw reply

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
From: Avi Kivity @ 2012-01-16 13:49 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki
In-Reply-To: <4F14296B.4070003@linux.vnet.ibm.com>

On 01/16/2012 03:43 PM, Raghavendra K T wrote:
>>> Dbench:
>>> Throughput is in MB/sec
>>> NRCLIENTS     BASE                    BASE+patch           
>>> %improvement
>>>                     mean (sd)               mean (sd)
>>> 8           1.101190  (0.875082)     1.700395  (0.846809)     54.4143
>>> 16          1.524312  (0.120354)     1.477553  (0.058166)     -3.06755
>>> 32            2.143028  (0.157103)     2.090307  (0.136778)    
>>> -2.46012
>>
>> So on a very contended system we're actually slower? Is this expected?
>>
>>
>
>
> I think, the result is interesting because its PLE machine. I have to
> experiment more with parameters, SPIN_THRESHOLD, and also may be
> ple_gap and ple_window.

Perhaps the PLE stuff fights with the PV stuff?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH RFC V4 5/5] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
From: Srivatsa Vaddagiri @ 2012-01-16 14:11 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Rik van Riel, Konrad Rzeszutek Wilk, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, Greg Kroah-Hartman, LKML,
	Dave Hansen, Suzuki
In-Reply-To: <4F13F883.5090002@redhat.com>

* Avi Kivity <avi@redhat.com> [2012-01-16 12:14:27]:

> > One option is to make the kick hypercall available only when
> > yield_on_hlt=1?
> 
> It's not a good idea to tie various options together.  Features should
> be orthogonal.
> 
> Can't we make it work?  Just have different handling for
> KVM_REQ_PVLOCK_KICK (let 's rename it, and the hypercall, PV_UNHALT,
> since we can use it for non-locks too).

The problem case I was thinking of was when guest VCPU would have issued
HLT with interrupts disabled. I guess one option is to inject an NMI,
and have the guest kernel NMI handler recognize this and make
adjustments such that the vcpu avoids going back to HLT instruction.

Having another hypercall to do yield/sleep (rather than effecting that
via HLT) seems like an alternate clean solution here ..

- vatsa

^ permalink raw reply

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
From: Raghavendra K T @ 2012-01-16 14:13 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki
In-Reply-To: <4F13E84C.3010808@redhat.com>

On 01/16/2012 02:35 PM, Avi Kivity wrote:
> On 01/14/2012 08:26 PM, Raghavendra K T wrote:
>> Extends Linux guest running on KVM hypervisor to support pv-ticketlocks.
>>
>> During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
>> required feature (KVM_FEATURE_PVLOCK_KICK) to support pv-ticketlocks. If so,
>>   support for pv-ticketlocks is registered via pv_lock_ops.
>>
>> Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.
>> +
>> +	debugfs_create_u8("zero_stats", 0644, d_spin_debug,&zero_stats);
>> +
>> +	debugfs_create_u32("taken_slow", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[TAKEN_SLOW]);
>> +	debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
>> +
>> +	debugfs_create_u32("released_slow", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[RELEASED_SLOW]);
>> +	debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
>> +		&spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
>> +
>> +	debugfs_create_u64("time_blocked", 0444, d_spin_debug,
>> +			&spinlock_stats.time_blocked);
>> +
>> +	debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
>> +		     spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
>> +
>>
>
> Please drop all of these and replace with tracepoints in the appropriate
> spots.  Everything else (including the historgram) can be reconstructed
> the tracepoints in userspace.
>

I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
is the option.. no ?

^ permalink raw reply

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
From: Srivatsa Vaddagiri @ 2012-01-16 14:20 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML
In-Reply-To: <3EC1B881-0724-49E3-B892-F40BEB07D15D@suse.de>

* Alexander Graf <agraf@suse.de> [2012-01-16 04:57:45]:

> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?

You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for 
some workload(s)?

In some sense, the 1x overcommitcase results posted does measure the overhead
of (pv-)spinlocks no? We don't see any overhead in that case for atleast
kernbench ..

> Result for Non PLE machine :
> ============================

[snip]

> Kernbench:
>                BASE                    BASE+patch
>                %improvement
>                mean (sd)               mean (sd)
> Scenario A:
> case 1x:	 164.233 (16.5506)	 163.584 (15.4598	0.39517

[snip]

> Result for PLE machine:
> ======================

[snip]
> Kernbench:
>                BASE                    BASE+patch
>                %improvement
>                mean (sd)               mean (sd)
> Scenario A:
> case 1x:	 161.263 (56.518)        159.635 (40.5621)	1.00953

- vatsa

^ permalink raw reply

* Re: [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
From: Alexander Graf @ 2012-01-16 14:23 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jeremy Fitzhardinge, Raghavendra K T, linux-doc, Peter Zijlstra,
	Jan Kiszka, Virtualization, Paul Mackerras, H. Peter Anvin,
	Stefano Stabellini, Xen, Dave Jiang, KVM, Glauber Costa, X86,
	Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
	Greg Kroah-Hartman, Sasha Levin, Sedat Dilek, Thomas Gleixner,
	LKML
In-Reply-To: <20120116142014.GA10155@linux.vnet.ibm.com>


On 16.01.2012, at 15:20, Srivatsa Vaddagiri wrote:

> * Alexander Graf <agraf@suse.de> [2012-01-16 04:57:45]:
> 
>> Speaking of which - have you benchmarked performance degradation of pv ticket locks on bare metal?
> 
> You mean, run kernel on bare metal with CONFIG_PARAVIRT_SPINLOCKS
> enabled and compare how it performs with CONFIG_PARAVIRT_SPINLOCKS disabled for 
> some workload(s)?

Yup

> 
> In some sense, the 1x overcommitcase results posted does measure the overhead
> of (pv-)spinlocks no? We don't see any overhead in that case for atleast
> kernbench ..
> 
>> Result for Non PLE machine :
>> ============================
> 
> [snip]
> 
>> Kernbench:
>>               BASE                    BASE+patch

What is BASE really? Is BASE already with the PV spinlocks enabled? I'm having a hard time understanding which tree you're working against, since the prerequisites aren't upstream yet.


Alex

>>               %improvement
>>               mean (sd)               mean (sd)
>> Scenario A:
>> case 1x:	 164.233 (16.5506)	 163.584 (15.4598	0.39517
> 
> [snip]
> 
>> Result for PLE machine:
>> ======================
> 
> [snip]
>> Kernbench:
>>               BASE                    BASE+patch
>>               %improvement
>>               mean (sd)               mean (sd)
>> Scenario A:
>> case 1x:	 161.263 (56.518)        159.635 (40.5621)	1.00953
> 
> - vatsa
> 

^ permalink raw reply

* Re: [PATCH RFC V4 4/5] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
From: Avi Kivity @ 2012-01-16 14:47 UTC (permalink / raw)
  To: Raghavendra K T
  Cc: Jeremy Fitzhardinge, Greg Kroah-Hartman, linux-doc,
	Peter Zijlstra, Jan Kiszka, Virtualization, Paul Mackerras,
	H. Peter Anvin, Stefano Stabellini, Xen, Dave Jiang, KVM,
	Glauber Costa, X86, Ingo Molnar, Rik van Riel,
	Konrad Rzeszutek Wilk, Srivatsa Vaddagiri, Sasha Levin,
	Sedat Dilek, Thomas Gleixner, LKML, Dave Hansen, Suzuki
In-Reply-To: <4F1430A2.1080401@linux.vnet.ibm.com>

On 01/16/2012 04:13 PM, Raghavendra K T wrote:
>> Please drop all of these and replace with tracepoints in the appropriate
>> spots.  Everything else (including the historgram) can be reconstructed
>> the tracepoints in userspace.
>>
>
>
> I think Jeremy pointed that tracepoints use spinlocks and hence debugfs
> is the option.. no ?
>

Yeah, I think you're right.  What a pity.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: [PATCH] vhost-net: add module alias (v2.1)
From: Stephen Hemminger @ 2012-01-16 15:52 UTC (permalink / raw)
  To: Alan Cox
  Cc: kvm, Michael S. Tsirkin, netdev, kay.sievers, virtualization,
	device, David Miller
In-Reply-To: <20120116122645.2257b40b@bob.linux.org.uk>

On Mon, 16 Jan 2012 12:26:45 +0000
Alan Cox <alan@linux.intel.com> wrote:

> > > ACKs, NACKs?  What is happening here?
> > 
> > I would like an Ack from Alan Cox who switched vhost-net
> > to a dynamic minor in the first place, in commit
> > 79907d89c397b8bc2e05b347ec94e928ea919d33.
> 
> Sorry device@lanana.org isn't yet back from the kernel hack incident.
> 
> I don't read netdev so someone needs to summarise the issue and send me
> a copy of the patch to look at.
> 
> Alan

Subject: vhost-net: add module alias (v2.1)

By adding some module aliases, programs (or users) won't have to explicitly
call modprobe. Vhost-net will always be available if built into the kernel.
It does require assigning a permanent minor number for depmod to work.

Also:
  - use C99 style initialization.
  - add missing entry in documentation for loop-control

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
2.1 - add missing documentation for loop control as well

 Documentation/devices.txt  |    3 +++
 drivers/vhost/net.c        |    8 +++++---
 include/linux/miscdevice.h |    1 +
 3 files changed, 9 insertions(+), 3 deletions(-)

--- a/drivers/vhost/net.c	2012-01-12 14:14:25.681815487 -0800
+++ b/drivers/vhost/net.c	2012-01-12 18:09:56.810680816 -0800
@@ -856,9 +856,9 @@ static const struct file_operations vhos
 };
 
 static struct miscdevice vhost_net_misc = {
-	MISC_DYNAMIC_MINOR,
-	"vhost-net",
-	&vhost_net_fops,
+	.minor = VHOST_NET_MINOR,
+	.name = "vhost-net",
+	.fops = &vhost_net_fops,
 };
 
 static int vhost_net_init(void)
@@ -879,3 +879,5 @@ MODULE_VERSION("0.0.1");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Michael S. Tsirkin");
 MODULE_DESCRIPTION("Host kernel accelerator for virtio net");
+MODULE_ALIAS_MISCDEV(VHOST_NET_MINOR);
+MODULE_ALIAS("devname:vhost-net");
--- a/include/linux/miscdevice.h	2012-01-12 14:14:25.725815981 -0800
+++ b/include/linux/miscdevice.h	2012-01-12 18:09:56.810680816 -0800
@@ -42,6 +42,7 @@
 #define AUTOFS_MINOR		235
 #define MAPPER_CTRL_MINOR	236
 #define LOOP_CTRL_MINOR		237
+#define VHOST_NET_MINOR		238
 #define MISC_DYNAMIC_MINOR	255
 
 struct device;
--- a/Documentation/devices.txt	2012-01-12 14:14:25.701815712 -0800
+++ b/Documentation/devices.txt	2012-01-12 18:09:56.814680860 -0800
@@ -447,6 +447,9 @@ Your cooperation is appreciated.
 		234 = /dev/btrfs-control	Btrfs control device
 		235 = /dev/autofs	Autofs control device
 		236 = /dev/mapper/control	Device-Mapper control device
+		237 = /dev/loop-control Loopback control device
+		238 = /dev/vhost-net	Host kernel accelerator for virtio net
+
 		240-254			Reserved for local use
 		255			Reserved for MISC_DYNAMIC_MINOR

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox