Re: [PATCH/RFC] KVM: halt_polling: provide a way to qualify wakeups during poll

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Christian Borntraeger <borntraeger@de.ibm.com>
To: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, KVM <kvm@vger.kernel.org>,
	Cornelia Huck <cornelia.huck@de.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>,
	Jens Freimann <jfrei@linux.vnet.ibm.com>,
	David Hildenbrand <dahi@linux.vnet.ibm.com>
Subject: Re: [PATCH/RFC] KVM: halt_polling: provide a way to qualify wakeups during poll
Date: Mon, 2 May 2016 16:30:33 +0200	[thread overview]
Message-ID: <57276489.2050902@de.ibm.com> (raw)
In-Reply-To: <20160502133428.GA30059@potion>

On 05/02/2016 03:34 PM, Radim Krčmář wrote:
> 2016-05-02 12:42+0200, Christian Borntraeger:
>> Radim, Paolo,
>>
>> can you have a look at this patch? If you are ok with it, I want to
>> submit this patch with my next s390 pull request. It touches KVM common
>> code, but I tried to make it a nop for everything but s390.
> 
> (I have few questions and will ack the solution if you stand behind it.)
> 
>> Christian
>>
>> ----snip----
>>
>>
>> Some wakeups should not be considered a sucessful poll. For example on
>> s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
>> would be considered runnable - letting all vCPUs poll all the time for
>> transactional like workload, even if one vCPU would be enough.
>>
>> This can result in huge CPU usage for large guests.
>> This patch lets architectures provide a way to qualify wakeups if they
>> should be considered a good/bad wakeups in regard to polls.
>>
>> For s390 the implementation will fence of halt polling for anything but
>> known good, single vCPU events. The s390 implementation for floating
>> interrupts does a wakeup for one vCPU, but the interrupt will be delivered
>> by whatever CPU comes first. To limit the halt polling we only mark the
>> woken up CPU as a valid poll. This code will also cover several other
>> wakeup reasons like IPI or expired timers. This will of course also mark
>> some events as not sucessful. As  KVM on z runs always as a 2nd level
>> hypervisor, we prefer to not poll, unless we are really sure, though.
>>
>> So we start with a minimal set and will provide additional patches in
>> the future that mark additional code paths as valid wakeups, if that
>> turns out to be necessary.
>>
>> This patch successfully limits the CPU usage for cases like uperf 1byte
>> transactional ping pong workload or wakeup heavy workload like OLTP
>> while still providing a proper speedup.
>>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
>> @@ -976,6 +976,14 @@ no_timer:
>>  
>>  void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu)
>>  {
>> +	/*
>> +	 * This is outside of the if because we want to mark the wakeup
>> +	 * as valid for vCPUs that
>> +	 * a: do polling right now
>> +	 * b: do sleep right now
>> +	 * otherwise we would never grow the poll interval properly
>> +	 */
>> +	vcpu_set_valid_wakeup(vcpu);
>>  	if (waitqueue_active(&vcpu->wq)) {
> 
> (Can't kvm_s390_vcpu_wakeup() be called when the vcpu isn't in
>  kvm_vcpu_block()?  Either this condition is useless or we'd the set
>  vcpu_set_valid_wakeup() for any future wakeup.)

Yes, for example a timer might expire (see  kvm_s390_idle_wakeup) AND the
vcpu was already woken up by an I/O interrupt and we are in the process of
leaving kvm_vcpu_block. And yes, we might overindicate and set valid wakeup
in that case, but this is fine as this is jut a heuristics which will recover.
 
The problem is, that I cannot move vcpu_set_valid_wakeup inside the if,
because then a VCPU can be inside kvm_vcpu_block (polling) but the waitqueue
is not yet active. (in other words, the poll interval will be 0, or grow
once just to be reset to 0 afterwards.)

> 
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> @@ -224,6 +224,7 @@ struct kvm_vcpu {
>>  	sigset_t sigset;
>>  	struct kvm_vcpu_stat stat;
>>  	unsigned int halt_poll_ns;
>> +	bool valid_wakeup;
>>  
>>  #ifdef CONFIG_HAS_IOMEM
>>  	int mmio_needed;
>> @@ -1178,4 +1179,37 @@ int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
>>  				  uint32_t guest_irq, bool set);
>>  #endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */
>>  
>> +#ifdef CONFIG_HAVE_KVM_INVALID_POLLS
>> +/* If we wakeup during the poll time, was it a sucessful poll? */
>> +static inline bool vcpu_valid_wakeup(struct kvm_vcpu *vcpu)
> 
> (smp barriers?)

Not sure. Do we need to order valid_wakeup against other stores/reads?
To me it looks like the order of stores/fetches for the different values
should not matter.
I can certainly add smp_rmb/wmb to getters/setters, but I can not see a
problematic case right now and barriers require comments. Can you elaborate
what you see as potential issue?

> 
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> @@ -41,6 +41,10 @@ config KVM_VFIO
>> +config HAVE_KVM_INVALID_POLLS
>> +       bool
>> +
>> +
> 
> (One newline is enough.)

sure.
> 

>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> @@ -2008,7 +2008,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>>  			 * arrives.
>>  			 */
>>  			if (kvm_vcpu_check_block(vcpu) < 0) {
>> -				++vcpu->stat.halt_successful_poll;
>> +				if (vcpu_valid_wakeup(vcpu))
>> +					++vcpu->stat.halt_successful_poll;
> 
> KVM didn't call schedule(), so it's still a successful poll, IMO, just
> invalid.

so just always do ++vcpu->stat.halt_successful_poll; and add another counter 
that counts polls that will not be used for growing/shrinking?
like
			++vcpu->stat.halt_successful_poll;
			if (!vcpu_valid_wakeup(vcpu))
				++vcpu->stat.halt_poll_no_tuning; 

?



> 
>>  				goto out;
>>  			}
>>  			cur = ktime_get();
>> @@ -2038,14 +2039,16 @@ out:
>>  		if (block_ns <= vcpu->halt_poll_ns)
>>  			;
>>  		/* we had a long block, shrink polling */
>> -		else if (vcpu->halt_poll_ns && block_ns > halt_poll_ns)
>> +		else if (!vcpu_valid_wakeup(vcpu) ||
>> +			(vcpu->halt_poll_ns && block_ns > halt_poll_ns))
>>  			shrink_halt_poll_ns(vcpu);
> 
> Is the shrinking important?
> 
>>  		/* we had a short halt and our poll time is too small */
>>  		else if (vcpu->halt_poll_ns < halt_poll_ns &&
>> -			block_ns < halt_poll_ns)
>> +			block_ns < halt_poll_ns && vcpu_valid_wakeup(vcpu))
>>  			grow_halt_poll_ns(vcpu);
> 
> IIUC, the problem comes from overgrown halt_poll_ns, so couldn't we just
> ignore all invalid wakeups?

I have some pathological cases where I can easily get all CPUs to poll all
the time without the shrinking part of the patch. (e.g. guest with 16 CPUs,
8 null block devices and 64 dd reading small blocks with O_DIRECT from these disks)
which cause permanent exits which consumes all 16 host CPUs. Limiting the grow
did not seem to be enough in my testing, but when I also made shrinking more
aggressive things improved.

But I am certainly open for other ideas how to tune this.



> 
> It would make more sense to me, because we are not interested in latency
> of invalid wakeups, so they shouldn't affect valid ones.
> 
>>  	} else
>>  		vcpu->halt_poll_ns = 0;
>> +	vcpu_reset_wakeup(vcpu);
>>  
>>  	trace_kvm_vcpu_wakeup(block_ns, waited);
> 
> (Tracing valid/invalid wakeups could be useful.)

As an extension of the old trace events?

next prev parent reply	other threads:[~2016-05-02 14:30 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-02 10:42 [PATCH/RFC] KVM: halt_polling: provide a way to qualify wakeups during poll Christian Borntraeger
2016-05-02 10:45 ` David Hildenbrand
2016-05-02 11:46 ` Cornelia Huck
2016-05-02 11:50   ` Christian Borntraeger
2016-05-02 13:34 ` Radim Krčmář
2016-05-02 14:30   ` Christian Borntraeger [this message]
2016-05-02 15:25     ` Radim Krčmář
2016-05-03  8:55       ` Christian Borntraeger
2016-05-02 19:44 ` David Matlack
2016-05-03  8:46   ` Christian Borntraeger
2016-05-03  5:42 ` Wanpeng Li
2016-05-03  7:00   ` Christian Borntraeger
2016-05-03  9:19     ` Cornelia Huck
2016-05-10 13:54     ` Paolo Bonzini
2016-05-03  7:50 ` Wanpeng Li
2016-05-03  8:00   ` Cornelia Huck
2016-05-03  8:00   ` Christian Borntraeger
2016-05-03  8:48     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57276489.2050902@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=dahi@linux.vnet.ibm.com \
    --cc=jfrei@linux.vnet.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.