Re: [PATCH v7 04/11] KVM: MMU: zap pages in batch

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com, avi.kivity@gmail.com, pbonzini@redhat.com,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH v7 04/11] KVM: MMU: zap pages in batch
Date: Wed, 29 May 2013 22:00:08 +0800	[thread overview]
Message-ID: <51A609E8.20400@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130529132144.GF5931@amt.cnet>

On 05/29/2013 09:21 PM, Marcelo Tosatti wrote:
> On Wed, May 29, 2013 at 09:09:09PM +0800, Xiao Guangrong wrote:
>> On 05/29/2013 07:11 PM, Marcelo Tosatti wrote:
>>> On Tue, May 28, 2013 at 11:02:09PM +0800, Xiao Guangrong wrote:
>>>> On 05/28/2013 08:18 AM, Marcelo Tosatti wrote:
>>>>> On Mon, May 27, 2013 at 10:20:12AM +0800, Xiao Guangrong wrote:
>>>>>> On 05/25/2013 04:34 AM, Marcelo Tosatti wrote:
>>>>>>> On Thu, May 23, 2013 at 03:55:53AM +0800, Xiao Guangrong wrote:
>>>>>>>> Zap at lease 10 pages before releasing mmu-lock to reduce the overload
>>>>>>>> caused by requiring lock
>>>>>>>>
>>>>>>>> After the patch, kvm_zap_obsolete_pages can forward progress anyway,
>>>>>>>> so update the comments
>>>>>>>>
>>>>>>>> [ It improves kernel building 0.6% ~ 1% ]
>>>>>>>
>>>>>>> Can you please describe the overload in more detail? Under what scenario
>>>>>>> is kernel building improved?
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>> The scenario is we do kernel building, meanwhile, repeatedly read PCI rom
>>>>>> every one second.
>>>>>>
>>>>>> [
>>>>>>    echo 1 > /sys/bus/pci/devices/0000\:00\:03.0/rom
>>>>>>    cat /sys/bus/pci/devices/0000\:00\:03.0/rom > /dev/null
>>>>>> ]
>>>>>
>>>>> Can't see why it reflects real world scenario (or a real world
>>>>> scenario with same characteristics regarding kvm_mmu_zap_all vs faults)?
>>>>>
>>>>> Point is, it would be good to understand why this change 
>>>>> is improving performance? What are these cases where breaking out of
>>>>> kvm_mmu_zap_all due to either (need_resched || spin_needbreak) on zapped
>>>>> < 10 ?
>>>>
>>>> When guest read ROM, qemu will set the memory to map the device's firmware,
>>>> that is why kvm_mmu_zap_all can be called in the scenario.
>>>>
>>>> The reasons why it heart the performance are:
>>>> 1): Qemu use a global io-lock to sync all vcpu, so that the io-lock is held
>>>>     when we do kvm_mmu_zap_all(). If kvm_mmu_zap_all() is not efficient, all
>>>>     other vcpus need wait a long time to do I/O.
>>>>
>>>> 2): kvm_mmu_zap_all() is triggered in vcpu context. so it can block the IPI
>>>>     request from other vcpus.
>>>>
>>>> Is it enough?
>>>
>>> That is no problem. The problem is why you chose "10" as the minimum number of
>>> pages to zap before considering reschedule. I would expect the need to
>>
>> Well, my description above explained why batch-zapping is needed - we do
>> not want the vcpu spend lots of time to zap all pages because it hurts other
>> vcpus running.
>>
>> But, why the batch page number is "10"... I can not answer this, i just guessed
>> that '10' can make vcpu do not spend long time on zap_all_pages and do
>> not cause mmu-lock too hungry. "10" is the speculative value and i am not sure
>> it is the best value but at lease, i think it can work.
>>
>>> reschedule to be rare enough that one kvm_mmu_zap_all instance (between
>>> schedule in and schedule out) to be able to release no less than a
>>> thousand pages.
>>
>> Unfortunately, no.
>>
>> This information is I replied Gleb in his mail where he raced a question that
>> why "collapse tlb flush is needed":
>>
>> ======
>> It seems no.
>> Since we have reloaded mmu before zapping the obsolete pages, the mmu-lock
>> is easily contended. I did the simple track:
>>
>> +       int num = 0;
>>  restart:
>>         list_for_each_entry_safe_reverse(sp, node,
>>               &kvm->arch.active_mmu_pages, link) {
>> @@ -4265,6 +4265,7 @@ restart:
>>                 if (batch >= BATCH_ZAP_PAGES &&
>>                       cond_resched_lock(&kvm->mmu_lock)) {
>>                         batch = 0;
>> +                       num++;
>>                         goto restart;
>>                 }
>>
>> @@ -4277,6 +4278,7 @@ restart:
>>          * may use the pages.
>>          */
>>         kvm_mmu_commit_zap_page(kvm, &invalid_list);
>> +       printk("lock-break: %d.\n", num);
>>  }
>>
>> I do read pci rom when doing kernel building in the guest which
>> has 1G memory and 4vcpus with ept enabled, this is the normal
>> workload and normal configuration.
>>
>> # dmesg
>> [ 2338.759099] lock-break: 8.
>> [ 2339.732442] lock-break: 5.
>> [ 2340.904446] lock-break: 3.
>> [ 2342.513514] lock-break: 3.
>> [ 2343.452229] lock-break: 3.
>> [ 2344.981599] lock-break: 4.
>>
>> Basically, we need to break many times.
>> ======
>>
>> You can see we should break 3 times to zap all pages even if we have zapoed
>> 10 pages in batch. It is obviously that it need break more times without
>> batch-zapping.
> 
> Yes, but this is not a real scenario, or even describes a real scenario
> as far as i know. 

Aha.

Okay, maybe "read rom" is not the common case, but vcpu can trigger it or guest
driver can do in the further. What happen if vcpu trigger it? The worst
case is, one vcpu is always break mmu-lock due to one other vcpus doing intense
memory access and the reset vcpus is waiting IO or its IPI. It is easy soft-lockup.

More worse, if host memory is really less and host is always trying to reclaim qemu's
memory, that cause the lock always be hot and can not zap one page before rescheduled.

> 
> Are you sure this minimum-batching-before-considering-reschedule even
> after obsolete pages optimization?

Yes, this track is after all patches in this series are applied.

next prev parent reply	other threads:[~2013-05-29 14:00 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-22 19:55 [PATCH v7 00/11] KVM: MMU: fast zap all shadow pages Xiao Guangrong
2013-05-22 19:55 ` [PATCH v7 01/11] KVM: x86: drop calling kvm_mmu_zap_all in emulator_fix_hypercall Xiao Guangrong
2013-05-22 19:55 ` [PATCH v7 02/11] KVM: MMU: drop unnecessary kvm_reload_remote_mmus Xiao Guangrong
2013-05-22 19:55 ` [PATCH v7 03/11] KVM: MMU: fast invalidate all pages Xiao Guangrong
2013-05-24 20:23   ` Marcelo Tosatti
2013-05-26  8:26     ` Gleb Natapov
2013-05-26 20:37       ` Marcelo Tosatti
2013-05-27 22:59         ` Xiao Guangrong
2013-05-27  2:02     ` Xiao Guangrong
2013-05-22 19:55 ` [PATCH v7 04/11] KVM: MMU: zap pages in batch Xiao Guangrong
2013-05-24 20:34   ` Marcelo Tosatti
2013-05-27  2:20     ` Xiao Guangrong
2013-05-28  0:18       ` Marcelo Tosatti
2013-05-28 15:02         ` Xiao Guangrong
2013-05-29 11:11           ` Marcelo Tosatti
2013-05-29 13:09             ` Xiao Guangrong
2013-05-29 13:21               ` Marcelo Tosatti
2013-05-29 14:00                 ` Xiao Guangrong [this message]
2013-05-29 13:32               ` Marcelo Tosatti
2013-05-29 14:02                 ` Xiao Guangrong
2013-05-29 16:03                   ` Xiao Guangrong
2013-05-22 19:55 ` [PATCH v7 05/11] KVM: x86: use the fast way to invalidate all pages Xiao Guangrong
2013-05-22 19:55 ` [PATCH v7 06/11] KVM: MMU: show mmu_valid_gen in shadow page related tracepoints Xiao Guangrong
2013-05-22 19:55 ` [PATCH v7 07/11] KVM: MMU: add tracepoint for kvm_mmu_invalidate_all_pages Xiao Guangrong
2013-05-22 19:55 ` [PATCH v7 08/11] KVM: MMU: do not reuse the obsolete page Xiao Guangrong
2013-05-22 19:55 ` [PATCH v7 09/11] KVM: MMU: introduce kvm_mmu_prepare_zap_obsolete_page Xiao Guangrong
2013-05-23  5:57   ` Gleb Natapov
2013-05-23  6:13     ` Xiao Guangrong
2013-05-23  6:18       ` Gleb Natapov
2013-05-23  6:31         ` Xiao Guangrong
2013-05-23  7:37           ` Gleb Natapov
2013-05-23  7:50             ` Xiao Guangrong
2013-05-23  8:09               ` Gleb Natapov
2013-05-23  8:33                 ` Xiao Guangrong
2013-05-23 11:13                 ` Xiao Guangrong
2013-05-23 12:39                   ` Gleb Natapov
2013-05-23 13:03                     ` Xiao Guangrong
2013-05-23 15:57                       ` Gleb Natapov
2013-05-24  5:39                         ` Xiao Guangrong
2013-05-24  5:53                           ` Xiao Guangrong
2013-05-28  0:13   ` Marcelo Tosatti
2013-05-28 14:51     ` Xiao Guangrong
2013-05-29 12:25       ` Marcelo Tosatti
2013-05-29 13:43         ` Xiao Guangrong
2013-05-22 19:55 ` [PATCH v7 10/11] KVM: MMU: collapse TLB flushes when zap all pages Xiao Guangrong
2013-05-23  6:12   ` Gleb Natapov
2013-05-23  6:26     ` Xiao Guangrong
2013-05-23  7:24       ` Gleb Natapov
2013-05-23  7:37         ` Xiao Guangrong
2013-05-23  7:38           ` Xiao Guangrong
2013-05-23  7:56             ` Gleb Natapov
2013-05-28  0:36   ` Marcelo Tosatti
2013-05-28 15:19     ` Xiao Guangrong
2013-05-29  3:03       ` Xiao Guangrong
2013-05-29 12:39         ` Marcelo Tosatti
2013-05-29 13:19           ` Xiao Guangrong
2013-05-30  0:53             ` Gleb Natapov
2013-05-30 16:24               ` Takuya Yoshikawa
2013-05-30 17:10                 ` Takuya Yoshikawa
2013-05-22 19:56 ` [PATCH v7 11/11] KVM: MMU: reduce KVM_REQ_MMU_RELOAD when root page is zapped Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51A609E8.20400@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=avi.kivity@gmail.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).