From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: avi.kivity@gmail.com, mtosatti@redhat.com, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: Re: [PATCH 07/12] KVM: MMU: redesign the algorithm of pte_list
Date: Wed, 28 Aug 2013 16:37:32 +0800 [thread overview]
Message-ID: <521DB6CC.4030107@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130828081205.GN22899@redhat.com>
On 08/28/2013 04:12 PM, Gleb Natapov wrote:
>> +
>> + rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte);
>> + desc = (struct pte_list_desc *)(*pte_list & ~1ul);
>> +
>> + /* No empty position in the desc. */
>> + if (desc->sptes[PTE_LIST_EXT - 1]) {
>> + struct pte_list_desc *new_desc;
>> + new_desc = mmu_alloc_pte_list_desc(vcpu);
>> + new_desc->more = desc;
>> + desc = new_desc;
>> + *pte_list = (unsigned long)desc | 1;
>> }
>> - return count;
>> +
>> + free_pos = find_first_free(desc);
>> + desc->sptes[free_pos] = spte;
>> + return count_spte_number(desc);
> Should it be count_spte_number(desc) - 1? The function should returns
> the number of pte entries before the spte was added.
Yes. We have handled it count_spte_number(), we count the number like this:
return first_free + desc_num * PTE_LIST_EXT;
The first_free is indexed from 0.
Maybe it is clearer to let count_spte_number() return the real number.
>
>> }
>>
>> static void
>> -pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
>> - int i, struct pte_list_desc *prev_desc)
>> +pte_list_desc_remove_entry(unsigned long *pte_list,
>> + struct pte_list_desc *desc, int i)
>> {
>> - int j;
>> + struct pte_list_desc *first_desc;
>> + int last_used;
>> +
>> + first_desc = (struct pte_list_desc *)(*pte_list & ~1ul);
>> + last_used = find_last_used(first_desc);
>>
>> - for (j = PTE_LIST_EXT - 1; !desc->sptes[j] && j > i; --j)
>> - ;
>> - desc->sptes[i] = desc->sptes[j];
>> - desc->sptes[j] = NULL;
>> - if (j != 0)
>> + /*
>> + * Move the entry from the first desc to this position we want
>> + * to remove.
>> + */
>> + desc->sptes[i] = first_desc->sptes[last_used];
>> + first_desc->sptes[last_used] = NULL;
>> +
> What if desc == first_desc and i < last_used. You still move spte
> backwards so lockless walk may have already examined entry at i and
> will miss spte that was moved there from last_used position, no?
Right. I noticed it too and fixed in the v2 which is being tested.
I fixed it by bottom-up walk desc, like this:
pte_list_walk_lockless():
desc = (struct pte_list_desc *)(pte_list_value & ~1ul);
while (!desc_is_a_nulls(desc)) {
/*
* We should do bottom-up walk since we always use the
* bottom entry to replace the deleted entry if only
* one desc is used in the rmap when a spte is removed.
* Otherwise the moved entry will be missed.
*/
for (i = PTE_LIST_EXT - 1; i >= 0; i--)
fn(desc->sptes[i]);
desc = ACCESS_ONCE(desc->more);
/* It is being initialized. */
if (unlikely(!desc))
goto restart;
}
How about this?
>
>> + /* No valid entry in this desc, we can free this desc now. */
>> + if (!first_desc->sptes[0]) {
>> + struct pte_list_desc *next_desc = first_desc->more;
>> +
>> + /*
>> + * Only one entry existing but still use a desc to store it?
>> + */
>> + WARN_ON(!next_desc);
>> +
>> + mmu_free_pte_list_desc(first_desc);
>> + first_desc = next_desc;
>> + *pte_list = (unsigned long)first_desc | 1ul;
>> return;
>> - if (!prev_desc && !desc->more)
>> - *pte_list = (unsigned long)desc->sptes[0];
>> - else
>> - if (prev_desc)
>> - prev_desc->more = desc->more;
>> - else
>> - *pte_list = (unsigned long)desc->more | 1;
>> - mmu_free_pte_list_desc(desc);
>> + }
>> +
>> + WARN_ON(!first_desc->sptes[0]);
>> +
>> + /*
>> + * Only one entry in this desc, move the entry to the head
>> + * then the desc can be freed.
>> + */
>> + if (!first_desc->sptes[1] && !first_desc->more) {
>> + *pte_list = (unsigned long)first_desc->sptes[0];
>> + mmu_free_pte_list_desc(first_desc);
>> + }
>> }
>>
>> static void pte_list_remove(u64 *spte, unsigned long *pte_list)
>> {
>> struct pte_list_desc *desc;
>> - struct pte_list_desc *prev_desc;
>> int i;
>>
>> if (!*pte_list) {
>> - printk(KERN_ERR "pte_list_remove: %p 0->BUG\n", spte);
>> - BUG();
>> - } else if (!(*pte_list & 1)) {
>> + WARN(1, KERN_ERR "pte_list_remove: %p 0->BUG\n", spte);
> Why change BUG() to WARN() here and below?
WARN(1, "xxx") can replace two lines in the origin code. And personally,
i prefer WARN() to BUG() since sometimes BUG() can stop my box and i need to
get the full log by using kdump.
If you object it, i will change it back in the next version. :)
>
>> + return;
>> + }
>> +
>> + if (!(*pte_list & 1)) {
>> rmap_printk("pte_list_remove: %p 1->0\n", spte);
>> if ((u64 *)*pte_list != spte) {
>> - printk(KERN_ERR "pte_list_remove: %p 1->BUG\n", spte);
>> - BUG();
>> + WARN(1, KERN_ERR "pte_list_remove: %p 1->BUG\n", spte);
>> }
> Remove {} since only one statement left in the if(). Or better yet why
> not:
> WARN ((u64 *)*pte_list != spte, ....)?
Yes, it is better.
> But again why not BUG()?
The explanation is above. :)
next prev parent reply other threads:[~2013-08-28 8:37 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-30 13:01 [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-07-30 13:01 ` [PATCH 01/12] KVM: MMU: remove unused parameter Xiao Guangrong
2013-08-29 7:22 ` Gleb Natapov
2013-07-30 13:02 ` [PATCH 02/12] KVM: MMU: properly check last spte in fast_page_fault() Xiao Guangrong
2013-07-30 13:02 ` [PATCH 03/12] KVM: MMU: lazily drop large spte Xiao Guangrong
2013-08-02 14:55 ` Marcelo Tosatti
2013-08-02 15:42 ` Xiao Guangrong
2013-08-02 20:27 ` Marcelo Tosatti
2013-08-02 22:56 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable Xiao Guangrong
2013-07-30 13:26 ` Paolo Bonzini
2013-07-31 7:25 ` Xiao Guangrong
2013-08-07 1:48 ` Marcelo Tosatti
2013-08-07 4:06 ` Xiao Guangrong
2013-08-08 15:06 ` Marcelo Tosatti
2013-08-08 16:26 ` Xiao Guangrong
2013-11-20 0:29 ` Marcelo Tosatti
2013-11-20 0:35 ` Marcelo Tosatti
2013-11-20 14:20 ` Xiao Guangrong
2013-11-20 19:47 ` Marcelo Tosatti
2013-11-21 4:26 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 05/12] KVM: MMU: add spte into rmap before logging dirty page Xiao Guangrong
2013-07-30 13:27 ` Paolo Bonzini
2013-07-31 7:33 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 06/12] KVM: MMU: flush tlb if the spte can be locklessly modified Xiao Guangrong
2013-08-28 7:23 ` Gleb Natapov
2013-08-28 7:50 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 07/12] KVM: MMU: redesign the algorithm of pte_list Xiao Guangrong
2013-08-28 8:12 ` Gleb Natapov
2013-08-28 8:37 ` Xiao Guangrong [this message]
2013-08-28 8:58 ` Gleb Natapov
2013-08-28 9:19 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 08/12] KVM: MMU: introduce nulls desc Xiao Guangrong
2013-08-28 8:40 ` Gleb Natapov
2013-08-28 8:54 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker Xiao Guangrong
2013-08-28 9:20 ` Gleb Natapov
2013-08-28 9:33 ` Xiao Guangrong
2013-08-28 9:46 ` Gleb Natapov
2013-08-28 10:13 ` Xiao Guangrong
2013-08-28 10:49 ` Gleb Natapov
2013-08-28 12:15 ` Xiao Guangrong
2013-08-28 13:36 ` Gleb Natapov
2013-08-29 6:50 ` Xiao Guangrong
2013-08-29 9:08 ` Gleb Natapov
2013-08-29 9:31 ` Xiao Guangrong
2013-08-29 9:51 ` Gleb Natapov
2013-08-29 11:26 ` Xiao Guangrong
2013-08-30 11:38 ` Gleb Natapov
2013-09-02 7:02 ` Xiao Guangrong
2013-08-29 9:31 ` Gleb Natapov
2013-08-29 11:33 ` Xiao Guangrong
2013-08-29 12:02 ` Xiao Guangrong
2013-08-30 11:44 ` Gleb Natapov
2013-09-02 8:50 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 10/12] KVM: MMU: allow locklessly access shadow page table out of vcpu thread Xiao Guangrong
2013-08-07 13:09 ` Takuya Yoshikawa
2013-08-07 13:19 ` Xiao Guangrong
2013-08-29 9:10 ` Gleb Natapov
2013-08-29 9:25 ` Xiao Guangrong
2013-07-30 13:02 ` [PATCH 11/12] KVM: MMU: locklessly write-protect the page Xiao Guangrong
2013-07-30 13:02 ` [PATCH 12/12] KVM: MMU: clean up spte_write_protect Xiao Guangrong
2013-07-30 13:11 ` [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect Xiao Guangrong
2013-08-03 5:09 ` Takuya Yoshikawa
2013-08-04 14:15 ` Xiao Guangrong
2013-08-29 7:16 ` Gleb Natapov
2013-08-06 13:16 ` Xiao Guangrong
2013-08-08 17:38 ` Paolo Bonzini
2013-08-09 4:51 ` Xiao Guangrong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=521DB6CC.4030107@linux.vnet.ibm.com \
--to=xiaoguangrong@linux.vnet.ibm.com \
--cc=avi.kivity@gmail.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).