Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: "David S. Ahern" <daahern@cisco.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Avi Kivity <avi@qumranet.com>, kvm@vger.kernel.org
Subject: Re: [kvm-devel] performance with guests running 2.4 kernels	(specifically RHEL3)
Date: Wed, 28 May 2008 11:24:04 -0600	[thread overview]
Message-ID: <483D9534.8010002@cisco.com> (raw)
In-Reply-To: <20080528170410.GC8086@duo.random>

Yes, I've tried changing kscand_work_percent (values of 50 and 30).
Basically it makes kscand wake more often (ie.,MIN_AGING_INTERVAL
declines in proportion) put do less work each trip through the lists.

I have not seen a noticeable change in guest behavior.

david


Andrea Arcangeli wrote:
> On Wed, May 28, 2008 at 09:43:09AM -0600, David S. Ahern wrote:
>> This is the code in the RHEL3.8 kernel:
>>
>> static int scan_active_list(struct zone_struct * zone, int age,
>> 		struct list_head * list, int count)
>> {
>> 	struct list_head *page_lru , *next;
>> 	struct page * page;
>> 	int over_rsslimit;
>>
>> 	count = count * kscand_work_percent / 100;
>> 	/* Take the lock while messing with the list... */
>> 	lru_lock(zone);
>> 	while (count-- > 0 && !list_empty(list)) {
>> 		page = list_entry(list->prev, struct page, lru);
>> 		pte_chain_lock(page);
>> 		if (page_referenced(page, &over_rsslimit)
>> 				&& !over_rsslimit
>> 				&& check_mapping_inuse(page))
>> 			age_page_up_nolock(page, age);
>> 		else {
>> 			list_del(&page->lru);
>> 			list_add(&page->lru, list);
>> 		}
>> 		pte_chain_unlock(page);
>> 	}
>> 	lru_unlock(zone);
>> 	return 0;
>> }
>>
>> My previous email shows examples of the number of pages in the list and
>> the scanning that happens.
> 
> This code looks better than the one below, as a limit was introduced
> and the whole list isn't scanned anymore, if you decrease
> kscand_work_percent (I assume it's a sysctl even if it's missing the
> sysctl_ prefix) to say 1, you can limit damages. Did you try it?
> 
>> Avi Kivity wrote:
>>> Andrea Arcangeli wrote:
>>>> So I never found a relation to the symptom reported of VM kernel
>>>> threads going weird, with KVM optimal handling of kmap ptes.
>>>>   
>>>
>>> The problem is this code:
>>>
>>> static int scan_active_list(struct zone_struct * zone, int age,
>>>                struct list_head * list)
>>> {
>>>        struct list_head *page_lru , *next;
>>>        struct page * page;
>>>        int over_rsslimit;
>>>
>>>        /* Take the lock while messing with the list... */
>>>        lru_lock(zone);
>>>        list_for_each_safe(page_lru, next, list) {
>>>                page = list_entry(page_lru, struct page, lru);
>>>                pte_chain_lock(page);
>>>                if (page_referenced(page, &over_rsslimit) && !over_rsslimit)
>>>                        age_page_up_nolock(page, age);
>>>                pte_chain_unlock(page);
>>>        }
>>>        lru_unlock(zone);
>>>        return 0;
>>> }
>>> If the pages in the list are in the same order as in the ptes (which is
>>> very likely), then we have the following access pattern
> 
> Yes it is likely.
> 
>>> - set up kmap to point at pte
>>> - test_and_clear_bit(pte)
>>> - kunmap
>>>
>>> From kvm's point of view this looks like
>>>
>>> - several accesses to set up the kmap
> 
> Hmm, the kmap establishment takes a single guest operation in the
> fixmap area. That's a single write to the pte, to write a pte_t 8/4
> byte large region (PAE/non-PAE). The same pte_t is then cleared and
> flushed out of the tlb with a cpu-local invlpg during kunmap_atomic.
> 
> I count 1 write here so far.
> 
>>>  - if these accesses trigger flooding, we will have to tear down the
>>> shadow for this page, only to set it up again soon
> 
> So the shadow mapping the fixmap area would be tear down by the
> flooding.
> 
> Or is the shadow corresponding to the real user pte pointed by the
> fixmap, that is unshadowed by the flooding, or both/all?
> 
>>> - an access to the pte (emulted)
> 
> Here I count the second write and this isn't done on the fixmap area
> like the first write above, but this is a write to the real user pte,
> pointed by the fixmap. So if this is emulated it means the shadow of
> the user pte pointing to the real data page is still active.
> 
>>>  - if this access _doesn't_ trigger flooding, we will have 512 unneeded
>>> emulations.  The pte is worthless anyway since the accessed bit is clear
>>> (so we can't set up a shadow pte for it)
>>>    - this bug was fixed
> 
> You mean the accessed bit on fixmap pte used by kmap? Or the user pte
> pointed by the fixmap pte?
> 
>>> - an access to tear down the kmap
> 
> Yep, pte_clear on the fixmap pte_t followed by an invlpg (if that
> matters).
> 
>>> [btw, am I reading this right? the entire list is scanned each time?
> 
> If the list parameter isn't a local LIST_HEAD on the stack but the
> global one it's a full scan each time. I guess it's the global list
> looking at the new code at the top that has a kswapd_scan_limit
> sysctl.
> 
>>> if you have 1G of active HIGHMEM, that's a quarter of a million pages,
>>> which would take at least a second no matter what we do.  VMware can
>>> probably special-case kmaps, but we can't]
> 
> Perhaps they've a list per-age bucket or similar but still I doubt
> this works well on host either... I guess the virtualization overhead
> is exacerbating the inefficiency. Perhaps killall -STOP kscand is good
> enough fix ;). This seem to only push the age up, to be functional the
> age has to go down and I guess the go-down is done by other threads so
> stopping kscand may not hurt.
> 
> I think what we should aim for is to quickly reach this condition:
> 
> 1) always keep the fixmap/kmap pte_t shadowed and emulate the
>    kmap/kunmap access so the test_and_clear_young done on the user pte
>    doesn't require to re-establish the spte representing the fixmap
>    virtual address. If we don't emulate fixmap we'll have to
>    re-establish the spte during the write to the user pte, and
>    tear it down again during kunmap_atomic. So there's not much doubt
>    fixmap access emulation is worth it.
> 
> 2) get rid of the user pte shadow mapping pointing to the user data so
>    the test_and_clear of the young bitflag on the user pte will not be
>    emulated and it'll run at full CPU speed through the shadow pte
>    mapping corresponding to the fixmap virtual address
> 
> kscand pattern is the same as running mprotect on a 32bit 2.6
> kernel so it sounds worth optimizing for it, even if kscand may be
> unfixable without killall -STOP kscand or VM fixes to guest.
> 
> However I'm not sure about point 2 at the light of mprotect. With
> mprotect the guest virutal addresses mapped by the guest user ptes
> will be used. It's not like kscand that may write forever to the user
> ptes without ever using the guest virtual addresses that they're
> mapping. So we better be sure that by unshadowing and optimizing
> kscand we're not hurting mprotect or other pte mangling operations in
> 2.6 that will likely keep accessing the guest virtual addresses mapped
> by the user ptes previously modified.
> 
> Hope this makes any sense, I'm not sure to understand this completely.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

next prev parent reply	other threads:[~2008-05-28 17:24 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-16  0:15 performance with guests running 2.4 kernels (specifically RHEL3) David S. Ahern
2008-04-16  8:46 ` Avi Kivity
2008-04-17 21:12   ` David S. Ahern
2008-04-18  7:57     ` Avi Kivity
2008-04-21  4:31       ` David S. Ahern
2008-04-21  9:19         ` Avi Kivity
2008-04-21 17:07           ` David S. Ahern
2008-04-22 20:23           ` David S. Ahern
2008-04-23  8:04             ` Avi Kivity
2008-04-23 15:23               ` David S. Ahern
2008-04-23 15:53                 ` Avi Kivity
2008-04-23 16:39                   ` David S. Ahern
2008-04-24 17:25                     ` David S. Ahern
2008-04-26  6:43                       ` Avi Kivity
2008-04-26  6:20                     ` Avi Kivity
2008-04-25 17:33                 ` David S. Ahern
2008-04-26  6:45                   ` Avi Kivity
2008-04-28 18:15                   ` Marcelo Tosatti
2008-04-28 23:45                     ` David S. Ahern
2008-04-30  4:18                       ` David S. Ahern
2008-04-30  9:55                         ` Avi Kivity
2008-04-30 13:39                           ` David S. Ahern
2008-04-30 13:49                             ` Avi Kivity
2008-05-11 12:32                               ` Avi Kivity
2008-05-11 13:36                                 ` Avi Kivity
2008-05-13  3:49                                   ` David S. Ahern
2008-05-13  7:25                                     ` Avi Kivity
2008-05-14 20:35                                       ` David S. Ahern
2008-05-15 10:53                                         ` Avi Kivity
2008-05-17  4:31                                           ` David S. Ahern
     [not found]                                             ` <482FCEE1.5040306@qumranet.com>
     [not found]                                               ` <4830F90A.1020809@cisco.com>
2008-05-19  4:14                                                 ` [kvm-devel] " David S. Ahern
2008-05-19 14:27                                                   ` Avi Kivity
2008-05-19 16:25                                                     ` David S. Ahern
2008-05-19 17:04                                                       ` Avi Kivity
2008-05-20 14:19                                                     ` Avi Kivity
2008-05-20 14:34                                                       ` Avi Kivity
2008-05-22 22:08                                                       ` David S. Ahern
2008-05-28 10:51                                                         ` Avi Kivity
2008-05-28 14:13                                                           ` David S. Ahern
2008-05-28 14:35                                                             ` Avi Kivity
2008-05-28 19:49                                                               ` David S. Ahern
2008-05-29  6:37                                                                 ` Avi Kivity
2008-05-28 14:48                                                             ` Andrea Arcangeli
2008-05-28 14:57                                                               ` Avi Kivity
2008-05-28 15:39                                                                 ` David S. Ahern
2008-05-29 11:49                                                                   ` Avi Kivity
2008-05-29 12:10                                                                   ` Avi Kivity
2008-05-29 13:49                                                                     ` David S. Ahern
2008-05-29 14:08                                                                       ` Avi Kivity
2008-05-28 15:58                                                                 ` Andrea Arcangeli
2008-05-28 15:37                                                               ` Avi Kivity
2008-05-28 15:43                                                                 ` David S. Ahern
2008-05-28 17:04                                                                   ` Andrea Arcangeli
2008-05-28 17:24                                                                     ` David S. Ahern [this message]
2008-05-29 10:01                                                                     ` Avi Kivity
2008-05-29 14:27                                                                       ` Andrea Arcangeli
2008-05-29 15:11                                                                         ` David S. Ahern
2008-05-29 15:16                                                                         ` Avi Kivity
2008-05-30 13:12                                                                           ` Andrea Arcangeli
2008-05-31  7:39                                                                             ` Avi Kivity
2008-05-29 16:42                                                           ` David S. Ahern
2008-05-31  8:16                                                             ` Avi Kivity
2008-06-02 16:42                                                               ` David S. Ahern
2008-06-05  8:37                                                                 ` Avi Kivity
2008-06-05 16:20                                                                   ` David S. Ahern
2008-06-06 16:40                                                                     ` Avi Kivity
2008-06-19  4:20                                                                       ` David S. Ahern
2008-06-22  6:34                                                                         ` Avi Kivity
2008-06-23 14:09                                                                           ` David S. Ahern
2008-06-25  9:51                                                                             ` Avi Kivity
2008-04-30 13:56                             ` Daniel P. Berrange
2008-04-30 14:23                               ` David S. Ahern
2008-04-23  8:03     ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=483D9534.8010002@cisco.com \
    --to=daahern@cisco.com \
    --cc=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox