All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David S. Ahern" <daahern@cisco.com>
To: Andrea Arcangeli <andrea@qumranet.com>
Cc: Avi Kivity <avi@qumranet.com>, kvm@vger.kernel.org
Subject: Re: [kvm-devel] performance with guests running 2.4 kernels	(specifically RHEL3)
Date: Wed, 28 May 2008 11:24:04 -0600	[thread overview]
Message-ID: <483D9534.8010002@cisco.com> (raw)
In-Reply-To: <20080528170410.GC8086@duo.random>

Yes, I've tried changing kscand_work_percent (values of 50 and 30).
Basically it makes kscand wake more often (ie.,MIN_AGING_INTERVAL
declines in proportion) put do less work each trip through the lists.

I have not seen a noticeable change in guest behavior.

david


Andrea Arcangeli wrote:
> On Wed, May 28, 2008 at 09:43:09AM -0600, David S. Ahern wrote:
>> This is the code in the RHEL3.8 kernel:
>>
>> static int scan_active_list(struct zone_struct * zone, int age,
>> 		struct list_head * list, int count)
>> {
>> 	struct list_head *page_lru , *next;
>> 	struct page * page;
>> 	int over_rsslimit;
>>
>> 	count = count * kscand_work_percent / 100;
>> 	/* Take the lock while messing with the list... */
>> 	lru_lock(zone);
>> 	while (count-- > 0 && !list_empty(list)) {
>> 		page = list_entry(list->prev, struct page, lru);
>> 		pte_chain_lock(page);
>> 		if (page_referenced(page, &over_rsslimit)
>> 				&& !over_rsslimit
>> 				&& check_mapping_inuse(page))
>> 			age_page_up_nolock(page, age);
>> 		else {
>> 			list_del(&page->lru);
>> 			list_add(&page->lru, list);
>> 		}
>> 		pte_chain_unlock(page);
>> 	}
>> 	lru_unlock(zone);
>> 	return 0;
>> }
>>
>> My previous email shows examples of the number of pages in the list and
>> the scanning that happens.
> 
> This code looks better than the one below, as a limit was introduced
> and the whole list isn't scanned anymore, if you decrease
> kscand_work_percent (I assume it's a sysctl even if it's missing the
> sysctl_ prefix) to say 1, you can limit damages. Did you try it?
> 
>> Avi Kivity wrote:
>>> Andrea Arcangeli wrote:
>>>> So I never found a relation to the symptom reported of VM kernel
>>>> threads going weird, with KVM optimal handling of kmap ptes.
>>>>   
>>>
>>> The problem is this code:
>>>
>>> static int scan_active_list(struct zone_struct * zone, int age,
>>>                struct list_head * list)
>>> {
>>>        struct list_head *page_lru , *next;
>>>        struct page * page;
>>>        int over_rsslimit;
>>>
>>>        /* Take the lock while messing with the list... */
>>>        lru_lock(zone);
>>>        list_for_each_safe(page_lru, next, list) {
>>>                page = list_entry(page_lru, struct page, lru);
>>>                pte_chain_lock(page);
>>>                if (page_referenced(page, &over_rsslimit) && !over_rsslimit)
>>>                        age_page_up_nolock(page, age);
>>>                pte_chain_unlock(page);
>>>        }
>>>        lru_unlock(zone);
>>>        return 0;
>>> }
>>> If the pages in the list are in the same order as in the ptes (which is
>>> very likely), then we have the following access pattern
> 
> Yes it is likely.
> 
>>> - set up kmap to point at pte
>>> - test_and_clear_bit(pte)
>>> - kunmap
>>>
>>> From kvm's point of view this looks like
>>>
>>> - several accesses to set up the kmap
> 
> Hmm, the kmap establishment takes a single guest operation in the
> fixmap area. That's a single write to the pte, to write a pte_t 8/4
> byte large region (PAE/non-PAE). The same pte_t is then cleared and
> flushed out of the tlb with a cpu-local invlpg during kunmap_atomic.
> 
> I count 1 write here so far.
> 
>>>  - if these accesses trigger flooding, we will have to tear down the
>>> shadow for this page, only to set it up again soon
> 
> So the shadow mapping the fixmap area would be tear down by the
> flooding.
> 
> Or is the shadow corresponding to the real user pte pointed by the
> fixmap, that is unshadowed by the flooding, or both/all?
> 
>>> - an access to the pte (emulted)
> 
> Here I count the second write and this isn't done on the fixmap area
> like the first write above, but this is a write to the real user pte,
> pointed by the fixmap. So if this is emulated it means the shadow of
> the user pte pointing to the real data page is still active.
> 
>>>  - if this access _doesn't_ trigger flooding, we will have 512 unneeded
>>> emulations.  The pte is worthless anyway since the accessed bit is clear
>>> (so we can't set up a shadow pte for it)
>>>    - this bug was fixed
> 
> You mean the accessed bit on fixmap pte used by kmap? Or the user pte
> pointed by the fixmap pte?
> 
>>> - an access to tear down the kmap
> 
> Yep, pte_clear on the fixmap pte_t followed by an invlpg (if that
> matters).
> 
>>> [btw, am I reading this right? the entire list is scanned each time?
> 
> If the list parameter isn't a local LIST_HEAD on the stack but the
> global one it's a full scan each time. I guess it's the global list
> looking at the new code at the top that has a kswapd_scan_limit
> sysctl.
> 
>>> if you have 1G of active HIGHMEM, that's a quarter of a million pages,
>>> which would take at least a second no matter what we do.  VMware can
>>> probably special-case kmaps, but we can't]
> 
> Perhaps they've a list per-age bucket or similar but still I doubt
> this works well on host either... I guess the virtualization overhead
> is exacerbating the inefficiency. Perhaps killall -STOP kscand is good
> enough fix ;). This seem to only push the age up, to be functional the
> age has to go down and I guess the go-down is done by other threads so
> stopping kscand may not hurt.
> 
> I think what we should aim for is to quickly reach this condition:
> 
> 1) always keep the fixmap/kmap pte_t shadowed and emulate the
>    kmap/kunmap access so the test_and_clear_young done on the user pte
>    doesn't require to re-establish the spte representing the fixmap
>    virtual address. If we don't emulate fixmap we'll have to
>    re-establish the spte during the write to the user pte, and
>    tear it down again during kunmap_atomic. So there's not much doubt
>    fixmap access emulation is worth it.
> 
> 2) get rid of the user pte shadow mapping pointing to the user data so
>    the test_and_clear of the young bitflag on the user pte will not be
>    emulated and it'll run at full CPU speed through the shadow pte
>    mapping corresponding to the fixmap virtual address
> 
> kscand pattern is the same as running mprotect on a 32bit 2.6
> kernel so it sounds worth optimizing for it, even if kscand may be
> unfixable without killall -STOP kscand or VM fixes to guest.
> 
> However I'm not sure about point 2 at the light of mprotect. With
> mprotect the guest virutal addresses mapped by the guest user ptes
> will be used. It's not like kscand that may write forever to the user
> ptes without ever using the guest virtual addresses that they're
> mapping. So we better be sure that by unshadowing and optimizing
> kscand we're not hurting mprotect or other pte mangling operations in
> 2.6 that will likely keep accessing the guest virtual addresses mapped
> by the user ptes previously modified.
> 
> Hope this makes any sense, I'm not sure to understand this completely.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  reply	other threads:[~2008-05-28 17:24 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-16  0:15 performance with guests running 2.4 kernels (specifically RHEL3) David S. Ahern
2008-04-16  8:46 ` Avi Kivity
2008-04-17 21:12   ` David S. Ahern
2008-04-18  7:57     ` Avi Kivity
2008-04-21  4:31       ` David S. Ahern
2008-04-21  9:19         ` Avi Kivity
2008-04-21 17:07           ` David S. Ahern
2008-04-22 20:23           ` David S. Ahern
2008-04-23  8:04             ` Avi Kivity
2008-04-23 15:23               ` David S. Ahern
2008-04-23 15:53                 ` Avi Kivity
2008-04-23 16:39                   ` David S. Ahern
2008-04-24 17:25                     ` David S. Ahern
2008-04-26  6:43                       ` Avi Kivity
2008-04-26  6:20                     ` Avi Kivity
2008-04-25 17:33                 ` David S. Ahern
2008-04-26  6:45                   ` Avi Kivity
2008-04-28 18:15                   ` Marcelo Tosatti
2008-04-28 23:45                     ` David S. Ahern
2008-04-30  4:18                       ` David S. Ahern
2008-04-30  9:55                         ` Avi Kivity
2008-04-30 13:39                           ` David S. Ahern
2008-04-30 13:49                             ` Avi Kivity
2008-05-11 12:32                               ` Avi Kivity
2008-05-11 13:36                                 ` Avi Kivity
2008-05-13  3:49                                   ` David S. Ahern
2008-05-13  7:25                                     ` Avi Kivity
2008-05-14 20:35                                       ` David S. Ahern
2008-05-15 10:53                                         ` Avi Kivity
2008-05-17  4:31                                           ` David S. Ahern
     [not found]                                             ` <482FCEE1.5040306@qumranet.com>
     [not found]                                               ` <4830F90A.1020809@cisco.com>
2008-05-19  4:14                                                 ` [kvm-devel] " David S. Ahern
2008-05-19 14:27                                                   ` Avi Kivity
2008-05-19 16:25                                                     ` David S. Ahern
2008-05-19 17:04                                                       ` Avi Kivity
2008-05-20 14:19                                                     ` Avi Kivity
2008-05-20 14:34                                                       ` Avi Kivity
2008-05-22 22:08                                                       ` David S. Ahern
2008-05-28 10:51                                                         ` Avi Kivity
2008-05-28 14:13                                                           ` David S. Ahern
2008-05-28 14:35                                                             ` Avi Kivity
2008-05-28 19:49                                                               ` David S. Ahern
2008-05-29  6:37                                                                 ` Avi Kivity
2008-05-28 14:48                                                             ` Andrea Arcangeli
2008-05-28 14:57                                                               ` Avi Kivity
2008-05-28 15:39                                                                 ` David S. Ahern
2008-05-29 11:49                                                                   ` Avi Kivity
2008-05-29 12:10                                                                   ` Avi Kivity
2008-05-29 13:49                                                                     ` David S. Ahern
2008-05-29 14:08                                                                       ` Avi Kivity
2008-05-28 15:58                                                                 ` Andrea Arcangeli
2008-05-28 15:37                                                               ` Avi Kivity
2008-05-28 15:43                                                                 ` David S. Ahern
2008-05-28 17:04                                                                   ` Andrea Arcangeli
2008-05-28 17:24                                                                     ` David S. Ahern [this message]
2008-05-29 10:01                                                                     ` Avi Kivity
2008-05-29 14:27                                                                       ` Andrea Arcangeli
2008-05-29 15:11                                                                         ` David S. Ahern
2008-05-29 15:16                                                                         ` Avi Kivity
2008-05-30 13:12                                                                           ` Andrea Arcangeli
2008-05-31  7:39                                                                             ` Avi Kivity
2008-05-29 16:42                                                           ` David S. Ahern
2008-05-31  8:16                                                             ` Avi Kivity
2008-06-02 16:42                                                               ` David S. Ahern
2008-06-05  8:37                                                                 ` Avi Kivity
2008-06-05 16:20                                                                   ` David S. Ahern
2008-06-06 16:40                                                                     ` Avi Kivity
2008-06-19  4:20                                                                       ` David S. Ahern
2008-06-22  6:34                                                                         ` Avi Kivity
2008-06-23 14:09                                                                           ` David S. Ahern
2008-06-25  9:51                                                                             ` Avi Kivity
2008-04-30 13:56                             ` Daniel P. Berrange
2008-04-30 14:23                               ` David S. Ahern
2008-04-23  8:03     ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=483D9534.8010002@cisco.com \
    --to=daahern@cisco.com \
    --cc=andrea@qumranet.com \
    --cc=avi@qumranet.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.