From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
akpm@osdl.org, nickpiggin@yahoo.com.au, frankeh@watson.ibm.com
Subject: Re: [patch 3/9] Guest page hinting: volatile page cache.
Date: Fri, 01 Sep 2006 18:25:43 +0200 [thread overview]
Message-ID: <1157127943.21733.52.camel@localhost> (raw)
In-Reply-To: <1157127483.28577.117.camel@localhost.localdomain>
On Fri, 2006-09-01 at 09:18 -0700, Dave Hansen wrote:
> > 1) The page-is-discarded (PG_discarded) bit is set for pages that have
> > been recognized as removed by the host. The page needs to be removed
> > from the page cache while there are still page references floating
> > around. To prevent multiple removals from the page cache the discarded
> > bit is needed.
>
> OK, so the page has data in it, and is in the page cache. The
> hypervisor kills the page, gives the notification to the kernel that the
> page has gone away, and the kernel marks PG_discarded. There still
> might be active references to the page.
No, the hypervisor does not give the notification immediatly. A discard
fault is delivered to the guest if it tries to access a page that has
been removed by the host. That is the fundamental difference between a
memory ballooner and the guest page hinting.
> So, is the problem trying to communicate with the reference holders that
> the page is no longer valid? How is this fundamentally different from
> page truncating?
Truncating is similar but the reaction is different. A truncated page is
gone and will not be recreated. A discarded page can be reloaded.
> > 2) The page-state-change (PG_state_change) bit is required to prevent
> > that an make_stable "overtakes" a make_volatile. In order to make a page
> > volatile a number of conditions are check. After this is done the state
> > change will be done. The critical section is the code that performs the
> > checks up to the instruction that does the state change. No make_stable
> > may be done in between. The granularity is per page, to use a global
> > lock like a spinlock would severly limit the scalability for large smp
> > systems.
>
> How about doing it in the NUMA node? Or the mem_section? Or, even a
> bit in the mem_map[] for the area guarding the 'struct page' itself?
> Even a hashed table of locks based on the page address. You just need
> something that allows _some_ level of concurrency. You certainly never
> have a number of CPUs which is anywhere close to the number of 'struct
> page's in the system.
NUMA node is not granular enough, mem_section is probably doable. I do
not understand the part about the bit in the mem_map[] area, a bit in
the page->flags is exactly that, isn't it?
--
blue skies,
Martin.
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH
"Reality continues to ruin my life." - Calvin.
next prev parent reply other threads:[~2006-09-01 16:25 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-01 11:09 [patch 3/9] Guest page hinting: volatile page cache Martin Schwidefsky
2006-09-01 14:54 ` Dave Hansen
2006-09-01 14:54 ` Dave Hansen
2006-09-01 15:29 ` Martin Schwidefsky
2006-09-01 15:37 ` Dave Hansen
2006-09-01 14:57 ` Dave Hansen
2006-09-01 15:31 ` Martin Schwidefsky
2006-09-01 15:48 ` Andy Whitcroft
2006-09-01 15:48 ` Andy Whitcroft
2006-09-01 16:04 ` Martin Schwidefsky
2006-09-01 16:04 ` Martin Schwidefsky
2006-09-01 16:18 ` Dave Hansen
2006-09-01 16:18 ` Dave Hansen
2006-09-01 16:25 ` Martin Schwidefsky [this message]
2006-09-01 16:37 ` Dave Hansen
2006-09-01 16:37 ` Dave Hansen
2006-09-01 16:56 ` Martin Schwidefsky
2006-09-01 17:16 ` Dave Hansen
2006-09-01 17:16 ` Dave Hansen
2006-09-01 17:42 ` Martin Schwidefsky
2006-09-01 18:03 ` Dave Hansen
2006-09-01 18:04 ` Martin Schwidefsky
2006-09-01 18:23 ` Dave Hansen
2006-09-01 18:23 ` Dave Hansen
2006-09-01 18:31 ` Martin Schwidefsky
2006-09-01 18:41 ` Dave Hansen
2006-09-04 11:21 ` Martin Schwidefsky
2006-09-05 18:27 ` Dave Hansen
2006-09-06 10:49 ` Martin Schwidefsky
2006-09-01 16:29 ` Dave Hansen
2006-09-01 17:02 ` Martin Schwidefsky
2006-09-01 17:05 ` Dave Hansen
2006-09-13 18:21 ` Zachary Amsden
2006-09-14 8:56 ` Martin Schwidefsky
2006-09-14 9:23 ` Zachary Amsden
2006-09-15 8:36 ` Martin Schwidefsky
-- strict thread matches above, loose matches on Subject: below --
2006-09-15 17:50 Chuck Ebbert
2006-09-18 8:08 ` Martin Schwidefsky
2006-08-24 14:30 Martin Schwidefsky, Martin Schwidefsky, Hubertus Franke, Himanshu Raj
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1157127943.21733.52.camel@localhost \
--to=schwidefsky@de.ibm.com \
--cc=akpm@osdl.org \
--cc=apw@shadowen.org \
--cc=frankeh@watson.ibm.com \
--cc=haveblue@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nickpiggin@yahoo.com.au \
--cc=virtualization@lists.osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.