From: Dave Hansen <haveblue@us.ibm.com>
To: schwidefsky@de.ibm.com
Cc: akpm@osdl.org, virtualization@lists.osdl.org,
frankeh@watson.ibm.com, nickpiggin@yahoo.com.au,
Andy Whitcroft <apw@shadowen.org>,
linux-kernel@vger.kernel.org
Subject: Re: [patch 3/9] Guest page hinting: volatile page cache.
Date: Fri, 01 Sep 2006 10:16:10 -0700 [thread overview]
Message-ID: <1157130970.28577.150.camel@localhost.localdomain> (raw)
In-Reply-To: <1157129762.21733.63.camel@localhost>
On Fri, 2006-09-01 at 18:56 +0200, Martin Schwidefsky wrote:
> On Fri, 2006-09-01 at 09:37 -0700, Dave Hansen wrote:
> > Can you give me the sequence of events that occur so that we need to
> > set, then check PG_discarded? I'm not getting it.
> >
> > 1. there is good data in a page
> > ...
> > 50. ... and PG_discarded gets set
> > ...
> > 99. We check PG_discarded and ...
>
> Ok, here we go:
> 0) there is good data in a page
> 1) the host scans for pages to reclaim and selects a page of a
> particular guest
> 2) the host checks the page state and decides to either swap the page or
> discard it
> 3) nothing happens for a long time
> 4) the guest comes around and tries to access the long gone page
> 5) the host gets a fault because the page is gone from the hosts page
> table for the guest system
> 6) the host delivers a discard fault to the guest
> 7) the architecture dependent fault handler gets a page reference for
> the discarded page (tricky for s390)
> 8) page_discard is called which locks the page and does a
> TestSetPageDiscarded. If the bit has not been set yet the page is
> removed from the page cache. There can still be page references around.
>
> Concurrent to 5-8 another cpu could be just be removing the page from
> page cache as well. Without the check for the discarded bit the page
> would get removed twice. This does nasty things to reference counting,
> mapping->nrpages, ...
This feels like something that can be done with RCU. The
__page_discard() is the write operation, right? So, take an rcu write
lock inside of the page discard function, and read locks over the
current places where PG_discarded is set.
That should make sure that the discard operation itself can't be done
concurrently with one of the __remove_from*() operations. Once the
write lock has been acquired, you just check page->mapping to see if the
a __remove_from*() operation has occurred while you waited.
> Ouch, I understand what you are trying to tell me. The struct page
> entries that cover the mem_map array itself has free bits we could try
> to cannibalize.
Right. It is certainly ugly. I'd much rather have some kind of
spin_lock(&lock_array[hash_function(page)]);
thing.
-- Dave
WARNING: multiple messages have this Message-ID (diff)
From: Dave Hansen <haveblue@us.ibm.com>
To: schwidefsky@de.ibm.com
Cc: Andy Whitcroft <apw@shadowen.org>,
linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
akpm@osdl.org, nickpiggin@yahoo.com.au, frankeh@watson.ibm.com
Subject: Re: [patch 3/9] Guest page hinting: volatile page cache.
Date: Fri, 01 Sep 2006 10:16:10 -0700 [thread overview]
Message-ID: <1157130970.28577.150.camel@localhost.localdomain> (raw)
In-Reply-To: <1157129762.21733.63.camel@localhost>
On Fri, 2006-09-01 at 18:56 +0200, Martin Schwidefsky wrote:
> On Fri, 2006-09-01 at 09:37 -0700, Dave Hansen wrote:
> > Can you give me the sequence of events that occur so that we need to
> > set, then check PG_discarded? I'm not getting it.
> >
> > 1. there is good data in a page
> > ...
> > 50. ... and PG_discarded gets set
> > ...
> > 99. We check PG_discarded and ...
>
> Ok, here we go:
> 0) there is good data in a page
> 1) the host scans for pages to reclaim and selects a page of a
> particular guest
> 2) the host checks the page state and decides to either swap the page or
> discard it
> 3) nothing happens for a long time
> 4) the guest comes around and tries to access the long gone page
> 5) the host gets a fault because the page is gone from the hosts page
> table for the guest system
> 6) the host delivers a discard fault to the guest
> 7) the architecture dependent fault handler gets a page reference for
> the discarded page (tricky for s390)
> 8) page_discard is called which locks the page and does a
> TestSetPageDiscarded. If the bit has not been set yet the page is
> removed from the page cache. There can still be page references around.
>
> Concurrent to 5-8 another cpu could be just be removing the page from
> page cache as well. Without the check for the discarded bit the page
> would get removed twice. This does nasty things to reference counting,
> mapping->nrpages, ...
This feels like something that can be done with RCU. The
__page_discard() is the write operation, right? So, take an rcu write
lock inside of the page discard function, and read locks over the
current places where PG_discarded is set.
That should make sure that the discard operation itself can't be done
concurrently with one of the __remove_from*() operations. Once the
write lock has been acquired, you just check page->mapping to see if the
a __remove_from*() operation has occurred while you waited.
> Ouch, I understand what you are trying to tell me. The struct page
> entries that cover the mem_map array itself has free bits we could try
> to cannibalize.
Right. It is certainly ugly. I'd much rather have some kind of
spin_lock(&lock_array[hash_function(page)]);
thing.
-- Dave
next prev parent reply other threads:[~2006-09-01 17:16 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-01 11:09 [patch 3/9] Guest page hinting: volatile page cache Martin Schwidefsky
2006-09-01 14:54 ` Dave Hansen
2006-09-01 14:54 ` Dave Hansen
2006-09-01 15:29 ` Martin Schwidefsky
2006-09-01 15:37 ` Dave Hansen
2006-09-01 14:57 ` Dave Hansen
2006-09-01 15:31 ` Martin Schwidefsky
2006-09-01 15:48 ` Andy Whitcroft
2006-09-01 15:48 ` Andy Whitcroft
2006-09-01 16:04 ` Martin Schwidefsky
2006-09-01 16:04 ` Martin Schwidefsky
2006-09-01 16:18 ` Dave Hansen
2006-09-01 16:18 ` Dave Hansen
2006-09-01 16:25 ` Martin Schwidefsky
2006-09-01 16:37 ` Dave Hansen
2006-09-01 16:37 ` Dave Hansen
2006-09-01 16:56 ` Martin Schwidefsky
2006-09-01 17:16 ` Dave Hansen [this message]
2006-09-01 17:16 ` Dave Hansen
2006-09-01 17:42 ` Martin Schwidefsky
2006-09-01 18:03 ` Dave Hansen
2006-09-01 18:04 ` Martin Schwidefsky
2006-09-01 18:23 ` Dave Hansen
2006-09-01 18:23 ` Dave Hansen
2006-09-01 18:31 ` Martin Schwidefsky
2006-09-01 18:41 ` Dave Hansen
2006-09-04 11:21 ` Martin Schwidefsky
2006-09-05 18:27 ` Dave Hansen
2006-09-06 10:49 ` Martin Schwidefsky
2006-09-01 16:29 ` Dave Hansen
2006-09-01 17:02 ` Martin Schwidefsky
2006-09-01 17:05 ` Dave Hansen
2006-09-13 18:21 ` Zachary Amsden
2006-09-14 8:56 ` Martin Schwidefsky
2006-09-14 9:23 ` Zachary Amsden
2006-09-15 8:36 ` Martin Schwidefsky
-- strict thread matches above, loose matches on Subject: below --
2006-09-15 17:50 Chuck Ebbert
2006-09-18 8:08 ` Martin Schwidefsky
2006-08-24 14:30 Martin Schwidefsky, Martin Schwidefsky, Hubertus Franke, Himanshu Raj
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1157130970.28577.150.camel@localhost.localdomain \
--to=haveblue@us.ibm.com \
--cc=akpm@osdl.org \
--cc=apw@shadowen.org \
--cc=frankeh@watson.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nickpiggin@yahoo.com.au \
--cc=schwidefsky@de.ibm.com \
--cc=virtualization@lists.osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.