virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <haveblue@us.ibm.com>
To: schwidefsky@de.ibm.com
Cc: Andy Whitcroft <apw@shadowen.org>,
	linux-kernel@vger.kernel.org, virtualization@lists.osdl.org,
	akpm@osdl.org, nickpiggin@yahoo.com.au, frankeh@watson.ibm.com
Subject: Re: [patch 3/9] Guest page hinting: volatile page cache.
Date: Tue, 05 Sep 2006 11:27:53 -0700	[thread overview]
Message-ID: <1157480873.3186.57.camel@localhost.localdomain> (raw)
In-Reply-To: <1157368883.5078.24.camel@localhost>

On Mon, 2006-09-04 at 13:21 +0200, Martin Schwidefsky wrote:
> Any kind of locking won't work. You need the information that a page has
> been discarded until the page has been freed. Only then the fact that
> the page has been discarded may enter nirvana. Any kind of lock needs to
> be freed again to allow the next discard fault to happen. Since you
> don't when the last page reference is returned you cannot hold the lock
> until the page is free.

First of all, you *CAN* sleep with the BKL held. ;)

Why doesn't the normal lock_page() help?  It can sleep, too?

As far as simplifying the patches, I feel like some of the
page_make_stable() stuff should be done inside of page_cache_get().
Perhaps the API needs to be changed so that page_cache_get()s can fail.

There are also a ton of "mapping == page->mapping" tests all over.
Perhaps you need a page_still_in_mapping(page, mapping) call that also
checks the page's discard state.

I also have the feeling that every single page_host_discards() check
which is actually placed in the VM code shouldn't be there.  The ones in
page_make_stable() and friends are OK, but the ones in
shrink_inactive_list() seem bogus to me.  Looks like they should be
covered up in some _other_ function that checks PageDiscarded().

You could even put these things in (what are now) simple functions like
lru_to_page().  The logic would be along the lines of, whenever I am
looking into the LRU, I need to make sure this page is still actually
there.

As for the locking, imagine a seqlock (per-zone, node, section, hash,
anon_vma, mapping, whatever...).  A write is taken any time that
PG_discard would have been set, and the page is placed in to a list so
that it can be found (the data structure isn't important now).  All of
the places that currently check PG_discard would go and take a read on
the seqlock.  If they fail to acquire it (what is normally now a loop),
they would go look in the list to see if the page they are interested in
is there.  If it is, then they treat it as dicarded, otherwise they
proceed normally.  So, the operation is normally very cheap (a
non-atomic read).  It is very expensive _during_ a discard because of
the traversal of the list, but these should be rare.

The structure storing the page could be like this:

struct page_list {
	struct list_head list;
	struct page *page;
};

So that it doesn't require any extra space in the struct page, and
limits the overhead to only the people actually using the page discard
mechanism.

-- Dave

  reply	other threads:[~2006-09-05 18:27 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-01 11:09 [patch 3/9] Guest page hinting: volatile page cache Martin Schwidefsky
2006-09-01 14:54 ` Dave Hansen
2006-09-01 15:29   ` Martin Schwidefsky
2006-09-01 15:37     ` Dave Hansen
2006-09-01 14:57 ` Dave Hansen
2006-09-01 15:31   ` Martin Schwidefsky
2006-09-01 15:48     ` Andy Whitcroft
2006-09-01 16:04       ` Martin Schwidefsky
2006-09-01 16:18         ` Dave Hansen
2006-09-01 16:25           ` Martin Schwidefsky
2006-09-01 16:37             ` Dave Hansen
2006-09-01 16:56               ` Martin Schwidefsky
2006-09-01 17:16                 ` Dave Hansen
2006-09-01 17:42                   ` Martin Schwidefsky
2006-09-01 18:03                     ` Dave Hansen
2006-09-01 18:04                       ` Martin Schwidefsky
2006-09-01 18:23                         ` Dave Hansen
2006-09-01 18:31                           ` Martin Schwidefsky
2006-09-01 18:41                             ` Dave Hansen
2006-09-04 11:21                               ` Martin Schwidefsky
2006-09-05 18:27                                 ` Dave Hansen [this message]
2006-09-06 10:49                                   ` Martin Schwidefsky
2006-09-01 16:29         ` Dave Hansen
2006-09-01 17:02           ` Martin Schwidefsky
2006-09-01 17:05             ` Dave Hansen
2006-09-13 18:21 ` Zachary Amsden
2006-09-14  8:56   ` Martin Schwidefsky
2006-09-14  9:23     ` Zachary Amsden
2006-09-15  8:36       ` Martin Schwidefsky
  -- strict thread matches above, loose matches on Subject: below --
2006-09-15 17:50 Chuck Ebbert
2006-09-18  8:08 ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1157480873.3186.57.camel@localhost.localdomain \
    --to=haveblue@us.ibm.com \
    --cc=akpm@osdl.org \
    --cc=apw@shadowen.org \
    --cc=frankeh@watson.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=schwidefsky@de.ibm.com \
    --cc=virtualization@lists.osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).