virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	virtualization@lists.osdl.org, frankeh@watson.ibm.com,
	akpm@osdl.org, nickpiggin@yahoo.com.au, hugh@veritas.com
Subject: Re: [patch 1/6] Guest page hinting: core + volatile page cache.
Date: Sun, 29 Mar 2009 15:56:40 +0200	[thread overview]
Message-ID: <20090329155640.31472c61@skybase> (raw)
In-Reply-To: <49CD59DB.3070906@redhat.com>

On Fri, 27 Mar 2009 18:57:31 -0400
Rik van Riel <riel@redhat.com> wrote:

> Martin Schwidefsky wrote:
> 
> > The major obstacles that need to get addressed:
> > * Concurrent page state changes:
> >   To guard against concurrent page state updates some kind of lock
> >   is needed. If page_make_volatile() has already done the 11 checks it
> >   will issue the state change primitive. If in the meantime one of
> >   the conditions has changed the user that requires that page in
> >   stable state will have to wait in the page_make_stable() function
> >   until the make volatile operation has finished. It is up to the
> >   architecture to define how this is done with the three primitives
> >   page_test_set_state_change, page_clear_state_change and
> >   page_state_change.
> >   There are some alternatives how this can be done, e.g. a global
> >   lock, or lock per segment in the kernel page table, or the per page
> >   bit PG_arch_1 if it is still free.
> 
> Can this be taken care of by memory barriers and
> careful ordering of operations?

I don't see how this could be done with memory barries, the sequence is
1) check conditions
2) do state change to volatile

another cpus can do
i) change one of the conditions

The operation i) needs to be postponed while the first cpu has done 1)
but not done 2) yet. 1+2 needs to be atomic but consists of several
instructions. Ergo we need a lock, no ?

> If we consider the states unused -> volatile -> stable
> as progressively higher, "upgrades" can be done before
> any kernel operation that requires the page to be in
> that state (but after setting up the things that allow
> it to be found), while downgrades can be done after the
> kernel is done with needing the page at a higher level.
> 
> Since the downgrade checks for users that need the page
> in a higher state, no lock should be required.
> 
> In fact, it may be possible to manage the page state
> bitmap with compare-and-swap, without needing a call
> to the hypervisor.
> 
> > Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> 
> Some comments and questions in line.
> 
> > @@ -601,6 +604,21 @@ copy_one_pte(struct mm_struct *dst_mm, s
> >  
> >  out_set_pte:
> >  	set_pte_at(dst_mm, addr, dst_pte, pte);
> > +	return;
> > +
> > +out_discard_pte:
> > +	/*
> > +	 * If the page referred by the pte has the PG_discarded bit set,
> > +	 * copy_one_pte is racing with page_discard. The pte may not be
> > +	 * copied or we can end up with a pte pointing to a page not
> > +	 * in the page cache anymore. Do what try_to_unmap_one would do
> > +	 * if the copy_one_pte had taken place before page_discard.
> > +	 */
> > +	if (page->index != linear_page_index(vma, addr))
> > +		/* If nonlinear, store the file page offset in the pte. */
> > +		set_pte_at(dst_mm, addr, dst_pte, pgoff_to_pte(page->index));
> > +	else
> > +		pte_clear(dst_mm, addr, dst_pte);
> >  }
> 
> It would be good to document that PG_discarded can only happen for
> file pages and NOT for eg. clean swap cache pages.

PG_discarded can happen for swap cache pages as well. If a clean swap
cache page gets remove and subsequently access again the discard fault
handler will set the bit (see __page_discard). The code necessary for
volatile swap cache is introduced with patch #2. So I would rather not
add a comment in patch #1 only to remove it again with patch #2 ..

> > @@ -1390,6 +1391,7 @@ int test_clear_page_writeback(struct pag
> >  			radix_tree_tag_clear(&mapping->page_tree,
> >  						page_index(page),
> >  						PAGECACHE_TAG_WRITEBACK);
> > +			page_make_volatile(page, 1);
> >  			if (bdi_cap_account_writeback(bdi)) {
> >  				__dec_bdi_stat(bdi, BDI_WRITEBACK);
> >  				__bdi_writeout_inc(bdi);
> 
> Does this mark the page volatile before the IO writing the
> dirty data back to disk has even started?  Is that OK?
 
Hmm, it could be that the page_make_volatile is just superflouos here.
The logic here is that whenever one of the conditions that prevent a
page from becoming volatile is cleared a try with page_make_volatile
is done. The condition in question here is PageWriteback(page). If we
can prove that one of the other conditions is true this particular call
is a waste of effort.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-03-29 13:56 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-27 15:09 [patch 0/6] Guest page hinting version 7 Martin Schwidefsky
2009-03-27 15:09 ` [patch 1/6] Guest page hinting: core + volatile page cache Martin Schwidefsky
2009-03-27 22:57   ` Rik van Riel
2009-03-29 13:56     ` Martin Schwidefsky [this message]
2009-03-29 14:35       ` Rik van Riel
2009-03-27 15:09 ` [patch 2/6] Guest page hinting: volatile swap cache Martin Schwidefsky
2009-04-01  2:10   ` Rik van Riel
2009-04-01  8:13     ` Martin Schwidefsky
2009-03-27 15:09 ` [patch 3/6] Guest page hinting: mlocked pages Martin Schwidefsky
2009-04-01  2:52   ` Rik van Riel
2009-04-01  8:13     ` Martin Schwidefsky
2009-03-27 15:09 ` [patch 4/6] Guest page hinting: writable page table entries Martin Schwidefsky
2009-04-01 13:25   ` Rik van Riel
2009-04-01 14:36     ` Martin Schwidefsky
2009-04-01 14:45       ` Rik van Riel
2009-03-27 15:09 ` [patch 5/6] Guest page hinting: minor fault optimization Martin Schwidefsky
2009-04-01 15:33   ` Rik van Riel
2009-03-27 15:09 ` [patch 6/6] Guest page hinting: s390 support Martin Schwidefsky
2009-04-01 16:18   ` Rik van Riel
2009-03-27 23:03 ` [patch 0/6] Guest page hinting version 7 Dave Hansen
2009-03-28  0:06   ` Rik van Riel
2009-03-29 14:20     ` Martin Schwidefsky
2009-03-29 14:38       ` Rik van Riel
2009-03-29 14:12   ` Martin Schwidefsky
2009-03-30 15:54     ` Dave Hansen
2009-03-30 16:34       ` Martin Schwidefsky
2009-03-30 18:37       ` Jeremy Fitzhardinge
2009-03-30 18:42         ` Rik van Riel
2009-03-30 18:59           ` Jeremy Fitzhardinge
2009-03-30 20:02             ` Rik van Riel
2009-03-30 20:35               ` Jeremy Fitzhardinge
2009-03-30 21:38                 ` Dor Laor
2009-03-30 22:16                   ` Izik Eidus
2009-03-28  6:35 ` Rusty Russell
2009-03-29 14:23   ` Martin Schwidefsky
2009-04-02 11:32     ` Nick Piggin
2009-04-02 15:52       ` Martin Schwidefsky
2009-04-02 16:18         ` Jeremy Fitzhardinge
2009-04-02 16:23         ` Nick Piggin
2009-04-02 19:06         ` Rik van Riel
2009-04-02 19:22           ` Nick Piggin
2009-04-02 20:05             ` Rik van Riel
2009-04-03  0:50               ` Jeremy Fitzhardinge
2009-04-02 19:58           ` Jeremy Fitzhardinge
2009-04-02 20:14             ` Rik van Riel
2009-04-02 20:34               ` Jeremy Fitzhardinge
2009-04-03  8:49                 ` Martin Schwidefsky
2009-04-03 18:19                   ` Jeremy Fitzhardinge
2009-04-06  7:21                     ` Martin Schwidefsky
2009-04-06  7:32                       ` Nick Piggin
2009-04-06 19:23                       ` Jeremy Fitzhardinge
2009-04-02 19:27         ` Hugh Dickins
  -- strict thread matches above, loose matches on Subject: below --
2008-03-12 13:21 [patch 0/6] Guest page hinting version 6 Martin Schwidefsky
2008-03-12 13:21 ` [patch 1/6] Guest page hinting: core + volatile page cache Martin Schwidefsky
2008-03-12 23:12   ` Rusty Russell
2008-03-13  9:24     ` Martin Schwidefsky
     [not found] <20070628164049.118610355@de.ibm.com>
2007-06-28 16:40 ` Martin Schwidefsky
2007-05-11 13:58 [patch 0/6] [rfc] guest page hinting version 5 Martin Schwidefsky
2007-05-11 13:58 ` [patch 1/6] Guest page hinting: core + volatile page cache Martin Schwidefsky
2007-05-11 14:45   ` Valdis.Kletnieks
2007-05-11 14:53     ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090329155640.31472c61@skybase \
    --to=schwidefsky@de.ibm.com \
    --cc=akpm@osdl.org \
    --cc=frankeh@watson.ibm.com \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=riel@redhat.com \
    --cc=virtualization@lists.osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).