From: Rik van Riel <riel@redhat.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
virtualization@lists.osdl.org, frankeh@watson.ibm.com,
akpm@osdl.org, nickpiggin@yahoo.com.au, hugh@veritas.com
Subject: Re: [patch 1/6] Guest page hinting: core + volatile page cache.
Date: Sun, 29 Mar 2009 10:35:31 -0400 [thread overview]
Message-ID: <49CF8733.7060309@redhat.com> (raw)
In-Reply-To: <20090329155640.31472c61@skybase>
Martin Schwidefsky wrote:
> On Fri, 27 Mar 2009 18:57:31 -0400
> Rik van Riel <riel@redhat.com> wrote:
>> Martin Schwidefsky wrote:
>>> There are some alternatives how this can be done, e.g. a global
>>> lock, or lock per segment in the kernel page table, or the per page
>>> bit PG_arch_1 if it is still free.
>> Can this be taken care of by memory barriers and
>> careful ordering of operations?
>
> I don't see how this could be done with memory barries, the sequence is
> 1) check conditions
> 2) do state change to volatile
>
> another cpus can do
> i) change one of the conditions
>
> The operation i) needs to be postponed while the first cpu has done 1)
> but not done 2) yet. 1+2 needs to be atomic but consists of several
> instructions. Ergo we need a lock, no ?
You are right.
Hashed locks may be a space saving option, with a
set of (cache line aligned?) locks in each zone
and the page state lock chosen by taking a hash
of the page number or address.
Not ideal, but at least we can get some NUMA
locality.
>>> + if (page->index != linear_page_index(vma, addr))
>>> + /* If nonlinear, store the file page offset in the pte. */
>>> + set_pte_at(dst_mm, addr, dst_pte, pgoff_to_pte(page->index));
>>> + else
>>> + pte_clear(dst_mm, addr, dst_pte);
>>> }
>> It would be good to document that PG_discarded can only happen for
>> file pages and NOT for eg. clean swap cache pages.
>
> PG_discarded can happen for swap cache pages as well. If a clean swap
> cache page gets remove and subsequently access again the discard fault
> handler will set the bit (see __page_discard). The code necessary for
> volatile swap cache is introduced with patch #2. So I would rather not
> add a comment in patch #1 only to remove it again with patch #2 ..
I discovered that once I opened the next email :)
>>> @@ -1390,6 +1391,7 @@ int test_clear_page_writeback(struct pag
>>> radix_tree_tag_clear(&mapping->page_tree,
>>> page_index(page),
>>> PAGECACHE_TAG_WRITEBACK);
>>> + page_make_volatile(page, 1);
>>> if (bdi_cap_account_writeback(bdi)) {
>>> __dec_bdi_stat(bdi, BDI_WRITEBACK);
>>> __bdi_writeout_inc(bdi);
>> Does this mark the page volatile before the IO writing the
>> dirty data back to disk has even started? Is that OK?
>
> Hmm, it could be that the page_make_volatile is just superflouos here.
> The logic here is that whenever one of the conditions that prevent a
> page from becoming volatile is cleared a try with page_make_volatile
> is done. The condition in question here is PageWriteback(page). If we
> can prove that one of the other conditions is true this particular call
> is a waste of effort.
Actually, test_clear_page_writeback is probably called
on IO completion and it was just me being confused after
a few hundred lines of very new (to me) VM code :)
I guess the patch is correct.
Acked-by: Rik van Riel <riel@redhat.com>
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Rik van Riel <riel@redhat.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
virtualization@lists.osdl.org, frankeh@watson.ibm.com,
akpm@osdl.org, nickpiggin@yahoo.com.au, hugh@veritas.com
Subject: Re: [patch 1/6] Guest page hinting: core + volatile page cache.
Date: Sun, 29 Mar 2009 10:35:31 -0400 [thread overview]
Message-ID: <49CF8733.7060309@redhat.com> (raw)
In-Reply-To: <20090329155640.31472c61@skybase>
Martin Schwidefsky wrote:
> On Fri, 27 Mar 2009 18:57:31 -0400
> Rik van Riel <riel@redhat.com> wrote:
>> Martin Schwidefsky wrote:
>>> There are some alternatives how this can be done, e.g. a global
>>> lock, or lock per segment in the kernel page table, or the per page
>>> bit PG_arch_1 if it is still free.
>> Can this be taken care of by memory barriers and
>> careful ordering of operations?
>
> I don't see how this could be done with memory barries, the sequence is
> 1) check conditions
> 2) do state change to volatile
>
> another cpus can do
> i) change one of the conditions
>
> The operation i) needs to be postponed while the first cpu has done 1)
> but not done 2) yet. 1+2 needs to be atomic but consists of several
> instructions. Ergo we need a lock, no ?
You are right.
Hashed locks may be a space saving option, with a
set of (cache line aligned?) locks in each zone
and the page state lock chosen by taking a hash
of the page number or address.
Not ideal, but at least we can get some NUMA
locality.
>>> + if (page->index != linear_page_index(vma, addr))
>>> + /* If nonlinear, store the file page offset in the pte. */
>>> + set_pte_at(dst_mm, addr, dst_pte, pgoff_to_pte(page->index));
>>> + else
>>> + pte_clear(dst_mm, addr, dst_pte);
>>> }
>> It would be good to document that PG_discarded can only happen for
>> file pages and NOT for eg. clean swap cache pages.
>
> PG_discarded can happen for swap cache pages as well. If a clean swap
> cache page gets remove and subsequently access again the discard fault
> handler will set the bit (see __page_discard). The code necessary for
> volatile swap cache is introduced with patch #2. So I would rather not
> add a comment in patch #1 only to remove it again with patch #2 ..
I discovered that once I opened the next email :)
>>> @@ -1390,6 +1391,7 @@ int test_clear_page_writeback(struct pag
>>> radix_tree_tag_clear(&mapping->page_tree,
>>> page_index(page),
>>> PAGECACHE_TAG_WRITEBACK);
>>> + page_make_volatile(page, 1);
>>> if (bdi_cap_account_writeback(bdi)) {
>>> __dec_bdi_stat(bdi, BDI_WRITEBACK);
>>> __bdi_writeout_inc(bdi);
>> Does this mark the page volatile before the IO writing the
>> dirty data back to disk has even started? Is that OK?
>
> Hmm, it could be that the page_make_volatile is just superflouos here.
> The logic here is that whenever one of the conditions that prevent a
> page from becoming volatile is cleared a try with page_make_volatile
> is done. The condition in question here is PageWriteback(page). If we
> can prove that one of the other conditions is true this particular call
> is a waste of effort.
Actually, test_clear_page_writeback is probably called
on IO completion and it was just me being confused after
a few hundred lines of very new (to me) VM code :)
I guess the patch is correct.
Acked-by: Rik van Riel <riel@redhat.com>
--
All rights reversed.
next prev parent reply other threads:[~2009-03-29 14:35 UTC|newest]
Thread overview: 117+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-27 15:09 [patch 0/6] Guest page hinting version 7 Martin Schwidefsky
2009-03-27 15:09 ` Martin Schwidefsky
2009-03-27 15:09 ` [patch 1/6] Guest page hinting: core + volatile page cache Martin Schwidefsky
2009-03-27 15:09 ` Martin Schwidefsky
2009-03-27 22:57 ` Rik van Riel
2009-03-27 22:57 ` Rik van Riel
2009-03-29 13:56 ` Martin Schwidefsky
2009-03-29 13:56 ` Martin Schwidefsky
2009-03-29 14:35 ` Rik van Riel [this message]
2009-03-29 14:35 ` Rik van Riel
2009-03-27 15:09 ` [patch 2/6] Guest page hinting: volatile swap cache Martin Schwidefsky
2009-03-27 15:09 ` Martin Schwidefsky
2009-04-01 2:10 ` Rik van Riel
2009-04-01 2:10 ` Rik van Riel
2009-04-01 8:13 ` Martin Schwidefsky
2009-04-01 8:13 ` Martin Schwidefsky
2009-03-27 15:09 ` [patch 3/6] Guest page hinting: mlocked pages Martin Schwidefsky
2009-03-27 15:09 ` Martin Schwidefsky
2009-04-01 2:52 ` Rik van Riel
2009-04-01 2:52 ` Rik van Riel
2009-04-01 8:13 ` Martin Schwidefsky
2009-04-01 8:13 ` Martin Schwidefsky
2009-03-27 15:09 ` [patch 4/6] Guest page hinting: writable page table entries Martin Schwidefsky
2009-03-27 15:09 ` Martin Schwidefsky
2009-04-01 13:25 ` Rik van Riel
2009-04-01 13:25 ` Rik van Riel
2009-04-01 14:36 ` Martin Schwidefsky
2009-04-01 14:36 ` Martin Schwidefsky
2009-04-01 14:45 ` Rik van Riel
2009-04-01 14:45 ` Rik van Riel
2009-03-27 15:09 ` [patch 5/6] Guest page hinting: minor fault optimization Martin Schwidefsky
2009-03-27 15:09 ` Martin Schwidefsky
2009-04-01 15:33 ` Rik van Riel
2009-04-01 15:33 ` Rik van Riel
2009-03-27 15:09 ` [patch 6/6] Guest page hinting: s390 support Martin Schwidefsky
2009-03-27 15:09 ` Martin Schwidefsky
2009-04-01 16:18 ` Rik van Riel
2009-04-01 16:18 ` Rik van Riel
2009-03-27 23:03 ` [patch 0/6] Guest page hinting version 7 Dave Hansen
2009-03-27 23:03 ` Dave Hansen
2009-03-28 0:06 ` Rik van Riel
2009-03-28 0:06 ` Rik van Riel
2009-03-29 14:20 ` Martin Schwidefsky
2009-03-29 14:20 ` Martin Schwidefsky
2009-03-29 14:38 ` Rik van Riel
2009-03-29 14:38 ` Rik van Riel
2009-03-29 14:12 ` Martin Schwidefsky
2009-03-29 14:12 ` Martin Schwidefsky
2009-03-30 15:54 ` Dave Hansen
2009-03-30 15:54 ` Dave Hansen
2009-03-30 16:34 ` Martin Schwidefsky
2009-03-30 16:34 ` Martin Schwidefsky
2009-03-30 18:37 ` Jeremy Fitzhardinge
2009-03-30 18:37 ` Jeremy Fitzhardinge
2009-03-30 18:42 ` Rik van Riel
2009-03-30 18:42 ` Rik van Riel
2009-03-30 18:59 ` Jeremy Fitzhardinge
2009-03-30 18:59 ` Jeremy Fitzhardinge
2009-03-30 20:02 ` Rik van Riel
2009-03-30 20:02 ` Rik van Riel
2009-03-30 20:35 ` Jeremy Fitzhardinge
2009-03-30 20:35 ` Jeremy Fitzhardinge
2009-03-30 21:38 ` Dor Laor
2009-03-30 21:38 ` Dor Laor
2009-03-30 22:16 ` Izik Eidus
2009-03-30 22:16 ` Izik Eidus
2009-03-28 6:35 ` Rusty Russell
2009-03-28 6:35 ` Rusty Russell
2009-03-29 14:23 ` Martin Schwidefsky
2009-03-29 14:23 ` Martin Schwidefsky
2009-04-02 11:32 ` Nick Piggin
2009-04-02 11:32 ` Nick Piggin
2009-04-02 15:52 ` Martin Schwidefsky
2009-04-02 15:52 ` Martin Schwidefsky
2009-04-02 16:18 ` Jeremy Fitzhardinge
2009-04-02 16:18 ` Jeremy Fitzhardinge
2009-04-02 16:23 ` Nick Piggin
2009-04-02 16:23 ` Nick Piggin
2009-04-02 19:06 ` Rik van Riel
2009-04-02 19:06 ` Rik van Riel
2009-04-02 19:22 ` Nick Piggin
2009-04-02 19:22 ` Nick Piggin
2009-04-02 20:05 ` Rik van Riel
2009-04-02 20:05 ` Rik van Riel
2009-04-03 0:50 ` Jeremy Fitzhardinge
2009-04-03 0:50 ` Jeremy Fitzhardinge
2009-04-02 19:58 ` Jeremy Fitzhardinge
2009-04-02 19:58 ` Jeremy Fitzhardinge
2009-04-02 20:14 ` Rik van Riel
2009-04-02 20:14 ` Rik van Riel
2009-04-02 20:34 ` Jeremy Fitzhardinge
2009-04-02 20:34 ` Jeremy Fitzhardinge
2009-04-03 8:49 ` Martin Schwidefsky
2009-04-03 8:49 ` Martin Schwidefsky
2009-04-03 18:19 ` Jeremy Fitzhardinge
2009-04-03 18:19 ` Jeremy Fitzhardinge
2009-04-06 7:21 ` Martin Schwidefsky
2009-04-06 7:21 ` Martin Schwidefsky
2009-04-06 7:32 ` Nick Piggin
2009-04-06 7:32 ` Nick Piggin
2009-04-06 7:32 ` Nick Piggin
2009-04-06 19:23 ` Jeremy Fitzhardinge
2009-04-06 19:23 ` Jeremy Fitzhardinge
2009-04-02 19:27 ` Hugh Dickins
2009-04-02 19:27 ` Hugh Dickins
-- strict thread matches above, loose matches on Subject: below --
2008-03-12 13:21 [patch 0/6] Guest page hinting version 6 Martin Schwidefsky
2008-03-12 13:21 ` [patch 1/6] Guest page hinting: core + volatile page cache Martin Schwidefsky
2008-03-12 23:12 ` Rusty Russell
2008-03-13 9:24 ` Martin Schwidefsky
2007-06-28 16:40 [patch 0/6] resend: guest page hinting version 5 Martin Schwidefsky
2007-06-28 16:40 ` [patch 1/6] Guest page hinting: core + volatile page cache Martin Schwidefsky
2007-06-28 16:40 ` Martin Schwidefsky
2007-06-28 16:40 ` Martin Schwidefsky, Martin Schwidefsky, Hubertus Franke, Himanshu Raj
2007-06-28 16:40 ` Martin Schwidefsky
2007-05-11 13:58 [patch 0/6] [rfc] guest page hinting version 5 Martin Schwidefsky
2007-05-11 13:58 ` [patch 1/6] Guest page hinting: core + volatile page cache Martin Schwidefsky
2007-05-11 13:58 ` Martin Schwidefsky, Martin Schwidefsky, Hubertus Franke, Himanshu Raj
2007-05-11 14:45 ` Valdis.Kletnieks
2007-05-11 14:53 ` Martin Schwidefsky
2007-05-11 14:53 ` Martin Schwidefsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49CF8733.7060309@redhat.com \
--to=riel@redhat.com \
--cc=akpm@osdl.org \
--cc=frankeh@watson.ibm.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
--cc=schwidefsky@de.ibm.com \
--cc=virtualization@lists.osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.