From: Andrea Arcangeli <aarcange@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Izik Eidus <ieidus@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kvm@vger.kernel.org, chrisw@redhat.com, avi@redhat.com,
izike@qumranet.com
Subject: Re: [PATCH 1/4] rmap: add page_wrprotect() function,
Date: Tue, 11 Nov 2008 21:38:06 +0100 [thread overview]
Message-ID: <20081111203806.GE10818@random.random> (raw)
In-Reply-To: <20081111113948.f38b9e95.akpm@linux-foundation.org>
On Tue, Nov 11, 2008 at 11:39:48AM -0800, Andrew Morton wrote:
> > +static int page_wrprotect_one(struct page *page, struct vm_area_struct *vma,
> > + int *odirect_sync)
> > +{
> > + struct mm_struct *mm = vma->vm_mm;
> > + unsigned long address;
> > + pte_t *pte;
> > + spinlock_t *ptl;
> > + int ret = 0;
> > +
> > + address = vma_address(page, vma);
> > + if (address == -EFAULT)
> > + goto out;
> > +
> > + pte = page_check_address(page, mm, address, &ptl, 0);
> > + if (!pte)
> > + goto out;
> > +
> > + if (pte_write(*pte)) {
> > + pte_t entry;
> > +
> > + if (page_mapcount(page) != page_count(page)) {
> > + *odirect_sync = 0;
> > + goto out_unlock;
> > + }
> > + flush_cache_page(vma, address, pte_pfn(*pte));
> > + entry = ptep_clear_flush_notify(vma, address, pte);
> > + entry = pte_wrprotect(entry);
> > + set_pte_at(mm, address, pte, entry);
> > + }
> > + ret = 1;
> > +
> > +out_unlock:
> > + pte_unmap_unlock(pte, ptl);
> > +out:
> > + return ret;
> > +}
>
> OK. I think. We need to find a way of provoking Hugh to look at it.
Yes. Please focus on the page_mapcount != page_count, which is likely
missing from migrate.c too and in turn page migration currently breaks
O_DIRECT like fork() is buggy as well as discussed here:
http://marc.info/?l=linux-mm&m=122236799302540&w=2
http://marc.info/?l=linux-mm&m=122524107519182&w=2
http://marc.info/?l=linux-mm&m=122581116713932&w=2
The fix implemented in ksm currently handles older kernels (like
rhel/sles) not current mainline that does
get_user_pages_fast. get_user_pages_fast is unfixable yet (see my last
email to Nick above asking for a way to block gup_fast).
The fix proposed by Nick plus my additional fix, should stop the
corruption in fork the same way the above check fixes it for ksm. But
todate gup_fast remains unfixable.
> > +static int page_wrprotect_file(struct page *page, int *odirect_sync)
> > +{
> > + struct address_space *mapping;
> > + struct prio_tree_iter iter;
> > + struct vm_area_struct *vma;
> > + pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > + int ret = 0;
> > +
> > + mapping = page_mapping(page);
>
> What pins *mapping in memory? Usually this is done by requiring that
> the caller has locked the page. But no such precondition is documented
> here.
Looks buggy but we never call it from ksm 8). I guess Izik added it
for completeness when preparing for mainline submission. We've the
option to get rid of page_wrprotect_file entirely and only implement a
page_wrprotect_anon! Otherwise we can add a BUG_ON(!PageLocked(page))
before the above page_mapping to protect against truncate.
> > + * set all the ptes pointed to a page as read only,
> > + * odirect_sync is set to 0 in case we cannot protect against race with odirect
> > + * return the number of ptes that were set as read only
> > + * (ptes that were read only before this function was called are couned as well)
> > + */
>
> But it isn't.
What isn't?
> I don't understand this odirect_sync thing. What race? Please expand
> this comment to make the function of odirect_sync more understandable.
I should have answered this one with the above 3 links.
> What do you think about making all this new code dependent upon some
> CONFIG_ switch which CONFIG_KVM can select?
I like that too.
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Izik Eidus <ieidus@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kvm@vger.kernel.org, chrisw@redhat.com, avi@redhat.com,
izike@qumranet.com
Subject: Re: [PATCH 1/4] rmap: add page_wrprotect() function,
Date: Tue, 11 Nov 2008 21:38:06 +0100 [thread overview]
Message-ID: <20081111203806.GE10818@random.random> (raw)
In-Reply-To: <20081111113948.f38b9e95.akpm@linux-foundation.org>
On Tue, Nov 11, 2008 at 11:39:48AM -0800, Andrew Morton wrote:
> > +static int page_wrprotect_one(struct page *page, struct vm_area_struct *vma,
> > + int *odirect_sync)
> > +{
> > + struct mm_struct *mm = vma->vm_mm;
> > + unsigned long address;
> > + pte_t *pte;
> > + spinlock_t *ptl;
> > + int ret = 0;
> > +
> > + address = vma_address(page, vma);
> > + if (address == -EFAULT)
> > + goto out;
> > +
> > + pte = page_check_address(page, mm, address, &ptl, 0);
> > + if (!pte)
> > + goto out;
> > +
> > + if (pte_write(*pte)) {
> > + pte_t entry;
> > +
> > + if (page_mapcount(page) != page_count(page)) {
> > + *odirect_sync = 0;
> > + goto out_unlock;
> > + }
> > + flush_cache_page(vma, address, pte_pfn(*pte));
> > + entry = ptep_clear_flush_notify(vma, address, pte);
> > + entry = pte_wrprotect(entry);
> > + set_pte_at(mm, address, pte, entry);
> > + }
> > + ret = 1;
> > +
> > +out_unlock:
> > + pte_unmap_unlock(pte, ptl);
> > +out:
> > + return ret;
> > +}
>
> OK. I think. We need to find a way of provoking Hugh to look at it.
Yes. Please focus on the page_mapcount != page_count, which is likely
missing from migrate.c too and in turn page migration currently breaks
O_DIRECT like fork() is buggy as well as discussed here:
http://marc.info/?l=linux-mm&m=122236799302540&w=2
http://marc.info/?l=linux-mm&m=122524107519182&w=2
http://marc.info/?l=linux-mm&m=122581116713932&w=2
The fix implemented in ksm currently handles older kernels (like
rhel/sles) not current mainline that does
get_user_pages_fast. get_user_pages_fast is unfixable yet (see my last
email to Nick above asking for a way to block gup_fast).
The fix proposed by Nick plus my additional fix, should stop the
corruption in fork the same way the above check fixes it for ksm. But
todate gup_fast remains unfixable.
> > +static int page_wrprotect_file(struct page *page, int *odirect_sync)
> > +{
> > + struct address_space *mapping;
> > + struct prio_tree_iter iter;
> > + struct vm_area_struct *vma;
> > + pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
> > + int ret = 0;
> > +
> > + mapping = page_mapping(page);
>
> What pins *mapping in memory? Usually this is done by requiring that
> the caller has locked the page. But no such precondition is documented
> here.
Looks buggy but we never call it from ksm 8). I guess Izik added it
for completeness when preparing for mainline submission. We've the
option to get rid of page_wrprotect_file entirely and only implement a
page_wrprotect_anon! Otherwise we can add a BUG_ON(!PageLocked(page))
before the above page_mapping to protect against truncate.
> > + * set all the ptes pointed to a page as read only,
> > + * odirect_sync is set to 0 in case we cannot protect against race with odirect
> > + * return the number of ptes that were set as read only
> > + * (ptes that were read only before this function was called are couned as well)
> > + */
>
> But it isn't.
What isn't?
> I don't understand this odirect_sync thing. What race? Please expand
> this comment to make the function of odirect_sync more understandable.
I should have answered this one with the above 3 links.
> What do you think about making all this new code dependent upon some
> CONFIG_ switch which CONFIG_KVM can select?
I like that too.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-11-11 20:38 UTC|newest]
Thread overview: 139+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-11 13:21 [PATCH 0/4] ksm - dynamic page sharing driver for linux Izik Eidus
2008-11-11 13:21 ` Izik Eidus
2008-11-11 13:21 ` [PATCH 1/4] rmap: add page_wrprotect() function, Izik Eidus
2008-11-11 13:21 ` Izik Eidus, Izik Eidus
2008-11-11 13:21 ` [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Izik Eidus
2008-11-11 13:21 ` Izik Eidus, Izik Eidus
2008-11-11 13:21 ` [PATCH 3/4] add ksm kernel shared memory driver Izik Eidus
2008-11-11 13:21 ` Izik Eidus, Izik Eidus
2008-11-11 13:21 ` [PATCH 4/4] MMU_NOTIFIRES: add set_pte_at_notify() Izik Eidus
2008-11-11 13:21 ` Izik Eidus, Izik Eidus
2008-11-11 20:38 ` [PATCH 3/4] add ksm kernel shared memory driver Andrew Morton
2008-11-11 20:38 ` Andrew Morton
2008-11-11 22:03 ` Andrea Arcangeli
2008-11-11 22:03 ` Andrea Arcangeli
2008-11-11 22:03 ` Jonathan Corbet
2008-11-11 22:03 ` Jonathan Corbet
2008-11-11 22:17 ` Izik Eidus
2008-11-11 22:17 ` Izik Eidus
2008-11-11 22:25 ` Jonathan Corbet
2008-11-11 22:25 ` Jonathan Corbet
2008-11-11 22:31 ` Izik Eidus
2008-11-11 22:31 ` Izik Eidus
2008-11-11 22:30 ` Jonathan Corbet
2008-11-11 22:30 ` Jonathan Corbet
2008-11-11 22:38 ` Izik Eidus
2008-11-11 22:38 ` Izik Eidus
2008-11-11 23:02 ` Izik Eidus
2008-11-11 23:02 ` Izik Eidus
2008-11-11 23:03 ` Andrea Arcangeli
2008-11-11 23:03 ` Andrea Arcangeli
2008-11-11 22:49 ` Avi Kivity
2008-11-11 22:49 ` Avi Kivity
2008-11-11 22:40 ` Valdis.Kletnieks
2008-11-13 6:13 ` Eric Rannaud
2008-11-13 6:13 ` Eric Rannaud
2008-11-11 22:43 ` Avi Kivity
2008-11-11 22:43 ` Avi Kivity
2008-11-11 19:45 ` [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Andrew Morton
2008-11-11 19:45 ` Andrew Morton
2008-11-11 20:57 ` Izik Eidus
2008-11-11 20:57 ` Izik Eidus
2008-11-11 21:21 ` Christoph Lameter
2008-11-11 21:21 ` Christoph Lameter
2008-11-11 21:23 ` Izik Eidus
2008-11-11 21:23 ` Izik Eidus
2008-11-11 21:31 ` Christoph Lameter
2008-11-11 21:31 ` Christoph Lameter
2008-11-11 21:37 ` Izik Eidus
2008-11-11 21:37 ` Izik Eidus
2008-11-11 22:24 ` Andrea Arcangeli
2008-11-11 22:24 ` Andrea Arcangeli
2008-11-12 2:19 ` KAMEZAWA Hiroyuki
2008-11-12 2:19 ` KAMEZAWA Hiroyuki
2008-11-12 10:05 ` Avi Kivity
2008-11-12 10:05 ` Avi Kivity
2008-11-12 11:11 ` Izik Eidus
2008-11-12 11:11 ` Izik Eidus
2008-11-13 6:11 ` KAMEZAWA Hiroyuki
2008-11-13 6:11 ` KAMEZAWA Hiroyuki
2008-11-13 10:38 ` Izik Eidus
2008-11-13 10:38 ` Izik Eidus
2008-11-13 11:32 ` KAMEZAWA Hiroyuki
2008-11-13 11:32 ` KAMEZAWA Hiroyuki
2008-11-11 21:35 ` Andrea Arcangeli
2008-11-11 21:35 ` Andrea Arcangeli
2008-11-11 21:06 ` Andrea Arcangeli
2008-11-11 21:06 ` Andrea Arcangeli
2008-11-11 21:26 ` Christoph Lameter
2008-11-11 21:26 ` Christoph Lameter
2008-11-11 21:39 ` Avi Kivity
2008-11-11 21:39 ` Avi Kivity
2008-11-11 21:47 ` Christoph Lameter
2008-11-11 21:47 ` Christoph Lameter
2008-11-11 21:55 ` Izik Eidus
2008-11-11 21:55 ` Izik Eidus
2008-11-11 22:36 ` Avi Kivity
2008-11-11 22:36 ` Avi Kivity
2008-11-11 22:17 ` Andrea Arcangeli
2008-11-11 22:17 ` Andrea Arcangeli
2008-11-11 22:30 ` Christoph Lameter
2008-11-11 22:30 ` Christoph Lameter
2008-11-11 23:17 ` Andrea Arcangeli
2008-11-11 23:17 ` Andrea Arcangeli
2008-11-11 23:25 ` Andrea Arcangeli
2008-11-11 23:25 ` Andrea Arcangeli
2008-11-12 0:27 ` Christoph Lameter
2008-11-12 0:27 ` Christoph Lameter
2008-11-12 2:27 ` Andrea Arcangeli
2008-11-12 2:27 ` Andrea Arcangeli
2008-11-12 3:10 ` Christoph Lameter
2008-11-12 3:10 ` Christoph Lameter
2008-11-12 17:32 ` Andrea Arcangeli
2008-11-12 17:32 ` Andrea Arcangeli
2008-11-12 20:08 ` Lee Schermerhorn
2008-11-12 20:08 ` Lee Schermerhorn
2008-11-12 20:31 ` Christoph Lameter
2008-11-12 20:31 ` Christoph Lameter
2008-11-12 20:27 ` Christoph Lameter
2008-11-12 20:27 ` Christoph Lameter
2008-11-12 22:09 ` Lee Schermerhorn
2008-11-12 22:09 ` Lee Schermerhorn
2008-11-13 2:00 ` Andrea Arcangeli
2008-11-13 2:00 ` Andrea Arcangeli
2008-11-13 2:31 ` Andrea Arcangeli
2008-11-13 2:31 ` Andrea Arcangeli
2008-11-13 4:02 ` Nick Piggin
2008-11-13 4:02 ` Nick Piggin
2008-11-11 19:39 ` [PATCH 1/4] rmap: add page_wrprotect() function, Andrew Morton
2008-11-11 19:39 ` Andrew Morton
2008-11-11 20:38 ` Andrea Arcangeli [this message]
2008-11-11 20:38 ` Andrea Arcangeli
2008-11-11 21:01 ` Andrew Morton
2008-11-11 21:01 ` Andrew Morton
2008-11-11 21:17 ` Andrea Arcangeli
2008-11-11 21:17 ` Andrea Arcangeli
2008-11-11 18:30 ` [PATCH 0/4] ksm - dynamic page sharing driver for linux Andrew Morton
2008-11-11 18:30 ` Andrew Morton
2008-11-11 18:48 ` Avi Kivity
2008-11-11 18:48 ` Avi Kivity
2008-11-11 19:08 ` Izik Eidus
2008-11-11 19:08 ` Izik Eidus
2008-11-11 19:11 ` Andrew Morton
2008-11-11 19:11 ` Andrew Morton
2008-11-11 19:18 ` Izik Eidus
2008-11-11 19:18 ` Izik Eidus
2008-11-11 19:32 ` Andrew Morton
2008-11-11 19:32 ` Andrew Morton
2008-11-11 19:52 ` Izik Eidus
2008-11-11 19:52 ` Izik Eidus
2008-11-11 20:08 ` Izik Eidus
2008-11-11 20:08 ` Izik Eidus
2008-11-11 19:29 ` Avi Kivity
2008-11-11 19:29 ` Avi Kivity
2008-11-11 19:55 ` Andrea Arcangeli
2008-11-11 19:55 ` Andrea Arcangeli
2008-11-11 19:07 ` Izik Eidus
2008-11-11 19:07 ` Izik Eidus
2008-11-11 19:20 ` Andrew Morton
2008-11-11 19:20 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081111203806.GE10818@random.random \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=chrisw@redhat.com \
--cc=ieidus@redhat.com \
--cc=izike@qumranet.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.