All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Andi Kleen <ak@suse.de>
Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	mingo@redhat.com, mbligh@google.com
Subject: [PATCH] fix x86_64-mm-cpa-cache-flush.patch in 2.6.22-rc4-mm2
Date: Wed, 20 Jun 2007 15:39:12 -0400	[thread overview]
Message-ID: <20070620193912.GA20578@Krystal> (raw)
In-Reply-To: <200706201953.54322.ak@suse.de>

* Andi Kleen (ak@suse.de) wrote:
> On Wednesday 20 June 2007 18:46, Mathieu Desnoyers wrote:
> > * Andi Kleen (ak@suse.de) wrote:
> > > On Tuesday 19 June 2007 22:01:36 Mathieu Desnoyers wrote:
> > > > Looking more closely into the code to find the cause of the
> > > > change_page_addr()/global_flush_tlb() inconsistency, I see where the
> > > > problem could be:
> > >
> > > Yes it's a known problem. I have a hack queued for .22 and there
> > > are proposed patches for .23 too.
> > >
> > > ftp://ftp.firstfloor.org/pub/ak/x86_64/late-merge/patches/cpa-flush
> > >
> > > -ANdi
> >
> > Hi Andi,
> >
> > Although I cannot find it at the specified URL, I suspect it is already
> > in Andrew's tree, in 2.6.22-rc4-mm2, under the name
> 
> Try again
> 
> > "x86_64-mm-cpa-cache-flush.patch"
> 
> No, that's a different patch with also at least one known bug.
> 
> -Andi

I just fixed x86_64 and i386, using a high order bit of private as a
flag "page needs deferred flush". It works well on i386, not tested on
x86_64.


x86_64 mm CPA cache flush fix for i386 and x86_64

Andi's patch introduced a hang for i386 machines when write protecting pages.

1st fix : use the appropritate checks in global flush tlb.
2nd fix : the hang was caused by multiple list_add of the same
kpte_page. Use a high order bit to keep track of which kpte_pages are
currently in the list and waiting for deferred flush.

This patch applies on top of the x86_64-mm-cpa-cache-flush.patch in the -mm
tree (2.6.22-rc4-mm2).

(note: the revert-x86_64-mm-cpa-cache-flush.patch must be discarded from the -mm
tree)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 arch/i386/mm/pageattr.c         |   24 +++++++++++++++++++-----
 arch/x86_64/mm/pageattr.c       |   16 ++++++++++++----
 include/asm-i386/cacheflush.h   |   11 +++++++++++
 include/asm-x86_64/cacheflush.h |   11 +++++++++++
 4 files changed, 53 insertions(+), 9 deletions(-)

Index: linux-2.6-lttng/arch/i386/mm/pageattr.c
===================================================================
--- linux-2.6-lttng.orig/arch/i386/mm/pageattr.c	2007-06-20 12:51:10.000000000 -0400
+++ linux-2.6-lttng/arch/i386/mm/pageattr.c	2007-06-20 15:28:56.000000000 -0400
@@ -53,6 +53,9 @@
 	/*
 	 * page_private is used to track the number of entries in
 	 * the page table page that have non standard attributes.
+	 * We use the highest bit to tell is the page needs to be flushed,
+	 * therefore page_private_cpa_count() must be used to read the count.
+	 * Count increment and decrement never overflow on the highest bit.
 	 */
 	SetPagePrivate(base);
 	page_private(base) = 0;
@@ -160,7 +163,7 @@
 		page_private(kpte_page)++;
 	} else if (!pte_huge(*kpte)) {
 		set_pte_atomic(kpte, mk_pte(page, PAGE_KERNEL));
-		BUG_ON(page_private(kpte_page) == 0);
+		BUG_ON(page_private_cpa_count(kpte_page) == 0);
 		page_private(kpte_page)--;
 	} else
 		BUG();
@@ -170,10 +173,12 @@
 	 * time (not via split_large_page) and in turn we must not
 	 * replace it with a largepage.
 	 */
-
-	list_add(&kpte_page->lru, &df_list);
+	if (!(page_private(kpte_page) & CPA_FLUSH)) {
+		page_private(kpte_page) |= CPA_FLUSH;
+		list_add(&kpte_page->lru, &df_list);
+	}
 	if (!PageReserved(kpte_page)) {
-		if (cpu_has_pse && (page_private(kpte_page) == 0)) {
+		if (cpu_has_pse && (page_private_cpa_count(kpte_page) == 0)) {
 			paravirt_release_pt(page_to_pfn(kpte_page));
 			revert_page(kpte_page, address);
 		}
@@ -228,9 +233,13 @@
 	if (!cpu_has_clflush)
 		flush_map(NULL);
 	list_for_each_entry_safe(pg, next, &l, lru) {
+		list_del(&pg->lru);
+		page_private(pg) &= ~CPA_FLUSH;
 		if (cpu_has_clflush)
 			flush_map(page_address(pg));
-		if (page_private(pg) != 0)
+
+		if (PageReserved(pg) || !cpu_has_pse
+				|| page_private_cpa_count(pg) != 0)
 			continue;
 		ClearPagePrivate(pg);
 		__free_page(pg);
@@ -252,6 +261,11 @@
 	change_page_attr(page, numpages, enable ? PAGE_KERNEL : __pgprot(0));
 	/* we should perform an IPI and flush all tlbs,
 	 * but that can deadlock->flush only current cpu.
+	 *
+	 * FIXME : this is utterly buggy; it does not clean the df_list
+	 * populated by change_page_attr and could cause a double addition to
+	 * this list. With what exactly would the IPI deadlock ?
+	 * Mathieu Desnoyers
 	 */
 	__flush_tlb_all();
 }
Index: linux-2.6-lttng/include/asm-i386/cacheflush.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-i386/cacheflush.h	2007-06-20 14:53:39.000000000 -0400
+++ linux-2.6-lttng/include/asm-i386/cacheflush.h	2007-06-20 15:23:07.000000000 -0400
@@ -4,6 +4,17 @@
 /* Keep includes the same across arches.  */
 #include <linux/mm.h>
 
+/* Use the highest bit of the page's private field to flag the kpte page as
+ * needing a flush. The lower bits are used as a counter of the number of ptes
+ * with special flags, within the page, which will never use the highest bit.
+ * pte_t being 8 bytes in size,
+ * 4096/sizeof(pte_t) = 512, which holds in 9 bits.
+ * For Large pages:
+ * 4MB/sizeof(pte_t) = 524288, which holds in 19 bits.
+ */
+#define CPA_FLUSH	(1UL<<31)
+#define page_private_cpa_count(page)	(page_private(page) & (~CPA_FLUSH))
+
 /* Caches aren't brain-dead on the intel. */
 #define flush_cache_all()			do { } while (0)
 #define flush_cache_mm(mm)			do { } while (0)
Index: linux-2.6-lttng/include/asm-x86_64/cacheflush.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86_64/cacheflush.h	2007-06-20 14:55:23.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86_64/cacheflush.h	2007-06-20 15:22:49.000000000 -0400
@@ -4,6 +4,17 @@
 /* Keep includes the same across arches.  */
 #include <linux/mm.h>
 
+/* Use the highest bit of the page's private field to flag the kpte page as
+ * needing a flush. The lower bits are used as a counter of the number of ptes
+ * with special flags, within the page, which will never use the highest bit.
+ * pte_t being 8 bytes in size,
+ * 4096/sizeof(pte_t) = 512, which holds in 9 bits.
+ * For Large pages:
+ * 4MB/sizeof(pte_t) = 524288, which holds in 19 bits.
+ */
+#define CPA_FLUSH	(1UL<<63)
+#define page_private_cpa_count(page)	(page_private(page) & (~CPA_FLUSH))
+
 /* Caches aren't brain-dead on the intel. */
 #define flush_cache_all()			do { } while (0)
 #define flush_cache_mm(mm)			do { } while (0)
Index: linux-2.6-lttng/arch/x86_64/mm/pageattr.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86_64/mm/pageattr.c	2007-06-20 15:00:04.000000000 -0400
+++ linux-2.6-lttng/arch/x86_64/mm/pageattr.c	2007-06-20 15:24:35.000000000 -0400
@@ -47,6 +47,9 @@
 	/*
 	 * page_private is used to track the number of entries in
 	 * the page table page have non standard attributes.
+	 * We use the highest bit to tell is the page needs to be flushed,
+	 * therefore page_private_cpa_count() must be used to read the count.
+	 * Count increment and decrement never overflow on the highest bit.
 	 */
 	SetPagePrivate(base);
 	page_private(base) = 0;
@@ -79,6 +82,7 @@
 		asm volatile("wbinvd" ::: "memory");
 	list_for_each_entry(pg, l, lru) {
 		void *adr = page_address(pg);
+		page_private(pg) &= ~CPA_FLUSH;
 		if (cpu_has_clflush)
 			cache_flush_page(adr);
 	}
@@ -94,7 +98,10 @@
 
 static inline void save_page(struct page *fpage)
 {
-	list_add(&fpage->lru, &deferred_pages);
+	if (!(page_private(fpage) & CPA_FLUSH)) {
+		page_private(kpte_page) |= CPA_FLUSH;
+		list_add(&fpage->lru, &deferred_pages);
+	}
 }
 
 /* 
@@ -150,7 +157,7 @@
 		page_private(kpte_page)++;
 	} else if (!pte_huge(*kpte)) {
 		set_pte(kpte, pfn_pte(pfn, ref_prot));
-		BUG_ON(page_private(kpte_page) == 0);
+		BUG_ON(page_private_cpa_count(kpte_page) == 0);
 		page_private(kpte_page)--;
 	} else
 		BUG();
@@ -159,7 +166,7 @@
  	BUG_ON(PageReserved(kpte_page));
 
 	save_page(kpte_page);
-	if (page_private(kpte_page) == 0)
+	if (page_private_cpa_count(kpte_page) == 0)
 		revert_page(address, ref_prot);
 	return 0;
 } 
@@ -232,7 +239,8 @@
 	flush_map(&l);
 
 	list_for_each_entry_safe(pg, next, &l, lru) {
-		if (page_private(pg) != 0)
+		list_del(&pg->lru);
+		if (page_private_cpa_count(pg) != 0)
 			continue;
 		ClearPagePrivate(pg);
 		__free_page(pg);
-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  parent reply	other threads:[~2007-06-20 19:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-19 17:09 Problem with global_flush_tlb() on i386 in 2.6.22-rc4-mm2 Mathieu Desnoyers
2007-06-19 20:01 ` Problem with global_flush_tlb() on i386 (x86_64? too) " Mathieu Desnoyers
2007-06-19 21:10   ` [PATCH] Workaround change_page_attr() and global_flush_tlb() df_list inconsistency on i386 Mathieu Desnoyers
2007-06-20  9:01   ` Problem with global_flush_tlb() on i386 (x86_64? too) in 2.6.22-rc4-mm2 Andi Kleen
2007-06-20 16:46     ` Mathieu Desnoyers
2007-06-20 17:53       ` Andi Kleen
2007-06-20 18:14         ` Mathieu Desnoyers
2007-06-20 19:39         ` Mathieu Desnoyers [this message]
     [not found]           ` <20070625212553.ec2caba9.akpm@linux-foundation.org>
2007-06-29  4:20             ` [PATCH] fix x86_64-mm-cpa-cache-flush.patch " Mathieu Desnoyers
2007-06-20  1:23 ` Problem with global_flush_tlb() on i386 " Anthony Liguori
2007-06-20  1:32   ` Mathieu Desnoyers
2007-06-20  1:49   ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070620193912.GA20578@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@google.com \
    --cc=mingo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.