Patch for segfaults in minifail tests

All of lore.kernel.org
 help / color / mirror / Atom feed

* Patch for segfaults in minifail tests
@ 2010-04-30 18:21 James Bottomley
  2010-04-30 21:11 ` John David Anglin
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: James Bottomley @ 2010-04-30 18:21 UTC (permalink / raw)
  To: linux-parisc; +Cc: John David Anglin, Helge Deller

T-bone is keeping a web page with all the failing tests on now:

http://wiki.parisc-linux.org/TestCases

The patch below is what I've found fixes the minifail6 case, but it
doesn't seem to fully fix the minifail one (although the frequency goes
down).

It's the essential patch we need to fix up our kmapping.  Right at the
moment kmap of a page with a dirty cache line in userspace sees stale
data and kunmap of a page the kernel has modified will see likewise.

James

---

diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 4772777..d717e75 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -116,21 +116,22 @@ void mark_rodata_ro(void);
 #define ARCH_HAS_KMAP
 
 void kunmap_parisc(void *addr);
+void *kmap_parisc(struct page *page);
 
 static inline void *kmap(struct page *page)
 {
 	might_sleep();
-	return page_address(page);
+	return kmap_parisc(page);
 }
 
 #define kunmap(page)			kunmap_parisc(page_address(page))
 
-#define kmap_atomic(page, idx)		page_address(page)
+#define kmap_atomic(page, idx)		kmap_parisc(page)
 
 #define kunmap_atomic(addr, idx)	kunmap_parisc(addr)
 
-#define kmap_atomic_pfn(pfn, idx)	page_address(pfn_to_page(pfn))
-#define kmap_atomic_to_page(ptr)	virt_to_page(ptr)
+#define kmap_atomic_pfn(pfn, idx)	kmap_parisc(pfn_to_page(pfn))
+#define kmap_atomic_to_page(ptr)	virt_to_page(kmap_parisc(virt_to_page(ptr)))
 #endif
 
 #endif /* _PARISC_CACHEFLUSH_H */
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index d054f3d..b971ed3 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -336,9 +336,9 @@ __flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr)
 	}
 }
 
-void flush_dcache_page(struct page *page)
+static void flush_user_dcache_page_internal(struct address_space *mapping,
+					    struct page *page)
 {
-	struct address_space *mapping = page_mapping(page);
 	struct vm_area_struct *mpnt;
 	struct prio_tree_iter iter;
 	unsigned long offset;
@@ -346,14 +346,6 @@ void flush_dcache_page(struct page *page)
 	pgoff_t pgoff;
 	unsigned long pfn = page_to_pfn(page);
 
-
-	if (mapping && !mapping_mapped(mapping)) {
-		set_bit(PG_dcache_dirty, &page->flags);
-		return;
-	}
-
-	flush_kernel_dcache_page(page);
-
 	if (!mapping)
 		return;
 
@@ -387,6 +379,19 @@ void flush_dcache_page(struct page *page)
 	}
 	flush_dcache_mmap_unlock(mapping);
 }
+
+void flush_dcache_page(struct page *page)
+{
+	struct address_space *mapping = page_mapping(page);
+
+	if (mapping && !mapping_mapped(mapping)) {
+		set_bit(PG_dcache_dirty, &page->flags);
+		return;
+	}
+
+	flush_kernel_dcache_page(page);
+	flush_user_dcache_page_internal(mapping, page);
+}
 EXPORT_SYMBOL(flush_dcache_page);
 
 /* Defined in arch/parisc/kernel/pacache.S */
@@ -475,16 +480,6 @@ void copy_user_page(void *vto, void *vfrom, unsigned long vaddr,
 }
 EXPORT_SYMBOL(copy_user_page);
 
-#ifdef CONFIG_PA8X00
-
-void kunmap_parisc(void *addr)
-{
-	if (parisc_requires_coherency())
-		flush_kernel_dcache_page_addr(addr);
-}
-EXPORT_SYMBOL(kunmap_parisc);
-#endif
-
 void __flush_tlb_range(unsigned long sid, unsigned long start,
 		       unsigned long end)
 {
@@ -577,3 +572,25 @@ flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long
 		__flush_cache_page(vma, vmaddr);
 
 }
+
+void *kmap_parisc(struct page *page)
+{
+	/* this is a killer.  There's no easy way to test quickly if
+	 * this page is dirty in any userspace.  Additionally, for
+	 * kernel alterations of the page, we'd need it invalidated
+	 * here anyway, so currently flush (and invalidate)
+	 * universally */
+	flush_user_dcache_page_internal(page_mapping(page), page);
+	return page_address(page);
+}
+EXPORT_SYMBOL(kmap_parisc);
+
+void kunmap_parisc(void *addr)
+{
+	/* flush and invalidate the kernel mapping.  We need the
+	 * invalidate so we don't have stale data at this cache
+	 * location the next time the page is mapped */
+	flush_kernel_dcache_page_addr(addr);
+}
+EXPORT_SYMBOL(kunmap_parisc);
+




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Patch for segfaults in minifail tests
  2010-04-30 18:21 Patch for segfaults in minifail tests James Bottomley
@ 2010-04-30 21:11 ` John David Anglin
  2010-04-30 21:13   ` James Bottomley
  2010-05-01 18:29 ` Thibaut VARENE
  2010-05-01 22:25 ` Helge Deller
  2 siblings, 1 reply; 11+ messages in thread
From: John David Anglin @ 2010-04-30 21:11 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-parisc, Helge Deller

On Fri, 30 Apr 2010, James Bottomley wrote:

> It's the essential patch we need to fix up our kmapping.  Right at the
> moment kmap of a page with a dirty cache line in userspace sees stale
> data and kunmap of a page the kernel has modified will see likewise.

Did you try using the version of copy_user_page_asm that uses
equivalent aliasing?   There is an issue with needing cache flushing
noted in pgtable.h, but if this works it may be less of a hammer.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch for segfaults in minifail tests
  2010-04-30 21:11 ` John David Anglin
@ 2010-04-30 21:13   ` James Bottomley
  0 siblings, 0 replies; 11+ messages in thread
From: James Bottomley @ 2010-04-30 21:13 UTC (permalink / raw)
  To: John David Anglin; +Cc: linux-parisc, Helge Deller

On Fri, 2010-04-30 at 17:11 -0400, John David Anglin wrote:
> On Fri, 30 Apr 2010, James Bottomley wrote:
> 
> > It's the essential patch we need to fix up our kmapping.  Right at the
> > moment kmap of a page with a dirty cache line in userspace sees stale
> > data and kunmap of a page the kernel has modified will see likewise.
> 
> Did you try using the version of copy_user_page_asm that uses
> equivalent aliasing?   There is an issue with needing cache flushing
> noted in pgtable.h, but if this works it may be less of a hammer.

Not yet.  I want to isolate the root cause first before I start adding
optimisations.  Although the optimisation is a fairly safe one, since
it's used by a lot of other architectures.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch for segfaults in minifail tests
  2010-04-30 18:21 Patch for segfaults in minifail tests James Bottomley
  2010-04-30 21:11 ` John David Anglin
@ 2010-05-01 18:29 ` Thibaut VARENE
  2010-05-01 22:25 ` Helge Deller
  2 siblings, 0 replies; 11+ messages in thread
From: Thibaut VARENE @ 2010-05-01 18:29 UTC (permalink / raw)
  To: linux-parisc

On Fri, Apr 30, 2010 at 8:21 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> T-bone is keeping a web page with all the failing tests on now:
>
> http://wiki.parisc-linux.org/TestCases

About this: I've had to blaze through hundreds of emails to get this
data together. TBH it's been quite hard to figure which testcase
related to which bug and which patch. I'm not absolutely sure that my
segmentation is right (all the futex/fork/vfork/etc threads seem to
deal with the same kind of problems). I'm also not entirely certain
that I haven't listed bugs that are now fixed. I may also have
overlooked mails containing non-obvious testcases.

What I'm trying to say is that this page is useless without your
feedback. I'd especially need people to, short of updating it
themselves (^-^), tell me what can be added/removed/improved. Given a
Message-ID, I can do the dirty work ;-)

Comments are welcome, I hope this will be anyhow helpful.

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch for segfaults in minifail tests
  2010-04-30 18:21 Patch for segfaults in minifail tests James Bottomley
  2010-04-30 21:11 ` John David Anglin
  2010-05-01 18:29 ` Thibaut VARENE
@ 2010-05-01 22:25 ` Helge Deller
  2010-05-01 23:13   ` John David Anglin
  2010-05-01 23:40   ` Thibaut VARÈNE
  2 siblings, 2 replies; 11+ messages in thread
From: Helge Deller @ 2010-05-01 22:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-parisc, John David Anglin, T-Bone

[-- Attachment #1: Type: text/plain, Size: 6738 bytes --]

On 04/30/2010 08:21 PM, James Bottomley wrote:
> T-bone is keeping a web page with all the failing tests on now:
> 
> http://wiki.parisc-linux.org/TestCases
> 
> The patch below is what I've found fixes the minifail6 case, but it
> doesn't seem to fully fix the minifail one (although the frequency goes
> down).
> 
> It's the essential patch we need to fix up our kmapping.  Right at the
> moment kmap of a page with a dirty cache line in userspace sees stale
> data and kunmap of a page the kernel has modified will see likewise.

Hi James,

I tried your patch on top of a 2.6.33.2 kernel (SMP, 32bit, PA8500 (PCX-W) CPU).
I still do see all the page faults as before. They even seem to trigger 
faster than with a few of Dave's patches.

I usually run this command
	i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail_dave; done;
in a few screen sessions in parallel.

Thibeaut: I've attached the testcase (I called it minifail_dave, since it
was changed by Dave). Maybe you can attach it to your website if it's not
yet there...?

Helge
-----------------

do_page_fault() pid=4325 command='minifail_dave' type=6 address=0x00000003

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00000000000001001111111100001111 Not tainted
r00-03  0004ff0f 1055e000 403390d7 c03613c0
r04-07  4034b5f4 00000007 4034bdf4 00000000
r08-11  4034be64 00000000 c03613ca 0000001c
r12-15  4034be60 4034c7f8 00000000 c0361448
r16-19  4034c0b0 c0361448 40349270 00000000
r20-23  00000000 00000000 00000000 00000000
r24-27  fffffff5 ffffffd3 4034c0b0 00011dac
r28-31  00000000 4034c0b0 c03614c0 403390d7
sr00-03  00001111 0000111c 00000000 00001111
sr04-07  00001111 00001111 00001111 00001111

      VZOUICununcqcqcqcqcqcrmunTDVZOUI
FPSR: 00000000000000000000000000000000
FPER1: 00000000
fr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
fr04-07  8f828e808edb4080 000000001011be24 107495b01078db64 0000000000000000
fr08-11  8f844240107b8000 0000000800000000 8edb40808f828e80 8f8422b81056115c
fr12-15  107495b010803280 1056115c1011be24 00000002ffffff9c fffff00000000000
fr16-19  8f84428010802840 8f828edc8f828ec0 8f828e8000000000 8f81bea88f844208
fr20-23  8f8442088f844210 8f844208101dc9f4 0000000000000090 00001c9800000000
fr24-27  3ff0000000000000 3fe0000000000000 412e848000000000 00000000001e8480
fr28-31  000004000000007a 000000101016bee8 0000000f106ff040 106e3b3810802894

IASQ: 00001111 00001111 IAOQ: 00000003 00000007
 IIR: 43ffff80    ISR: 00001111  IOR: 40000bd0
 CPU:        0   CR30: 8d9f8000 CR31: ffffffff
 ORIG_R28: 00000000
 IAOQ[0]: 00000003
 IAOQ[1]: 00000007
 RP(r2): 403390d7
pagealloc: memory corruption
88140040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
88140050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
88140060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
88140070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
88140080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
88140090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
881400a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
881400b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Backtrace:
 [<1011ecb0>] show_stack+0x18/0x28
 [<10117b94>] dump_stack+0x1c/0x2c
 [<101c6270>] kernel_map_pages+0x2a0/0x2b8
 [<1019eb24>] get_page_from_freelist+0x3e4/0x5f4
 [<1019ee68>] __alloc_pages_nodemask+0x134/0x610
 [<101b1d6c>] do_wp_page+0x260/0xa00
 [<101b3ac0>] handle_mm_fault+0x4cc/0x784
 [<1011d844>] do_page_fault+0x1f8/0x2fc
 [<1011f4ec>] handle_interruption+0xec/0x730
 [<10103078>] intr_check_sig+0x0/0x34


do_page_fault() pid=7143 command='minifail_dave' type=6 address=0x00000003

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00000000000001001111111100001111 Not tainted
r00-03  0004ff0f 1055e000 403390d7 bffaa3c0
r04-07  4034b5f4 00000007 4034bdf4 00000000
r08-11  4034be64 00000000 bffaa3ca 0000001c
r12-15  4034be60 4034c7f8 00000000 bffaa448
r16-19  4034c0b0 bffaa448 40349270 00000000
r20-23  00000000 00000000 00000000 00000000
r24-27  fffffff5 ffffffd3 4034c0b0 00011dac
r28-31  00000000 4034c0b0 bffaa4c0 403390d7
sr00-03  00001c1b 00000000 00000000 00001c1b
sr04-07  00001c1b 00001c1b 00001c1b 00001c1b

      VZOUICununcqcqcqcqcqcrmunTDVZOUI
FPSR: 00000000000000000000000000000000
FPER1: 00000000
fr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
fr04-07  8f828e808edb4080 000000001011be24 107495b01078db64 0000000000000000
fr08-11  8f844240107b8000 0000000800000000 8edb40808f828e80 8f8422b81056115c
fr12-15  107495b010803280 1056115c1011be24 00000002ffffff9c fffff00000000000
fr16-19  8f84428010802840 8f828edc8f828ec0 8f828e8000000000 8f81bea88f844208
fr20-23  8f8442088f844210 8f844208101dc9f4 0000000000000090 00001c9800000000
fr24-27  3ff0000000000000 3fe0000000000000 412e848000000000 00000000001e8480
fr28-31  000004000000007a 000000101016bee8 0000000f106ff040 106e3b3810802894

IASQ: 00001c1b 00001c1b IAOQ: 00000003 00000007
 IIR: 43ffff80    ISR: 00001c1b  IOR: 40000bd0
 CPU:        1   CR30: 86860000 CR31: ffffffff
 ORIG_R28: 00000000
 IAOQ[0]: 00000003
 IAOQ[1]: 00000007
 RP(r2): 403390d7

do_page_fault() pid=9873 command='minifail_dave' type=6 address=0x00000003

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00000000000001001111111100001111 Not tainted
r00-03  0004ff0f 1055e000 403390d7 c06e53c0
r04-07  4034b5f4 00000007 4034bdf4 00000000
r08-11  4034be64 00000000 c06e53ca 0000001c
r12-15  4034be60 4034c7f8 00000000 c06e5448
r16-19  4034c0b0 c06e5448 40349270 00000000
r20-23  00000000 00000000 00000000 00000000
r24-27  fffffff5 ffffffd3 4034c0b0 00011dac
r28-31  00000000 4034c0b0 c06e54c0 403390d7
sr00-03  000026c7 00000000 00000000 000026c7
sr04-07  000026c7 000026c7 000026c7 000026c7

      VZOUICununcqcqcqcqcqcrmunTDVZOUI
FPSR: 00000000000000000000000000000000
FPER1: 00000000
fr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
fr04-07  8f828e808edb4080 000000001011be24 107495b01078db64 0000000000000000
fr08-11  8f844240107b8000 0000000800000000 8edb40808f828e80 8f8422b81056115c
fr12-15  107495b010803280 1056115c1011be24 00000002ffffff9c fffff00000000000
fr16-19  8f84428010802840 8f828edc8f828ec0 8f828e8000000000 8f81bea88f844208
fr20-23  8f8442088f844210 8f844208101dc9f4 0000000000000090 00001c9800000000
fr24-27  3ff0000000000000 3fe0000000000000 412e848000000000 00000000001e8480
fr28-31  000004000000007a 000000101016bee8 0000000f106ff040 106e3b3810802894

IASQ: 000026c7 000026c7 IAOQ: 00000003 00000007
 IIR: 43ffff80    ISR: 000026c7  IOR: 40000bd0
 CPU:        0   CR30: 8dadc000 CR31: ffffffff
 ORIG_R28: 00000000
 IAOQ[0]: 00000003
 IAOQ[1]: 00000007
 RP(r2): 403390d7


[-- Attachment #2: minifail_dave.cpp --]
[-- Type: text/plain, Size: 644 bytes --]

#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

/*
 g++  minifail_dave.cpp -o minifail_dave -O0 -pthread -g

 i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail_dave; done;
 */
void* thread_run(void* arg) {
	write(1,"Thread OK.\n",11);
}

int pure_test() {
	pthread_t thread;
	pthread_create(&thread, NULL, thread_run, NULL);

	switch (fork()) {
		case -1:
			perror("fork() failed");
		case 0:
			write(1,"Child OK.\n",10);
			_exit(0);
		default:
			break;
		
	}
	
	pthread_join(thread, NULL);
	return 0;
}

int main(int argc, char** argv) {
	return pure_test();
}


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch for segfaults in minifail tests
  2010-05-01 22:25 ` Helge Deller
@ 2010-05-01 23:13   ` John David Anglin
  2010-05-01 23:39     ` James Bottomley
  2010-05-01 23:40   ` Thibaut VARÈNE
  1 sibling, 1 reply; 11+ messages in thread
From: John David Anglin @ 2010-05-01 23:13 UTC (permalink / raw)
  To: Helge Deller; +Cc: James.Bottomley, linux-parisc, T-Bone

Hi Helge,

> I tried your patch on top of a 2.6.33.2 kernel (SMP, 32bit, PA8500 (PCX-W) CPU).
> I still do see all the page faults as before. They even seem to trigger 
> faster than with a few of Dave's patches.
> 
> I usually run this command
> 	i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail_dave; done;
> in a few screen sessions in parallel.

The reason running multiple screens in parallel exposes further problems
is the implementation of ptep_set_wrprotect is broken.  It simply sets the
write protect bit in the pte and doesn't purge the existing translation.
So, the parent continues to merrily write to the write protected page until
the TLB entry is purged and reloaded.  More processes make it more likely the
entry will be replaced and trigger a COW break.

This is why my versions of the minifail test which monitor the stack region
used by the thread don't cause a COW break immediately after the fork.  When
compiled at -O0, the loop index is constantly being stored to the stack.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch for segfaults in minifail tests
  2010-05-01 23:13   ` John David Anglin
@ 2010-05-01 23:39     ` James Bottomley
  2010-05-03 21:54       ` John David Anglin
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2010-05-01 23:39 UTC (permalink / raw)
  To: John David Anglin; +Cc: Helge Deller, linux-parisc, T-Bone

On Sat, 2010-05-01 at 19:13 -0400, John David Anglin wrote:
> Hi Helge,
> 
> > I tried your patch on top of a 2.6.33.2 kernel (SMP, 32bit, PA8500 (PCX-W) CPU).
> > I still do see all the page faults as before. They even seem to trigger 
> > faster than with a few of Dave's patches.
> > 
> > I usually run this command
> > 	i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail_dave; done;
> > in a few screen sessions in parallel.
> 
> The reason running multiple screens in parallel exposes further problems
> is the implementation of ptep_set_wrprotect is broken.  It simply sets the
> write protect bit in the pte and doesn't purge the existing translation.
> So, the parent continues to merrily write to the write protected page until
> the TLB entry is purged and reloaded.  More processes make it more likely the
> entry will be replaced and trigger a COW break.
> 
> This is why my versions of the minifail test which monitor the stack region
> used by the thread don't cause a COW break immediately after the fork.  When
> compiled at -O0, the loop index is constantly being stored to the stack.

Actually, no, this explanation isn't correct.

The way linux works.  You can see roughly how this works in
copy_page_range() where we prepare the COW.  If it's going to be a COW
range, we call mmu notifiers before and after the pte settings.  The
after mmu notifier is supposed to flush the TLB.   Linux always does
memory operations in the form

prepare();
do_something_with_the_ptes()
activate()

It's only after the activate() through the mmu notifiers that we're
supposed to be consistent.

Now the outstanding question is whether we're correctly hooked into the
post mmu update notifier ... but I suspect we are (sorry, heading to a
plane for boston, will try to check this next week).

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch for segfaults in minifail tests
  2010-05-01 22:25 ` Helge Deller
  2010-05-01 23:13   ` John David Anglin
@ 2010-05-01 23:40   ` Thibaut VARÈNE
  2010-05-02  8:19     ` Helge Deller
  1 sibling, 1 reply; 11+ messages in thread
From: Thibaut VARÈNE @ 2010-05-01 23:40 UTC (permalink / raw)
  To: Helge Deller; +Cc: James Bottomley, linux-parisc, John David Anglin

Le 2 mai 10 =E0 00:25, Helge Deller a =E9crit :

> Thibeaut: I've attached the testcase (I called it minifail_dave, =20
> since it
> was changed by Dave). Maybe you can attach it to your website if =20
> it's not
> yet there...?

Err, please, you're not making this easy for me.

I've put on the website all the testcases I found in the emails, so if =
=20
dave already posted it, chances are it's already there.

Could you please check the page and see if the testcase you're using =20
is there? I tried to keep it simple by:
a) keeping the original name (whenever available)
b) linking to the corresponding message

If I'm the only one actually using that webpage, it's totally =20
pointless...

HTH

--=20
Thibaut VAR=C8NE
http://www.parisc-linux.org/~varenet/

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch for segfaults in minifail tests
  2010-05-01 23:40   ` Thibaut VARÈNE
@ 2010-05-02  8:19     ` Helge Deller
  2010-05-02 10:59       ` Thibaut VARÈNE
  0 siblings, 1 reply; 11+ messages in thread
From: Helge Deller @ 2010-05-02  8:19 UTC (permalink / raw)
  To: Thibaut VARÈNE; +Cc: James Bottomley, linux-parisc, John David Anglin

On 05/02/2010 01:40 AM, Thibaut VAR=C8NE wrote:
> Le 2 mai 10 =E0 00:25, Helge Deller a =E9crit :
>=20
>> Thibeaut: I've attached the testcase (I called it minifail_dave, sin=
ce it
>> was changed by Dave). Maybe you can attach it to your website if it'=
s not
>> yet there...?
>=20
>=20
> Err, please, you're not making this easy for me.
>=20
> I've put on the website all the testcases I found in the emails, so i=
f
> dave already posted it, chances are it's already there.
>=20
> Could you please check the page and see if the testcase you're using =
is
> there? I tried to keep it simple by:
> a) keeping the original name (whenever available)
> b) linking to the corresponding message

Sorry Thibaut,

Thanks so much for putting all this stuff into the Wiki!
It's very useful to everybody, and of course I did already yesterday lo=
oked=20
if this minifail test program was in the list. I didn't found it.

So, it would be great if you could add it.

In principle you could replace the minifail3 testcase by this one.
And, I think this should be the major testcase which should work in the
end, since it's the one which doesn't has any tweaks. It should just
run without segfaults out of the box if we fixed the problem.

> If I'm the only one actually using that webpage, it's totally pointle=
ss...

You are not. I have looked at it already quite much!

Helge
--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch for segfaults in minifail tests
  2010-05-02  8:19     ` Helge Deller
@ 2010-05-02 10:59       ` Thibaut VARÈNE
  0 siblings, 0 replies; 11+ messages in thread
From: Thibaut VARÈNE @ 2010-05-02 10:59 UTC (permalink / raw)
  To: Helge Deller; +Cc: linux-parisc

Le 2 mai 10 =E0 10:19, Helge Deller a =E9crit :

> Sorry Thibaut,
>
> Thanks so much for putting all this stuff into the Wiki!
> It's very useful to everybody, and of course I did already yesterday =
=20
> looked
> if this minifail test program was in the list. I didn't found it.

Ah ok, sorry I didn't understand that.

> So, it would be great if you could add it.

Done.

> In principle you could replace the minifail3 testcase by this one.

I'll keep it for archiving purpose, it's just a line on the page after =
=20
all ;)

> And, I think this should be the major testcase which should work in =20
> the
> end, since it's the one which doesn't has any tweaks. It should just
> run without segfaults out of the box if we fixed the problem.


I've marked it as "Final" until proved wrong.

HTH

--=20
Thibaut VAR=C8NE
http://www.parisc-linux.org/~varenet/

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc"=
 in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Patch for segfaults in minifail tests
  2010-05-01 23:39     ` James Bottomley
@ 2010-05-03 21:54       ` John David Anglin
  0 siblings, 0 replies; 11+ messages in thread
From: John David Anglin @ 2010-05-03 21:54 UTC (permalink / raw)
  To: James Bottomley; +Cc: Helge Deller, linux-parisc, T-Bone

On Sat, 01 May 2010, James Bottomley wrote:

> On Sat, 2010-05-01 at 19:13 -0400, John David Anglin wrote:
> > Hi Helge,
> > 
> > > I tried your patch on top of a 2.6.33.2 kernel (SMP, 32bit, PA8500 (PCX-W) CPU).
> > > I still do see all the page faults as before. They even seem to trigger 
> > > faster than with a few of Dave's patches.
> > > 
> > > I usually run this command
> > > 	i=0; while true; do i=$(($i+1)); echo Run $i; ./minifail_dave; done;
> > > in a few screen sessions in parallel.
> > 
> > The reason running multiple screens in parallel exposes further problems
> > is the implementation of ptep_set_wrprotect is broken.  It simply sets the
> > write protect bit in the pte and doesn't purge the existing translation.
> > So, the parent continues to merrily write to the write protected page until
> > the TLB entry is purged and reloaded.  More processes make it more likely the
> > entry will be replaced and trigger a COW break.
> > 
> > This is why my versions of the minifail test which monitor the stack region
> > used by the thread don't cause a COW break immediately after the fork.  When
> > compiled at -O0, the loop index is constantly being stored to the stack.
> 
> Actually, no, this explanation isn't correct.

I stand by the observation that COW breaks do not happen when expected,
or consistently with the minifail testcase.

> The way linux works.  You can see roughly how this works in
> copy_page_range() where we prepare the COW.  If it's going to be a COW
> range, we call mmu notifiers before and after the pte settings.  The
> after mmu notifier is supposed to flush the TLB.   Linux always does
> memory operations in the form
> 
> prepare();
> do_something_with_the_ptes()
> activate()

That is certainly consistent with the existing code in pgtable.h.
The ptes can only be handled in this manner if there is a different
page table for each process.  I couldn't find the notifier hooks...
include/asm-generic/pgtable.h contains flush implementations which
do the TLB flush for several pte operations.  However, there is no
generic implementation of ptep_set_wrprotect as far as I can tell.

In the minifail testcase, we have three processes: "parent", "thread"
and "child".  Parent and thread have the same space id.  Child has
a different space id and therefore a different global virtual address
range.  All processes reference the same physical memory.

The ptes for parent and thread have to be consistent since the idtlbt
instruction will remove one or more entries whose virtual address ranges
overlap the virtual address range of the new translation.  Parisc
cpus have a relatively small translation lookaside buffer; so when
many processes are running, it is easy to lose an entry.  If we lose
entries, a page could randomly alternate from being write protected
to not write protected.

As far as I can tell, copy_one_pte only write protects the mms for
the parent (src_pte) and the child (pte).  The pte of thread is not
write protected.  As far as I can tell, we don't purge TLBs on
context switches, so it would appear that we have to have a consistent
set of ptes for all processes with the same space id.

Originally, I had assumed that parent and thread had the same page
directory.  In that case, one has to handle race conditions in setting
and updating ptes.  My rp3440 is more stable when I try to force
insertion of the write protect bit.  However, there are still some faults
that look like cache corruption when I push the system load up.

> It's only after the activate() through the mmu notifiers that we're
> supposed to be consistent.
> 
> Now the outstanding question is whether we're correctly hooked into the
> post mmu update notifier ... but I suspect we are (sorry, heading to a
> plane for boston, will try to check this next week).

Hope you spot something.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-05-03 21:54 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-30 18:21 Patch for segfaults in minifail tests James Bottomley
2010-04-30 21:11 ` John David Anglin
2010-04-30 21:13   ` James Bottomley
2010-05-01 18:29 ` Thibaut VARENE
2010-05-01 22:25 ` Helge Deller
2010-05-01 23:13   ` John David Anglin
2010-05-01 23:39     ` James Bottomley
2010-05-03 21:54       ` John David Anglin
2010-05-01 23:40   ` Thibaut VARÈNE
2010-05-02  8:19     ` Helge Deller
2010-05-02 10:59       ` Thibaut VARÈNE

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.