[Xenomai-core] Kernel crash with Xenomai (caused by fork?)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
@ 2008-03-28 21:06 Tomas Kalibera
  2008-03-28 23:25 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-03-28 21:06 UTC (permalink / raw)
  To: xenomai-core


Hi,

I'm getting kernel crashes with my native skin user-space Xenomai 
application. It looks like the crash happens after clone/fork. I'm using 
kernel 2.6.24.3, SMP, RT_PREEMPT (settings like  2.6.22-14-rt from 
Ubuntu 7.10). Xenomai 2.4.2.

The thread causing the crash is a Xenomai task, running most of the time 
in the Linux domain. The application is very huge, getting a short 
example leading to the bug is unfortunatelly not realistic.

The crash happens when running on real hardware (x86_64 with 32 bit 
kernel and applications).  The system is unusable after it happens, can 
only be rebooted, the dump is from serial console.
In VMWare on another x86_64 machine, it does not crash.

Anyone getting a similar error ? Any ideas where to look for the problem ?

Thanks,

Tomas

 

kernel crash dump

[  139.814229] ------------[ cut here ]------------
[  139.818830] kernel BUG at arch/x86/mm/highmem_32.c:42!
[  139.823945] invalid opcode: 0000 [#1] PREEMPT SMP 
[  139.828739] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 parport_pc lp parport sr_mod cdrom pcspkr iTCO_wdt iTCO_vendor_support ipv6 shpchp 
pci_hotplug evdev ext3 jbd mbcache sg sd_mod ata_piix usbhid hid floppy ata_generic ahci ohci1394 libata scsi_mod ieee1394 ehci_hcd tg3 uhci_hcd usbcor
e fuse
[  139.855896] 
[  139.857378] Pid: 4959, comm: ovmtask Not tainted (2.6.24.3xenomai #1)
[  139.863790] EIP: 0060:[<c011a8d8>] EFLAGS: 00010286 CPU: 0
[  139.869255] EIP is at kmap_atomic_prot+0x98/0xa0
[  139.873850] EAX: d91aa163 EBX: c2b23540 ECX: fffff000 EDX: c044fecc
[  139.880088] ESI: 00000007 EDI: 00000163 EBP: 08003875 ESP: df68fea0
[  139.886326]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  139.891699] Process ovmtask (pid: 4959, ti=df68e000 task=df685080 task.ti=df68e000)<0>
[  139.899148] I-pipe domain Linux
[  139.902539] Stack: fffb2000 00000000 c2b2354c c01a967a fffb7000 fffb6000 df89395c df4ad580 
[  139.910930]        df4ad900 dfaf5084 df9f5084 08615000 08400000 08615000 f7c02ec0 c2b23560 
[  139.919323]        00000000 00000000 c2b2354c c2be8acc fffb3000 08614fff 00000000 00000000 
[  139.927714] Call Trace:
[  139.930329]  [<c01a967a>] copy_page_range+0x13a/0x560
[  139.935368]  [<c01224bf>] copy_process+0x8df/0x1250
[  139.940235]  [<c012306c>] do_fork+0x4c/0x200
[  139.944495]  [<c01022d2>] sys_clone+0x32/0x40
[  139.948839]  [<c0104431>] syscall_call+0x7/0xb
[  139.953272]  =======================
[  139.956828] Code: b5 00 00 00 00 29 c2 8b 02 85 c0 75 1e 2b 1d 80 0c 50 c0 8d 46 45 c1 e0 0c c1 fb 05 29 c1 c1 e3 0c 89 c8 09 fb 89 1a 5b 5e 5f c3 <
0f> 0b eb fe 8d 74 26 00 8b 0d f4 b1 45 c0 e9 55 ff ff ff 90 8d 
[  139.976150] EIP: [<c011a8d8>] kmap_atomic_prot+0x98/0xa0 SS:ESP 0068:df68fea0
[  139.983355] ---[ end trace 1cb0b5180594e9d9 ]---
[  139.987956] note: ovmtask[4959] exited with preempt_count 1


end of strace output

4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
4959  rt_sigaction(SIGUSR1, NULL, {SIG_DFL}, 8) = 0
4959  rt_sigaction(SIGUSR1, {0x85ec4b0, [], SA_RESTART|SA_SIGINFO}, {SIG_DFL}, 8) = 0
4959  rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  write(2, "#<", 2)                 = 2
4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  write(2, "executive", 9)          = 9
4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  write(2, "> ", 2)                 = 2
4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  write(2, "[Testing ", 9)          = 9
4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  write(2, "AbstractInterpretation", 22) = 22
4959  fcntl64(2, F_GETFL)               = 0x8001 (flags O_WRONLY|O_LARGEFILE)
4959  pipe([7, 8])                      = 0
4959  fcntl64(7, F_GETFL)               = 0 (flags O_RDONLY)
4959  fcntl64(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
4959  fcntl64(8, F_GETFL)               = 0x1 (flags O_WRONLY)
4959  fcntl64(8, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
4959  clone( <unfinished ...>
4958  <... nanosleep resumed> NULL)     = 0



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-03-28 21:06 [Xenomai-core] Kernel crash with Xenomai (caused by fork?) Tomas Kalibera
@ 2008-03-28 23:25 ` Gilles Chanteperdrix
  2008-03-29  0:08   ` Gilles Chanteperdrix
  0 siblings, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-03-28 23:25 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

Tomas Kalibera wrote:
 > 
 > Hi,
 > 
 > I'm getting kernel crashes with my native skin user-space Xenomai 
 > application. It looks like the crash happens after clone/fork. I'm using 
 > kernel 2.6.24.3, SMP, RT_PREEMPT (settings like  2.6.22-14-rt from 
 > Ubuntu 7.10). Xenomai 2.4.2.
 > 
 > The thread causing the crash is a Xenomai task, running most of the time 
 > in the Linux domain. The application is very huge, getting a short 
 > example leading to the bug is unfortunatelly not realistic.
 > 
 > The crash happens when running on real hardware (x86_64 with 32 bit 
 > kernel and applications).  The system is unusable after it happens, can 
 > only be rebooted, the dump is from serial console.
 > In VMWare on another x86_64 machine, it does not crash.
 > 
 > Anyone getting a similar error ? Any ideas where to look for the problem ?

Looking at the kernel code, it seems that only one page may be mapped at
a time with kmap_atomic using KM_USER0. So what probably happens is that
for other invocations of cow_user_page than the one taking place in
fork, a lock of some kind prevents concurrent invocation of
cow_user_page. In our use of cow_user_page, we probably do not hold that
lock. I look at the code, I see that copy_pte_range holds a spinlock,
which should disable preemption on a classical kernel. But who knows
what happens with RT_PREEMPT enabled...

-- 


					    Gilles.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-03-28 23:25 ` Gilles Chanteperdrix
@ 2008-03-29  0:08   ` Gilles Chanteperdrix
  2008-03-29  1:36     ` Tomas Kalibera
  0 siblings, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-03-29  0:08 UTC (permalink / raw)
  To: Tomas Kalibera, xenomai-core

Gilles Chanteperdrix wrote:
 > Tomas Kalibera wrote:
 >  > 
 >  > Hi,
 >  > 
 >  > I'm getting kernel crashes with my native skin user-space Xenomai 
 >  > application. It looks like the crash happens after clone/fork. I'm using 
 >  > kernel 2.6.24.3, SMP, RT_PREEMPT (settings like  2.6.22-14-rt from 
 >  > Ubuntu 7.10). Xenomai 2.4.2.
 >  > 
 >  > The thread causing the crash is a Xenomai task, running most of the time 
 >  > in the Linux domain. The application is very huge, getting a short 
 >  > example leading to the bug is unfortunatelly not realistic.
 >  > 
 >  > The crash happens when running on real hardware (x86_64 with 32 bit 
 >  > kernel and applications).  The system is unusable after it happens, can 
 >  > only be rebooted, the dump is from serial console.
 >  > In VMWare on another x86_64 machine, it does not crash.
 >  > 
 >  > Anyone getting a similar error ? Any ideas where to look for the problem ?
 > 
 > Looking at the kernel code, it seems that only one page may be mapped at
 > a time with kmap_atomic using KM_USER0. So what probably happens is that
 > for other invocations of cow_user_page than the one taking place in
 > fork, a lock of some kind prevents concurrent invocation of
 > cow_user_page. In our use of cow_user_page, we probably do not hold that
 > lock. I look at the code, I see that copy_pte_range holds a spinlock,
 > which should disable preemption on a classical kernel. But who knows
 > what happens with RT_PREEMPT enabled...

There is something strange... Normally, when compiling with
CONFIG_PREEMPT_RT, kmap_atomic_prot is replaced with kmap and the real
kmap_atomic_prot is renamd __kmap_atomic_prot. Since cow_user_page uses
kmap_atomic_prot, kmap is in fact called and kmap_atomic_prot BUG_ON
condition should in fact never occur.

-- 


					    Gilles.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-03-29  0:08   ` Gilles Chanteperdrix
@ 2008-03-29  1:36     ` Tomas Kalibera
  2008-03-29 20:17       ` Gilles Chanteperdrix
  2008-03-30 20:27       ` Gilles Chanteperdrix
  0 siblings, 2 replies; 34+ messages in thread
From: Tomas Kalibera @ 2008-03-29  1:36 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Hi Gilles,

thanks for looking at it. Your analysis is correct, I don't indeed have 
CONFIG_PREEMPT_RT kernel, but only CONFIG_PREEMPT, sorry for the confusion.

I've put the kernel config, sources, and binary on the web, so that you 
can be sure you're really looking on the kernel that is crashing, 
http://www.cs.purdue.edu/homes/tkaliber/crash

Thanks,

Tomas

Gilles Chanteperdrix wrote:
> Gilles Chanteperdrix wrote:
>  > Tomas Kalibera wrote:
>  >  > 
>  >  > Hi,
>  >  > 
>  >  > I'm getting kernel crashes with my native skin user-space Xenomai 
>  >  > application. It looks like the crash happens after clone/fork. I'm using 
>  >  > kernel 2.6.24.3, SMP, RT_PREEMPT (settings like  2.6.22-14-rt from 
>  >  > Ubuntu 7.10). Xenomai 2.4.2.
>  >  > 
>  >  > The thread causing the crash is a Xenomai task, running most of the time 
>  >  > in the Linux domain. The application is very huge, getting a short 
>  >  > example leading to the bug is unfortunatelly not realistic.
>  >  > 
>  >  > The crash happens when running on real hardware (x86_64 with 32 bit 
>  >  > kernel and applications).  The system is unusable after it happens, can 
>  >  > only be rebooted, the dump is from serial console.
>  >  > In VMWare on another x86_64 machine, it does not crash.
>  >  > 
>  >  > Anyone getting a similar error ? Any ideas where to look for the problem ?
>  > 
>  > Looking at the kernel code, it seems that only one page may be mapped at
>  > a time with kmap_atomic using KM_USER0. So what probably happens is that
>  > for other invocations of cow_user_page than the one taking place in
>  > fork, a lock of some kind prevents concurrent invocation of
>  > cow_user_page. In our use of cow_user_page, we probably do not hold that
>  > lock. I look at the code, I see that copy_pte_range holds a spinlock,
>  > which should disable preemption on a classical kernel. But who knows
>  > what happens with RT_PREEMPT enabled...
>
> There is something strange... Normally, when compiling with
> CONFIG_PREEMPT_RT, kmap_atomic_prot is replaced with kmap and the real
> kmap_atomic_prot is renamd __kmap_atomic_prot. Since cow_user_page uses
> kmap_atomic_prot, kmap is in fact called and kmap_atomic_prot BUG_ON
> condition should in fact never occur.
>
>   



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-03-29  1:36     ` Tomas Kalibera
@ 2008-03-29 20:17       ` Gilles Chanteperdrix
  2008-03-30 20:27       ` Gilles Chanteperdrix
  1 sibling, 0 replies; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-03-29 20:17 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

Tomas Kalibera wrote:
 > Hi Gilles,
 > 
 > thanks for looking at it. Your analysis is correct, I don't indeed have 
 > CONFIG_PREEMPT_RT kernel, but only CONFIG_PREEMPT, sorry for the confusion.
 > 
 > I've put the kernel config, sources, and binary on the web, so that you 
 > can be sure you're really looking on the kernel that is crashing, 
 > http://www.cs.purdue.edu/homes/tkaliber/crash

It looks like do_wp_page, the caller of cow_user_page calls it with
spinlock unlocked. So nothing prevents a rescheduling to happen and
reschedule a real-time process, which can call fork. Now, I wonder what
prevents do_wp_page to be called in the same conditions...

-- 


					    Gilles.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-03-29  1:36     ` Tomas Kalibera
  2008-03-29 20:17       ` Gilles Chanteperdrix
@ 2008-03-30 20:27       ` Gilles Chanteperdrix
  2008-03-31  4:04         ` Tomas Kalibera
  1 sibling, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-03-30 20:27 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

[-- Attachment #1: message body and .signature --]
[-- Type: text/plain, Size: 789 bytes --]

Tomas Kalibera wrote:
 > Hi Gilles,
 > 
 > thanks for looking at it. Your analysis is correct, I don't indeed have 
 > CONFIG_PREEMPT_RT kernel, but only CONFIG_PREEMPT, sorry for the confusion.
 > 
 > I've put the kernel config, sources, and binary on the web, so that you 
 > can be sure you're really looking on the kernel that is crashing, 
 > http://www.cs.purdue.edu/homes/tkaliber/crash

After looking at the sources, it appears that kmap_atomic disables
preemption and kunmap_atomic reenables it. In short, the bug should
never happen. What could happen is that the preemption count is garbled,
or that a call to kmap_atomic is not paired with a kunmap_atomic.

To check if the problem comes from the preemption count, could you apply
the following patch ?

-- 


					    Gilles.

[-- Attachment #2: ipipe-kmap_atomic-bug.diff --]
[-- Type: text/plain, Size: 742 bytes --]

diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 1c3bf95..4bb9fc6 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -34,6 +34,7 @@ void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot)
 	/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
 	pagefault_disable();
 
+	BUG_ON(type == KM_USER0 && !in_atomic());
 	if (!PageHighMem(page))
 		return page_address(page);
 
@@ -85,6 +86,7 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type type)
 
 	pagefault_disable();
 
+	BUG_ON(type == KM_USER0 && !in_atomic());
 	idx = type + KM_TYPE_NR*smp_processor_id();
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
 	set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot));

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-03-30 20:27       ` Gilles Chanteperdrix
@ 2008-03-31  4:04         ` Tomas Kalibera
  2008-03-31 20:21           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-03-31  4:04 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core


Crashed on the very same line as before
Tomas

[  189.558776] ------------[ cut here ]------------
[  189.563377] kernel BUG at arch/x86/mm/highmem_32.c:43!
[  189.568491] invalid opcode: 0000 [#1] PREEMPT SMP 
[  189.573285] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 parport_pc lp parport sr_mod cdrom pcspkr iTCO_wdt iTCO_v
endor_support shpchp pci_hotplug ipv6 evdev ext3 jbd mbcache sg sd_mod ata_piix usbhid hid floppy ata_generic ahci ohci1394 l
ibata scsi_mod ieee1394 ehci_hcd tg3 uhci_hcd usbcore fuse
[  189.600440] 
[  189.601924] Pid: 4960, comm: ovmtask Not tainted (2.6.24.3xenomaip1 #1)
[  189.608508] EIP: 0060:[<c011a908>] EFLAGS: 00010286 CPU: 0
[  189.613971] EIP is at kmap_atomic_prot+0xb8/0xc0
[  189.618566] EAX: d91a8163 EBX: c2b23500 ECX: fffff000 EDX: c044fecc
[  189.624804] ESI: 00000007 EDI: 00000163 EBP: 08003875 ESP: df673ea0
[  189.631043]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  189.636416] Process ovmtask (pid: 4960, ti=df672000 task=df4d29e0 task.ti=df672000)<0>
[  189.643865] I-pipe domain Linux
[  189.647257] Stack: fffb2000 00000000 c2b2350c c01a96aa fffb7000 fffb6000 df66d278 dfb7a580 
[  189.655648]        dfb7ae40 df846084 df9b7084 08615000 08400000 08615000 f7c3d740 c2b23520 
[  189.664039]        00000000 00000000 c2b2350c c2be8aac fffb3000 08614fff 00000000 00000000 
[  189.672430] Call Trace:
[  189.675045]  [<c01a96aa>] copy_page_range+0x13a/0x560
[  189.680086]  [<c01224ef>] copy_process+0x8df/0x1250
[  189.684951]  [<c012309c>] do_fork+0x4c/0x200
[  189.689211]  [<c01022d2>] sys_clone+0x32/0x40
[  189.693556]  [<c01043a1>] sysenter_past_esp+0x6e/0x72
[  189.698595]  =======================
[  189.702150] Code: 0c c1 fb 05 29 c1 c1 e3 0c 89 c8 09 fb 89 1a 5b 5e 5f c3 89 e0 25 00 e0 ff ff f7 40 14 ff ff ff ef 0f 85
 69 ff ff ff 0f 0b eb fe <0f> 0b eb fe 8d 74 26 00 8b 0d f4 b1 45 c0 e9 35 ff ff ff 90 8d 
[  189.721467] EIP: [<c011a908>] kmap_atomic_prot+0xb8/0xc0 SS:ESP 0068:df673ea0
[  189.728669] ---[ end trace 7363976c5f0598cc ]---
[  189.733269] note: ovmtask[4960] exited with preempt_count 1



Gilles Chanteperdrix wrote:
> Tomas Kalibera wrote:
>  > Hi Gilles,
>  > 
>  > thanks for looking at it. Your analysis is correct, I don't indeed have 
>  > CONFIG_PREEMPT_RT kernel, but only CONFIG_PREEMPT, sorry for the confusion.
>  > 
>  > I've put the kernel config, sources, and binary on the web, so that you 
>  > can be sure you're really looking on the kernel that is crashing, 
>  > http://www.cs.purdue.edu/homes/tkaliber/crash
>
> After looking at the sources, it appears that kmap_atomic disables
> preemption and kunmap_atomic reenables it. In short, the bug should
> never happen. What could happen is that the preemption count is garbled,
> or that a call to kmap_atomic is not paired with a kunmap_atomic.
>
> To check if the problem comes from the preemption count, could you apply
> the following patch ?
>
>   



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-03-31  4:04         ` Tomas Kalibera
@ 2008-03-31 20:21           ` Gilles Chanteperdrix
  2008-03-31 20:30             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-03-31 20:21 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

[-- Attachment #1: message body and .signature --]
[-- Type: text/plain, Size: 177 bytes --]

Tomas Kalibera wrote:
 > 
 > Crashed on the very same line as before
 > Tomas

Ok. Let us look for unbalanced kmap_atomics then. Try this patch instead.

-- 


					    Gilles.

[-- Attachment #2: ipipe-kmap_atomic-bug.2.diff --]
[-- Type: text/plain, Size: 3847 bytes --]

diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 1c3bf95..a78494e 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -1,6 +1,11 @@
 #include <linux/highmem.h>
 #include <linux/module.h>
 
+static struct {
+	const char *file;
+	unsigned line;
+} last_km_user0 [NR_CPUS];
+
 void *kmap(struct page *page)
 {
 	might_sleep();
@@ -26,7 +31,8 @@ void kunmap(struct page *page)
  * However when holding an atomic kmap is is not legal to sleep, so atomic
  * kmaps are appropriate for short, tight code paths only.
  */
-void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot)
+void *_kmap_atomic_prot(struct page *page, enum km_type type,
+			pgprot_t prot, const char *file, unsigned line)
 {
 	enum fixed_addresses idx;
 	unsigned long vaddr;
@@ -39,7 +45,17 @@ void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot)
 
 	idx = type + KM_TYPE_NR*smp_processor_id();
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-	BUG_ON(!pte_none(*(kmap_pte-idx)));
+	if (!pte_none(*(kmap_pte-idx))) {
+		if (type == KM_USER0)
+			printk("KM_USER0 already mapped at %s:%d\n",
+			       last_km_user0[smp_processor_id()].file,
+			       last_km_user0[smp_processor_id()].line);
+		BUG();
+	} else if (type == KM_USER0) {
+		last_km_user0[smp_processor_id()].file = file;
+		last_km_user0[smp_processor_id()].line = line;
+	}
+
 	set_pte(kmap_pte-idx, mk_pte(page, prot));
 	arch_flush_lazy_mmu_mode();
 
@@ -70,6 +86,10 @@ void kunmap_atomic(void *kvaddr, enum km_type type)
 		BUG_ON(vaddr >= (unsigned long)high_memory);
 #endif
 	}
+	if (type == KM_USER0) {
+		last_km_user0[smp_processor_id()].file = NULL;
+		last_km_user0[smp_processor_id()].line = 0;
+	}
 
 	arch_flush_lazy_mmu_mode();
 	pagefault_enable();
@@ -78,7 +98,8 @@ void kunmap_atomic(void *kvaddr, enum km_type type)
 /* This is the same as kmap_atomic() but can map memory that doesn't
  * have a struct page associated with it.
  */
-void *kmap_atomic_pfn(unsigned long pfn, enum km_type type)
+void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type,
+		       const char *file, unsigned line)
 {
 	enum fixed_addresses idx;
 	unsigned long vaddr;
@@ -87,6 +108,16 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type type)
 
 	idx = type + KM_TYPE_NR*smp_processor_id();
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+	if (!pte_none(*(kmap_pte-idx))) {
+		if (type == KM_USER0)
+			printk("KM_USER0 already mapped at %s:%d\n",
+			       last_km_user0[smp_processor_id()].file,
+			       last_km_user0[smp_processor_id()].line);
+		BUG();
+	} else if (type == KM_USER0) {
+		last_km_user0[smp_processor_id()].file = file;
+		last_km_user0[smp_processor_id()].line = line;
+	}
 	set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot));
 	arch_flush_lazy_mmu_mode();
 
diff --git a/include/asm-x86/highmem.h b/include/asm-x86/highmem.h
index 13cdcd6..57b89f7 100644
--- a/include/asm-x86/highmem.h
+++ b/include/asm-x86/highmem.h
@@ -68,10 +68,16 @@ extern void FASTCALL(kunmap_high(struct page *page));
 
 void *kmap(struct page *page);
 void kunmap(struct page *page);
-void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot);
+void *_kmap_atomic_prot(struct page *page, enum km_type type,
+			pgprot_t prot, const char *file, unsigned line);
+#define kmap_atomic_prot(page, type, prot) \
+	_kmap_atomic_prot(page, type, prot, __FILE__, __LINE__)
 void *kmap_atomic(struct page *page, enum km_type type);
 void kunmap_atomic(void *kvaddr, enum km_type type);
-void *kmap_atomic_pfn(unsigned long pfn, enum km_type type);
+void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type,
+		       const char *file, unsigned line);
+#define kmap_atomic_pfn(pfn, type) \
+	_kmap_atomic_pfn(pfn, type, __FILE__, __LINE__)
 struct page *kmap_atomic_to_page(void *ptr);
 
 #ifndef CONFIG_PARAVIRT

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-03-31 20:21           ` Gilles Chanteperdrix
@ 2008-03-31 20:30             ` Gilles Chanteperdrix
  2008-04-01  0:00               ` Tomas Kalibera
  0 siblings, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-03-31 20:30 UTC (permalink / raw)
  To: Tomas Kalibera, xenomai-core

[-- Attachment #1: message body and .signature --]
[-- Type: text/plain, Size: 319 bytes --]

Gilles Chanteperdrix wrote:
 > Tomas Kalibera wrote:
 >  > 
 >  > Crashed on the very same line as before
 >  > Tomas
 > 
 > Ok. Let us look for unbalanced kmap_atomics then. Try this patch instead.

Just when I hit the reply button, I realize that I forgot something. So,
try this one instead.

-- 


					    Gilles.

[-- Attachment #2: ipipe-kmap_atomic-bug.3.diff --]
[-- Type: text/plain, Size: 4385 bytes --]

diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 1c3bf95..97a5242 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -1,6 +1,11 @@
 #include <linux/highmem.h>
 #include <linux/module.h>
 
+static struct {
+	const char *file;
+	unsigned line;
+} last_km_user0 [NR_CPUS];
+
 void *kmap(struct page *page)
 {
 	might_sleep();
@@ -26,7 +31,8 @@ void kunmap(struct page *page)
  * However when holding an atomic kmap is is not legal to sleep, so atomic
  * kmaps are appropriate for short, tight code paths only.
  */
-void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot)
+void *_kmap_atomic_prot(struct page *page, enum km_type type,
+			pgprot_t prot, const char *file, unsigned line)
 {
 	enum fixed_addresses idx;
 	unsigned long vaddr;
@@ -39,16 +45,27 @@ void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot)
 
 	idx = type + KM_TYPE_NR*smp_processor_id();
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-	BUG_ON(!pte_none(*(kmap_pte-idx)));
+	if (!pte_none(*(kmap_pte-idx))) {
+		if (type == KM_USER0)
+			printk("KM_USER0 already mapped at %s:%d\n",
+			       last_km_user0[smp_processor_id()].file,
+			       last_km_user0[smp_processor_id()].line);
+		BUG();
+	} else if (type == KM_USER0) {
+		last_km_user0[smp_processor_id()].file = file;
+		last_km_user0[smp_processor_id()].line = line;
+	}
+
 	set_pte(kmap_pte-idx, mk_pte(page, prot));
 	arch_flush_lazy_mmu_mode();
 
 	return (void *)vaddr;
 }
 
-void *kmap_atomic(struct page *page, enum km_type type)
+void *_kmap_atomic(struct page *page, enum km_type type,
+		   const char *file, unsigned line)
 {
-	return kmap_atomic_prot(page, type, kmap_prot);
+	return _kmap_atomic_prot(page, type, kmap_prot, file, line);
 }
 
 void kunmap_atomic(void *kvaddr, enum km_type type)
@@ -70,6 +87,10 @@ void kunmap_atomic(void *kvaddr, enum km_type type)
 		BUG_ON(vaddr >= (unsigned long)high_memory);
 #endif
 	}
+	if (type == KM_USER0) {
+		last_km_user0[smp_processor_id()].file = NULL;
+		last_km_user0[smp_processor_id()].line = 0;
+	}
 
 	arch_flush_lazy_mmu_mode();
 	pagefault_enable();
@@ -78,7 +99,8 @@ void kunmap_atomic(void *kvaddr, enum km_type type)
 /* This is the same as kmap_atomic() but can map memory that doesn't
  * have a struct page associated with it.
  */
-void *kmap_atomic_pfn(unsigned long pfn, enum km_type type)
+void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type,
+		       const char *file, unsigned line)
 {
 	enum fixed_addresses idx;
 	unsigned long vaddr;
@@ -87,6 +109,16 @@ void *kmap_atomic_pfn(unsigned long pfn, enum km_type type)
 
 	idx = type + KM_TYPE_NR*smp_processor_id();
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+	if (!pte_none(*(kmap_pte-idx))) {
+		if (type == KM_USER0)
+			printk("KM_USER0 already mapped at %s:%d\n",
+			       last_km_user0[smp_processor_id()].file,
+			       last_km_user0[smp_processor_id()].line);
+		BUG();
+	} else if (type == KM_USER0) {
+		last_km_user0[smp_processor_id()].file = file;
+		last_km_user0[smp_processor_id()].line = line;
+	}
 	set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot));
 	arch_flush_lazy_mmu_mode();
 
diff --git a/include/asm-x86/highmem.h b/include/asm-x86/highmem.h
index 13cdcd6..db09f27 100644
--- a/include/asm-x86/highmem.h
+++ b/include/asm-x86/highmem.h
@@ -68,10 +68,19 @@ extern void FASTCALL(kunmap_high(struct page *page));
 
 void *kmap(struct page *page);
 void kunmap(struct page *page);
-void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot);
-void *kmap_atomic(struct page *page, enum km_type type);
+void *_kmap_atomic_prot(struct page *page, enum km_type type,
+			pgprot_t prot, const char *file, unsigned line);
+#define kmap_atomic_prot(page, type, prot) \
+	_kmap_atomic_prot(page, type, prot, __FILE__, __LINE__)
+void *_kmap_atomic(struct page *page, enum km_type type,
+		   const char *file, unsigned line);
+#define kmap_atomic(page, type) \
+	_kmap_atomic(page, type, __FILE__, __LINE__)
 void kunmap_atomic(void *kvaddr, enum km_type type);
-void *kmap_atomic_pfn(unsigned long pfn, enum km_type type);
+void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type,
+		       const char *file, unsigned line);
+#define kmap_atomic_pfn(pfn, type) \
+	_kmap_atomic_pfn(pfn, type, __FILE__, __LINE__)
 struct page *kmap_atomic_to_page(void *ptr);
 
 #ifndef CONFIG_PARAVIRT

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-03-31 20:30             ` Gilles Chanteperdrix
@ 2008-04-01  0:00               ` Tomas Kalibera
  2008-04-01  5:52                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-01  0:00 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 614 bytes --]


I added a missing underscore and re-tried, and none of the debug 
messages was printed. I added another one to make sure that there is not 
a problem with getting printk messages to the serial console. The 
resulting highmem_32.c and the output is attached.

T


Gilles Chanteperdrix wrote:
> Gilles Chanteperdrix wrote:
>  > Tomas Kalibera wrote:
>  >  > 
>  >  > Crashed on the very same line as before
>  >  > Tomas
>  > 
>  > Ok. Let us look for unbalanced kmap_atomics then. Try this patch instead.
>
> Just when I hit the reply button, I realize that I forgot something. So,
> try this one instead.
>
>   


[-- Attachment #2: highmem_32.c --]
[-- Type: text/x-csrc, Size: 3867 bytes --]

#include <linux/highmem.h>
#include <linux/module.h>

static struct {
	const char *file;
	unsigned line;
} last_km_user0 [NR_CPUS];

void *kmap(struct page *page)
{
	might_sleep();
	if (!PageHighMem(page))
		return page_address(page);
	return kmap_high(page);
}

void kunmap(struct page *page)
{
	if (in_interrupt())
		BUG();
	if (!PageHighMem(page))
		return;
	kunmap_high(page);
}

/*
 * kmap_atomic/kunmap_atomic is significantly faster than kmap/kunmap because
 * no global lock is needed and because the kmap code must perform a global TLB
 * invalidation when the kmap pool wraps.
 *
 * However when holding an atomic kmap is is not legal to sleep, so atomic
 * kmaps are appropriate for short, tight code paths only.
 */
void *_kmap_atomic_prot(struct page *page, enum km_type type,
			pgprot_t prot, const char *file, unsigned line)
{
	enum fixed_addresses idx;
	unsigned long vaddr;

	/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
	pagefault_disable();

	if (!PageHighMem(page))
		return page_address(page);

	idx = type + KM_TYPE_NR*smp_processor_id();
	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
	if (!pte_none(*(kmap_pte-idx))) {
		if (type == KM_USER0) {
			printk("KM_USER0 already mapped at %s:%d\n",
			       last_km_user0[smp_processor_id()].file,
			       last_km_user0[smp_processor_id()].line);
		} else {
			printk("type is NOT KM_USER0\n");
		}
		BUG();
	} else if (type == KM_USER0) {
		last_km_user0[smp_processor_id()].file = file;
		last_km_user0[smp_processor_id()].line = line;
	}

	set_pte(kmap_pte-idx, mk_pte(page, prot));
	arch_flush_lazy_mmu_mode();

	return (void *)vaddr;
}

void *_kmap_atomic(struct page *page, enum km_type type,
		   const char *file, unsigned line)
{
	return _kmap_atomic_prot(page, type, kmap_prot, file, line);
}

void kunmap_atomic(void *kvaddr, enum km_type type)
{
	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
	enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();

	/*
	 * Force other mappings to Oops if they'll try to access this pte
	 * without first remap it.  Keeping stale mappings around is a bad idea
	 * also, in case the page changes cacheability attributes or becomes
	 * a protected page in a hypervisor.
	 */
	if (vaddr == __fix_to_virt(FIX_KMAP_BEGIN+idx))
		kpte_clear_flush(kmap_pte-idx, vaddr);
	else {
#ifdef CONFIG_DEBUG_HIGHMEM
		BUG_ON(vaddr < PAGE_OFFSET);
		BUG_ON(vaddr >= (unsigned long)high_memory);
#endif
	}
	if (type == KM_USER0) {
		last_km_user0[smp_processor_id()].file = NULL;
		last_km_user0[smp_processor_id()].line = 0;
	}

	arch_flush_lazy_mmu_mode();
	pagefault_enable();
}

/* This is the same as kmap_atomic() but can map memory that doesn't
 * have a struct page associated with it.
 */
void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type,
		       const char *file, unsigned line)
{
	enum fixed_addresses idx;
	unsigned long vaddr;

	pagefault_disable();

	idx = type + KM_TYPE_NR*smp_processor_id();
	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
	if (!pte_none(*(kmap_pte-idx))) {
		if (type == KM_USER0)
			printk("KM_USER0 already mapped at %s:%d\n",
			       last_km_user0[smp_processor_id()].file,
			       last_km_user0[smp_processor_id()].line);
		BUG();
	} else if (type == KM_USER0) {
		last_km_user0[smp_processor_id()].file = file;
		last_km_user0[smp_processor_id()].line = line;
	}
	set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot));
	arch_flush_lazy_mmu_mode();

	return (void*) vaddr;
}

struct page *kmap_atomic_to_page(void *ptr)
{
	unsigned long idx, vaddr = (unsigned long)ptr;
	pte_t *pte;

	if (vaddr < FIXADDR_START)
		return virt_to_page(ptr);

	idx = virt_to_fix(vaddr);
	pte = kmap_pte - (idx - FIX_KMAP_BEGIN);
	return pte_page(*pte);
}

EXPORT_SYMBOL(kmap);
EXPORT_SYMBOL(kunmap);
EXPORT_SYMBOL(_kmap_atomic);
EXPORT_SYMBOL(kunmap_atomic);
EXPORT_SYMBOL(kmap_atomic_to_page);

[-- Attachment #3: all --]
[-- Type: text/plain, Size: 17463 bytes --]

[  255.285392] ------------[ cut here ]------------
[  255.289992] kernel BUG at arch/x86/mm/highmem_32.c:56!
[  255.295107] invalid opcode: 0000 [#1] PREEMPT SMP 
[  255.299901] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 ipv6 parport_pc lp parport pcspkr iTCO_wdt iTCO_vendor_se
[  255.327057] 
[  255.328538] Pid: 4986, comm: ovmtask Not tainted (2.6.24.3xenomaip3 #2)
[  255.335123] EIP: 0060:[<c011a966>] EFLAGS: 00010286 CPU: 0
[  255.340588] EIP is at _kmap_atomic_prot+0xa6/0x120
[  255.345356] EAX: 00000027 EBX: c2b27520 ECX: 00000000 EDX: 02bcf000
[  255.351594] ESI: fffff000 EDI: 00000007 EBP: 00000007 ESP: decade70
[  255.357832]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  255.363205] Process ovmtask (pid: 4986, ti=decac000 task=df820d90 task.ti=decac000)<0>
[  255.370653] I-pipe domain Linux
[  255.374044] Stack: c039a438 00000010 00000000 0000022b c03a1f9e 00000163 0000022b 00000000 
[  255.382435]        c2b2752c 08003875 c011a9fa c03a1f9e 0000022b fffb2000 c01a9976 0000022b 
[  255.390827]        fffb7000 fffb6000 df472ba8 dee09ac0 dee09740 df3b4084 df3b1084 08615000 
[  255.399221] Call Trace:
[  255.401835]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  255.406440]  [<c01a9976>] copy_page_range+0x146/0x590
[  255.411484]  [<c01225ff>] copy_process+0x8df/0x1250
[  255.416353]  [<c01231ac>] do_fork+0x4c/0x200
[  255.420613]  [<c01022d2>] sys_clone+0x32/0x40
[  255.424960]  [<c01043a1>] sysenter_past_esp+0x6e/0x72
[  255.429998]  =======================
[  255.433554] Code: 04 82 8b 35 f8 4a 3d c0 8d 2c 38 8d 04 ad 00 00 00 00 29 c1 8b 01 85 c0 74 1b 83 ff 03 74 3f c7 04 24 3 
[  255.452899] EIP: [<c011a966>] _kmap_atomic_prot+0xa6/0x120 SS:ESP 0068:decade70
[  255.460280] ---[ end trace 6f16dfbea90303ec ]---
[  255.464881] note: ovmtask[4986] exited with preempt_count 1
[  255.470435] BUG: scheduling while atomic: ovmtask/4986/0x00000002
[  255.476505] Pid: 4986, comm: ovmtask Tainted: G      D 2.6.24.3xenomaip3 #2
[  255.483440]  [<c0313fed>] schedule+0x53d/0x770
[  255.487882]  [<c0140e04>] clockevents_program_event+0xa4/0x120
[  255.493710]  [<c0315229>] rwsem_down_failed_common+0x69/0x190
[  255.499448]  [<c031539a>] rwsem_down_read_failed+0x1a/0x24
[  255.504926]  [<c0315407>] call_rwsem_down_read_failed+0x7/0xc
[  255.510663]  [<c0314a5a>] down_read+0xa/0x10
[  255.514929]  [<c0143a69>] futex_wake+0x19/0xd0
[  255.519369]  [<c011459e>] smp_apic_timer_interrupt+0x4e/0x80
[  255.525020]  [<c027b0bc>] serial_in+0x2c/0xa0
[  255.529374]  [<c0144d98>] do_futex+0x5d8/0xb30
[  255.533816]  [<c015667c>] __xirq_end+0x0/0x50
[  255.538172]  [<c01569da>] __ipipe_unstall_root+0x4a/0x50
[  255.543475]  [<c012493a>] vprintk+0x2ca/0x3b0
[  255.547830]  [<c0145388>] sys_futex+0x98/0x110
[  255.552270]  [<c0121bc7>] mm_release+0x87/0xa0
[  255.556709]  [<c0126073>] exit_mm+0x13/0xf0
[  255.560889]  [<c0127b3f>] do_exit+0x4ef/0x820
[  255.565243]  [<c0105b67>] die+0x267/0x270
[  255.569252]  [<c0105e81>] do_invalid_op+0x81/0x90
[  255.573952]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  255.579084]  [<c01388db>] autoremove_wake_function+0x1b/0x50
[  255.584749]  [<c011b19b>] __wake_up_common+0x4b/0x80
[  255.589708]  [<c011c3be>] __wake_up+0x3e/0x60
[  255.594062]  [<c0315cc2>] _spin_unlock_irqrestore+0x12/0x40
[  255.599625]  [<c01242bb>] wake_up_klogd+0x3b/0x40
[  255.604323]  [<c012493a>] vprintk+0x2ca/0x3b0
[  255.608676]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  255.614417]  [<c0315ecb>] error_code+0x77/0x84
[  255.618860]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  255.623993]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  255.628606]  [<c01a9976>] copy_page_range+0x146/0x590
[  255.633657]  [<c01225ff>] copy_process+0x8df/0x1250
[  255.638532]  [<c01231ac>] do_fork+0x4c/0x200
[  255.642801]  [<c01022d2>] sys_clone+0x32/0x40
[  255.647154]  [<c01043a1>] sysenter_past_esp+0x6e/0x72
[  255.652200]  =======================
[  255.656668] type is NOT KM_USER0
[  255.659886] ------------[ cut here ]------------
[  255.664482] kernel BUG at arch/x86/mm/highmem_32.c:56!
[  255.669596] invalid opcode: 0000 [#2] PREEMPT SMP 
[  255.674389] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 ipv6 parport_pc lp parport pcspkr iTCO_wdt iTCO_vendor_se
[  255.701549] 
[  255.703029] Pid: 4306, comm: syslogd Tainted: G      D (2.6.24.3xenomaip3 #2)
[  255.710132] EIP: 0060:[<c011a966>] EFLAGS: 00010296 CPU: 0
[  255.715593] EIP is at _kmap_atomic_prot+0xa6/0x120
[  255.720360] EAX: 00000027 EBX: c2b28240 ECX: 02bcf000 EDX: 00000000
[  255.726597] ESI: fffff000 EDI: 00000007 EBP: 00000007 ESP: dedf5ebc
[  255.732835]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  255.738208] Process syslogd (pid: 4306, ti=dedf4000 task=decc0c10 task.ti=dedf4000)<0>
[  255.745656] I-pipe domain Linux
[  255.749046] Stack: c039a438 decc0c10 c01388c0 00000a21 c03a1f9e 00000163 00000a21 df806128 
[  255.757438]        c044a720 f7fd73c0 c011a9fa c03a1f9e 00000a21 00000000 c01a8d71 00000a21 
[  255.765829]        00000000 00000002 f7a5c7dc 00000000 b7f48390 df806128 f7fd73c0 df8a8b7c 
[  255.774221] Call Trace:
[  255.776835]  [<c01388c0>] autoremove_wake_function+0x0/0x50
[  255.782391]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  255.786995]  [<c01a8d71>] handle_mm_fault+0x91/0x690
[  255.791946]  [<c01bac7a>] do_readv_writev+0x13a/0x1b0
[  255.796983]  [<c0317b36>] do_page_fault+0x336/0x750
[  255.801849]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  255.807578]  [<c01bad2c>] vfs_writev+0x3c/0x50
[  255.812010]  [<c0315ecb>] error_code+0x77/0x84
[  255.816443]  =======================
[  255.819999] Code: 04 82 8b 35 f8 4a 3d c0 8d 2c 38 8d 04 ad 00 00 00 00 29 c1 8b 01 85 c0 74 1b 83 ff 03 74 3f c7 04 24 3 
[  255.839315] EIP: [<c011a966>] _kmap_atomic_prot+0xa6/0x120 SS:ESP 0068:dedf5ebc
[  255.846634] ---[ end trace 6f16dfbea90303ec ]---
[  255.851233] note: syslogd[4306] exited with preempt_count 1
[  255.856790] type is NOT KM_USER0
[  255.860006] ------------[ cut here ]------------
[  255.864602] kernel BUG at arch/x86/mm/highmem_32.c:56!
[  255.869717] invalid opcode: 0000 [#3] PREEMPT SMP 
[  255.874510] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 ipv6 parport_pc lp parport pcspkr iTCO_wdt iTCO_vendor_se
[  255.901662] 
[  255.903143] Pid: 4306, comm: syslogd Tainted: G      D (2.6.24.3xenomaip3 #2)
[  255.910245] EIP: 0060:[<c011a966>] EFLAGS: 00010282 CPU: 0
[  255.915706] EIP is at _kmap_atomic_prot+0xa6/0x120
[  255.920472] EAX: 00000027 EBX: c2b281a0 ECX: 02bcf000 EDX: 00000000
[  255.926709] ESI: fffff000 EDI: 00000007 EBP: 00000007 ESP: dedf5c80
[  255.932947]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  255.938319] Process syslogd (pid: 4306, ti=dedf4000 task=decc0c10 task.ti=dedf4000)<0>
[  255.945768] I-pipe domain Linux
[  255.949158] Stack: c039a438 c044a5e0 c0449280 000002bc c03a1f9e 00000163 000002bc df9c1128 
[  255.957547]        08048000 00000000 c011a9fa c03a1f9e 000002bc 00000000 c01a7611 000002bc 
[  255.965935]        0804efff 00000000 df9c1128 dedf5d28 00000000 00000001 0804f000 df8a8080 
[  255.974323] Call Trace:
[  255.976937]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  255.981542]  [<c01a7611>] unmap_vmas+0x181/0x5a0
[  255.986148]  [<c01aae6d>] exit_mmap+0x7d/0x120
[  255.990580]  [<c0121c40>] mmput+0x30/0xe0
[  255.994578]  [<c012779c>] do_exit+0x14c/0x820
[  255.998924]  [<c0105b67>] die+0x267/0x270
[  256.002924]  [<c0105e81>] do_invalid_op+0x81/0x90
[  256.007615]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  256.012738]  [<c015667c>] __xirq_end+0x0/0x50
[  256.017083]  [<c01569da>] __ipipe_unstall_root+0x4a/0x50
[  256.022377]  [<c012493a>] vprintk+0x2ca/0x3b0
[  256.026721]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  256.032451]  [<c0315ecb>] error_code+0x77/0x84
[  256.036882]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  256.042005]  [<c01388c0>] autoremove_wake_function+0x0/0x50
[  256.047562]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  256.052165]  [<c01a8d71>] handle_mm_fault+0x91/0x690
[  256.057115]  [<c01bac7a>] do_readv_writev+0x13a/0x1b0
[  256.062153]  [<c0317b36>] do_page_fault+0x336/0x750
[  256.067016]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  256.072744]  [<c01bad2c>] vfs_writev+0x3c/0x50
[  256.077175]  [<c0315ecb>] error_code+0x77/0x84
[  256.081607]  =======================
[  256.085164] Code: 04 82 8b 35 f8 4a 3d c0 8d 2c 38 8d 04 ad 00 00 00 00 29 c1 8b 01 85 c0 74 1b 83 ff 03 74 3f c7 04 24 3 
[  256.104476] EIP: [<c011a966>] _kmap_atomic_prot+0xa6/0x120 SS:ESP 0068:dedf5c80
[  256.111791] ---[ end trace 6f16dfbea90303ec ]---
[  256.116389] Fixing recursive fault but reboot is needed!
[  256.121683] BUG: scheduling while atomic: syslogd/4306/0x00000004
[  256.127753] Pid: 4306, comm: syslogd Tainted: G      D 2.6.24.3xenomaip3 #2
[  256.134688]  [<c0313fed>] schedule+0x53d/0x770
[  256.139130]  [<c0235908>] cfq_free_io_context+0xb8/0xc0
[  256.144351]  [<c0127e39>] do_exit+0x7e9/0x820
[  256.148705]  [<c0124a8b>] printk+0x6b/0xf0
[  256.152800]  [<c0105b67>] die+0x267/0x270
[  256.156807]  [<c0105e81>] do_invalid_op+0x81/0x90
[  256.161506]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  256.166642]  [<c015667c>] __xirq_end+0x0/0x50
[  256.170997]  [<c01569da>] __ipipe_unstall_root+0x4a/0x50
[  256.176301]  [<c012493a>] vprintk+0x2ca/0x3b0
[  256.180654]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  256.186394]  [<c0315ecb>] error_code+0x77/0x84
[  256.190836]  [<c01100d8>] generic_set_all+0x198/0x2e0
[  256.195882]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  256.201016]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  256.205628]  [<c01a7611>] unmap_vmas+0x181/0x5a0
[  256.210243]  [<c01aae6d>] exit_mmap+0x7d/0x120
[  256.214684]  [<c0121c40>] mmput+0x30/0xe0
[  256.218691]  [<c012779c>] do_exit+0x14c/0x820
[  256.223046]  [<c0105b67>] die+0x267/0x270
[  256.227053]  [<c0105e81>] do_invalid_op+0x81/0x90
[  256.231754]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  256.236887]  [<c015667c>] __xirq_end+0x0/0x50
[  256.241242]  [<c01569da>] __ipipe_unstall_root+0x4a/0x50
[  256.246548]  [<c012493a>] vprintk+0x2ca/0x3b0
[  256.250902]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  256.256640]  [<c0315ecb>] error_code+0x77/0x84
[  256.261080]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  256.266212]  [<c01388c0>] autoremove_wake_function+0x0/0x50
[  256.271778]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  256.276391]  [<c01a8d71>] handle_mm_fault+0x91/0x690
[  256.281351]  [<c01bac7a>] do_readv_writev+0x13a/0x1b0
[  256.286397]  [<c0317b36>] do_page_fault+0x336/0x750
[  256.291273]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  256.297011]  [<c01bad2c>] vfs_writev+0x3c/0x50
[  256.301450]  [<c0315ecb>] error_code+0x77/0x84
[  256.305892]  =======================
[  308.807005] type is NOT KM_USER0
[  308.810230] ------------[ cut here ]------------
[  308.814826] kernel BUG at arch/x86/mm/highmem_32.c:56!
[  308.819938] invalid opcode: 0000 [#4] PREEMPT SMP 
[  308.824733] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 ipv6 parport_pc lp parport pcspkr iTCO_wdt iTCO_vendor_se
[  308.851890] 
[  308.853371] Pid: 4944, comm: gdmgreeter Tainted: G      D (2.6.24.3xenomaip3 #2)
[  308.860734] EIP: 0060:[<c011a966>] EFLAGS: 00210296 CPU: 0
[  308.866196] EIP is at _kmap_atomic_prot+0xa6/0x120
[  308.870963] EAX: 00000027 EBX: c2b21ca0 ECX: 02bcf000 EDX: 00000000
[  308.877200] ESI: fffff000 EDI: 00000007 EBP: 00000007 ESP: df69debc
[  308.883439]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  308.888813] Process gdmgreeter (pid: 4944, ti=df69c000 task=decc06a0 task.ti=df69c000)<0>
[  308.896521] I-pipe domain Linux
[  308.899913] Stack: c039a438 f7c1d000 c01c4888 00000a21 c03a1f9e 00000163 00000a21 dfb9f56c 
[  308.908304]        c044a720 df9b6200 c011a9fa c03a1f9e 00000a21 00000001 c01a8d71 00000a21 
[  308.916694]        0100809b 000081a4 00000001 00000000 08354e98 dfb9f56c df9b6200 df9f7080 
[  308.925085] Call Trace:
[  308.927697]  [<c01c4888>] do_path_lookup+0x78/0x1c0
[  308.932562]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  308.937167]  [<c01a8d71>] handle_mm_fault+0x91/0x690
[  308.942118]  [<c0317b36>] do_page_fault+0x336/0x750
[  308.946984]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  308.952714]  [<c0315ecb>] error_code+0x77/0x84
[  308.957147]  =======================
[  308.960703] Code: 04 82 8b 35 f8 4a 3d c0 8d 2c 38 8d 04 ad 00 00 00 00 29 c1 8b 01 85 c0 74 1b 83 ff 03 74 3f c7 04 24 3 
[  308.980015] EIP: [<c011a966>] _kmap_atomic_prot+0xa6/0x120 SS:ESP 0068:df69debc
[  308.987334] ---[ end trace 6f16dfbea90303ec ]---
[  308.991933] note: gdmgreeter[4944] exited with preempt_count 1
[  308.997758] type is NOT KM_USER0
[  309.000975] ------------[ cut here ]------------
[  309.005570] kernel BUG at arch/x86/mm/highmem_32.c:56!
[  309.010683] invalid opcode: 0000 [#5] PREEMPT SMP 
[  309.015476] Modules linked in: rfcomm l2cap bluetooth ppdev sbp2 ipv6 parport_pc lp parport pcspkr iTCO_wdt iTCO_vendor_se
[  309.042627] 
[  309.044107] Pid: 4944, comm: gdmgreeter Tainted: G      D (2.6.24.3xenomaip3 #2)
[  309.051469] EIP: 0060:[<c011a966>] EFLAGS: 00210282 CPU: 0
[  309.056929] EIP is at _kmap_atomic_prot+0xa6/0x120
[  309.061696] EAX: 00000027 EBX: c2b21ca0 ECX: 02bcf000 EDX: 00000000
[  309.067933] ESI: fffff000 EDI: 00000007 EBP: 00000007 ESP: df69dc80
[  309.074171]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  309.079544] Process gdmgreeter (pid: 4944, ti=df69c000 task=decc06a0 task.ti=df69c000)<0>
[  309.087252] I-pipe domain Linux
[  309.090644] Stack: c039a438 c044a5e0 c0449280 000002bc c03a1f9e 00000163 000002bc dfb9f668 
[  309.099031]        08048000 00000000 c011a9fa c03a1f9e 000002bc 00000000 c01a7611 000002bc 
[  309.107418]        08075fff 00000000 dfb9f668 df69dd28 00000000 00000001 08076000 df9f7080 
[  309.115807] Call Trace:
[  309.118421]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  309.123026]  [<c01a7611>] unmap_vmas+0x181/0x5a0
[  309.127631]  [<c01aae6d>] exit_mmap+0x7d/0x120
[  309.132062]  [<c0121c40>] mmput+0x30/0xe0
[  309.136061]  [<c012779c>] do_exit+0x14c/0x820
[  309.140406]  [<c0105b67>] die+0x267/0x270
[  309.144405]  [<c0105e81>] do_invalid_op+0x81/0x90
[  309.149095]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  309.154217]  [<c013007b>] kill_pgrp+0xb/0x20
[  309.158474]  [<c015667c>] __xirq_end+0x0/0x50
[  309.162817]  [<c0110060>] generic_set_all+0x120/0x2e0
[  309.167856]  [<c01569da>] __ipipe_unstall_root+0x4a/0x50
[  309.173150]  [<c012493a>] vprintk+0x2ca/0x3b0
[  309.177496]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  309.183227]  [<c0315ecb>] error_code+0x77/0x84
[  309.187659]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  309.192782]  [<c01c4888>] do_path_lookup+0x78/0x1c0
[  309.197646]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  309.202250]  [<c01a8d71>] handle_mm_fault+0x91/0x690
[  309.207201]  [<c0317b36>] do_page_fault+0x336/0x750
[  309.212067]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  309.217798]  [<c0315ecb>] error_code+0x77/0x84
[  309.222230]  =======================
[  309.225786] Code: 04 82 8b 35 f8 4a 3d c0 8d 2c 38 8d 04 ad 00 00 00 00 29 c1 8b 01 85 c0 74 1b 83 ff 03 74 3f c7 04 24 3 
[  309.245098] EIP: [<c011a966>] _kmap_atomic_prot+0xa6/0x120 SS:ESP 0068:df69dc80
[  309.252414] ---[ end trace 6f16dfbea90303ec ]---
[  309.257013] Fixing recursive fault but reboot is needed!
[  309.262306] BUG: scheduling while atomic: gdmgreeter/4944/0x00000004
[  309.268635] Pid: 4944, comm: gdmgreeter Tainted: G      D 2.6.24.3xenomaip3 #2
[  309.275830]  [<c0313fed>] schedule+0x53d/0x770
[  309.280272]  [<c0235908>] cfq_free_io_context+0xb8/0xc0
[  309.285492]  [<c0127e39>] do_exit+0x7e9/0x820
[  309.289847]  [<c0124a8b>] printk+0x6b/0xf0
[  309.293943]  [<c0105b67>] die+0x267/0x270
[  309.297951]  [<c0105e81>] do_invalid_op+0x81/0x90
[  309.302651]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  309.307784]  [<c015667c>] __xirq_end+0x0/0x50
[  309.312138]  [<c01569da>] __ipipe_unstall_root+0x4a/0x50
[  309.317457]  [<c012493a>] vprintk+0x2ca/0x3b0
[  309.321812]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  309.327550]  [<c0315ecb>] error_code+0x77/0x84
[  309.331992]  [<c01100d8>] generic_set_all+0x198/0x2e0
[  309.337037]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  309.342171]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  309.346784]  [<c01a7611>] unmap_vmas+0x181/0x5a0
[  309.351399]  [<c01aae6d>] exit_mmap+0x7d/0x120
[  309.355841]  [<c0121c40>] mmput+0x30/0xe0
[  309.359849]  [<c012779c>] do_exit+0x14c/0x820
[  309.364204]  [<c0105b67>] die+0x267/0x270
[  309.368211]  [<c0105e81>] do_invalid_op+0x81/0x90
[  309.372911]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  309.378044]  [<c013007b>] kill_pgrp+0xb/0x20
[  309.382311]  [<c015667c>] __xirq_end+0x0/0x50
[  309.386664]  [<c0110060>] generic_set_all+0x120/0x2e0
[  309.391712]  [<c01569da>] __ipipe_unstall_root+0x4a/0x50
[  309.397016]  [<c012493a>] vprintk+0x2ca/0x3b0
[  309.401371]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  309.407109]  [<c0315ecb>] error_code+0x77/0x84
[  309.411550]  [<c011a966>] _kmap_atomic_prot+0xa6/0x120
[  309.416682]  [<c01c4888>] do_path_lookup+0x78/0x1c0
[  309.421556]  [<c011a9fa>] _kmap_atomic+0x1a/0x20
[  309.426170]  [<c01a8d71>] handle_mm_fault+0x91/0x690
[  309.431130]  [<c0317b36>] do_page_fault+0x336/0x750
[  309.436003]  [<c011705d>] __ipipe_handle_exception+0xbd/0x220
[  309.441742]  [<c0315ecb>] error_code+0x77/0x84
[  309.446226]  =======================

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-01  0:00               ` Tomas Kalibera
@ 2008-04-01  5:52                 ` Gilles Chanteperdrix
  2008-04-01  7:59                   ` Gilles Chanteperdrix
  2008-04-01 13:54                   ` Tomas Kalibera
  0 siblings, 2 replies; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-01  5:52 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

Tomas Kalibera wrote:
 > 
 > I added a missing underscore and re-tried, and none of the debug 
 > messages was printed. I added another one to make sure that there is not 
 > a problem with getting printk messages to the serial console. The 
 > resulting highmem_32.c and the output is attached.
 > 
 > T

The interesting part of the output is the printk which occurs right
before the first bug, what happens afterwards is of little use. Do you
get any output before the first bug ?

-- 


					    Gilles.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-01  5:52                 ` Gilles Chanteperdrix
@ 2008-04-01  7:59                   ` Gilles Chanteperdrix
  2008-04-01 13:54                   ` Tomas Kalibera
  1 sibling, 0 replies; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-01  7:59 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

On Tue, Apr 1, 2008 at 7:52 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> Tomas Kalibera wrote:
>   >
>   > I added a missing underscore and re-tried, and none of the debug
>   > messages was printed. I added another one to make sure that there is not
>   > a problem with getting printk messages to the serial console. The
>   > resulting highmem_32.c and the output is attached.
>   >
>   > T
>
>  The interesting part of the output is the printk which occurs right
>  before the first bug, what happens afterwards is of little use. Do you
>  get any output before the first bug ?

There are other kmap_atomic calls in copy_pte_range than the
kmap_atomic taking place in cow_user_page, they use KM_PTE0 and
KM_PTE1 as the type value. So, we should track these types as well in
highmem_32.c.

-- 
 Gilles


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-01  5:52                 ` Gilles Chanteperdrix
  2008-04-01  7:59                   ` Gilles Chanteperdrix
@ 2008-04-01 13:54                   ` Tomas Kalibera
  2008-04-01 14:03                     ` Gilles Chanteperdrix
  1 sibling, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-01 13:54 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> Tomas Kalibera wrote:
>  > 
>  > I added a missing underscore and re-tried, and none of the debug 
>  > messages was printed. I added another one to make sure that there is not 
>  > a problem with getting printk messages to the serial console. The 
>  > resulting highmem_32.c and the output is attached.
>  > 
>  > T
>
> The interesting part of the output is the printk which occurs right
> before the first bug, what happens afterwards is of little use. Do you
> get any output before the first bug ?
>   
There is no output before the first bug. This is why I added the other 
printk to make sure there is always one on the path leading to the 
BUG(). After I did that, the newly added message ("type is NOT 
KM_USER0") appeared in the serial console log after the bug. This is 
what made me believe that this is the printk we care about, and why I 
included the longer output.

Tomas



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-01 13:54                   ` Tomas Kalibera
@ 2008-04-01 14:03                     ` Gilles Chanteperdrix
  2008-04-01 15:45                       ` Tomas Kalibera
  0 siblings, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-01 14:03 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]

On Tue, Apr 1, 2008 at 3:54 PM, Tomas Kalibera <kalibera@domain.hid> wrote:
>
> Gilles Chanteperdrix wrote:
>  > Tomas Kalibera wrote:
>  >  >
>  >  > I added a missing underscore and re-tried, and none of the debug
>  >  > messages was printed. I added another one to make sure that there is not
>  >  > a problem with getting printk messages to the serial console. The
>  >  > resulting highmem_32.c and the output is attached.
>  >  >
>  >  > T
>  >
>  > The interesting part of the output is the printk which occurs right
>  > before the first bug, what happens afterwards is of little use. Do you
>  > get any output before the first bug ?
>  >
>  There is no output before the first bug. This is why I added the other
>  printk to make sure there is always one on the path leading to the
>  BUG(). After I did that, the newly added message ("type is NOT
>  KM_USER0") appeared in the serial console log after the bug. This is
>  what made me believe that this is the printk we care about, and why I
>  included the longer output.

Ok. In the mean time, I think I may have found the reason for the
crash. Try this patch, you should get a printk and a stack trace
instead of the bug.

-- 
 Gilles

[-- Attachment #2: ipipe-kmap_atomic-bug.4.diff --]
[-- Type: application/octet-stream, Size: 503 bytes --]

diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index 1c3bf95..f4bae3b 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -69,6 +69,9 @@ void kunmap_atomic(void *kvaddr, enum km_type type)
 		BUG_ON(vaddr < PAGE_OFFSET);
 		BUG_ON(vaddr >= (unsigned long)high_memory);
 #endif
+		printk("Wrong address passed to kunmap_atomic!\n");
+		show_stack(NULL, NULL);
+		kpte_clear_flush(kmap_pte-idx, __fix_to_virt(FIX_KMAP_BEGIN+idx));
 	}
 
 	arch_flush_lazy_mmu_mode();

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-01 14:03                     ` Gilles Chanteperdrix
@ 2008-04-01 15:45                       ` Tomas Kalibera
  2008-04-01 15:58                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-01 15:45 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 3051 bytes --]


The stack trace starts getting printed even before the kernel boots - 
the first occurrence is below. It then repeats printing so frequently 
that the system is unusable (and I could not run the Xenomai task).

I've attached the highmem_32.c I used.

Tomas

...

[   14.232387] I-pipe 2.0-03: pipeline enabled.
[   14.233719] Console: colour VGA+ 80x25
[   14.233723] console [tty0] enabled
[   14.235351] console [ttyS0] enabled
[   14.477149] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[   14.484474] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[   14.725139] Memory: 3588720k/4194304k available (2152k kernel code, 78516k reserved, 904k data, 296k init, 2751016k highme
m)
[   14.736370] virtual kernel memory layout:
[   14.736371]     fixmap  : 0xfff52000 - 0xfffff000   ( 692 kB)
[   14.736371]     pkmap   : 0xff800000 - 0xffc00000   (4096 kB)
[   14.736372]     vmalloc : 0xf8800000 - 0xff7fe000   ( 111 MB)
[   14.736373]     lowmem  : 0xc0000000 - 0xf8000000   ( 896 MB)
[   14.736374]       .init : 0xc0404000 - 0xc044e000   ( 296 kB)
[   14.736375]       .data : 0xc031a3a6 - 0xc03fc664   ( 904 kB)
[   14.736375]       .text : 0xc0100000 - 0xc031a3a6   (2152 kB)
[   14.780692] Checking if this processor honours the WP bit even in supervisor mode... Ok.
[   14.848936] Calibrating delay using timer specific routine.. 7583.33 BogoMIPS (lpj=3791666)
[   14.857374] Security Framework initialized
[   14.861493] SELinux:  Disabled at boot.
[   14.865360] Mount-cache hash table entries: 512
[   14.869987] Wrong address passed to kunmap_atomic!
[   14.874796]        c03ffebc 00000000 00000000 00000003 00000000 c0105fbd c0394b4a 00048000 
[   14.883277]        00000003 c011aa9e c039a858 c16f8240 c16f8220 f7c12000 c019e9bd 0000007a 
[   14.891970]        00000044 00000000 c01b7210 f7c11000 00000001 00000001 00000000 c03dbc80 
[   14.900661] Call Trace:
[   14.903348]  [<c0105fbd>] show_stack+0x2d/0x40
[   14.907856]  [<c011aa9e>] kunmap_atomic+0x4e/0xf0
[   14.912617]  [<c019e9bd>] get_page_from_freelist+0x2ed/0x4d0
[   14.918333]  [<c01b7210>] do_ccupdate_local+0x0/0x40
[   14.923358]  [<c019ec47>] __alloc_pages+0x57/0x360
[   14.928208]  [<c01b7e89>] enable_cpucache+0x29/0xb0
[   14.933141]  [<c01d15c5>] alloc_vfsmnt+0x95/0xd0
[   14.937816]  [<c019efec>] get_zeroed_page+0x3c/0x50
[   14.942750]  [<c01bc8f6>] vfs_kern_mount+0x56/0x120
[   14.947682]  [<c01bc9d2>] kern_mount_data+0x12/0x20
[   14.952616]  [<c041842d>] proc_root_init+0x2d/0xb0
[   14.957464]  [<c0404a6d>] start_kernel+0x2fd/0x3a0
[   14.962311]  [<c0404140>] unknown_bootoption+0x0/0x1f0
[   14.967506]  =======================
[   14.971152] monitor/mwait feature present.
[   14.975271] CPU: Trace cache: 12K uops, L1 D cache: 16K
[   14.980553] CPU: L2 cache: 2048K
[   14.983804] CPU: Physical Processor ID: 0
...


Gilles Chanteperdrix wrote:
> Ok. In the mean time, I think I may have found the reason for the
> crash. Try this patch, you should get a printk and a stack trace
> instead of the bug.
>
>   


[-- Attachment #2: highmem_32.c --]
[-- Type: text/x-csrc, Size: 3962 bytes --]

#include <linux/highmem.h>
#include <linux/module.h>

static struct {
	const char *file;
	unsigned line;
} last_km_user0 [NR_CPUS];

void *kmap(struct page *page)
{
	might_sleep();
	if (!PageHighMem(page))
		return page_address(page);
	return kmap_high(page);
}

void kunmap(struct page *page)
{
	if (in_interrupt())
		BUG();
	if (!PageHighMem(page))
		return;
	kunmap_high(page);
}

/*
 * kmap_atomic/kunmap_atomic is significantly faster than kmap/kunmap because
 * no global lock is needed and because the kmap code must perform a global TLB
 * invalidation when the kmap pool wraps.
 *
 * However when holding an atomic kmap is is not legal to sleep, so atomic
 * kmaps are appropriate for short, tight code paths only.
 */
void *_kmap_atomic_prot(struct page *page, enum km_type type,
			pgprot_t prot, const char *file, unsigned line)
{
	enum fixed_addresses idx;
	unsigned long vaddr;

	/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
	pagefault_disable();

	if (!PageHighMem(page))
		return page_address(page);

	idx = type + KM_TYPE_NR*smp_processor_id();
	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
	if (!pte_none(*(kmap_pte-idx))) {
		if (type == KM_USER0)
			printk("KM_USER0 already mapped at %s:%d\n",
			       last_km_user0[smp_processor_id()].file,
			       last_km_user0[smp_processor_id()].line);
		BUG();
	} else if (type == KM_USER0) {
		last_km_user0[smp_processor_id()].file = file;
		last_km_user0[smp_processor_id()].line = line;
	}

	set_pte(kmap_pte-idx, mk_pte(page, prot));
	arch_flush_lazy_mmu_mode();

	return (void *)vaddr;
}

void *_kmap_atomic(struct page *page, enum km_type type,
		   const char *file, unsigned line)
{
	return _kmap_atomic_prot(page, type, kmap_prot, file, line);
}

void kunmap_atomic(void *kvaddr, enum km_type type)
{
	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
	enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();

	/*
	 * Force other mappings to Oops if they'll try to access this pte
	 * without first remap it.  Keeping stale mappings around is a bad idea
	 * also, in case the page changes cacheability attributes or becomes
	 * a protected page in a hypervisor.
	 */
	if (vaddr == __fix_to_virt(FIX_KMAP_BEGIN+idx))
		kpte_clear_flush(kmap_pte-idx, vaddr);
	else {
#ifdef CONFIG_DEBUG_HIGHMEM
		BUG_ON(vaddr < PAGE_OFFSET);
		BUG_ON(vaddr >= (unsigned long)high_memory);
#endif
		printk("Wrong address passed to kunmap_atomic!\n");
		show_stack(NULL, NULL);
		kpte_clear_flush(kmap_pte-idx, __fix_to_virt(FIX_KMAP_BEGIN+idx));
	}
	if (type == KM_USER0) {
		last_km_user0[smp_processor_id()].file = NULL;
		last_km_user0[smp_processor_id()].line = 0;
	}

	arch_flush_lazy_mmu_mode();
	pagefault_enable();
}

/* This is the same as kmap_atomic() but can map memory that doesn't
 * have a struct page associated with it.
 */
void *_kmap_atomic_pfn(unsigned long pfn, enum km_type type,
		       const char *file, unsigned line)
{
	enum fixed_addresses idx;
	unsigned long vaddr;

	pagefault_disable();

	idx = type + KM_TYPE_NR*smp_processor_id();
	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
	if (!pte_none(*(kmap_pte-idx))) {
		if (type == KM_USER0)
			printk("KM_USER0 already mapped at %s:%d\n",
			       last_km_user0[smp_processor_id()].file,
			       last_km_user0[smp_processor_id()].line);
		BUG();
	} else if (type == KM_USER0) {
		last_km_user0[smp_processor_id()].file = file;
		last_km_user0[smp_processor_id()].line = line;
	}
	set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot));
	arch_flush_lazy_mmu_mode();

	return (void*) vaddr;
}

struct page *kmap_atomic_to_page(void *ptr)
{
	unsigned long idx, vaddr = (unsigned long)ptr;
	pte_t *pte;

	if (vaddr < FIXADDR_START)
		return virt_to_page(ptr);

	idx = virt_to_fix(vaddr);
	pte = kmap_pte - (idx - FIX_KMAP_BEGIN);
	return pte_page(*pte);
}

EXPORT_SYMBOL(kmap);
EXPORT_SYMBOL(kunmap);
EXPORT_SYMBOL(_kmap_atomic);
EXPORT_SYMBOL(kunmap_atomic);
EXPORT_SYMBOL(kmap_atomic_to_page);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-01 15:45                       ` Tomas Kalibera
@ 2008-04-01 15:58                         ` Gilles Chanteperdrix
  2008-04-01 21:23                           ` Tomas Kalibera
  0 siblings, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-01 15:58 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

On Tue, Apr 1, 2008 at 5:45 PM, Tomas Kalibera <kalibera@domain.hid> wrote:
>
>  The stack trace starts getting printed even before the kernel boots - the
> first occurrence is below. It then repeats printing so frequently that the
> system is unusable (and I could not run the Xenomai task).
>
>  I've attached the highmem_32.c I used.

Ok. You can get rid of old patches. What happens if you make the
printk, stack trace and fix up, conditional to type == KM_PTE0 || type
== KM_PTE1 ? As in:

if (type == KM_PTE0 || type == KM_PTE1) {
			printk("Wrong address passed to kunmap_atomic!\n");
			show_stack(NULL, NULL);
			kpte_clear_flush(kmap_pte-idx, __fix_to_virt(FIX_KMAP_BEGIN+idx));
		}

-- 
 Gilles


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-01 15:58                         ` Gilles Chanteperdrix
@ 2008-04-01 21:23                           ` Tomas Kalibera
  2008-04-02  8:42                             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-01 21:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1966 bytes --]

OK, I got rid of old patches and applied only this check (attached). 
Now, the kernel boots. When I run my Xenomai app, the kernel locks up, 
without writing anything to console.

I tried to get some info via SysRq keys, though the only working options 
were reboot and kill. If I did "kill", the kernel started repeating all 
the time the message below.

T

[  356.939852] Wrong address passed to kunmap_atomic!└─────────────────┘
[  356.944621]        df5b1e74 00000000 00000000 00000007 08102077 c0105fbd c0394b3a 0004c000 
[  356.952854]        00000007 c011a9e1 c039a7ec fffb2000 00000000 c2b94e8c c01a99a0 fffb7000 
[  356.961267]        fffb6000 df57cf44 df99be40 df99b040 f7cbe430 df9a3430 43400000 43000000 
[  356.969681] Call Trace:
[  356.972298]  [<c0105fbd>] show_stack+0x2d/0x40
[  356.976736]  [<c011a9e1>] kunmap_atomic+0x91/0xd0
[  356.981434]  [<c01a99a0>] copy_page_range+0x3f0/0x560
[  356.986481]  [<c012252f>] copy_process+0x8df/0x1250
[  356.991353]  [<c01230dc>] do_fork+0x4c/0x200
[  356.995619]  [<c01022d2>] sys_clone+0x32/0x40
[  356.999970]  [<c01043a1>] sysenter_past_esp+0x6e/0x72



Gilles Chanteperdrix wrote:
> On Tue, Apr 1, 2008 at 5:45 PM, Tomas Kalibera <kalibera@domain.hid> wrote:
>   
>>  The stack trace starts getting printed even before the kernel boots - the
>> first occurrence is below. It then repeats printing so frequently that the
>> system is unusable (and I could not run the Xenomai task).
>>
>>  I've attached the highmem_32.c I used.
>>     
>
> Ok. You can get rid of old patches. What happens if you make the
> printk, stack trace and fix up, conditional to type == KM_PTE0 || type
> == KM_PTE1 ? As in:
>
> if (type == KM_PTE0 || type == KM_PTE1) {
> 			printk("Wrong address passed to kunmap_atomic!\n");
> 			show_stack(NULL, NULL);
> 			kpte_clear_flush(kmap_pte-idx, __fix_to_virt(FIX_KMAP_BEGIN+idx));
> 		}
>
>   


[-- Attachment #2: highmem_32.c --]
[-- Type: text/x-csrc, Size: 3014 bytes --]

#include <linux/highmem.h>
#include <linux/module.h>

void *kmap(struct page *page)
{
	might_sleep();
	if (!PageHighMem(page))
		return page_address(page);
	return kmap_high(page);
}

void kunmap(struct page *page)
{
	if (in_interrupt())
		BUG();
	if (!PageHighMem(page))
		return;
	kunmap_high(page);
}

/*
 * kmap_atomic/kunmap_atomic is significantly faster than kmap/kunmap because
 * no global lock is needed and because the kmap code must perform a global TLB
 * invalidation when the kmap pool wraps.
 *
 * However when holding an atomic kmap is is not legal to sleep, so atomic
 * kmaps are appropriate for short, tight code paths only.
 */
void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t prot)
{
	enum fixed_addresses idx;
	unsigned long vaddr;

	/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
	pagefault_disable();

	if (!PageHighMem(page))
		return page_address(page);

	idx = type + KM_TYPE_NR*smp_processor_id();
	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
	BUG_ON(!pte_none(*(kmap_pte-idx)));
	set_pte(kmap_pte-idx, mk_pte(page, prot));
	arch_flush_lazy_mmu_mode();

	return (void *)vaddr;
}

void *kmap_atomic(struct page *page, enum km_type type)
{
	return kmap_atomic_prot(page, type, kmap_prot);
}

void kunmap_atomic(void *kvaddr, enum km_type type)
{
	unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
	enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();

	/*
	 * Force other mappings to Oops if they'll try to access this pte
	 * without first remap it.  Keeping stale mappings around is a bad idea
	 * also, in case the page changes cacheability attributes or becomes
	 * a protected page in a hypervisor.
	 */
	if (vaddr == __fix_to_virt(FIX_KMAP_BEGIN+idx))
		kpte_clear_flush(kmap_pte-idx, vaddr);
	else {
#ifdef CONFIG_DEBUG_HIGHMEM
		BUG_ON(vaddr < PAGE_OFFSET);
		BUG_ON(vaddr >= (unsigned long)high_memory);
#endif
		if (type == KM_PTE0 || type == KM_PTE1) {
			printk("Wrong address passed to kunmap_atomic!\n");
			show_stack(NULL, NULL);
			kpte_clear_flush(kmap_pte-idx, __fix_to_virt(FIX_KMAP_BEGIN+idx));
		}
	}

	arch_flush_lazy_mmu_mode();
	pagefault_enable();
}

/* This is the same as kmap_atomic() but can map memory that doesn't
 * have a struct page associated with it.
 */
void *kmap_atomic_pfn(unsigned long pfn, enum km_type type)
{
	enum fixed_addresses idx;
	unsigned long vaddr;

	pagefault_disable();

	idx = type + KM_TYPE_NR*smp_processor_id();
	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
	set_pte(kmap_pte-idx, pfn_pte(pfn, kmap_prot));
	arch_flush_lazy_mmu_mode();

	return (void*) vaddr;
}

struct page *kmap_atomic_to_page(void *ptr)
{
	unsigned long idx, vaddr = (unsigned long)ptr;
	pte_t *pte;

	if (vaddr < FIXADDR_START)
		return virt_to_page(ptr);

	idx = virt_to_fix(vaddr);
	pte = kmap_pte - (idx - FIX_KMAP_BEGIN);
	return pte_page(*pte);
}

EXPORT_SYMBOL(kmap);
EXPORT_SYMBOL(kunmap);
EXPORT_SYMBOL(kmap_atomic);
EXPORT_SYMBOL(kunmap_atomic);
EXPORT_SYMBOL(kmap_atomic_to_page);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-01 21:23                           ` Tomas Kalibera
@ 2008-04-02  8:42                             ` Gilles Chanteperdrix
  2008-04-02 15:02                               ` Tomas Kalibera
  0 siblings, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-02  8:42 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 547 bytes --]

On Tue, Apr 1, 2008 at 11:23 PM, Tomas Kalibera <kalibera@domain.hid> wrote:
> OK, I got rid of old patches and applied only this check (attached). Now,
> the kernel boots. When I run my Xenomai app, the kernel locks up, without
> writing anything to console.
>
>  I tried to get some info via SysRq keys, though the only working options
> were reboot and kill. If I did "kill", the kernel started repeating all the
> time the message below.

Ok. Get rid of old patches and try this new one. This should hopefully
be the final fix...

-- 
 Gilles

[-- Attachment #2: ipipe-kmap_atomic-bug.5.diff --]
[-- Type: application/octet-stream, Size: 578 bytes --]

diff --git a/mm/memory.c b/mm/memory.c
index e0e7a3d..32d6cb6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -584,8 +584,14 @@ again:
 			if (is_cow_mapping(vma->vm_flags)) {
 				if (((vma->vm_flags|src_mm->def_flags) & (VM_LOCKED|VM_PINNED))
 				    == (VM_LOCKED|VM_PINNED)) {
+				    	arch_leave_lazy_mmu_mode();
+				    	spin_unlock(src_ptl);
+				    	pte_unmap_nested(src_pte);
+				    	add_mm_rss(dst_mm, rss[0], rss[1]);
+				    	pte_unmap_unlock(dst_pte, dst_ptl);
+				    	cond_resched();
 					do_cow_break = 1;
-					break;
+					goto again;
 				}
 			}
 		}

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02  8:42                             ` Gilles Chanteperdrix
@ 2008-04-02 15:02                               ` Tomas Kalibera
  2008-04-02 15:07                                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-02 15:02 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 824 bytes --]


OK, no change with this patch compared to the previous situation. The 
system boots, but hangs without a stacktrace when I run my Xenomai task. 
SysRq is blocked, now even SysRq-kill did not work, only SysRq-boot did.

Tomas

Gilles Chanteperdrix wrote:
> On Tue, Apr 1, 2008 at 11:23 PM, Tomas Kalibera <kalibera@domain.hid> wrote:
>   
>> OK, I got rid of old patches and applied only this check (attached). Now,
>> the kernel boots. When I run my Xenomai app, the kernel locks up, without
>> writing anything to console.
>>
>>  I tried to get some info via SysRq keys, though the only working options
>> were reboot and kill. If I did "kill", the kernel started repeating all the
>> time the message below.
>>     
>
> Ok. Get rid of old patches and try this new one. This should hopefully
> be the final fix...
>
>   


[-- Attachment #2: memory.c --]
[-- Type: text/x-csrc, Size: 79055 bytes --]

/*
 *  linux/mm/memory.c
 *
 *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
 */

/*
 * demand-loading started 01.12.91 - seems it is high on the list of
 * things wanted, and it should be easy to implement. - Linus
 */

/*
 * Ok, demand-loading was easy, shared pages a little bit tricker. Shared
 * pages started 02.12.91, seems to work. - Linus.
 *
 * Tested sharing by executing about 30 /bin/sh: under the old kernel it
 * would have taken more than the 6M I have free, but it worked well as
 * far as I could see.
 *
 * Also corrected some "invalidate()"s - I wasn't doing enough of them.
 */

/*
 * Real VM (paging to/from disk) started 18.12.91. Much more work and
 * thought has to go into this. Oh, well..
 * 19.12.91  -  works, somewhat. Sometimes I get faults, don't know why.
 *		Found it. Everything seems to work now.
 * 20.12.91  -  Ok, making the swap-device changeable like the root.
 */

/*
 * 05.04.94  -  Multi-page memory management added for v1.1.
 * 		Idea by Alex Bligh (alex@domain.hid)
 *
 * 16.07.99  -  Support of BIGMEM added by Gerhard Wichert, Siemens AG
 *		(Gerhard.Wichert@domain.hid)
 *
 * Aug/Sep 2004 Changed to four level page tables (Andi Kleen)
 */

#include <linux/kernel_stat.h>
#include <linux/mm.h>
#include <linux/hugetlb.h>
#include <linux/mman.h>
#include <linux/swap.h>
#include <linux/highmem.h>
#include <linux/pagemap.h>
#include <linux/rmap.h>
#include <linux/module.h>
#include <linux/delayacct.h>
#include <linux/init.h>
#include <linux/writeback.h>
#include <linux/vmalloc.h>

#include <asm/pgalloc.h>
#include <asm/uaccess.h>
#include <asm/tlb.h>
#include <asm/tlbflush.h>
#include <asm/pgtable.h>

#include <linux/swapops.h>
#include <linux/elf.h>

#ifndef CONFIG_NEED_MULTIPLE_NODES
/* use the per-pgdat data instead for discontigmem - mbligh */
unsigned long max_mapnr;
struct page *mem_map;

EXPORT_SYMBOL(max_mapnr);
EXPORT_SYMBOL(mem_map);
#endif

unsigned long num_physpages;
/*
 * A number of key systems in x86 including ioremap() rely on the assumption
 * that high_memory defines the upper bound on direct map memory, then end
 * of ZONE_NORMAL.  Under CONFIG_DISCONTIG this means that max_low_pfn and
 * highstart_pfn must be the same; there must be no gap between ZONE_NORMAL
 * and ZONE_HIGHMEM.
 */
void * high_memory;

EXPORT_SYMBOL(num_physpages);
EXPORT_SYMBOL(high_memory);

int randomize_va_space __read_mostly = 1;

static int __init disable_randmaps(char *s)
{
	randomize_va_space = 0;
	return 1;
}
__setup("norandmaps", disable_randmaps);


/*
 * If a p?d_bad entry is found while walking page tables, report
 * the error, before resetting entry to p?d_none.  Usually (but
 * very seldom) called out from the p?d_none_or_clear_bad macros.
 */

void pgd_clear_bad(pgd_t *pgd)
{
	pgd_ERROR(*pgd);
	pgd_clear(pgd);
}

void pud_clear_bad(pud_t *pud)
{
	pud_ERROR(*pud);
	pud_clear(pud);
}

void pmd_clear_bad(pmd_t *pmd)
{
	pmd_ERROR(*pmd);
	pmd_clear(pmd);
}

/*
 * Note: this doesn't free the actual pages themselves. That
 * has been handled earlier when unmapping all the memory regions.
 */
static void free_pte_range(struct mmu_gather *tlb, pmd_t *pmd)
{
	struct page *page = pmd_page(*pmd);
	pmd_clear(pmd);
	pte_lock_deinit(page);
	pte_free_tlb(tlb, page);
	dec_zone_page_state(page, NR_PAGETABLE);
	tlb->mm->nr_ptes--;
}

static inline void free_pmd_range(struct mmu_gather *tlb, pud_t *pud,
				unsigned long addr, unsigned long end,
				unsigned long floor, unsigned long ceiling)
{
	pmd_t *pmd;
	unsigned long next;
	unsigned long start;

	start = addr;
	pmd = pmd_offset(pud, addr);
	do {
		next = pmd_addr_end(addr, end);
		if (pmd_none_or_clear_bad(pmd))
			continue;
		free_pte_range(tlb, pmd);
	} while (pmd++, addr = next, addr != end);

	start &= PUD_MASK;
	if (start < floor)
		return;
	if (ceiling) {
		ceiling &= PUD_MASK;
		if (!ceiling)
			return;
	}
	if (end - 1 > ceiling - 1)
		return;

	pmd = pmd_offset(pud, start);
	pud_clear(pud);
	pmd_free_tlb(tlb, pmd);
}

static inline void free_pud_range(struct mmu_gather *tlb, pgd_t *pgd,
				unsigned long addr, unsigned long end,
				unsigned long floor, unsigned long ceiling)
{
	pud_t *pud;
	unsigned long next;
	unsigned long start;

	start = addr;
	pud = pud_offset(pgd, addr);
	do {
		next = pud_addr_end(addr, end);
		if (pud_none_or_clear_bad(pud))
			continue;
		free_pmd_range(tlb, pud, addr, next, floor, ceiling);
	} while (pud++, addr = next, addr != end);

	start &= PGDIR_MASK;
	if (start < floor)
		return;
	if (ceiling) {
		ceiling &= PGDIR_MASK;
		if (!ceiling)
			return;
	}
	if (end - 1 > ceiling - 1)
		return;

	pud = pud_offset(pgd, start);
	pgd_clear(pgd);
	pud_free_tlb(tlb, pud);
}

/*
 * This function frees user-level page tables of a process.
 *
 * Must be called with pagetable lock held.
 */
void free_pgd_range(struct mmu_gather **tlb,
			unsigned long addr, unsigned long end,
			unsigned long floor, unsigned long ceiling)
{
	pgd_t *pgd;
	unsigned long next;
	unsigned long start;

	/*
	 * The next few lines have given us lots of grief...
	 *
	 * Why are we testing PMD* at this top level?  Because often
	 * there will be no work to do at all, and we'd prefer not to
	 * go all the way down to the bottom just to discover that.
	 *
	 * Why all these "- 1"s?  Because 0 represents both the bottom
	 * of the address space and the top of it (using -1 for the
	 * top wouldn't help much: the masks would do the wrong thing).
	 * The rule is that addr 0 and floor 0 refer to the bottom of
	 * the address space, but end 0 and ceiling 0 refer to the top
	 * Comparisons need to use "end - 1" and "ceiling - 1" (though
	 * that end 0 case should be mythical).
	 *
	 * Wherever addr is brought up or ceiling brought down, we must
	 * be careful to reject "the opposite 0" before it confuses the
	 * subsequent tests.  But what about where end is brought down
	 * by PMD_SIZE below? no, end can't go down to 0 there.
	 *
	 * Whereas we round start (addr) and ceiling down, by different
	 * masks at different levels, in order to test whether a table
	 * now has no other vmas using it, so can be freed, we don't
	 * bother to round floor or end up - the tests don't need that.
	 */

	addr &= PMD_MASK;
	if (addr < floor) {
		addr += PMD_SIZE;
		if (!addr)
			return;
	}
	if (ceiling) {
		ceiling &= PMD_MASK;
		if (!ceiling)
			return;
	}
	if (end - 1 > ceiling - 1)
		end -= PMD_SIZE;
	if (addr > end - 1)
		return;

	start = addr;
	pgd = pgd_offset((*tlb)->mm, addr);
	do {
		next = pgd_addr_end(addr, end);
		if (pgd_none_or_clear_bad(pgd))
			continue;
		free_pud_range(*tlb, pgd, addr, next, floor, ceiling);
	} while (pgd++, addr = next, addr != end);
}

void free_pgtables(struct mmu_gather **tlb, struct vm_area_struct *vma,
		unsigned long floor, unsigned long ceiling)
{
	while (vma) {
		struct vm_area_struct *next = vma->vm_next;
		unsigned long addr = vma->vm_start;

		/*
		 * Hide vma from rmap and vmtruncate before freeing pgtables
		 */
		anon_vma_unlink(vma);
		unlink_file_vma(vma);

		if (is_vm_hugetlb_page(vma)) {
			hugetlb_free_pgd_range(tlb, addr, vma->vm_end,
				floor, next? next->vm_start: ceiling);
		} else {
			/*
			 * Optimization: gather nearby vmas into one call down
			 */
			while (next && next->vm_start <= vma->vm_end + PMD_SIZE
			       && !is_vm_hugetlb_page(next)) {
				vma = next;
				next = vma->vm_next;
				anon_vma_unlink(vma);
				unlink_file_vma(vma);
			}
			free_pgd_range(tlb, addr, vma->vm_end,
				floor, next? next->vm_start: ceiling);
		}
		vma = next;
	}
}

int __pte_alloc(struct mm_struct *mm, pmd_t *pmd, unsigned long address)
{
	struct page *new = pte_alloc_one(mm, address);
	if (!new)
		return -ENOMEM;

	pte_lock_init(new);
	spin_lock(&mm->page_table_lock);
	if (pmd_present(*pmd)) {	/* Another has populated it */
		pte_lock_deinit(new);
		pte_free(new);
	} else {
		mm->nr_ptes++;
		inc_zone_page_state(new, NR_PAGETABLE);
		pmd_populate(mm, pmd, new);
	}
	spin_unlock(&mm->page_table_lock);
	return 0;
}

int __pte_alloc_kernel(pmd_t *pmd, unsigned long address)
{
	pte_t *new = pte_alloc_one_kernel(&init_mm, address);
	if (!new)
		return -ENOMEM;

	spin_lock(&init_mm.page_table_lock);
	if (pmd_present(*pmd))		/* Another has populated it */
		pte_free_kernel(new);
	else
		pmd_populate_kernel(&init_mm, pmd, new);
	spin_unlock(&init_mm.page_table_lock);
	return 0;
}

static inline void add_mm_rss(struct mm_struct *mm, int file_rss, int anon_rss)
{
	if (file_rss)
		add_mm_counter(mm, file_rss, file_rss);
	if (anon_rss)
		add_mm_counter(mm, anon_rss, anon_rss);
}

/*
 * This function is called to print an error when a bad pte
 * is found. For example, we might have a PFN-mapped pte in
 * a region that doesn't allow it.
 *
 * The calling function must still handle the error.
 */
void print_bad_pte(struct vm_area_struct *vma, pte_t pte, unsigned long vaddr)
{
	printk(KERN_ERR "Bad pte = %08llx, process = %s, "
			"vm_flags = %lx, vaddr = %lx\n",
		(long long)pte_val(pte),
		(vma->vm_mm == current->mm ? current->comm : "???"),
		vma->vm_flags, vaddr);
	dump_stack();
}

static inline int is_cow_mapping(unsigned int flags)
{
	return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
}

/*
 * This function gets the "struct page" associated with a pte.
 *
 * NOTE! Some mappings do not have "struct pages". A raw PFN mapping
 * will have each page table entry just pointing to a raw page frame
 * number, and as far as the VM layer is concerned, those do not have
 * pages associated with them - even if the PFN might point to memory
 * that otherwise is perfectly fine and has a "struct page".
 *
 * The way we recognize those mappings is through the rules set up
 * by "remap_pfn_range()": the vma will have the VM_PFNMAP bit set,
 * and the vm_pgoff will point to the first PFN mapped: thus every
 * page that is a raw mapping will always honor the rule
 *
 *	pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT)
 *
 * and if that isn't true, the page has been COW'ed (in which case it
 * _does_ have a "struct page" associated with it even if it is in a
 * VM_PFNMAP range).
 */
struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte)
{
	unsigned long pfn = pte_pfn(pte);

	if (unlikely(vma->vm_flags & VM_PFNMAP)) {
		unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT;
		if (pfn == vma->vm_pgoff + off)
			return NULL;
		if (!is_cow_mapping(vma->vm_flags))
			return NULL;
	}

#ifdef CONFIG_DEBUG_VM
	/*
	 * Add some anal sanity checks for now. Eventually,
	 * we should just do "return pfn_to_page(pfn)", but
	 * in the meantime we check that we get a valid pfn,
	 * and that the resulting page looks ok.
	 */
	if (unlikely(!pfn_valid(pfn))) {
		print_bad_pte(vma, pte, addr);
		return NULL;
	}
#endif

	/*
	 * NOTE! We still have PageReserved() pages in the page 
	 * tables. 
	 *
	 * The PAGE_ZERO() pages and various VDSO mappings can
	 * cause them to exist.
	 */
	return pfn_to_page(pfn);
}

static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma)
{
	/*
	 * If the source page was a PFN mapping, we don't have
	 * a "struct page" for it. We do a best-effort copy by
	 * just copying from the original user address. If that
	 * fails, we just zero-fill it. Live with it.
	 */
	if (unlikely(!src)) {
		void *kaddr = kmap_atomic(dst, KM_USER0);
		void __user *uaddr = (void __user *)(va & PAGE_MASK);

		/*
		 * This really shouldn't fail, because the page is there
		 * in the page tables. But it might just be unreadable,
		 * in which case we just give up and fill the result with
		 * zeroes.
		 */
		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
			memset(kaddr, 0, PAGE_SIZE);
		kunmap_atomic(kaddr, KM_USER0);
		flush_dcache_page(dst);
		return;
		
	}
	copy_user_highpage(dst, src, va, vma);
}

/*
 * copy one vm_area from one task to the other. Assumes the page tables
 * already present in the new task to be cleared in the whole range
 * covered by this vma.
 */

static inline void
copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
	     pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
	     unsigned long addr, int *rss, struct page *uncow_page)
{
	unsigned long vm_flags = vma->vm_flags;
	pte_t pte = *src_pte;
	struct page *page;

	/* pte contains position in swap or file, so copy. */
	if (unlikely(!pte_present(pte))) {
		if (!pte_file(pte)) {
			swp_entry_t entry = pte_to_swp_entry(pte);

			swap_duplicate(entry);
			/* make sure dst_mm is on swapoff's mmlist. */
			if (unlikely(list_empty(&dst_mm->mmlist))) {
				spin_lock(&mmlist_lock);
				if (list_empty(&dst_mm->mmlist))
					list_add(&dst_mm->mmlist,
						 &src_mm->mmlist);
				spin_unlock(&mmlist_lock);
			}
			if (is_write_migration_entry(entry) &&
					is_cow_mapping(vm_flags)) {
				/*
				 * COW mappings require pages in both parent
				 * and child to be set to read.
				 */
				make_migration_entry_read(&entry);
				pte = swp_entry_to_pte(entry);
				set_pte_at(src_mm, addr, src_pte, pte);
			}
		}
		goto out_set_pte;
	}

	/*
	 * If it's a COW mapping, write protect it both
	 * in the parent and the child
	 */
	if (is_cow_mapping(vm_flags)) {
#ifdef CONFIG_IPIPE
		if (uncow_page) {
			struct page *old_page = vm_normal_page(vma, addr, pte);
			cow_user_page(uncow_page, old_page, addr, vma);
			pte = mk_pte(uncow_page, vma->vm_page_prot);
			
			if (vm_flags & VM_SHARED)
				pte = pte_mkclean(pte);
			pte = pte_mkold(pte);

			page_dup_rmap(uncow_page, vma, addr);
			rss[!!PageAnon(uncow_page)]++;
			goto out_set_pte;
		}
#endif /* CONFIG_IPIPE */
		ptep_set_wrprotect(src_mm, addr, src_pte);
		pte = pte_wrprotect(pte);
	}

	/*
	 * If it's a shared mapping, mark it clean in
	 * the child
	 */
	if (vm_flags & VM_SHARED)
		pte = pte_mkclean(pte);
	pte = pte_mkold(pte);

	page = vm_normal_page(vma, addr, pte);
	if (page) {
		get_page(page);
		page_dup_rmap(page, vma, addr);
		rss[!!PageAnon(page)]++;
	}

out_set_pte:
	set_pte_at(dst_mm, addr, dst_pte, pte);
}

static int copy_pte_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
		pmd_t *dst_pmd, pmd_t *src_pmd, struct vm_area_struct *vma,
		unsigned long addr, unsigned long end)
{
	pte_t *src_pte, *dst_pte;
	spinlock_t *src_ptl, *dst_ptl;
	int progress = 0;
	struct page *uncow_page = NULL;
	int rss[2];
#ifdef CONFIG_IPIPE
	int do_cow_break = 0;
again:
	if (do_cow_break) {
		uncow_page = alloc_page_vma(GFP_HIGHUSER, vma, addr);
		if (!uncow_page)
			return -ENOMEM;
		do_cow_break = 0;
	}
#else
again:
#endif
	rss[1] = rss[0] = 0;
	dst_pte = pte_alloc_map_lock(dst_mm, dst_pmd, addr, &dst_ptl);
	if (!dst_pte) {
		if (uncow_page)
			page_cache_release(uncow_page);
		return -ENOMEM;
	}
	src_pte = pte_offset_map_nested(src_pmd, addr);
	src_ptl = pte_lockptr(src_mm, src_pmd);
	spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
	arch_enter_lazy_mmu_mode();

	do {
		/*
		 * We are holding two locks at this point - either of them
		 * could generate latencies in another task on another CPU.
		 */
		if (progress >= 32) {
			progress = 0;
			if (need_resched() ||
			    need_lockbreak(src_ptl) ||
			    need_lockbreak(dst_ptl))
				break;
		}
		if (pte_none(*src_pte)) {
			progress++;
			continue;
		}
#ifdef CONFIG_IPIPE
		if (likely(uncow_page == NULL) && likely(pte_present(*src_pte))) {
			if (is_cow_mapping(vma->vm_flags)) {
				if (((vma->vm_flags|src_mm->def_flags) & (VM_LOCKED|VM_PINNED))
				    == (VM_LOCKED|VM_PINNED)) {
				    	arch_leave_lazy_mmu_mode();
				    	spin_unlock(src_ptl);
				    	pte_unmap_nested(src_pte);
				    	add_mm_rss(dst_mm, rss[0], rss[1]);
				    	pte_unmap_unlock(dst_pte, dst_ptl);
				    	cond_resched();
					do_cow_break = 1;
					goto again;
				}
			}
		}
#endif
		copy_one_pte(dst_mm, src_mm, dst_pte,
			     src_pte, vma, addr, rss, uncow_page);
		uncow_page = NULL;
		progress += 8;
	} while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end);

	arch_leave_lazy_mmu_mode();
	spin_unlock(src_ptl);
	pte_unmap_nested(src_pte - 1);
	add_mm_rss(dst_mm, rss[0], rss[1]);
	pte_unmap_unlock(dst_pte - 1, dst_ptl);
	cond_resched();
	if (addr != end)
		goto again;
	return 0;
}

static inline int copy_pmd_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
		pud_t *dst_pud, pud_t *src_pud, struct vm_area_struct *vma,
		unsigned long addr, unsigned long end)
{
	pmd_t *src_pmd, *dst_pmd;
	unsigned long next;

	dst_pmd = pmd_alloc(dst_mm, dst_pud, addr);
	if (!dst_pmd)
		return -ENOMEM;
	src_pmd = pmd_offset(src_pud, addr);
	do {
		next = pmd_addr_end(addr, end);
		if (pmd_none_or_clear_bad(src_pmd))
			continue;
		if (copy_pte_range(dst_mm, src_mm, dst_pmd, src_pmd,
						vma, addr, next))
			return -ENOMEM;
	} while (dst_pmd++, src_pmd++, addr = next, addr != end);
	return 0;
}

static inline int copy_pud_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
		pgd_t *dst_pgd, pgd_t *src_pgd, struct vm_area_struct *vma,
		unsigned long addr, unsigned long end)
{
	pud_t *src_pud, *dst_pud;
	unsigned long next;

	dst_pud = pud_alloc(dst_mm, dst_pgd, addr);
	if (!dst_pud)
		return -ENOMEM;
	src_pud = pud_offset(src_pgd, addr);
	do {
		next = pud_addr_end(addr, end);
		if (pud_none_or_clear_bad(src_pud))
			continue;
		if (copy_pmd_range(dst_mm, src_mm, dst_pud, src_pud,
						vma, addr, next))
			return -ENOMEM;
	} while (dst_pud++, src_pud++, addr = next, addr != end);
	return 0;
}

int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
		struct vm_area_struct *vma)
{
	pgd_t *src_pgd, *dst_pgd;
	unsigned long next;
	unsigned long addr = vma->vm_start;
	unsigned long end = vma->vm_end;

	/*
	 * Don't copy ptes where a page fault will fill them correctly.
	 * Fork becomes much lighter when there are big shared or private
	 * readonly mappings. The tradeoff is that copy_page_range is more
	 * efficient than faulting.
	 */
	if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_PFNMAP|VM_INSERTPAGE))) {
		if (!vma->anon_vma)
			return 0;
	}

	if (is_vm_hugetlb_page(vma))
		return copy_hugetlb_page_range(dst_mm, src_mm, vma);

	dst_pgd = pgd_offset(dst_mm, addr);
	src_pgd = pgd_offset(src_mm, addr);
	do {
		next = pgd_addr_end(addr, end);
		if (pgd_none_or_clear_bad(src_pgd))
			continue;
		if (copy_pud_range(dst_mm, src_mm, dst_pgd, src_pgd,
						vma, addr, next))
			return -ENOMEM;
	} while (dst_pgd++, src_pgd++, addr = next, addr != end);
	return 0;
}

static unsigned long zap_pte_range(struct mmu_gather *tlb,
				struct vm_area_struct *vma, pmd_t *pmd,
				unsigned long addr, unsigned long end,
				long *zap_work, struct zap_details *details)
{
	struct mm_struct *mm = tlb->mm;
	pte_t *pte;
	spinlock_t *ptl;
	int file_rss = 0;
	int anon_rss = 0;

	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
	arch_enter_lazy_mmu_mode();
	do {
		pte_t ptent = *pte;
		if (pte_none(ptent)) {
			(*zap_work)--;
			continue;
		}

		(*zap_work) -= PAGE_SIZE;

		if (pte_present(ptent)) {
			struct page *page;

			page = vm_normal_page(vma, addr, ptent);
			if (unlikely(details) && page) {
				/*
				 * unmap_shared_mapping_pages() wants to
				 * invalidate cache without truncating:
				 * unmap shared but keep private pages.
				 */
				if (details->check_mapping &&
				    details->check_mapping != page->mapping)
					continue;
				/*
				 * Each page->index must be checked when
				 * invalidating or truncating nonlinear.
				 */
				if (details->nonlinear_vma &&
				    (page->index < details->first_index ||
				     page->index > details->last_index))
					continue;
			}
			ptent = ptep_get_and_clear_full(mm, addr, pte,
							tlb->fullmm);
			tlb_remove_tlb_entry(tlb, pte, addr);
			if (unlikely(!page))
				continue;
			if (unlikely(details) && details->nonlinear_vma
			    && linear_page_index(details->nonlinear_vma,
						addr) != page->index)
				set_pte_at(mm, addr, pte,
					   pgoff_to_pte(page->index));
			if (PageAnon(page))
				anon_rss--;
			else {
				if (pte_dirty(ptent))
					set_page_dirty(page);
				if (pte_young(ptent))
					SetPageReferenced(page);
				file_rss--;
			}
			page_remove_rmap(page, vma);
			tlb_remove_page(tlb, page);
			continue;
		}
		/*
		 * If details->check_mapping, we leave swap entries;
		 * if details->nonlinear_vma, we leave file entries.
		 */
		if (unlikely(details))
			continue;
		if (!pte_file(ptent))
			free_swap_and_cache(pte_to_swp_entry(ptent));
		pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
	} while (pte++, addr += PAGE_SIZE, (addr != end && *zap_work > 0));

	add_mm_rss(mm, file_rss, anon_rss);
	arch_leave_lazy_mmu_mode();
	pte_unmap_unlock(pte - 1, ptl);

	return addr;
}

static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
				struct vm_area_struct *vma, pud_t *pud,
				unsigned long addr, unsigned long end,
				long *zap_work, struct zap_details *details)
{
	pmd_t *pmd;
	unsigned long next;

	pmd = pmd_offset(pud, addr);
	do {
		next = pmd_addr_end(addr, end);
		if (pmd_none_or_clear_bad(pmd)) {
			(*zap_work)--;
			continue;
		}
		next = zap_pte_range(tlb, vma, pmd, addr, next,
						zap_work, details);
	} while (pmd++, addr = next, (addr != end && *zap_work > 0));

	return addr;
}

static inline unsigned long zap_pud_range(struct mmu_gather *tlb,
				struct vm_area_struct *vma, pgd_t *pgd,
				unsigned long addr, unsigned long end,
				long *zap_work, struct zap_details *details)
{
	pud_t *pud;
	unsigned long next;

	pud = pud_offset(pgd, addr);
	do {
		next = pud_addr_end(addr, end);
		if (pud_none_or_clear_bad(pud)) {
			(*zap_work)--;
			continue;
		}
		next = zap_pmd_range(tlb, vma, pud, addr, next,
						zap_work, details);
	} while (pud++, addr = next, (addr != end && *zap_work > 0));

	return addr;
}

static unsigned long unmap_page_range(struct mmu_gather *tlb,
				struct vm_area_struct *vma,
				unsigned long addr, unsigned long end,
				long *zap_work, struct zap_details *details)
{
	pgd_t *pgd;
	unsigned long next;

	if (details && !details->check_mapping && !details->nonlinear_vma)
		details = NULL;

	BUG_ON(addr >= end);
	tlb_start_vma(tlb, vma);
	pgd = pgd_offset(vma->vm_mm, addr);
	do {
		next = pgd_addr_end(addr, end);
		if (pgd_none_or_clear_bad(pgd)) {
			(*zap_work)--;
			continue;
		}
		next = zap_pud_range(tlb, vma, pgd, addr, next,
						zap_work, details);
	} while (pgd++, addr = next, (addr != end && *zap_work > 0));
	tlb_end_vma(tlb, vma);

	return addr;
}

#ifdef CONFIG_PREEMPT
# define ZAP_BLOCK_SIZE	(8 * PAGE_SIZE)
#else
/* No preempt: go for improved straight-line efficiency */
# define ZAP_BLOCK_SIZE	(1024 * PAGE_SIZE)
#endif

/**
 * unmap_vmas - unmap a range of memory covered by a list of vma's
 * @tlbp: address of the caller's struct mmu_gather
 * @vma: the starting vma
 * @start_addr: virtual address at which to start unmapping
 * @end_addr: virtual address at which to end unmapping
 * @nr_accounted: Place number of unmapped pages in vm-accountable vma's here
 * @details: details of nonlinear truncation or shared cache invalidation
 *
 * Returns the end address of the unmapping (restart addr if interrupted).
 *
 * Unmap all pages in the vma list.
 *
 * We aim to not hold locks for too long (for scheduling latency reasons).
 * So zap pages in ZAP_BLOCK_SIZE bytecounts.  This means we need to
 * return the ending mmu_gather to the caller.
 *
 * Only addresses between `start' and `end' will be unmapped.
 *
 * The VMA list must be sorted in ascending virtual address order.
 *
 * unmap_vmas() assumes that the caller will flush the whole unmapped address
 * range after unmap_vmas() returns.  So the only responsibility here is to
 * ensure that any thus-far unmapped pages are flushed before unmap_vmas()
 * drops the lock and schedules.
 */
unsigned long unmap_vmas(struct mmu_gather **tlbp,
		struct vm_area_struct *vma, unsigned long start_addr,
		unsigned long end_addr, unsigned long *nr_accounted,
		struct zap_details *details)
{
	long zap_work = ZAP_BLOCK_SIZE;
	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
	int tlb_start_valid = 0;
	unsigned long start = start_addr;
	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
	int fullmm = (*tlbp)->fullmm;

	for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) {
		unsigned long end;

		start = max(vma->vm_start, start_addr);
		if (start >= vma->vm_end)
			continue;
		end = min(vma->vm_end, end_addr);
		if (end <= vma->vm_start)
			continue;

		if (vma->vm_flags & VM_ACCOUNT)
			*nr_accounted += (end - start) >> PAGE_SHIFT;

		while (start != end) {
			if (!tlb_start_valid) {
				tlb_start = start;
				tlb_start_valid = 1;
			}

			if (unlikely(is_vm_hugetlb_page(vma))) {
				unmap_hugepage_range(vma, start, end);
				zap_work -= (end - start) /
						(HPAGE_SIZE / PAGE_SIZE);
				start = end;
			} else
				start = unmap_page_range(*tlbp, vma,
						start, end, &zap_work, details);

			if (zap_work > 0) {
				BUG_ON(start != end);
				break;
			}

			tlb_finish_mmu(*tlbp, tlb_start, start);

			if (need_resched() ||
				(i_mmap_lock && need_lockbreak(i_mmap_lock))) {
				if (i_mmap_lock) {
					*tlbp = NULL;
					goto out;
				}
				cond_resched();
			}

			*tlbp = tlb_gather_mmu(vma->vm_mm, fullmm);
			tlb_start_valid = 0;
			zap_work = ZAP_BLOCK_SIZE;
		}
	}
out:
	return start;	/* which is now the end (or restart) address */
}

/**
 * zap_page_range - remove user pages in a given range
 * @vma: vm_area_struct holding the applicable pages
 * @address: starting address of pages to zap
 * @size: number of bytes to zap
 * @details: details of nonlinear truncation or shared cache invalidation
 */
unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
		unsigned long size, struct zap_details *details)
{
	struct mm_struct *mm = vma->vm_mm;
	struct mmu_gather *tlb;
	unsigned long end = address + size;
	unsigned long nr_accounted = 0;

	lru_add_drain();
	tlb = tlb_gather_mmu(mm, 0);
	update_hiwater_rss(mm);
	end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
	if (tlb)
		tlb_finish_mmu(tlb, address, end);
	return end;
}

/*
 * Do a quick page-table lookup for a single page.
 */
struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
			unsigned int flags)
{
	pgd_t *pgd;
	pud_t *pud;
	pmd_t *pmd;
	pte_t *ptep, pte;
	spinlock_t *ptl;
	struct page *page;
	struct mm_struct *mm = vma->vm_mm;

	page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
	if (!IS_ERR(page)) {
		BUG_ON(flags & FOLL_GET);
		goto out;
	}

	page = NULL;
	pgd = pgd_offset(mm, address);
	if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
		goto no_page_table;

	pud = pud_offset(pgd, address);
	if (pud_none(*pud) || unlikely(pud_bad(*pud)))
		goto no_page_table;
	
	pmd = pmd_offset(pud, address);
	if (pmd_none(*pmd) || unlikely(pmd_bad(*pmd)))
		goto no_page_table;

	if (pmd_huge(*pmd)) {
		BUG_ON(flags & FOLL_GET);
		page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
		goto out;
	}

	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
	if (!ptep)
		goto out;

	pte = *ptep;
	if (!pte_present(pte))
		goto unlock;
	if ((flags & FOLL_WRITE) && !pte_write(pte))
		goto unlock;
	page = vm_normal_page(vma, address, pte);
	if (unlikely(!page))
		goto unlock;

	if (flags & FOLL_GET)
		get_page(page);
	if (flags & FOLL_TOUCH) {
		if ((flags & FOLL_WRITE) &&
		    !pte_dirty(pte) && !PageDirty(page))
			set_page_dirty(page);
		mark_page_accessed(page);
	}
unlock:
	pte_unmap_unlock(ptep, ptl);
out:
	return page;

no_page_table:
	/*
	 * When core dumping an enormous anonymous area that nobody
	 * has touched so far, we don't want to allocate page tables.
	 */
	if (flags & FOLL_ANON) {
		page = ZERO_PAGE(0);
		if (flags & FOLL_GET)
			get_page(page);
		BUG_ON(flags & FOLL_WRITE);
	}
	return page;
}

int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
		unsigned long start, int len, int write, int force,
		struct page **pages, struct vm_area_struct **vmas)
{
	int i;
	unsigned int vm_flags;

	if (len <= 0)
		return 0;
	/* 
	 * Require read or write permissions.
	 * If 'force' is set, we only require the "MAY" flags.
	 */
	vm_flags  = write ? (VM_WRITE | VM_MAYWRITE) : (VM_READ | VM_MAYREAD);
	vm_flags &= force ? (VM_MAYREAD | VM_MAYWRITE) : (VM_READ | VM_WRITE);
	i = 0;

	do {
		struct vm_area_struct *vma;
		unsigned int foll_flags;

		vma = find_extend_vma(mm, start);
		if (!vma && in_gate_area(tsk, start)) {
			unsigned long pg = start & PAGE_MASK;
			struct vm_area_struct *gate_vma = get_gate_vma(tsk);
			pgd_t *pgd;
			pud_t *pud;
			pmd_t *pmd;
			pte_t *pte;
			if (write) /* user gate pages are read-only */
				return i ? : -EFAULT;
			if (pg > TASK_SIZE)
				pgd = pgd_offset_k(pg);
			else
				pgd = pgd_offset_gate(mm, pg);
			BUG_ON(pgd_none(*pgd));
			pud = pud_offset(pgd, pg);
			BUG_ON(pud_none(*pud));
			pmd = pmd_offset(pud, pg);
			if (pmd_none(*pmd))
				return i ? : -EFAULT;
			pte = pte_offset_map(pmd, pg);
			if (pte_none(*pte)) {
				pte_unmap(pte);
				return i ? : -EFAULT;
			}
			if (pages) {
				struct page *page = vm_normal_page(gate_vma, start, *pte);
				pages[i] = page;
				if (page)
					get_page(page);
			}
			pte_unmap(pte);
			if (vmas)
				vmas[i] = gate_vma;
			i++;
			start += PAGE_SIZE;
			len--;
			continue;
		}

		if (!vma || (vma->vm_flags & (VM_IO | VM_PFNMAP))
				|| !(vm_flags & vma->vm_flags))
			return i ? : -EFAULT;

		if (is_vm_hugetlb_page(vma)) {
			i = follow_hugetlb_page(mm, vma, pages, vmas,
						&start, &len, i, write);
			continue;
		}

		foll_flags = FOLL_TOUCH;
		if (pages)
			foll_flags |= FOLL_GET;
		if (!write && !(vma->vm_flags & VM_LOCKED) &&
		    (!vma->vm_ops || (!vma->vm_ops->nopage &&
					!vma->vm_ops->fault)))
			foll_flags |= FOLL_ANON;

		do {
			struct page *page;

			/*
			 * If tsk is ooming, cut off its access to large memory
			 * allocations. It has a pending SIGKILL, but it can't
			 * be processed until returning to user space.
			 */
			if (unlikely(test_tsk_thread_flag(tsk, TIF_MEMDIE)))
				return -ENOMEM;

			if (write)
				foll_flags |= FOLL_WRITE;

			cond_resched();
			while (!(page = follow_page(vma, start, foll_flags))) {
				int ret;
				ret = handle_mm_fault(mm, vma, start,
						foll_flags & FOLL_WRITE);
				if (ret & VM_FAULT_ERROR) {
					if (ret & VM_FAULT_OOM)
						return i ? i : -ENOMEM;
					else if (ret & VM_FAULT_SIGBUS)
						return i ? i : -EFAULT;
					BUG();
				}
				if (ret & VM_FAULT_MAJOR)
					tsk->maj_flt++;
				else
					tsk->min_flt++;

				/*
				 * The VM_FAULT_WRITE bit tells us that
				 * do_wp_page has broken COW when necessary,
				 * even if maybe_mkwrite decided not to set
				 * pte_write. We can thus safely do subsequent
				 * page lookups as if they were reads.
				 */
				if (ret & VM_FAULT_WRITE)
					foll_flags &= ~FOLL_WRITE;

				cond_resched();
			}
			if (pages) {
				pages[i] = page;

				flush_anon_page(vma, page, start);
				flush_dcache_page(page);
			}
			if (vmas)
				vmas[i] = vma;
			i++;
			start += PAGE_SIZE;
			len--;
		} while (len && start < vma->vm_end);
	} while (len);
	return i;
}
EXPORT_SYMBOL(get_user_pages);

pte_t * fastcall get_locked_pte(struct mm_struct *mm, unsigned long addr, spinlock_t **ptl)
{
	pgd_t * pgd = pgd_offset(mm, addr);
	pud_t * pud = pud_alloc(mm, pgd, addr);
	if (pud) {
		pmd_t * pmd = pmd_alloc(mm, pud, addr);
		if (pmd)
			return pte_alloc_map_lock(mm, pmd, addr, ptl);
	}
	return NULL;
}

/*
 * This is the old fallback for page remapping.
 *
 * For historical reasons, it only allows reserved pages. Only
 * old drivers should use this, and they needed to mark their
 * pages reserved for the old functions anyway.
 */
static int insert_page(struct mm_struct *mm, unsigned long addr, struct page *page, pgprot_t prot)
{
	int retval;
	pte_t *pte;
	spinlock_t *ptl;  

	retval = -EINVAL;
	if (PageAnon(page))
		goto out;
	retval = -ENOMEM;
	flush_dcache_page(page);
	pte = get_locked_pte(mm, addr, &ptl);
	if (!pte)
		goto out;
	retval = -EBUSY;
	if (!pte_none(*pte))
		goto out_unlock;

	/* Ok, finally just insert the thing.. */
	get_page(page);
	inc_mm_counter(mm, file_rss);
	page_add_file_rmap(page);
	set_pte_at(mm, addr, pte, mk_pte(page, prot));

	retval = 0;
out_unlock:
	pte_unmap_unlock(pte, ptl);
out:
	return retval;
}

/**
 * vm_insert_page - insert single page into user vma
 * @vma: user vma to map to
 * @addr: target user address of this page
 * @page: source kernel page
 *
 * This allows drivers to insert individual pages they've allocated
 * into a user vma.
 *
 * The page has to be a nice clean _individual_ kernel allocation.
 * If you allocate a compound page, you need to have marked it as
 * such (__GFP_COMP), or manually just split the page up yourself
 * (see split_page()).
 *
 * NOTE! Traditionally this was done with "remap_pfn_range()" which
 * took an arbitrary page protection parameter. This doesn't allow
 * that. Your vma protection will have to be set up correctly, which
 * means that if you want a shared writable mapping, you'd better
 * ask for a shared writable mapping!
 *
 * The page does not need to be reserved.
 */
int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, struct page *page)
{
	if (addr < vma->vm_start || addr >= vma->vm_end)
		return -EFAULT;
	if (!page_count(page))
		return -EINVAL;
	vma->vm_flags |= VM_INSERTPAGE;
	return insert_page(vma->vm_mm, addr, page, vma->vm_page_prot);
}
EXPORT_SYMBOL(vm_insert_page);

/**
 * vm_insert_pfn - insert single pfn into user vma
 * @vma: user vma to map to
 * @addr: target user address of this page
 * @pfn: source kernel pfn
 *
 * Similar to vm_inert_page, this allows drivers to insert individual pages
 * they've allocated into a user vma. Same comments apply.
 *
 * This function should only be called from a vm_ops->fault handler, and
 * in that case the handler should return NULL.
 */
int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
		unsigned long pfn)
{
	struct mm_struct *mm = vma->vm_mm;
	int retval;
	pte_t *pte, entry;
	spinlock_t *ptl;

	BUG_ON(!(vma->vm_flags & VM_PFNMAP));
	BUG_ON(is_cow_mapping(vma->vm_flags));

	retval = -ENOMEM;
	pte = get_locked_pte(mm, addr, &ptl);
	if (!pte)
		goto out;
	retval = -EBUSY;
	if (!pte_none(*pte))
		goto out_unlock;

	/* Ok, finally just insert the thing.. */
	entry = pfn_pte(pfn, vma->vm_page_prot);
	set_pte_at(mm, addr, pte, entry);
	update_mmu_cache(vma, addr, entry);

	retval = 0;
out_unlock:
	pte_unmap_unlock(pte, ptl);

out:
	return retval;
}
EXPORT_SYMBOL(vm_insert_pfn);

/*
 * maps a range of physical memory into the requested pages. the old
 * mappings are removed. any references to nonexistent pages results
 * in null mappings (currently treated as "copy-on-access")
 */
static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
			unsigned long addr, unsigned long end,
			unsigned long pfn, pgprot_t prot)
{
	pte_t *pte;
	spinlock_t *ptl;

	pte = pte_alloc_map_lock(mm, pmd, addr, &ptl);
	if (!pte)
		return -ENOMEM;
	arch_enter_lazy_mmu_mode();
	do {
		BUG_ON(!pte_none(*pte));
		set_pte_at(mm, addr, pte, pfn_pte(pfn, prot));
		pfn++;
	} while (pte++, addr += PAGE_SIZE, addr != end);
	arch_leave_lazy_mmu_mode();
	pte_unmap_unlock(pte - 1, ptl);
	return 0;
}

static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
			unsigned long addr, unsigned long end,
			unsigned long pfn, pgprot_t prot)
{
	pmd_t *pmd;
	unsigned long next;

	pfn -= addr >> PAGE_SHIFT;
	pmd = pmd_alloc(mm, pud, addr);
	if (!pmd)
		return -ENOMEM;
	do {
		next = pmd_addr_end(addr, end);
		if (remap_pte_range(mm, pmd, addr, next,
				pfn + (addr >> PAGE_SHIFT), prot))
			return -ENOMEM;
	} while (pmd++, addr = next, addr != end);
	return 0;
}

static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd,
			unsigned long addr, unsigned long end,
			unsigned long pfn, pgprot_t prot)
{
	pud_t *pud;
	unsigned long next;

	pfn -= addr >> PAGE_SHIFT;
	pud = pud_alloc(mm, pgd, addr);
	if (!pud)
		return -ENOMEM;
	do {
		next = pud_addr_end(addr, end);
		if (remap_pmd_range(mm, pud, addr, next,
				pfn + (addr >> PAGE_SHIFT), prot))
			return -ENOMEM;
	} while (pud++, addr = next, addr != end);
	return 0;
}

/**
 * remap_pfn_range - remap kernel memory to userspace
 * @vma: user vma to map to
 * @addr: target user address to start at
 * @pfn: physical address of kernel memory
 * @size: size of map area
 * @prot: page protection flags for this mapping
 *
 *  Note: this is only safe if the mm semaphore is held when called.
 */
int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
		    unsigned long pfn, unsigned long size, pgprot_t prot)
{
	pgd_t *pgd;
	unsigned long next;
	unsigned long end = addr + PAGE_ALIGN(size);
	struct mm_struct *mm = vma->vm_mm;
	int err;

	/*
	 * Physically remapped pages are special. Tell the
	 * rest of the world about it:
	 *   VM_IO tells people not to look at these pages
	 *	(accesses can have side effects).
	 *   VM_RESERVED is specified all over the place, because
	 *	in 2.4 it kept swapout's vma scan off this vma; but
	 *	in 2.6 the LRU scan won't even find its pages, so this
	 *	flag means no more than count its pages in reserved_vm,
	 * 	and omit it from core dump, even when VM_IO turned off.
	 *   VM_PFNMAP tells the core MM that the base pages are just
	 *	raw PFN mappings, and do not have a "struct page" associated
	 *	with them.
	 *
	 * There's a horrible special case to handle copy-on-write
	 * behaviour that some programs depend on. We mark the "original"
	 * un-COW'ed pages by matching them up with "vma->vm_pgoff".
	 */
	if (is_cow_mapping(vma->vm_flags)) {
		if (addr != vma->vm_start || end != vma->vm_end)
			return -EINVAL;
		vma->vm_pgoff = pfn;
	}

	vma->vm_flags |= VM_IO | VM_RESERVED | VM_PFNMAP;

	BUG_ON(addr >= end);
	pfn -= addr >> PAGE_SHIFT;
	pgd = pgd_offset(mm, addr);
	flush_cache_range(vma, addr, end);
	do {
		next = pgd_addr_end(addr, end);
		err = remap_pud_range(mm, pgd, addr, next,
				pfn + (addr >> PAGE_SHIFT), prot);
		if (err)
			break;
	} while (pgd++, addr = next, addr != end);
	return err;
}
EXPORT_SYMBOL(remap_pfn_range);

static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
				     unsigned long addr, unsigned long end,
				     pte_fn_t fn, void *data)
{
	pte_t *pte;
	int err;
	struct page *pmd_page;
	spinlock_t *uninitialized_var(ptl);

	pte = (mm == &init_mm) ?
		pte_alloc_kernel(pmd, addr) :
		pte_alloc_map_lock(mm, pmd, addr, &ptl);
	if (!pte)
		return -ENOMEM;

	BUG_ON(pmd_huge(*pmd));

	pmd_page = pmd_page(*pmd);

	do {
		err = fn(pte, pmd_page, addr, data);
		if (err)
			break;
	} while (pte++, addr += PAGE_SIZE, addr != end);

	if (mm != &init_mm)
		pte_unmap_unlock(pte-1, ptl);
	return err;
}

static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud,
				     unsigned long addr, unsigned long end,
				     pte_fn_t fn, void *data)
{
	pmd_t *pmd;
	unsigned long next;
	int err;

	pmd = pmd_alloc(mm, pud, addr);
	if (!pmd)
		return -ENOMEM;
	do {
		next = pmd_addr_end(addr, end);
		err = apply_to_pte_range(mm, pmd, addr, next, fn, data);
		if (err)
			break;
	} while (pmd++, addr = next, addr != end);
	return err;
}

static int apply_to_pud_range(struct mm_struct *mm, pgd_t *pgd,
				     unsigned long addr, unsigned long end,
				     pte_fn_t fn, void *data)
{
	pud_t *pud;
	unsigned long next;
	int err;

	pud = pud_alloc(mm, pgd, addr);
	if (!pud)
		return -ENOMEM;
	do {
		next = pud_addr_end(addr, end);
		err = apply_to_pmd_range(mm, pud, addr, next, fn, data);
		if (err)
			break;
	} while (pud++, addr = next, addr != end);
	return err;
}

/*
 * Scan a region of virtual memory, filling in page tables as necessary
 * and calling a provided function on each leaf page table.
 */
int apply_to_page_range(struct mm_struct *mm, unsigned long addr,
			unsigned long size, pte_fn_t fn, void *data)
{
	pgd_t *pgd;
	unsigned long next;
	unsigned long end = addr + size;
	int err;

	BUG_ON(addr >= end);
	pgd = pgd_offset(mm, addr);
	do {
		next = pgd_addr_end(addr, end);
		err = apply_to_pud_range(mm, pgd, addr, next, fn, data);
		if (err)
			break;
	} while (pgd++, addr = next, addr != end);
	return err;
}
EXPORT_SYMBOL_GPL(apply_to_page_range);

/*
 * handle_pte_fault chooses page fault handler according to an entry
 * which was read non-atomically.  Before making any commitment, on
 * those architectures or configurations (e.g. i386 with PAE) which
 * might give a mix of unmatched parts, do_swap_page and do_file_page
 * must check under lock before unmapping the pte and proceeding
 * (but do_wp_page is only called after already making such a check;
 * and do_anonymous_page and do_no_page can safely check later on).
 */
static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
				pte_t *page_table, pte_t orig_pte)
{
	int same = 1;
#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
	if (sizeof(pte_t) > sizeof(unsigned long)) {
		spinlock_t *ptl = pte_lockptr(mm, pmd);
		spin_lock(ptl);
		same = pte_same(*page_table, orig_pte);
		spin_unlock(ptl);
	}
#endif
	pte_unmap(page_table);
	return same;
}

/*
 * Do pte_mkwrite, but only if the vma says VM_WRITE.  We do this when
 * servicing faults for write access.  In the normal case, do always want
 * pte_mkwrite.  But get_user_pages can cause write faults for mappings
 * that do not have writing enabled, when used by access_process_vm.
 */
static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
{
	if (likely(vma->vm_flags & VM_WRITE))
		pte = pte_mkwrite(pte);
	return pte;
}

/*
 * This routine handles present pages, when users try to write
 * to a shared page. It is done by copying the page to a new address
 * and decrementing the shared-page counter for the old page.
 *
 * Note that this routine assumes that the protection checks have been
 * done by the caller (the low-level page fault routine in most cases).
 * Thus we can safely just mark it writable once we've done any necessary
 * COW.
 *
 * We also mark the page dirty at this point even though the page will
 * change only once the write actually happens. This avoids a few races,
 * and potentially makes it more efficient.
 *
 * We enter with non-exclusive mmap_sem (to exclude vma changes,
 * but allow concurrent faults), with pte both mapped and locked.
 * We return with mmap_sem still held, but pte unmapped and unlocked.
 */
static int do_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
		unsigned long address, pte_t *page_table, pmd_t *pmd,
		spinlock_t *ptl, pte_t orig_pte)
{
	struct page *old_page, *new_page;
	pte_t entry;
	int reuse = 0, ret = 0;
	int page_mkwrite = 0;
	struct page *dirty_page = NULL;

	old_page = vm_normal_page(vma, address, orig_pte);
	if (!old_page)
		goto gotten;

	/*
	 * Take out anonymous pages first, anonymous shared vmas are
	 * not dirty accountable.
	 */
	if (PageAnon(old_page)) {
		if (!TestSetPageLocked(old_page)) {
			reuse = can_share_swap_page(old_page);
			unlock_page(old_page);
		}
	} else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
					(VM_WRITE|VM_SHARED))) {
		/*
		 * Only catch write-faults on shared writable pages,
		 * read-only shared pages can get COWed by
		 * get_user_pages(.write=1, .force=1).
		 */
		if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
			/*
			 * Notify the address space that the page is about to
			 * become writable so that it can prohibit this or wait
			 * for the page to get into an appropriate state.
			 *
			 * We do this without the lock held, so that it can
			 * sleep if it needs to.
			 */
			page_cache_get(old_page);
			pte_unmap_unlock(page_table, ptl);

			if (vma->vm_ops->page_mkwrite(vma, old_page) < 0)
				goto unwritable_page;

			/*
			 * Since we dropped the lock we need to revalidate
			 * the PTE as someone else may have changed it.  If
			 * they did, we just return, as we can count on the
			 * MMU to tell us if they didn't also make it writable.
			 */
			page_table = pte_offset_map_lock(mm, pmd, address,
							 &ptl);
			page_cache_release(old_page);
			if (!pte_same(*page_table, orig_pte))
				goto unlock;

			page_mkwrite = 1;
		}
		dirty_page = old_page;
		get_page(dirty_page);
		reuse = 1;
	}

	if (reuse) {
		flush_cache_page(vma, address, pte_pfn(orig_pte));
		entry = pte_mkyoung(orig_pte);
		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
		if (ptep_set_access_flags(vma, address, page_table, entry,1))
			update_mmu_cache(vma, address, entry);
		ret |= VM_FAULT_WRITE;
		goto unlock;
	}

	/*
	 * Ok, we need to copy. Oh, well..
	 */
	page_cache_get(old_page);
gotten:
	pte_unmap_unlock(page_table, ptl);

	if (unlikely(anon_vma_prepare(vma)))
		goto oom;
	VM_BUG_ON(old_page == ZERO_PAGE(0));
	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
	if (!new_page)
		goto oom;
	cow_user_page(new_page, old_page, address, vma);

	/*
	 * Re-check the pte - we dropped the lock
	 */
	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
	if (likely(pte_same(*page_table, orig_pte))) {
		if (old_page) {
			page_remove_rmap(old_page, vma);
			if (!PageAnon(old_page)) {
				dec_mm_counter(mm, file_rss);
				inc_mm_counter(mm, anon_rss);
			}
		} else
			inc_mm_counter(mm, anon_rss);
		flush_cache_page(vma, address, pte_pfn(orig_pte));
		entry = mk_pte(new_page, vma->vm_page_prot);
		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
		/*
		 * Clear the pte entry and flush it first, before updating the
		 * pte with the new entry. This will avoid a race condition
		 * seen in the presence of one thread doing SMC and another
		 * thread doing COW.
		 */
		ptep_clear_flush(vma, address, page_table);
		set_pte_at(mm, address, page_table, entry);
		update_mmu_cache(vma, address, entry);
		lru_cache_add_active(new_page);
		page_add_new_anon_rmap(new_page, vma, address);

		/* Free the old page.. */
		new_page = old_page;
		ret |= VM_FAULT_WRITE;
	}
	if (new_page)
		page_cache_release(new_page);
	if (old_page)
		page_cache_release(old_page);
unlock:
	pte_unmap_unlock(page_table, ptl);
	if (dirty_page) {
		if (vma->vm_file)
			file_update_time(vma->vm_file);

		/*
		 * Yes, Virginia, this is actually required to prevent a race
		 * with clear_page_dirty_for_io() from clearing the page dirty
		 * bit after it clear all dirty ptes, but before a racing
		 * do_wp_page installs a dirty pte.
		 *
		 * do_no_page is protected similarly.
		 */
		wait_on_page_locked(dirty_page);
		set_page_dirty_balance(dirty_page, page_mkwrite);
		put_page(dirty_page);
	}
	return ret;
oom:
	if (old_page)
		page_cache_release(old_page);
	return VM_FAULT_OOM;

unwritable_page:
	page_cache_release(old_page);
	return VM_FAULT_SIGBUS;
}

/*
 * Helper functions for unmap_mapping_range().
 *
 * __ Notes on dropping i_mmap_lock to reduce latency while unmapping __
 *
 * We have to restart searching the prio_tree whenever we drop the lock,
 * since the iterator is only valid while the lock is held, and anyway
 * a later vma might be split and reinserted earlier while lock dropped.
 *
 * The list of nonlinear vmas could be handled more efficiently, using
 * a placeholder, but handle it in the same way until a need is shown.
 * It is important to search the prio_tree before nonlinear list: a vma
 * may become nonlinear and be shifted from prio_tree to nonlinear list
 * while the lock is dropped; but never shifted from list to prio_tree.
 *
 * In order to make forward progress despite restarting the search,
 * vm_truncate_count is used to mark a vma as now dealt with, so we can
 * quickly skip it next time around.  Since the prio_tree search only
 * shows us those vmas affected by unmapping the range in question, we
 * can't efficiently keep all vmas in step with mapping->truncate_count:
 * so instead reset them all whenever it wraps back to 0 (then go to 1).
 * mapping->truncate_count and vma->vm_truncate_count are protected by
 * i_mmap_lock.
 *
 * In order to make forward progress despite repeatedly restarting some
 * large vma, note the restart_addr from unmap_vmas when it breaks out:
 * and restart from that address when we reach that vma again.  It might
 * have been split or merged, shrunk or extended, but never shifted: so
 * restart_addr remains valid so long as it remains in the vma's range.
 * unmap_mapping_range forces truncate_count to leap over page-aligned
 * values so we can save vma's restart_addr in its truncate_count field.
 */
#define is_restart_addr(truncate_count) (!((truncate_count) & ~PAGE_MASK))

static void reset_vma_truncate_counts(struct address_space *mapping)
{
	struct vm_area_struct *vma;
	struct prio_tree_iter iter;

	vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, 0, ULONG_MAX)
		vma->vm_truncate_count = 0;
	list_for_each_entry(vma, &mapping->i_mmap_nonlinear, shared.vm_set.list)
		vma->vm_truncate_count = 0;
}

static int unmap_mapping_range_vma(struct vm_area_struct *vma,
		unsigned long start_addr, unsigned long end_addr,
		struct zap_details *details)
{
	unsigned long restart_addr;
	int need_break;

	/*
	 * files that support invalidating or truncating portions of the
	 * file from under mmaped areas must have their ->fault function
	 * return a locked page (and set VM_FAULT_LOCKED in the return).
	 * This provides synchronisation against concurrent unmapping here.
	 */

again:
	restart_addr = vma->vm_truncate_count;
	if (is_restart_addr(restart_addr) && start_addr < restart_addr) {
		start_addr = restart_addr;
		if (start_addr >= end_addr) {
			/* Top of vma has been split off since last time */
			vma->vm_truncate_count = details->truncate_count;
			return 0;
		}
	}

	restart_addr = zap_page_range(vma, start_addr,
					end_addr - start_addr, details);
	need_break = need_resched() ||
			need_lockbreak(details->i_mmap_lock);

	if (restart_addr >= end_addr) {
		/* We have now completed this vma: mark it so */
		vma->vm_truncate_count = details->truncate_count;
		if (!need_break)
			return 0;
	} else {
		/* Note restart_addr in vma's truncate_count field */
		vma->vm_truncate_count = restart_addr;
		if (!need_break)
			goto again;
	}

	spin_unlock(details->i_mmap_lock);
	cond_resched();
	spin_lock(details->i_mmap_lock);
	return -EINTR;
}

static inline void unmap_mapping_range_tree(struct prio_tree_root *root,
					    struct zap_details *details)
{
	struct vm_area_struct *vma;
	struct prio_tree_iter iter;
	pgoff_t vba, vea, zba, zea;

restart:
	vma_prio_tree_foreach(vma, &iter, root,
			details->first_index, details->last_index) {
		/* Skip quickly over those we have already dealt with */
		if (vma->vm_truncate_count == details->truncate_count)
			continue;

		vba = vma->vm_pgoff;
		vea = vba + ((vma->vm_end - vma->vm_start) >> PAGE_SHIFT) - 1;
		/* Assume for now that PAGE_CACHE_SHIFT == PAGE_SHIFT */
		zba = details->first_index;
		if (zba < vba)
			zba = vba;
		zea = details->last_index;
		if (zea > vea)
			zea = vea;

		if (unmap_mapping_range_vma(vma,
			((zba - vba) << PAGE_SHIFT) + vma->vm_start,
			((zea - vba + 1) << PAGE_SHIFT) + vma->vm_start,
				details) < 0)
			goto restart;
	}
}

static inline void unmap_mapping_range_list(struct list_head *head,
					    struct zap_details *details)
{
	struct vm_area_struct *vma;

	/*
	 * In nonlinear VMAs there is no correspondence between virtual address
	 * offset and file offset.  So we must perform an exhaustive search
	 * across *all* the pages in each nonlinear VMA, not just the pages
	 * whose virtual address lies outside the file truncation point.
	 */
restart:
	list_for_each_entry(vma, head, shared.vm_set.list) {
		/* Skip quickly over those we have already dealt with */
		if (vma->vm_truncate_count == details->truncate_count)
			continue;
		details->nonlinear_vma = vma;
		if (unmap_mapping_range_vma(vma, vma->vm_start,
					vma->vm_end, details) < 0)
			goto restart;
	}
}

/**
 * unmap_mapping_range - unmap the portion of all mmaps in the specified address_space corresponding to the specified page range in the underlying file.
 * @mapping: the address space containing mmaps to be unmapped.
 * @holebegin: byte in first page to unmap, relative to the start of
 * the underlying file.  This will be rounded down to a PAGE_SIZE
 * boundary.  Note that this is different from vmtruncate(), which
 * must keep the partial page.  In contrast, we must get rid of
 * partial pages.
 * @holelen: size of prospective hole in bytes.  This will be rounded
 * up to a PAGE_SIZE boundary.  A holelen of zero truncates to the
 * end of the file.
 * @even_cows: 1 when truncating a file, unmap even private COWed pages;
 * but 0 when invalidating pagecache, don't throw away private data.
 */
void unmap_mapping_range(struct address_space *mapping,
		loff_t const holebegin, loff_t const holelen, int even_cows)
{
	struct zap_details details;
	pgoff_t hba = holebegin >> PAGE_SHIFT;
	pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;

	/* Check for overflow. */
	if (sizeof(holelen) > sizeof(hlen)) {
		long long holeend =
			(holebegin + holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
		if (holeend & ~(long long)ULONG_MAX)
			hlen = ULONG_MAX - hba + 1;
	}

	details.check_mapping = even_cows? NULL: mapping;
	details.nonlinear_vma = NULL;
	details.first_index = hba;
	details.last_index = hba + hlen - 1;
	if (details.last_index < details.first_index)
		details.last_index = ULONG_MAX;
	details.i_mmap_lock = &mapping->i_mmap_lock;

	spin_lock(&mapping->i_mmap_lock);

	/* Protect against endless unmapping loops */
	mapping->truncate_count++;
	if (unlikely(is_restart_addr(mapping->truncate_count))) {
		if (mapping->truncate_count == 0)
			reset_vma_truncate_counts(mapping);
		mapping->truncate_count++;
	}
	details.truncate_count = mapping->truncate_count;

	if (unlikely(!prio_tree_empty(&mapping->i_mmap)))
		unmap_mapping_range_tree(&mapping->i_mmap, &details);
	if (unlikely(!list_empty(&mapping->i_mmap_nonlinear)))
		unmap_mapping_range_list(&mapping->i_mmap_nonlinear, &details);
	spin_unlock(&mapping->i_mmap_lock);
}
EXPORT_SYMBOL(unmap_mapping_range);

/**
 * vmtruncate - unmap mappings "freed" by truncate() syscall
 * @inode: inode of the file used
 * @offset: file offset to start truncating
 *
 * NOTE! We have to be ready to update the memory sharing
 * between the file and the memory map for a potential last
 * incomplete page.  Ugly, but necessary.
 */
int vmtruncate(struct inode * inode, loff_t offset)
{
	struct address_space *mapping = inode->i_mapping;
	unsigned long limit;

	if (inode->i_size < offset)
		goto do_expand;
	/*
	 * truncation of in-use swapfiles is disallowed - it would cause
	 * subsequent swapout to scribble on the now-freed blocks.
	 */
	if (IS_SWAPFILE(inode))
		goto out_busy;
	i_size_write(inode, offset);

	/*
	 * unmap_mapping_range is called twice, first simply for efficiency
	 * so that truncate_inode_pages does fewer single-page unmaps. However
	 * after this first call, and before truncate_inode_pages finishes,
	 * it is possible for private pages to be COWed, which remain after
	 * truncate_inode_pages finishes, hence the second unmap_mapping_range
	 * call must be made for correctness.
	 */
	unmap_mapping_range(mapping, offset + PAGE_SIZE - 1, 0, 1);
	truncate_inode_pages(mapping, offset);
	unmap_mapping_range(mapping, offset + PAGE_SIZE - 1, 0, 1);
	goto out_truncate;

do_expand:
	limit = current->signal->rlim[RLIMIT_FSIZE].rlim_cur;
	if (limit != RLIM_INFINITY && offset > limit)
		goto out_sig;
	if (offset > inode->i_sb->s_maxbytes)
		goto out_big;
	i_size_write(inode, offset);

out_truncate:
	if (inode->i_op && inode->i_op->truncate)
		inode->i_op->truncate(inode);
	return 0;
out_sig:
	send_sig(SIGXFSZ, current, 0);
out_big:
	return -EFBIG;
out_busy:
	return -ETXTBSY;
}
EXPORT_SYMBOL(vmtruncate);

int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end)
{
	struct address_space *mapping = inode->i_mapping;

	/*
	 * If the underlying filesystem is not going to provide
	 * a way to truncate a range of blocks (punch a hole) -
	 * we should return failure right now.
	 */
	if (!inode->i_op || !inode->i_op->truncate_range)
		return -ENOSYS;

	mutex_lock(&inode->i_mutex);
	down_write(&inode->i_alloc_sem);
	unmap_mapping_range(mapping, offset, (end - offset), 1);
	truncate_inode_pages_range(mapping, offset, end);
	unmap_mapping_range(mapping, offset, (end - offset), 1);
	inode->i_op->truncate_range(inode, offset, end);
	up_write(&inode->i_alloc_sem);
	mutex_unlock(&inode->i_mutex);

	return 0;
}

/**
 * swapin_readahead - swap in pages in hope we need them soon
 * @entry: swap entry of this memory
 * @addr: address to start
 * @vma: user vma this addresses belong to
 *
 * Primitive swap readahead code. We simply read an aligned block of
 * (1 << page_cluster) entries in the swap area. This method is chosen
 * because it doesn't cost us any seek time.  We also make sure to queue
 * the 'original' request together with the readahead ones...
 *
 * This has been extended to use the NUMA policies from the mm triggering
 * the readahead.
 *
 * Caller must hold down_read on the vma->vm_mm if vma is not NULL.
 */
void swapin_readahead(swp_entry_t entry, unsigned long addr,struct vm_area_struct *vma)
{
#ifdef CONFIG_NUMA
	struct vm_area_struct *next_vma = vma ? vma->vm_next : NULL;
#endif
	int i, num;
	struct page *new_page;
	unsigned long offset;

	/*
	 * Get the number of handles we should do readahead io to.
	 */
	num = valid_swaphandles(entry, &offset);
	for (i = 0; i < num; offset++, i++) {
		/* Ok, do the async read-ahead now */
		new_page = read_swap_cache_async(swp_entry(swp_type(entry),
							   offset), vma, addr);
		if (!new_page)
			break;
		page_cache_release(new_page);
#ifdef CONFIG_NUMA
		/*
		 * Find the next applicable VMA for the NUMA policy.
		 */
		addr += PAGE_SIZE;
		if (addr == 0)
			vma = NULL;
		if (vma) {
			if (addr >= vma->vm_end) {
				vma = next_vma;
				next_vma = vma ? vma->vm_next : NULL;
			}
			if (vma && addr < vma->vm_start)
				vma = NULL;
		} else {
			if (next_vma && addr >= next_vma->vm_start) {
				vma = next_vma;
				next_vma = vma->vm_next;
			}
		}
#endif
	}
	lru_add_drain();	/* Push any new pages onto the LRU now */
}

/*
 * We enter with non-exclusive mmap_sem (to exclude vma changes,
 * but allow concurrent faults), and pte mapped but not yet locked.
 * We return with mmap_sem still held, but pte unmapped and unlocked.
 */
static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
		unsigned long address, pte_t *page_table, pmd_t *pmd,
		int write_access, pte_t orig_pte)
{
	spinlock_t *ptl;
	struct page *page;
	swp_entry_t entry;
	pte_t pte;
	int ret = 0;

	if (!pte_unmap_same(mm, pmd, page_table, orig_pte))
		goto out;

	entry = pte_to_swp_entry(orig_pte);
	if (is_migration_entry(entry)) {
		migration_entry_wait(mm, pmd, address);
		goto out;
	}
	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
	page = lookup_swap_cache(entry);
	if (!page) {
		grab_swap_token(); /* Contend for token _before_ read-in */
 		swapin_readahead(entry, address, vma);
 		page = read_swap_cache_async(entry, vma, address);
		if (!page) {
			/*
			 * Back out if somebody else faulted in this pte
			 * while we released the pte lock.
			 */
			page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
			if (likely(pte_same(*page_table, orig_pte)))
				ret = VM_FAULT_OOM;
			delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
			goto unlock;
		}

		/* Had to read the page from swap area: Major fault */
		ret = VM_FAULT_MAJOR;
		count_vm_event(PGMAJFAULT);
	}

	mark_page_accessed(page);
	lock_page(page);
	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);

	/*
	 * Back out if somebody else already faulted in this pte.
	 */
	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
	if (unlikely(!pte_same(*page_table, orig_pte)))
		goto out_nomap;

	if (unlikely(!PageUptodate(page))) {
		ret = VM_FAULT_SIGBUS;
		goto out_nomap;
	}

	/* The page isn't present yet, go ahead with the fault. */

	inc_mm_counter(mm, anon_rss);
	pte = mk_pte(page, vma->vm_page_prot);
	if (write_access && can_share_swap_page(page)) {
		pte = maybe_mkwrite(pte_mkdirty(pte), vma);
		write_access = 0;
	}

	flush_icache_page(vma, page);
	set_pte_at(mm, address, page_table, pte);
	page_add_anon_rmap(page, vma, address);

	swap_free(entry);
	if (vm_swap_full())
		remove_exclusive_swap_page(page);
	unlock_page(page);

	if (write_access) {
		/* XXX: We could OR the do_wp_page code with this one? */
		if (do_wp_page(mm, vma, address,
				page_table, pmd, ptl, pte) & VM_FAULT_OOM)
			ret = VM_FAULT_OOM;
		goto out;
	}

	/* No need to invalidate - it was non-present before */
	update_mmu_cache(vma, address, pte);
unlock:
	pte_unmap_unlock(page_table, ptl);
out:
	return ret;
out_nomap:
	pte_unmap_unlock(page_table, ptl);
	unlock_page(page);
	page_cache_release(page);
	return ret;
}

/*
 * We enter with non-exclusive mmap_sem (to exclude vma changes,
 * but allow concurrent faults), and pte mapped but not yet locked.
 * We return with mmap_sem still held, but pte unmapped and unlocked.
 */
static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
		unsigned long address, pte_t *page_table, pmd_t *pmd,
		int write_access)
{
	struct page *page;
	spinlock_t *ptl;
	pte_t entry;

	/* Allocate our own private page. */
	pte_unmap(page_table);

	if (unlikely(anon_vma_prepare(vma)))
		goto oom;
	page = alloc_zeroed_user_highpage_movable(vma, address);
	if (!page)
		goto oom;

	entry = mk_pte(page, vma->vm_page_prot);
	entry = maybe_mkwrite(pte_mkdirty(entry), vma);

	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
	if (!pte_none(*page_table))
		goto release;
	inc_mm_counter(mm, anon_rss);
	lru_cache_add_active(page);
	page_add_new_anon_rmap(page, vma, address);
	set_pte_at(mm, address, page_table, entry);

	/* No need to invalidate - it was non-present before */
	update_mmu_cache(vma, address, entry);
unlock:
	pte_unmap_unlock(page_table, ptl);
	return 0;
release:
	page_cache_release(page);
	goto unlock;
oom:
	return VM_FAULT_OOM;
}

/*
 * __do_fault() tries to create a new page mapping. It aggressively
 * tries to share with existing pages, but makes a separate copy if
 * the FAULT_FLAG_WRITE is set in the flags parameter in order to avoid
 * the next page fault.
 *
 * As this is called only for pages that do not currently exist, we
 * do not need to flush old virtual caches or the TLB.
 *
 * We enter with non-exclusive mmap_sem (to exclude vma changes,
 * but allow concurrent faults), and pte neither mapped nor locked.
 * We return with mmap_sem still held, but pte unmapped and unlocked.
 */
static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
		unsigned long address, pmd_t *pmd,
		pgoff_t pgoff, unsigned int flags, pte_t orig_pte)
{
	pte_t *page_table;
	spinlock_t *ptl;
	struct page *page;
	pte_t entry;
	int anon = 0;
	struct page *dirty_page = NULL;
	struct vm_fault vmf;
	int ret;
	int page_mkwrite = 0;

	vmf.virtual_address = (void __user *)(address & PAGE_MASK);
	vmf.pgoff = pgoff;
	vmf.flags = flags;
	vmf.page = NULL;

	BUG_ON(vma->vm_flags & VM_PFNMAP);

	if (likely(vma->vm_ops->fault)) {
		ret = vma->vm_ops->fault(vma, &vmf);
		if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE)))
			return ret;
	} else {
		/* Legacy ->nopage path */
		ret = 0;
		vmf.page = vma->vm_ops->nopage(vma, address & PAGE_MASK, &ret);
		/* no page was available -- either SIGBUS or OOM */
		if (unlikely(vmf.page == NOPAGE_SIGBUS))
			return VM_FAULT_SIGBUS;
		else if (unlikely(vmf.page == NOPAGE_OOM))
			return VM_FAULT_OOM;
	}

	/*
	 * For consistency in subsequent calls, make the faulted page always
	 * locked.
	 */
	if (unlikely(!(ret & VM_FAULT_LOCKED)))
		lock_page(vmf.page);
	else
		VM_BUG_ON(!PageLocked(vmf.page));

	/*
	 * Should we do an early C-O-W break?
	 */
	page = vmf.page;
	if (flags & FAULT_FLAG_WRITE) {
		if (!(vma->vm_flags & VM_SHARED)) {
			anon = 1;
			if (unlikely(anon_vma_prepare(vma))) {
				ret = VM_FAULT_OOM;
				goto out;
			}
			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE,
						vma, address);
			if (!page) {
				ret = VM_FAULT_OOM;
				goto out;
			}
			copy_user_highpage(page, vmf.page, address, vma);
		} else {
			/*
			 * If the page will be shareable, see if the backing
			 * address space wants to know that the page is about
			 * to become writable
			 */
			if (vma->vm_ops->page_mkwrite) {
				unlock_page(page);
				if (vma->vm_ops->page_mkwrite(vma, page) < 0) {
					ret = VM_FAULT_SIGBUS;
					anon = 1; /* no anon but release vmf.page */
					goto out_unlocked;
				}
				lock_page(page);
				/*
				 * XXX: this is not quite right (racy vs
				 * invalidate) to unlock and relock the page
				 * like this, however a better fix requires
				 * reworking page_mkwrite locking API, which
				 * is better done later.
				 */
				if (!page->mapping) {
					ret = 0;
					anon = 1; /* no anon but release vmf.page */
					goto out;
				}
				page_mkwrite = 1;
			}
		}

	}

	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);

	/*
	 * This silly early PAGE_DIRTY setting removes a race
	 * due to the bad i386 page protection. But it's valid
	 * for other architectures too.
	 *
	 * Note that if write_access is true, we either now have
	 * an exclusive copy of the page, or this is a shared mapping,
	 * so we can make it writable and dirty to avoid having to
	 * handle that later.
	 */
	/* Only go through if we didn't race with anybody else... */
	if (likely(pte_same(*page_table, orig_pte))) {
		flush_icache_page(vma, page);
		entry = mk_pte(page, vma->vm_page_prot);
		if (flags & FAULT_FLAG_WRITE)
			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
		set_pte_at(mm, address, page_table, entry);
		if (anon) {
                        inc_mm_counter(mm, anon_rss);
                        lru_cache_add_active(page);
                        page_add_new_anon_rmap(page, vma, address);
		} else {
			inc_mm_counter(mm, file_rss);
			page_add_file_rmap(page);
			if (flags & FAULT_FLAG_WRITE) {
				dirty_page = page;
				get_page(dirty_page);
			}
		}

		/* no need to invalidate: a not-present page won't be cached */
		update_mmu_cache(vma, address, entry);
	} else {
		if (anon)
			page_cache_release(page);
		else
			anon = 1; /* no anon but release faulted_page */
	}

	pte_unmap_unlock(page_table, ptl);

out:
	unlock_page(vmf.page);
out_unlocked:
	if (anon)
		page_cache_release(vmf.page);
	else if (dirty_page) {
		if (vma->vm_file)
			file_update_time(vma->vm_file);

		set_page_dirty_balance(dirty_page, page_mkwrite);
		put_page(dirty_page);
	}

	return ret;
}

static int do_linear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
		unsigned long address, pte_t *page_table, pmd_t *pmd,
		int write_access, pte_t orig_pte)
{
	pgoff_t pgoff = (((address & PAGE_MASK)
			- vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
	unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);

	pte_unmap(page_table);
	return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
}


/*
 * do_no_pfn() tries to create a new page mapping for a page without
 * a struct_page backing it
 *
 * As this is called only for pages that do not currently exist, we
 * do not need to flush old virtual caches or the TLB.
 *
 * We enter with non-exclusive mmap_sem (to exclude vma changes,
 * but allow concurrent faults), and pte mapped but not yet locked.
 * We return with mmap_sem still held, but pte unmapped and unlocked.
 *
 * It is expected that the ->nopfn handler always returns the same pfn
 * for a given virtual mapping.
 *
 * Mark this `noinline' to prevent it from bloating the main pagefault code.
 */
static noinline int do_no_pfn(struct mm_struct *mm, struct vm_area_struct *vma,
		     unsigned long address, pte_t *page_table, pmd_t *pmd,
		     int write_access)
{
	spinlock_t *ptl;
	pte_t entry;
	unsigned long pfn;

	pte_unmap(page_table);
	BUG_ON(!(vma->vm_flags & VM_PFNMAP));
	BUG_ON(is_cow_mapping(vma->vm_flags));

	pfn = vma->vm_ops->nopfn(vma, address & PAGE_MASK);
	if (unlikely(pfn == NOPFN_OOM))
		return VM_FAULT_OOM;
	else if (unlikely(pfn == NOPFN_SIGBUS))
		return VM_FAULT_SIGBUS;
	else if (unlikely(pfn == NOPFN_REFAULT))
		return 0;

	page_table = pte_offset_map_lock(mm, pmd, address, &ptl);

	/* Only go through if we didn't race with anybody else... */
	if (pte_none(*page_table)) {
		entry = pfn_pte(pfn, vma->vm_page_prot);
		if (write_access)
			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
		set_pte_at(mm, address, page_table, entry);
	}
	pte_unmap_unlock(page_table, ptl);
	return 0;
}

/*
 * Fault of a previously existing named mapping. Repopulate the pte
 * from the encoded file_pte if possible. This enables swappable
 * nonlinear vmas.
 *
 * We enter with non-exclusive mmap_sem (to exclude vma changes,
 * but allow concurrent faults), and pte mapped but not yet locked.
 * We return with mmap_sem still held, but pte unmapped and unlocked.
 */
static int do_nonlinear_fault(struct mm_struct *mm, struct vm_area_struct *vma,
		unsigned long address, pte_t *page_table, pmd_t *pmd,
		int write_access, pte_t orig_pte)
{
	unsigned int flags = FAULT_FLAG_NONLINEAR |
				(write_access ? FAULT_FLAG_WRITE : 0);
	pgoff_t pgoff;

	if (!pte_unmap_same(mm, pmd, page_table, orig_pte))
		return 0;

	if (unlikely(!(vma->vm_flags & VM_NONLINEAR) ||
			!(vma->vm_flags & VM_CAN_NONLINEAR))) {
		/*
		 * Page table corrupted: show pte and kill process.
		 */
		print_bad_pte(vma, orig_pte, address);
		return VM_FAULT_OOM;
	}

	pgoff = pte_to_pgoff(orig_pte);
	return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
}

/*
 * These routines also need to handle stuff like marking pages dirty
 * and/or accessed for architectures that don't do it in hardware (most
 * RISC architectures).  The early dirtying is also good on the i386.
 *
 * There is also a hook called "update_mmu_cache()" that architectures
 * with external mmu caches can use to update those (ie the Sparc or
 * PowerPC hashed page tables that act as extended TLBs).
 *
 * We enter with non-exclusive mmap_sem (to exclude vma changes,
 * but allow concurrent faults), and pte mapped but not yet locked.
 * We return with mmap_sem still held, but pte unmapped and unlocked.
 */
static inline int handle_pte_fault(struct mm_struct *mm,
		struct vm_area_struct *vma, unsigned long address,
		pte_t *pte, pmd_t *pmd, int write_access)
{
	pte_t entry;
	spinlock_t *ptl;

	entry = *pte;
	if (!pte_present(entry)) {
		if (pte_none(entry)) {
			if (vma->vm_ops) {
				if (vma->vm_ops->fault || vma->vm_ops->nopage)
					return do_linear_fault(mm, vma, address,
						pte, pmd, write_access, entry);
				if (unlikely(vma->vm_ops->nopfn))
					return do_no_pfn(mm, vma, address, pte,
							 pmd, write_access);
			}
			return do_anonymous_page(mm, vma, address,
						 pte, pmd, write_access);
		}
		if (pte_file(entry))
			return do_nonlinear_fault(mm, vma, address,
					pte, pmd, write_access, entry);
		return do_swap_page(mm, vma, address,
					pte, pmd, write_access, entry);
	}

	ptl = pte_lockptr(mm, pmd);
	spin_lock(ptl);
	if (unlikely(!pte_same(*pte, entry)))
		goto unlock;
	if (write_access) {
		if (!pte_write(entry))
			return do_wp_page(mm, vma, address,
					pte, pmd, ptl, entry);
		entry = pte_mkdirty(entry);
	}
	entry = pte_mkyoung(entry);
	if (ptep_set_access_flags(vma, address, pte, entry, write_access)) {
		update_mmu_cache(vma, address, entry);
	} else {
		/*
		 * This is needed only for protection faults but the arch code
		 * is not yet telling us if this is a protection fault or not.
		 * This still avoids useless tlb flushes for .text page faults
		 * with threads.
		 */
		if (write_access)
			flush_tlb_page(vma, address);
	}
unlock:
	pte_unmap_unlock(pte, ptl);
	return 0;
}

/*
 * By the time we get here, we already hold the mm semaphore
 */
int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
		unsigned long address, int write_access)
{
	pgd_t *pgd;
	pud_t *pud;
	pmd_t *pmd;
	pte_t *pte;

	__set_current_state(TASK_RUNNING);

	count_vm_event(PGFAULT);

	if (unlikely(is_vm_hugetlb_page(vma)))
		return hugetlb_fault(mm, vma, address, write_access);

	pgd = pgd_offset(mm, address);
	pud = pud_alloc(mm, pgd, address);
	if (!pud)
		return VM_FAULT_OOM;
	pmd = pmd_alloc(mm, pud, address);
	if (!pmd)
		return VM_FAULT_OOM;
	pte = pte_alloc_map(mm, pmd, address);
	if (!pte)
		return VM_FAULT_OOM;

	return handle_pte_fault(mm, vma, address, pte, pmd, write_access);
}

#ifndef __PAGETABLE_PUD_FOLDED
/*
 * Allocate page upper directory.
 * We've already handled the fast-path in-line.
 */
int __pud_alloc(struct mm_struct *mm, pgd_t *pgd, unsigned long address)
{
	pud_t *new = pud_alloc_one(mm, address);
	if (!new)
		return -ENOMEM;

	spin_lock(&mm->page_table_lock);
	if (pgd_present(*pgd))		/* Another has populated it */
		pud_free(new);
	else
		pgd_populate(mm, pgd, new);
	spin_unlock(&mm->page_table_lock);
	return 0;
}
#endif /* __PAGETABLE_PUD_FOLDED */

#ifndef __PAGETABLE_PMD_FOLDED
/*
 * Allocate page middle directory.
 * We've already handled the fast-path in-line.
 */
int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
{
	pmd_t *new = pmd_alloc_one(mm, address);
	if (!new)
		return -ENOMEM;

	spin_lock(&mm->page_table_lock);
#ifndef __ARCH_HAS_4LEVEL_HACK
	if (pud_present(*pud))		/* Another has populated it */
		pmd_free(new);
	else
		pud_populate(mm, pud, new);
#else
	if (pgd_present(*pud))		/* Another has populated it */
		pmd_free(new);
	else
		pgd_populate(mm, pud, new);
#endif /* __ARCH_HAS_4LEVEL_HACK */
	spin_unlock(&mm->page_table_lock);
	return 0;
}
#endif /* __PAGETABLE_PMD_FOLDED */

int make_pages_present(unsigned long addr, unsigned long end)
{
	int ret, len, write;
	struct vm_area_struct * vma;

	vma = find_vma(current->mm, addr);
	if (!vma)
		return -1;
	write = (vma->vm_flags & VM_WRITE) != 0;
	BUG_ON(addr >= end);
	BUG_ON(end > vma->vm_end);
	len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE;
	ret = get_user_pages(current, current->mm, addr,
			len, write, 0, NULL, NULL);
	if (ret < 0)
		return ret;
	return ret == len ? 0 : -1;
}

/* 
 * Map a vmalloc()-space virtual address to the physical page.
 */
struct page * vmalloc_to_page(void * vmalloc_addr)
{
	unsigned long addr = (unsigned long) vmalloc_addr;
	struct page *page = NULL;
	pgd_t *pgd = pgd_offset_k(addr);
	pud_t *pud;
	pmd_t *pmd;
	pte_t *ptep, pte;
  
	if (!pgd_none(*pgd)) {
		pud = pud_offset(pgd, addr);
		if (!pud_none(*pud)) {
			pmd = pmd_offset(pud, addr);
			if (!pmd_none(*pmd)) {
				ptep = pte_offset_map(pmd, addr);
				pte = *ptep;
				if (pte_present(pte))
					page = pte_page(pte);
				pte_unmap(ptep);
			}
		}
	}
	return page;
}

EXPORT_SYMBOL(vmalloc_to_page);

/*
 * Map a vmalloc()-space virtual address to the physical page frame number.
 */
unsigned long vmalloc_to_pfn(void * vmalloc_addr)
{
	return page_to_pfn(vmalloc_to_page(vmalloc_addr));
}

EXPORT_SYMBOL(vmalloc_to_pfn);

#if !defined(__HAVE_ARCH_GATE_AREA)

#if defined(AT_SYSINFO_EHDR)
static struct vm_area_struct gate_vma;

static int __init gate_vma_init(void)
{
	gate_vma.vm_mm = NULL;
	gate_vma.vm_start = FIXADDR_USER_START;
	gate_vma.vm_end = FIXADDR_USER_END;
	gate_vma.vm_flags = VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC;
	gate_vma.vm_page_prot = __P101;
	/*
	 * Make sure the vDSO gets into every core dump.
	 * Dumping its contents makes post-mortem fully interpretable later
	 * without matching up the same kernel and hardware config to see
	 * what PC values meant.
	 */
	gate_vma.vm_flags |= VM_ALWAYSDUMP;
	return 0;
}
__initcall(gate_vma_init);
#endif

struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
{
#ifdef AT_SYSINFO_EHDR
	return &gate_vma;
#else
	return NULL;
#endif
}

int in_gate_area_no_task(unsigned long addr)
{
#ifdef AT_SYSINFO_EHDR
	if ((addr >= FIXADDR_USER_START) && (addr < FIXADDR_USER_END))
		return 1;
#endif
	return 0;
}

#endif	/* __HAVE_ARCH_GATE_AREA */

/*
 * Access another process' address space.
 * Source/target buffer must be kernel space,
 * Do not walk the page table directly, use get_user_pages
 */
int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write)
{
	struct mm_struct *mm;
	struct vm_area_struct *vma;
	struct page *page;
	void *old_buf = buf;

	mm = get_task_mm(tsk);
	if (!mm)
		return 0;

	down_read(&mm->mmap_sem);
	/* ignore errors, just check how much was successfully transferred */
	while (len) {
		int bytes, ret, offset;
		void *maddr;

		ret = get_user_pages(tsk, mm, addr, 1,
				write, 1, &page, &vma);
		if (ret <= 0)
			break;

		bytes = len;
		offset = addr & (PAGE_SIZE-1);
		if (bytes > PAGE_SIZE-offset)
			bytes = PAGE_SIZE-offset;

		maddr = kmap(page);
		if (write) {
			copy_to_user_page(vma, page, addr,
					  maddr + offset, buf, bytes);
			set_page_dirty_lock(page);
		} else {
			copy_from_user_page(vma, page, addr,
					    buf, maddr + offset, bytes);
		}
		kunmap(page);
		page_cache_release(page);
		len -= bytes;
		buf += bytes;
		addr += bytes;
	}
	up_read(&mm->mmap_sem);
	mmput(mm);

	return buf - old_buf;
}

#ifdef CONFIG_IPIPE

static inline int ipipe_pin_pte_range(struct mm_struct *mm, pmd_t *pmd,
				      struct vm_area_struct *vma,
				      unsigned long addr, unsigned long end)
{
	spinlock_t *ptl;
	pte_t *pte;
	
	do {
		pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
		if (!pte)
			continue;

		if (!pte_present(*pte)) {
			pte_unmap_unlock(pte, ptl);
			continue;
		}

		if (do_wp_page(mm, vma, addr, pte, pmd, ptl, *pte) == VM_FAULT_OOM)
			return -ENOMEM;
	} while (addr += PAGE_SIZE, addr != end);
	return 0;
}

static inline int ipipe_pin_pmd_range(struct mm_struct *mm, pud_t *pud,
				      struct vm_area_struct *vma,
				      unsigned long addr, unsigned long end)
{
	unsigned long next;
	pmd_t *pmd;

	pmd = pmd_offset(pud, addr);
	do {
		next = pmd_addr_end(addr, end);
		if (pmd_none_or_clear_bad(pmd))
			continue;
		if (ipipe_pin_pte_range(mm, pmd, vma, addr, next))
			return -ENOMEM;
	} while (pmd++, addr = next, addr != end);
	return 0;
}

static inline int ipipe_pin_pud_range(struct mm_struct *mm, pgd_t *pgd,
				      struct vm_area_struct *vma,
				      unsigned long addr, unsigned long end)
{
	unsigned long next;
	pud_t *pud;

	pud = pud_offset(pgd, addr);
	do {
		next = pud_addr_end(addr, end);
		if (pud_none_or_clear_bad(pud))
			continue;
		if (ipipe_pin_pmd_range(mm, pud, vma, addr, next))
			return -ENOMEM;
	} while (pud++, addr = next, addr != end);
	return 0;
}

int ipipe_disable_ondemand_mappings(struct task_struct *tsk)
{
	unsigned long addr, next, end;
	struct vm_area_struct *vma;
	struct mm_struct *mm;
	int result = 0;
	pgd_t *pgd;

	mm = get_task_mm(tsk);
	if (!mm)
		return -EPERM;

	down_write(&mm->mmap_sem);
	if (mm->def_flags & VM_PINNED)
		goto done_mm;

	for (vma = mm->mmap; vma; vma = vma->vm_next) {
		if (!is_cow_mapping(vma->vm_flags))
			continue;

		addr = vma->vm_start;
		end = vma->vm_end;
		
		pgd = pgd_offset(mm, addr);
		do {
			next = pgd_addr_end(addr, end);
			if (pgd_none_or_clear_bad(pgd))
				continue;
			if (ipipe_pin_pud_range(mm, pgd, vma, addr, next)) {
				result = -ENOMEM;
				goto done_mm;
			}
		} while (pgd++, addr = next, addr != end);
	}
	mm->def_flags |= VM_PINNED;

  done_mm:
	up_write(&mm->mmap_sem);
	mmput(mm);
	return result;
}

EXPORT_SYMBOL(ipipe_disable_ondemand_mappings);

#endif

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 15:02                               ` Tomas Kalibera
@ 2008-04-02 15:07                                 ` Gilles Chanteperdrix
  2008-04-02 18:14                                   ` Tomas Kalibera
  2008-04-02 19:38                                   ` Tomas Kalibera
  0 siblings, 2 replies; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-02 15:07 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

On Wed, Apr 2, 2008 at 5:02 PM, Tomas Kalibera <kalibera@domain.hid> wrote:
>
>  OK, no change with this patch compared to the previous situation. The
> system boots, but hangs without a stacktrace when I run my Xenomai task.
> SysRq is blocked, now even SysRq-kill did not work, only SysRq-boot did.

Are you sure you did not keep the stuff in highmem_32.c ?


-- 
 Gilles


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 15:07                                 ` Gilles Chanteperdrix
@ 2008-04-02 18:14                                   ` Tomas Kalibera
  2008-04-02 19:38                                   ` Tomas Kalibera
  1 sibling, 0 replies; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-02 18:14 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core


Hmm, I checked again and did not find a mistake in the experiment 
(neither using the old binary nor old sources). I'm doing a clean build 
from scratch again, so that we can be absolutely sure. I can then run 
memtest on the machine...

Tomas

Gilles Chanteperdrix wrote:
> On Wed, Apr 2, 2008 at 5:02 PM, Tomas Kalibera <kalibera@domain.hid> wrote:
>   
>>  OK, no change with this patch compared to the previous situation. The
>> system boots, but hangs without a stacktrace when I run my Xenomai task.
>> SysRq is blocked, now even SysRq-kill did not work, only SysRq-boot did.
>>     
>
> Are you sure you did not keep the stuff in highmem_32.c ?
>
>
>   



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 15:07                                 ` Gilles Chanteperdrix
  2008-04-02 18:14                                   ` Tomas Kalibera
@ 2008-04-02 19:38                                   ` Tomas Kalibera
  2008-04-02 19:42                                     ` Bill Gatliff
                                                       ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-02 19:38 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core


Hi Gilles,

I've recompiled the kernel again from scratch and got the same lock up. 
Fix 5 does not help... If you want to inspect the exact kernel I used, 
it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with 
"p5" in its name.

Tomas

Gilles Chanteperdrix wrote:
> On Wed, Apr 2, 2008 at 5:02 PM, Tomas Kalibera <kalibera@domain.hid> wrote:
>   
>>  OK, no change with this patch compared to the previous situation. The
>> system boots, but hangs without a stacktrace when I run my Xenomai task.
>> SysRq is blocked, now even SysRq-kill did not work, only SysRq-boot did.
>>     
>
> Are you sure you did not keep the stuff in highmem_32.c ?
>
>
>   



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 19:38                                   ` Tomas Kalibera
@ 2008-04-02 19:42                                     ` Bill Gatliff
  2008-04-02 19:44                                     ` Gilles Chanteperdrix
  2008-04-02 19:52                                     ` Gilles Chanteperdrix
  2 siblings, 0 replies; 34+ messages in thread
From: Bill Gatliff @ 2008-04-02 19:42 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

Tomas Kalibera wrote:
> Hi Gilles,
> 
> I've recompiled the kernel again from scratch and got the same lock up. 
> Fix 5 does not help... If you want to inspect the exact kernel I used, 
> it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with 
> "p5" in its name.

You aren't using gcc-4.2 or later, are you?  I've had problems with 
those for building and/or running kernels.  On non-x86 targets, mind 
you, but maybe there's a connection...


b.g.
-- 
Bill Gatliff
bgat@domain.hid


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 19:38                                   ` Tomas Kalibera
  2008-04-02 19:42                                     ` Bill Gatliff
@ 2008-04-02 19:44                                     ` Gilles Chanteperdrix
  2008-04-02 21:45                                       ` Tomas Kalibera
  2008-04-02 19:52                                     ` Gilles Chanteperdrix
  2 siblings, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-02 19:44 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

Tomas Kalibera wrote:
 > 
 > Hi Gilles,
 > 
 > I've recompiled the kernel again from scratch and got the same lock up. 
 > Fix 5 does not help... If you want to inspect the exact kernel I used, 
 > it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with 
 > "p5" in its name.
 > 
 > Tomas

But... do you get the lock-up without the patch ?

-- 


					    Gilles.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 19:38                                   ` Tomas Kalibera
  2008-04-02 19:42                                     ` Bill Gatliff
  2008-04-02 19:44                                     ` Gilles Chanteperdrix
@ 2008-04-02 19:52                                     ` Gilles Chanteperdrix
  2008-04-02 21:37                                       ` Tomas Kalibera
  2 siblings, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-02 19:52 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

Tomas Kalibera wrote:
 > 
 > Hi Gilles,
 > 
 > I've recompiled the kernel again from scratch and got the same lock up. 
 > Fix 5 does not help... If you want to inspect the exact kernel I used, 
 > it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with 
 > "p5" in its name.

permission denied to download kernel configuration.

-- 


					    Gilles.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 19:52                                     ` Gilles Chanteperdrix
@ 2008-04-02 21:37                                       ` Tomas Kalibera
  0 siblings, 0 replies; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-02 21:37 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> Tomas Kalibera wrote:
>  > 
>  > Hi Gilles,
>  > 
>  > I've recompiled the kernel again from scratch and got the same lock up. 
>  > Fix 5 does not help... If you want to inspect the exact kernel I used, 
>  > it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with 
>  > "p5" in its name.
>
> permission denied to download kernel configuration.
>
>   
Sorry. Permissions fixed.
T




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 19:44                                     ` Gilles Chanteperdrix
@ 2008-04-02 21:45                                       ` Tomas Kalibera
  2008-04-02 22:34                                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-02 21:45 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> Tomas Kalibera wrote:
>  > 
>  > Hi Gilles,
>  > 
>  > I've recompiled the kernel again from scratch and got the same lock up. 
>  > Fix 5 does not help... If you want to inspect the exact kernel I used, 
>  > it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with 
>  > "p5" in its name.
>  > 
>  > Tomas
>
> But... do you get the lock-up without the patch ?
>
>   
No. Or, more precisely, not the same one. With this patch (5), the 
system locks up as soon as the application starts. It  does not print 
any stack trace.
Without the patch, the system gets to unusable state when the 
application calls clone/fork, and it does produce a stack trace (those I 
was sending you before). It seems to be more alive (processes start, but 
crash, because of garbled preempt_count).

The crashes are perfectly repeatable on the system I have. So, the 
crashes make no sense to you, right ?  I can indeed try to go the 
defensive path and try an older kernel or something, but if there is a 
Xenomai bug, it would be nice to find it...  The same for kernel bug, 
indeed.

Tomas


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 21:45                                       ` Tomas Kalibera
@ 2008-04-02 22:34                                         ` Gilles Chanteperdrix
  2008-04-02 22:53                                           ` Gilles Chanteperdrix
  2008-04-02 23:46                                           ` Tomas Kalibera
  0 siblings, 2 replies; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-02 22:34 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

Tomas Kalibera wrote:
 > Gilles Chanteperdrix wrote:
 > > Tomas Kalibera wrote:
 > >  > 
 > >  > Hi Gilles,
 > >  > 
 > >  > I've recompiled the kernel again from scratch and got the same lock up. 
 > >  > Fix 5 does not help... If you want to inspect the exact kernel I used, 
 > >  > it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with 
 > >  > "p5" in its name.
 > >  > 
 > >  > Tomas
 > >
 > > But... do you get the lock-up without the patch ?
 > >
 > >   
 > No. Or, more precisely, not the same one. With this patch (5), the 
 > system locks up as soon as the application starts. It  does not print 
 > any stack trace.
 > Without the patch, the system gets to unusable state when the 
 > application calls clone/fork, and it does produce a stack trace (those I 
 > was sending you before). It seems to be more alive (processes start, but 
 > crash, because of garbled preempt_count).
 > 
 > The crashes are perfectly repeatable on the system I have. So, the 
 > crashes make no sense to you, right ?  I can indeed try to go the 
 > defensive path and try an older kernel or something, but if there is a 
 > Xenomai bug, it would be nice to find it...  The same for kernel bug, 
 > indeed.

Of course, we are looking for all bugs. But please tell me: do you get
the lock-up even before fork is called ? If not, could you verify that
at least some Xenomai programs run correctly, for instance latency ?
Looking at the code, I think I found a bug, but I doubt it could cause a
lockup. The definition of VM_PINNED in include/linux/mm.h collides with
another bit used by Linux, so this defintion should be changed from:
#define VM_PINNED 0x08000000
to:
#define VM_PINNED 0x10000000

I will now try, if possible, to reproduce the bug on a x86 box of mine
and will keep you informed.

-- 


					    Gilles.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 22:34                                         ` Gilles Chanteperdrix
@ 2008-04-02 22:53                                           ` Gilles Chanteperdrix
  2008-04-03 17:31                                             ` Tomas Kalibera
  2008-04-02 23:46                                           ` Tomas Kalibera
  1 sibling, 1 reply; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-02 22:53 UTC (permalink / raw)
  To: Tomas Kalibera, xenomai-core

[-- Attachment #1: message body and .signature --]
[-- Type: text/plain, Size: 1936 bytes --]

Gilles Chanteperdrix wrote:
 > Tomas Kalibera wrote:
 >  > Gilles Chanteperdrix wrote:
 >  > > Tomas Kalibera wrote:
 >  > >  > 
 >  > >  > Hi Gilles,
 >  > >  > 
 >  > >  > I've recompiled the kernel again from scratch and got the same lock up. 
 >  > >  > Fix 5 does not help... If you want to inspect the exact kernel I used, 
 >  > >  > it's again at http://www.cs.purdue.edu/~tkaliber/crash, the one with 
 >  > >  > "p5" in its name.
 >  > >  > 
 >  > >  > Tomas
 >  > >
 >  > > But... do you get the lock-up without the patch ?
 >  > >
 >  > >   
 >  > No. Or, more precisely, not the same one. With this patch (5), the 
 >  > system locks up as soon as the application starts. It  does not print 
 >  > any stack trace.
 >  > Without the patch, the system gets to unusable state when the 
 >  > application calls clone/fork, and it does produce a stack trace (those I 
 >  > was sending you before). It seems to be more alive (processes start, but 
 >  > crash, because of garbled preempt_count).
 >  > 
 >  > The crashes are perfectly repeatable on the system I have. So, the 
 >  > crashes make no sense to you, right ?  I can indeed try to go the 
 >  > defensive path and try an older kernel or something, but if there is a 
 >  > Xenomai bug, it would be nice to find it...  The same for kernel bug, 
 >  > indeed.
 > 
 > Of course, we are looking for all bugs. But please tell me: do you get
 > the lock-up even before fork is called ? If not, could you verify that
 > at least some Xenomai programs run correctly, for instance latency ?
 > Looking at the code, I think I found a bug, but I doubt it could cause a
 > lockup. The definition of VM_PINNED in include/linux/mm.h collides with
 > another bit used by Linux, so this defintion should be changed from:
 > #define VM_PINNED 0x08000000
 > to:
 > #define VM_PINNED 0x10000000

Here comes a 6th patch for this bug, (patch 6 includes patch 5).

-- 


					    Gilles.

[-- Attachment #2: ipipe-kmap_atomic-bug.6.diff --]
[-- Type: text/plain, Size: 1349 bytes --]

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7f27db6..a0c80c7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -104,10 +104,11 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_MAPPED_COPY	0x01000000	/* T if mapped copy of data (nommu mmap) */
 #define VM_INSERTPAGE	0x02000000	/* The vma has had "vm_insert_page()" done on it */
 #define VM_ALWAYSDUMP	0x04000000	/* Always include in core dumps */
-#define VM_PINNED	0x08000000	/* Disable faults for the vma */
 
 #define VM_CAN_NONLINEAR 0x08000000	/* Has ->fault & does nonlinear pages */
 
+#define VM_PINNED	0x10000000	/* Disable faults for the vma */
+
 #ifndef VM_STACK_DEFAULT_FLAGS		/* arch can override this */
 #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
 #endif
diff --git a/mm/memory.c b/mm/memory.c
index e0e7a3d..32d6cb6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -584,8 +584,14 @@ again:
 			if (is_cow_mapping(vma->vm_flags)) {
 				if (((vma->vm_flags|src_mm->def_flags) & (VM_LOCKED|VM_PINNED))
 				    == (VM_LOCKED|VM_PINNED)) {
+				    	arch_leave_lazy_mmu_mode();
+				    	spin_unlock(src_ptl);
+				    	pte_unmap_nested(src_pte);
+				    	add_mm_rss(dst_mm, rss[0], rss[1]);
+				    	pte_unmap_unlock(dst_pte, dst_ptl);
+				    	cond_resched();
 					do_cow_break = 1;
-					break;
+					goto again;
 				}
 			}
 		}

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 22:34                                         ` Gilles Chanteperdrix
  2008-04-02 22:53                                           ` Gilles Chanteperdrix
@ 2008-04-02 23:46                                           ` Tomas Kalibera
  2008-04-03  9:04                                             ` Gilles Chanteperdrix
  1 sibling, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-02 23:46 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
> Of course, we are looking for all bugs. But please tell me: do you get
> the lock-up even before fork is called ? If not, could you verify that
> at least some Xenomai programs run correctly, for instance latency ?
>   
The lock up with patch 5 happens before fork is called, but after a 
real-time task is started by the program. I don't know better now - I'd 
have to add more logging.  If I run in strace, the lock-up does not happen.

Thinking about that, it can be a bug in my program. If I understand the 
concept of Xenomai correctly, I can just write a real-time task that 
would starve the Linux kernel indefinitely, correct ? My program 
definitely does have bugs. So I'll do more debugging.

The lock-up does NOT happen with latency. But, the bug in the kernel 
without patch 5 (the one that lead to a stack trac, after call to fork), 
did not appear in latency, either.
> Looking at the code, I think I found a bug, but I doubt it could cause a
> lockup. The definition of VM_PINNED in include/linux/mm.h collides with
> another bit used by Linux, so this defintion should be changed from:
> #define VM_PINNED 0x08000000
> to:
> #define VM_PINNED 0x10000000
>
> I will now try, if possible, to reproduce the bug on a x86 box of mine
> and will keep you informed.
>   
Thanks ! I'll indeed build kernel with patch 6, test again, and test my 
application.

Tomas



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 23:46                                           ` Tomas Kalibera
@ 2008-04-03  9:04                                             ` Gilles Chanteperdrix
  0 siblings, 0 replies; 34+ messages in thread
From: Gilles Chanteperdrix @ 2008-04-03  9:04 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

On Thu, Apr 3, 2008 at 1:46 AM, Tomas Kalibera <kalibera@domain.hid> wrote:
> Gilles Chanteperdrix wrote:
>
> > Of course, we are looking for all bugs. But please tell me: do you get
> > the lock-up even before fork is called ? If not, could you verify that
> > at least some Xenomai programs run correctly, for instance latency ?
> >
> >
>  The lock up with patch 5 happens before fork is called, but after a
> real-time task is started by the program. I don't know better now - I'd have
> to add more logging.  If I run in strace, the lock-up does not happen.
>
>  Thinking about that, it can be a bug in my program. If I understand the
> concept of Xenomai correctly, I can just write a real-time task that would
> starve the Linux kernel indefinitely, correct ? My program definitely does
> have bugs. So I'll do more debugging.

Yes, you can starve Linux, but after 4seconds the Xenomai watchdog
should trigger. You can also starve Linux with a vanilla Linux
application running with the SCHED_FIFO scheduling policy, but in this
case, it is Linux soft lockup detector which should trigger. I see
that you have the two options enabled, so, the lockup is probably
another bug.

-- 
 Gilles


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-02 22:53                                           ` Gilles Chanteperdrix
@ 2008-04-03 17:31                                             ` Tomas Kalibera
  2008-04-05  4:32                                               ` Tomas Kalibera
  0 siblings, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-03 17:31 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core

Gilles Chanteperdrix wrote:
>  > Of course, we are looking for all bugs. But please tell me: do you get
>  > the lock-up even before fork is called ? If not, could you verify that
>  > at least some Xenomai programs run correctly, for instance latency ?
>  > Looking at the code, I think I found a bug, but I doubt it could cause a
>  > lockup. The definition of VM_PINNED in include/linux/mm.h collides with
>  > another bit used by Linux, so this defintion should be changed from:
>  > #define VM_PINNED 0x08000000
>  > to:
>  > #define VM_PINNED 0x10000000
>
> Here comes a 6th patch for this bug, (patch 6 includes patch 5).
>
>   
I've tested the 6th patch, the lockup is still there.
As far as I can observe, it behaves exactly like with the 5th patch.

Tomas





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-03 17:31                                             ` Tomas Kalibera
@ 2008-04-05  4:32                                               ` Tomas Kalibera
  2008-04-05  9:10                                                 ` Jan Kiszka
  0 siblings, 1 reply; 34+ messages in thread
From: Tomas Kalibera @ 2008-04-05  4:32 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-core


Hi,

I've tried a more defensive kernel setup & your patch (no.6). The lockup 
is still there. It  happens after a realtime task is started, though I 
was unable to track exactly when -  it does not crash in a debugger, 
does not crash with strace, breaks SysRq, and printing log messages 
seems to be delayed (despite flushing). I tried changing the application 
code (like using more default flags when creating a task, etc). But I 
did not find a workaround.

I've put the kernel on the web again, including the config (the one that 
contains "xenomaidp6"). Maybe it might help to track down the bug... 
Maybe not.

Do you have any ideas how to work it around ?

Thanks
Tomas





Tomas Kalibera wrote:
> Gilles Chanteperdrix wrote:
>   
>>  > Of course, we are looking for all bugs. But please tell me: do you get
>>  > the lock-up even before fork is called ? If not, could you verify that
>>  > at least some Xenomai programs run correctly, for instance latency ?
>>  > Looking at the code, I think I found a bug, but I doubt it could cause a
>>  > lockup. The definition of VM_PINNED in include/linux/mm.h collides with
>>  > another bit used by Linux, so this defintion should be changed from:
>>  > #define VM_PINNED 0x08000000
>>  > to:
>>  > #define VM_PINNED 0x10000000
>>
>> Here comes a 6th patch for this bug, (patch 6 includes patch 5).
>>
>>   
>>     
> I've tested the 6th patch, the lockup is still there.
> As far as I can observe, it behaves exactly like with the 5th patch.
>
> Tomas
>
>
>
>
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core
>   



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Xenomai-core] Kernel crash with Xenomai (caused by fork?)
  2008-04-05  4:32                                               ` Tomas Kalibera
@ 2008-04-05  9:10                                                 ` Jan Kiszka
  0 siblings, 0 replies; 34+ messages in thread
From: Jan Kiszka @ 2008-04-05  9:10 UTC (permalink / raw)
  To: Tomas Kalibera; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1485 bytes --]

Tomas Kalibera wrote:
> Hi,
> 
> I've tried a more defensive kernel setup & your patch (no.6). The lockup 
> is still there. It  happens after a realtime task is started, though I 
> was unable to track exactly when -  it does not crash in a debugger, 
> does not crash with strace, breaks SysRq, and printing log messages 
> seems to be delayed (despite flushing). I tried changing the application 
> code (like using more default flags when creating a task, etc). But I 
> did not find a workaround.
> 
> I've put the kernel on the web again, including the config (the one that 
> contains "xenomaidp6"). Maybe it might help to track down the bug... 
> Maybe not.

Jumping late on this, I didn't find any (user space) test case for the 
observed bug in this thread. Can you provide something? The simpler, the 
better. It may even contain bugs itself, it just has to trigger the 
kernel oops reliably.

Then I saw in your .config that your kernel is optimized for AMD K6. In 
order to prepare "exporting" the bug, could you check that more generic 
CONFIG_M586TSC makes no difference? Also, if you happen to have a 
second, different box (/wrt CPU type & speed) at hand, it would be nice 
to know that the issue is present there as well. But the latter is also 
something we can try once a test case is available. My preferred target 
will be QEMU, because that one can quite nicely be debugged even if the 
box is hopelessly locked up.

Thanks,
Jan

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2008-04-05  9:10 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-28 21:06 [Xenomai-core] Kernel crash with Xenomai (caused by fork?) Tomas Kalibera
2008-03-28 23:25 ` Gilles Chanteperdrix
2008-03-29  0:08   ` Gilles Chanteperdrix
2008-03-29  1:36     ` Tomas Kalibera
2008-03-29 20:17       ` Gilles Chanteperdrix
2008-03-30 20:27       ` Gilles Chanteperdrix
2008-03-31  4:04         ` Tomas Kalibera
2008-03-31 20:21           ` Gilles Chanteperdrix
2008-03-31 20:30             ` Gilles Chanteperdrix
2008-04-01  0:00               ` Tomas Kalibera
2008-04-01  5:52                 ` Gilles Chanteperdrix
2008-04-01  7:59                   ` Gilles Chanteperdrix
2008-04-01 13:54                   ` Tomas Kalibera
2008-04-01 14:03                     ` Gilles Chanteperdrix
2008-04-01 15:45                       ` Tomas Kalibera
2008-04-01 15:58                         ` Gilles Chanteperdrix
2008-04-01 21:23                           ` Tomas Kalibera
2008-04-02  8:42                             ` Gilles Chanteperdrix
2008-04-02 15:02                               ` Tomas Kalibera
2008-04-02 15:07                                 ` Gilles Chanteperdrix
2008-04-02 18:14                                   ` Tomas Kalibera
2008-04-02 19:38                                   ` Tomas Kalibera
2008-04-02 19:42                                     ` Bill Gatliff
2008-04-02 19:44                                     ` Gilles Chanteperdrix
2008-04-02 21:45                                       ` Tomas Kalibera
2008-04-02 22:34                                         ` Gilles Chanteperdrix
2008-04-02 22:53                                           ` Gilles Chanteperdrix
2008-04-03 17:31                                             ` Tomas Kalibera
2008-04-05  4:32                                               ` Tomas Kalibera
2008-04-05  9:10                                                 ` Jan Kiszka
2008-04-02 23:46                                           ` Tomas Kalibera
2008-04-03  9:04                                             ` Gilles Chanteperdrix
2008-04-02 19:52                                     ` Gilles Chanteperdrix
2008-04-02 21:37                                       ` Tomas Kalibera

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.