PFs on pages pinned with get_user

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* PFs on pages pinned with get_user_pages()
@ 2009-01-29  8:05 Frank Mehnert
  2009-01-29 12:28 ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Frank Mehnert @ 2009-01-29  8:05 UTC (permalink / raw)
  To: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

Hi,

please could someone explain me under which circumstances a pagefault,
either generated from kernel code or from userland code, can occur on
pages which are pinned with get_user_pages()?

So far my understanding was that this can _never_ happen but I seems to
be wrong. Under high memory pressure I get PFs on such pages raised from
kernel code and the PFs are handled by do_swap_page(). When this happens,
page_count is 3 but page_mapped() returns false.

Thanks in advance,

Frank
-- 
Dr.-Ing. Frank Mehnert    Sun Microsystems    http://www.sun.com/

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-29  8:05 PFs on pages pinned with get_user_pages() Frank Mehnert
@ 2009-01-29 12:28 ` Peter Zijlstra
  2009-01-29 13:08   ` Frank Mehnert
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-01-29 12:28 UTC (permalink / raw)
  To: Frank Mehnert; +Cc: Linux Kernel Mailing List

On Thu, 2009-01-29 at 09:05 +0100, Frank Mehnert wrote:
> Hi,
> 
> please could someone explain me under which circumstances a pagefault,
> either generated from kernel code or from userland code, can occur on
> pages which are pinned with get_user_pages()?
> 
> So far my understanding was that this can _never_ happen but I seems to
> be wrong. Under high memory pressure I get PFs on such pages raised from
> kernel code and the PFs are handled by do_swap_page(). When this happens,
> page_count is 3 but page_mapped() returns false.

Under memory pressure the page reclaim will first unmap the physical
page from the virtual address range, and then try to free it.

Obviously the freeing bit fails if you hold a reference to it, but the
unmap will work.

After that, userspace will have to (minor) fault the stuff back in.

Also, that same page-reclaim, or pdflush might decide to write out dirty
data, which will also result in (minor) faults when userspace will
re-dirty the pages.

Having a page reference will only avoid the physical page from getting
removed from its current mapping (and thereby also pins the mapping).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-29 12:28 ` Peter Zijlstra
@ 2009-01-29 13:08   ` Frank Mehnert
  2009-01-29 13:43     ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Frank Mehnert @ 2009-01-29 13:08 UTC (permalink / raw)
  To: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1658 bytes --]

Peter,

On Thursday 29 January 2009, Peter Zijlstra wrote:
> On Thu, 2009-01-29 at 09:05 +0100, Frank Mehnert wrote:
> > please could someone explain me under which circumstances a pagefault,
> > either generated from kernel code or from userland code, can occur on
> > pages which are pinned with get_user_pages()?
> >
> > So far my understanding was that this can _never_ happen but I seems to
> > be wrong. Under high memory pressure I get PFs on such pages raised from
> > kernel code and the PFs are handled by do_swap_page(). When this happens,
> > page_count is 3 but page_mapped() returns false.
>
> Under memory pressure the page reclaim will first unmap the physical
> page from the virtual address range, and then try to free it.

Which means the page table entry is removed but the physical page
is not swapped out, right?

> Obviously the freeing bit fails if you hold a reference to it, but the
> unmap will work.

Right.

> After that, userspace will have to (minor) fault the stuff back in.

So do_swap_page does only 'restore' the page table entry, no further
reading from the swapfile is necessary?

> Also, that same page-reclaim, or pdflush might decide to write out dirty
> data, which will also result in (minor) faults when userspace will
> re-dirty the pages.
>
> Having a page reference will only avoid the physical page from getting
> removed from its current mapping (and thereby also pins the mapping).

Question: Is it possible to prevent these minor page faults at all?

Thank you very much for your answer!

Frank
-- 
Dr.-Ing. Frank Mehnert    Sun Microsystems    http://www.sun.com/

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-29 13:08   ` Frank Mehnert
@ 2009-01-29 13:43     ` Peter Zijlstra
  2009-01-29 14:02       ` Frank Mehnert
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-01-29 13:43 UTC (permalink / raw)
  To: Frank Mehnert; +Cc: Linux Kernel Mailing List

On Thu, 2009-01-29 at 14:08 +0100, Frank Mehnert wrote:
> Peter,

(please retain CC's)

> On Thursday 29 January 2009, Peter Zijlstra wrote:
> > On Thu, 2009-01-29 at 09:05 +0100, Frank Mehnert wrote:
> > > please could someone explain me under which circumstances a pagefault,
> > > either generated from kernel code or from userland code, can occur on
> > > pages which are pinned with get_user_pages()?
> > >
> > > So far my understanding was that this can _never_ happen but I seems to
> > > be wrong. Under high memory pressure I get PFs on such pages raised from
> > > kernel code and the PFs are handled by do_swap_page(). When this happens,
> > > page_count is 3 but page_mapped() returns false.
> >
> > Under memory pressure the page reclaim will first unmap the physical
> > page from the virtual address range, and then try to free it.
> 
> Which means the page table entry is removed but the physical page
> is not swapped out, right?

Correct.

> > Obviously the freeing bit fails if you hold a reference to it, but the
> > unmap will work.
> 
> Right.
> 
> > After that, userspace will have to (minor) fault the stuff back in.
> 
> So do_swap_page does only 'restore' the page table entry, no further
> reading from the swapfile is necessary?

Indeed.

> > Also, that same page-reclaim, or pdflush might decide to write out dirty
> > data, which will also result in (minor) faults when userspace will
> > re-dirty the pages.
> >
> > Having a page reference will only avoid the physical page from getting
> > removed from its current mapping (and thereby also pins the mapping).
> 
> Question: Is it possible to prevent these minor page faults at all?

Not without some serious tinkering to the VM -- and in the case of the
dirty fault, not at all.

Why are you asking?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-29 13:43     ` Peter Zijlstra
@ 2009-01-29 14:02       ` Frank Mehnert
  2009-01-29 14:20         ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Frank Mehnert @ 2009-01-29 14:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 2217 bytes --]

On Thursday 29 January 2009, Peter Zijlstra wrote:
> On Thu, 2009-01-29 at 14:08 +0100, Frank Mehnert wrote:
> > Peter,
>
> (please retain CC's)
>
> > On Thursday 29 January 2009, Peter Zijlstra wrote:
> > > On Thu, 2009-01-29 at 09:05 +0100, Frank Mehnert wrote:
> > > > please could someone explain me under which circumstances a
> > > > pagefault, either generated from kernel code or from userland code,
> > > > can occur on pages which are pinned with get_user_pages()?
> > > >
> > > > So far my understanding was that this can _never_ happen but I seems
> > > > to be wrong. Under high memory pressure I get PFs on such pages
> > > > raised from kernel code and the PFs are handled by do_swap_page().
> > > > When this happens, page_count is 3 but page_mapped() returns false.
> > >
> > > Under memory pressure the page reclaim will first unmap the physical
> > > page from the virtual address range, and then try to free it.
> >
> > Which means the page table entry is removed but the physical page
> > is not swapped out, right?
>
> Correct.
>
> > > Obviously the freeing bit fails if you hold a reference to it, but the
> > > unmap will work.
> >
> > Right.
> >
> > > After that, userspace will have to (minor) fault the stuff back in.

[...]

> > Question: Is it possible to prevent these minor page faults at all?
>
> Not without some serious tinkering to the VM -- and in the case of the
> dirty fault, not at all.
>
> Why are you asking?

I'm one of the VirtualBox developers. We are trying to fix the annoying
kerneloops warning 'BUG: sleeping function called from invalid context'
reported by the Fedora folks. This warning occurs when do_swap_page()
calls lock_page() and in_atomic() returns true.

This warning appears when we touch into memory which is pinned with
get_user_pages(). In VT-x/AMD-V mode we are executing some code in the
context of the Linux kernel. To prevent scheduling of the current CPU
core we disable the interripts. preempt_disable() would be probably the
better choice but this would oops as well if CONFIG_PREEMPT is enabled.

Kind regards,

Frank
-- 
Dr.-Ing. Frank Mehnert    Sun Microsystems    http://www.sun.com/

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-29 14:02       ` Frank Mehnert
@ 2009-01-29 14:20         ` Peter Zijlstra
  2009-01-29 14:41           ` Frank Mehnert
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-01-29 14:20 UTC (permalink / raw)
  To: Frank Mehnert; +Cc: linux-kernel

On Thu, 2009-01-29 at 15:02 +0100, Frank Mehnert wrote:

> I'm one of the VirtualBox developers. We are trying to fix the annoying
> kerneloops warning 'BUG: sleeping function called from invalid context'
> reported by the Fedora folks. This warning occurs when do_swap_page()
> calls lock_page() and in_atomic() returns true.
> 
> This warning appears when we touch into memory which is pinned with
> get_user_pages(). In VT-x/AMD-V mode we are executing some code in the
> context of the Linux kernel. To prevent scheduling of the current CPU
> core we disable the interripts. preempt_disable() would be probably the
> better choice but this would oops as well if CONFIG_PREEMPT is enabled.

but to get there, you'd have to have called handle_mm_fault() which
requires the mmap_sem, which should also give that might_sleep()
warning.

That aside, is there any reason you have to avoid scheduling? Otherwise
I would just allow so and be done with it.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-29 14:20         ` Peter Zijlstra
@ 2009-01-29 14:41           ` Frank Mehnert
  2009-01-29 14:52             ` Peter Zijlstra
  2009-01-29 14:56             ` [PATCH] x86: add might_sleep() to do_page_fault() Peter Zijlstra
  0 siblings, 2 replies; 16+ messages in thread
From: Frank Mehnert @ 2009-01-29 14:41 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1543 bytes --]

On Thursday 29 January 2009, Peter Zijlstra wrote:
> On Thu, 2009-01-29 at 15:02 +0100, Frank Mehnert wrote:
> > I'm one of the VirtualBox developers. We are trying to fix the annoying
> > kerneloops warning 'BUG: sleeping function called from invalid context'
> > reported by the Fedora folks. This warning occurs when do_swap_page()
> > calls lock_page() and in_atomic() returns true.
> >
> > This warning appears when we touch into memory which is pinned with
> > get_user_pages(). In VT-x/AMD-V mode we are executing some code in the
> > context of the Linux kernel. To prevent scheduling of the current CPU
> > core we disable the interripts. preempt_disable() would be probably the
> > better choice but this would oops as well if CONFIG_PREEMPT is enabled.
>
> but to get there, you'd have to have called handle_mm_fault() which
> requires the mmap_sem, which should also give that might_sleep()
> warning.

The stacktrace is

  __might_sleep()
  lock_page()
  handle_mm_fault()
  do_page_fault()
  error_code

So yes, handle_mm_fault() is called. But I assume that down_read_trylock()
succeeded before we were forced to call down_read().

> That aside, is there any reason you have to avoid scheduling? Otherwise
> I would just allow so and be done with it.

The reason is that our code expects that to ensure syncing of the CPU
state with the saved state. I fear it is quite difficult to change that...

Kind regards,

Frank
-- 
Dr.-Ing. Frank Mehnert    Sun Microsystems    http://www.sun.com/

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-29 14:41           ` Frank Mehnert
@ 2009-01-29 14:52             ` Peter Zijlstra
  2009-01-29 16:03               ` Frank Mehnert
  2009-01-29 14:56             ` [PATCH] x86: add might_sleep() to do_page_fault() Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-01-29 14:52 UTC (permalink / raw)
  To: Frank Mehnert; +Cc: linux-kernel, Avi Kivity, Ingo Molnar

On Thu, 2009-01-29 at 15:41 +0100, Frank Mehnert wrote:
> On Thursday 29 January 2009, Peter Zijlstra wrote:
> > On Thu, 2009-01-29 at 15:02 +0100, Frank Mehnert wrote:
> > > I'm one of the VirtualBox developers. We are trying to fix the annoying
> > > kerneloops warning 'BUG: sleeping function called from invalid context'
> > > reported by the Fedora folks. This warning occurs when do_swap_page()
> > > calls lock_page() and in_atomic() returns true.
> > >
> > > This warning appears when we touch into memory which is pinned with
> > > get_user_pages(). In VT-x/AMD-V mode we are executing some code in the
> > > context of the Linux kernel. To prevent scheduling of the current CPU
> > > core we disable the interripts. preempt_disable() would be probably the
> > > better choice but this would oops as well if CONFIG_PREEMPT is enabled.
> >
> > but to get there, you'd have to have called handle_mm_fault() which
> > requires the mmap_sem, which should also give that might_sleep()
> > warning.
> 
> The stacktrace is
> 
>   __might_sleep()
>   lock_page()
>   handle_mm_fault()
>   do_page_fault()
>   error_code
> 
> So yes, handle_mm_fault() is called. But I assume that down_read_trylock()
> succeeded before we were forced to call down_read().
> 
> > That aside, is there any reason you have to avoid scheduling? Otherwise
> > I would just allow so and be done with it.
> 
> The reason is that our code expects that to ensure syncing of the CPU
> state with the saved state. I fear it is quite difficult to change that...

Ah, is that what KVM uses the preempt notifiers for? Could you too?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] x86: add might_sleep() to do_page_fault()
  2009-01-29 14:41           ` Frank Mehnert
  2009-01-29 14:52             ` Peter Zijlstra
@ 2009-01-29 14:56             ` Peter Zijlstra
  2009-01-29 14:59               ` Ingo Molnar
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-01-29 14:56 UTC (permalink / raw)
  To: Frank Mehnert; +Cc: linux-kernel, Linus Torvalds, Nick Piggin, Ingo Molnar

VirtualBox calls do_page_fault() from an atomic context but runs into a
might_sleep() way pas this point, cure that.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/mm/fault.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 67e4df5..bb7f946 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -908,6 +908,11 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
 		}
 		down_read(&mm->mmap_sem);
 	}
+	/*
+	 * The above down_read_trylock() might have succeeded in which case
+	 * we'll have missed the might_sleep() from down_read().
+	 */
+	might_sleep();
 
 	vma = find_vma(mm, address);
 	if (unlikely(!vma)) {



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86: add might_sleep() to do_page_fault()
  2009-01-29 14:56             ` [PATCH] x86: add might_sleep() to do_page_fault() Peter Zijlstra
@ 2009-01-29 14:59               ` Ingo Molnar
  2009-01-29 15:02                 ` [PATCH v2] " Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2009-01-29 14:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Frank Mehnert, linux-kernel, Linus Torvalds, Nick Piggin


* Peter Zijlstra <peterz@infradead.org> wrote:

> VirtualBox calls do_page_fault() from an atomic context but runs into a
> might_sleep() way pas this point, cure that.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/x86/mm/fault.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 67e4df5..bb7f946 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -908,6 +908,11 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  		}
>  		down_read(&mm->mmap_sem);
>  	}
> +	/*
> +	 * The above down_read_trylock() might have succeeded in which case
> +	 * we'll have missed the might_sleep() from down_read().
> +	 */
> +	might_sleep();

should go into the 'else' branch i guess? In the down_read() case we 
already had the check.

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] x86: add might_sleep() to do_page_fault()
  2009-01-29 14:59               ` Ingo Molnar
@ 2009-01-29 15:02                 ` Peter Zijlstra
  2009-01-29 15:03                   ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-01-29 15:02 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Frank Mehnert, linux-kernel, Linus Torvalds, Nick Piggin


> should go into the 'else' branch i guess? In the down_read() case we 
> already had the check.

True.

---
VirtualBox calls do_page_fault() from an atomic context but runs into a
might_sleep() way pas this point, cure that.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/mm/fault.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 67e4df5..bfac289 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -907,6 +907,12 @@ void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
 			return;
 		}
 		down_read(&mm->mmap_sem);
+	} else {
+		/*
+		 * The above down_read_trylock() might have succeeded in which
+		 * case we'll have missed the might_sleep() from down_read().
+		 */
+		might_sleep();
 	}
 
 	vma = find_vma(mm, address);



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] x86: add might_sleep() to do_page_fault()
  2009-01-29 15:02                 ` [PATCH v2] " Peter Zijlstra
@ 2009-01-29 15:03                   ` Ingo Molnar
  0 siblings, 0 replies; 16+ messages in thread
From: Ingo Molnar @ 2009-01-29 15:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Frank Mehnert, linux-kernel, Linus Torvalds, Nick Piggin


* Peter Zijlstra <peterz@infradead.org> wrote:

> 
> > should go into the 'else' branch i guess? In the down_read() case we 
> > already had the check.
> 
> True.
> 
> ---
> VirtualBox calls do_page_fault() from an atomic context but runs into a
> might_sleep() way pas this point, cure that.
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/x86/mm/fault.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)

Applied to tip/x86/mm, thanks Peter!

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-29 14:52             ` Peter Zijlstra
@ 2009-01-29 16:03               ` Frank Mehnert
  2009-01-29 16:11                 ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Frank Mehnert @ 2009-01-29 16:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Avi Kivity, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 863 bytes --]

On Thursday 29 January 2009, Peter Zijlstra wrote:
> > > That aside, is there any reason you have to avoid scheduling? Otherwise
> > > I would just allow so and be done with it.
> >
> > The reason is that our code expects that to ensure syncing of the CPU
> > state with the saved state. I fear it is quite difficult to change
> > that...
>
> Ah, is that what KVM uses the preempt notifiers for? Could you too?

Right, that could be an option.

We will try to change our code which is a big effort as we try
to keep the code as unique as possible between the different
hosts we support (Linux, Solaris, Windows, Mac OS X).

Just to be sure: There is no other option than disabling interrupts
or calling disable_preemption() to prevent scheduling?

Kind regards,

Frank
-- 
Dr.-Ing. Frank Mehnert    Sun Microsystems    http://www.sun.com/

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-29 16:03               ` Frank Mehnert
@ 2009-01-29 16:11                 ` Peter Zijlstra
  2009-01-30 10:34                   ` Frank Mehnert
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2009-01-29 16:11 UTC (permalink / raw)
  To: Frank Mehnert; +Cc: linux-kernel, Avi Kivity, Ingo Molnar

On Thu, 2009-01-29 at 17:03 +0100, Frank Mehnert wrote:
> On Thursday 29 January 2009, Peter Zijlstra wrote:
> > > > That aside, is there any reason you have to avoid scheduling? Otherwise
> > > > I would just allow so and be done with it.
> > >
> > > The reason is that our code expects that to ensure syncing of the CPU
> > > state with the saved state. I fear it is quite difficult to change
> > > that...
> >
> > Ah, is that what KVM uses the preempt notifiers for? Could you too?
> 
> Right, that could be an option.
> 
> We will try to change our code which is a big effort as we try
> to keep the code as unique as possible between the different
> hosts we support (Linux, Solaris, Windows, Mac OS X).
> 
> Just to be sure: There is no other option than disabling interrupts
> or calling disable_preemption() to prevent scheduling?

Thing is, lock_page() and down_read() require to be able to schedule(),
so there's no way around that.

So even if there was another way to disable scheduling, you'd still have
the same problem.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-29 16:11                 ` Peter Zijlstra
@ 2009-01-30 10:34                   ` Frank Mehnert
  2009-01-30 10:45                     ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Frank Mehnert @ 2009-01-30 10:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Avi Kivity, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]

On Thursday 29 January 2009, Peter Zijlstra wrote:
> On Thu, 2009-01-29 at 17:03 +0100, Frank Mehnert wrote:
> > On Thursday 29 January 2009, Peter Zijlstra wrote:
> > > > > That aside, is there any reason you have to avoid scheduling?
> > > > > Otherwise I would just allow so and be done with it.
> > > >
> > > > The reason is that our code expects that to ensure syncing of the CPU
> > > > state with the saved state. I fear it is quite difficult to change
> > > > that...
> > >
> > > Ah, is that what KVM uses the preempt notifiers for? Could you too?
> >
> > Right, that could be an option.
> >
> > We will try to change our code which is a big effort as we try
> > to keep the code as unique as possible between the different
> > hosts we support (Linux, Solaris, Windows, Mac OS X).
> >
> > Just to be sure: There is no other option than disabling interrupts
> > or calling disable_preemption() to prevent scheduling?
>
> Thing is, lock_page() and down_read() require to be able to schedule(),
> so there's no way around that.
>
> So even if there was another way to disable scheduling, you'd still have
> the same problem.

Yes, makes sense.

Back to my initial question: The problem arises for us because we depend
on permanent mappings of memory which were

 - allocated with alloc_pages() or alloc_page()
 - mapped into ring 3 with remap_pfn_range() and
 - pinned with get_user_pages()

There are potential pagefaults when touching into these ring-3-mappings
from ring 0. So I assume we could prevent such pagefaults if we access
that memory from ring-0-mappings, right? Unfortunately, the space for
ring-0-mappings (< 1GB) is smaller than userland (~ 3GB), at least on
32-bit systems.

Kind regards,

Frank
-- 
Dr.-Ing. Frank Mehnert    Sun Microsystems    http://www.sun.com/

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PFs on pages pinned with get_user_pages()
  2009-01-30 10:34                   ` Frank Mehnert
@ 2009-01-30 10:45                     ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2009-01-30 10:45 UTC (permalink / raw)
  To: Frank Mehnert; +Cc: linux-kernel, Avi Kivity, Ingo Molnar

On Fri, 2009-01-30 at 11:34 +0100, Frank Mehnert wrote:

> > Thing is, lock_page() and down_read() require to be able to schedule(),
> > so there's no way around that.
> >
> > So even if there was another way to disable scheduling, you'd still have
> > the same problem.
> 
> Yes, makes sense.
> 
> Back to my initial question: The problem arises for us because we depend
> on permanent mappings of memory which were
> 
>  - allocated with alloc_pages() or alloc_page()
>  - mapped into ring 3 with remap_pfn_range() and
>  - pinned with get_user_pages()
> 
> There are potential pagefaults when touching into these ring-3-mappings
> from ring 0. So I assume we could prevent such pagefaults if we access
> that memory from ring-0-mappings, right? Unfortunately, the space for
> ring-0-mappings (< 1GB) is smaller than userland (~ 3GB), at least on
> 32-bit systems.

if you only need to access one or two pages, you could kmap_atomic() the
actual pages from ring-0.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-01-30 10:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-29  8:05 PFs on pages pinned with get_user_pages() Frank Mehnert
2009-01-29 12:28 ` Peter Zijlstra
2009-01-29 13:08   ` Frank Mehnert
2009-01-29 13:43     ` Peter Zijlstra
2009-01-29 14:02       ` Frank Mehnert
2009-01-29 14:20         ` Peter Zijlstra
2009-01-29 14:41           ` Frank Mehnert
2009-01-29 14:52             ` Peter Zijlstra
2009-01-29 16:03               ` Frank Mehnert
2009-01-29 16:11                 ` Peter Zijlstra
2009-01-30 10:34                   ` Frank Mehnert
2009-01-30 10:45                     ` Peter Zijlstra
2009-01-29 14:56             ` [PATCH] x86: add might_sleep() to do_page_fault() Peter Zijlstra
2009-01-29 14:59               ` Ingo Molnar
2009-01-29 15:02                 ` [PATCH v2] " Peter Zijlstra
2009-01-29 15:03                   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox