Re: [git pull] cpus4096 tree, part 3

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Nick Piggin <npiggin@suse.de>
Cc: mingo@elte.hu, linux-arch@vger.kernel.org,
	torvalds@linux-foundation.org, rusty@rustcorp.com.au,
	travis@sgi.com, linux-kernel@vger.kernel.org,
	venkatesh.pallipadi@intel.com, suresh.b.siddha@intel.com,
	arjan@infradead.org, hpa@zytor.com, tglx@linutronix.de
Subject: Re: [git pull] cpus4096 tree, part 3
Date: Mon, 26 Jan 2009 11:00:54 -0800	[thread overview]
Message-ID: <20090126110054.bdddbf38.akpm@linux-foundation.org> (raw)
In-Reply-To: <20090105011630.GI32239@wotan.suse.de>

On Mon, 5 Jan 2009 02:16:30 +0100
Nick Piggin <npiggin@suse.de> wrote:

> Really cc linux-arch this time
> 
> On Mon, Jan 05, 2009 at 02:14:16AM +0100, Nick Piggin wrote:
> > On Sat, Jan 03, 2009 at 11:37:23PM +0100, Ingo Molnar wrote:
> > > 
> > > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > 
> > > > What happened to Nick's cleanup patch to do_page_fault (a month or two 
> > > > ago? I complained about some of the issues in his first version and 
> > > > asked for some further cleanups, but I think that whole discussion ended 
> > > > with him saying "I am going to add those changes that you suggested (in 
> > > > fact, I already have)".
> > > > 
> > > > And then I didn't see anything further. Maybe I just missed the end 
> > > > result. Or maybe we have it in some -mm branch or something?
> > > 
> > > they would have been in tip/x86/mm and would be upstream now had Nick 
> > > re-sent a v2 series but that never happened. I think they might have 
> > > fallen victim to a serious attention deficit caused by the SLQB patch ;-)
> > 
> > Well, I already added Linus's suggestions but didn't submit it because
> > there was a bit of work going on in that file as far as I could see, both
> > in the x86 tree and in -mm:
> > 
> > (http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.28-rc2/2.6.28-rc2-mm1/broken-out/mm-invoke-oom-killer-from-page-fault.patch)
> > 
> > It isn't a big deal to resolve either way, but I don't want to make Andrew's
> > life harder.
> > 
> > [Yes OK now I'm the guilty one of pushing in an x86 patch not via the
> > x86 tree ;) This one is easy to break in pieces, but I didn't want
> > to create a dependency between the trees]
> > 
> > I didn't really consider it to be urgent, so I was waiting for that patch
> > to go in, but I was still hoping to get this into 2.6.29... This is what
> > it looks like now with your suggestions, and just merged it to your current
> > tree (untested).
> > 
> > I'll cc the linux-arch list here too, because it might be nice to keep these
> > things as structurally similar as possible (and they'll all want to look at
> > the -mm patch above, although I'll probably end up having to write the
> > patches!).
> 
> ---
> Optimise x86's do_page_fault (C entry point for the page fault path).

It took rather a lot of hunting to find this email.  Please do try to
make the email subject match the final patch's title?

>   * This routine handles page faults.  It determines the address,
>   * and the problem, and then passes it off to one of the appropriate
> @@ -583,16 +796,12 @@ asmlinkage
>  #endif
>  void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
>  {
> +	unsigned long address;
>  	struct task_struct *tsk;
>  	struct mm_struct *mm;
>  	struct vm_area_struct *vma;
> -	unsigned long address;
> -	int write, si_code;
> +	int write;
>  	int fault;
> -#ifdef CONFIG_X86_64
> -	unsigned long flags;
> -	int sig;
> -#endif
>  
>  	tsk = current;
>  	mm = tsk->mm;
> @@ -601,9 +810,7 @@ void __kprobes do_page_fault(struct pt_r
>  	/* get the address */
>  	address = read_cr2();
>  
> -	si_code = SEGV_MAPERR;
> -
> -	if (notify_page_fault(regs))
> +	if (unlikely(notify_page_fault(regs)))
>  		return;
>  	if (unlikely(kmmio_fault(regs, address)))
>  		return;
> @@ -638,10 +845,10 @@ void __kprobes do_page_fault(struct pt_r
>  		 * Don't take the mm semaphore here. If we fixup a prefetch
>  		 * fault we could otherwise deadlock.
>  		 */
> -		goto bad_area_nosemaphore;
> +		bad_area_nosemaphore(regs, error_code, address);
> +		return;
>  	}
>  
> -
>  	/*
>  	 * It's safe to allow irq's after cr2 has been saved and the
>  	 * vmalloc fault has been handled.
> @@ -657,17 +864,18 @@ void __kprobes do_page_fault(struct pt_r
>  
>  #ifdef CONFIG_X86_64
>  	if (unlikely(error_code & PF_RSVD))
> -		pgtable_bad(address, regs, error_code);
> +		pgtable_bad(regs, error_code, address);
>  #endif
>  
>  	/*
>  	 * If we're in an interrupt, have no user context or are running in an
>  	 * atomic region then we must not take the fault.
>  	 */
> -	if (unlikely(in_atomic() || !mm))
> -		goto bad_area_nosemaphore;
> +	if (unlikely(in_atomic() || !mm)) {
> +		bad_area_nosemaphore(regs, error_code, address);
> +		return;
> +	}
>  
> -again:
>  	/*
>  	 * When running in the kernel we expect faults to occur only to
>  	 * addresses in user space.  All other faults represent errors in the
> @@ -684,20 +892,26 @@ again:
>  	 * source.  If this is invalid we can skip the address space check,
>  	 * thus avoiding the deadlock.
>  	 */
> -	if (!down_read_trylock(&mm->mmap_sem)) {
> +	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
>  		if ((error_code & PF_USER) == 0 &&
> -		    !search_exception_tables(regs->ip))
> -			goto bad_area_nosemaphore;
> +		    !search_exception_tables(regs->ip)) {
> +			bad_area_nosemaphore(regs, error_code, address);
> +			return;
> +		}
>  		down_read(&mm->mmap_sem);
>  	}
>  
>  	vma = find_vma(mm, address);
> -	if (!vma)
> -		goto bad_area;
> -	if (vma->vm_start <= address)
> +	if (unlikely(!vma)) {
> +		bad_area(regs, error_code, address);
> +		return;
> +	}
> +	if (likely(vma->vm_start <= address))
>  		goto good_area;
> -	if (!(vma->vm_flags & VM_GROWSDOWN))
> -		goto bad_area;
> +	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
> +		bad_area(regs, error_code, address);
> +		return;
> +	}
>  	if (error_code & PF_USER) {
>  		/*
>  		 * Accessing the stack below %sp is always a bug.
> @@ -705,31 +919,25 @@ again:
>  		 * and pusha to work.  ("enter $65535,$31" pushes
>  		 * 32 pointers and then decrements %sp by 65535.)
>  		 */
> -		if (address + 65536 + 32 * sizeof(unsigned long) < regs->sp)
> -			goto bad_area;
> +		if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {
> +			bad_area(regs, error_code, address);
> +			return;
> +		}
>  	}
> -	if (expand_stack(vma, address))
> -		goto bad_area;
> -/*
> - * Ok, we have a good vm_area for this memory access, so
> - * we can handle it..
> - */
> +	if (unlikely(expand_stack(vma, address))) {
> +		bad_area(regs, error_code, address);
> +		return;
> +	}
> +
> +	/*
> +	 * Ok, we have a good vm_area for this memory access, so
> +	 * we can handle it..
> +	 */
>  good_area:
> -	si_code = SEGV_ACCERR;
> -	write = 0;
> -	switch (error_code & (PF_PROT|PF_WRITE)) {
> -	default:	/* 3: write, present */
> -		/* fall through */
> -	case PF_WRITE:		/* write, not present */
> -		if (!(vma->vm_flags & VM_WRITE))
> -			goto bad_area;
> -		write++;
> -		break;
> -	case PF_PROT:		/* read, present */
> -		goto bad_area;
> -	case 0:			/* read, not present */
> -		if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))
> -			goto bad_area;
> +	write = error_code & PF_WRITE;

What's going on here?  We set `error_code' to PF_WRITE, which is some
x86-specific thing.

> +	if (unlikely(access_error(error_code, write, vma))) {
> +		bad_area_access_error(regs, error_code, address);
> +		return;
>  	}
>  
>  	/*
> @@ -739,11 +947,8 @@ good_area:
>  	 */
>  	fault = handle_mm_fault(mm, vma, address, write);

and then pass it into handle_mm_fault(), which is expecting a bunch of
flags in the FAULT_FLAG_foo domain.

IOW, the code will accidentally set FAULT_FLAG_NONLINEAR!.


Methinks we want something like this,

--- a/arch/x86/mm/fault.c~fix-x86-optimise-x86s-do_page_fault-c-entry-point-for-the-page-fault-path
+++ a/arch/x86/mm/fault.c
@@ -942,7 +942,7 @@ void __kprobes do_page_fault(struct pt_r
 	 * we can handle it..
 	 */
 good_area:
-	write = error_code & PF_WRITE;
+	write = (error_code & PF_WRITE) ? FAULT_FLAG_WRITE : 0;
 	if (unlikely(access_error(error_code, write, vma))) {
 		bad_area_access_error(regs, error_code, address);
 		return;
_


but why did the current code pass testing at all??

next prev parent reply	other threads:[~2009-01-26 19:01 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-01  1:19 [PULL] cpumask tree Rusty Russell
2009-01-02 20:06 ` Linus Torvalds
2009-01-02 20:38   ` Ingo Molnar
2009-01-02 23:31     ` Linus Torvalds
2009-01-03 19:38       ` [git pull] cpus4096 tree, part 3 Ingo Molnar
2009-01-03 20:28         ` Linus Torvalds
2009-01-03 20:36           ` Ingo Molnar
2009-01-03 20:56             ` Linus Torvalds
2009-01-03 21:58               ` Ingo Molnar
2009-01-04  3:35               ` Rusty Russell
2009-01-04  4:28                 ` Mike Travis
2009-01-03 21:38             ` Ingo Molnar
2009-01-03 22:00               ` Linus Torvalds
2009-01-03 22:37                 ` Ingo Molnar
2009-01-05  1:14                   ` Nick Piggin
2009-01-05  1:16                     ` Nick Piggin
2009-01-26 19:00                       ` Andrew Morton [this message]
2009-01-26 19:09                         ` Linus Torvalds
2009-01-26 19:30                           ` Andrew Morton
2009-01-26 20:09                         ` Ingo Molnar
2009-01-26 20:44                           ` Andrew Morton
     [not found]                             ` <604427e00901261312w23a1f0f5y61fc5c6cc70297fb@mail.gmail.com>
2009-01-26 23:21                               ` Ingo Molnar
2009-01-26 23:44                                 ` Andrew Morton
2009-01-07 17:30                     ` Ingo Molnar
2009-01-03 20:58           ` Mike Travis
2009-01-03  7:20     ` [PULL] cpumask tree Rusty Russell
2009-01-03 10:52       ` Ingo Molnar
2009-01-03 11:59         ` [PATCH] ia64: cpumask fix for is_affinity_mask_valid() Ingo Molnar
2009-01-03 12:19           ` [PATCH] cpumask: convert RCU implementations, fix Ingo Molnar
2009-01-04  3:43           ` [PATCH] ia64: cpumask fix for is_affinity_mask_valid() Rusty Russell
2009-01-04  4:20             ` Mike Travis
2009-01-04 12:38               ` Ingo Molnar
2009-01-03 14:58         ` [PULL] cpumask tree Mike Travis
2009-01-03 15:06           ` Ingo Molnar
2009-01-03 15:31             ` Mike Travis
2009-01-03 15:47               ` Ingo Molnar
2009-01-03 15:52                 ` Mike Travis
2009-01-03 16:00                 ` Ingo Molnar
2009-01-03 16:09                   ` Mike Travis
2009-01-03 16:42                     ` Ingo Molnar
2009-01-03 16:48                       ` Mike Travis
2009-01-03 17:45                     ` Ingo Molnar
2009-01-03 18:13                       ` Ingo Molnar
2009-01-03 18:14                       ` Mike Travis
2009-01-03  0:23   ` Rusty Russell
2009-01-08 19:10 ` David Daney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090126110054.bdddbf38.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=arjan@infradead.org \
    --cc=hpa@zytor.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=rusty@rustcorp.com.au \
    --cc=suresh.b.siddha@intel.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=travis@sgi.com \
    --cc=venkatesh.pallipadi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox