From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailman.xyplex.com (mailman.xyplex.com [140.179.176.116]) by ozlabs.org (Postfix) with ESMTP id F086567A77 for ; Thu, 30 Jun 2005 07:24:21 +1000 (EST) Message-ID: <42C311CD.1040405@mrv.com> Date: Wed, 29 Jun 2005 17:25:33 -0400 From: Guillaume Autran MIME-Version: 1.0 To: Marcelo Tosatti References: <20050625145318.GA32117@logos.cnet> <20050626143004.GA5198@logos.cnet> <20050627133930.GA9109@logos.cnet> <1119940208.5133.204.camel@gaston> <42C153E1.3060004@mrv.com> <1120018530.5133.241.camel@gaston> <42C2BF03.9000402@mrv.com> <20050629155445.GA3560@logos.cnet> In-Reply-To: <20050629155445.GA3560@logos.cnet> Content-Type: multipart/alternative; boundary="------------010108070203090409030006" Cc: '@logos.cnet, linux-ppc-embedded Subject: Re: [PATCH] 8xx: get_mmu_context() for (very) FEW_CONTEXTS and KERNEL_PREEMPT race/starvation issue List-Id: Linux on Embedded PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , This is a multi-part message in MIME format. --------------010108070203090409030006 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Hi Marcelo, Marcelo Tosatti wrote: >Hi Guillaume, > >On Wed, Jun 29, 2005 at 11:32:19AM -0400, Guillaume Autran wrote: > > >>Benjamin Herrenschmidt wrote: >> >> >> >>>On Tue, 2005-06-28 at 09:42 -0400, Guillaume Autran wrote: >>> >>> >>> >>> >>>>Hi, >>>> >>>>I happen to notice a race condition in the mmu_context code for the 8xx >>>>with very few context (16 MMU contexts) and kernel preemption enable. It >>>>is hard to reproduce has it shows only when many processes are >>>>created/destroy and the system is doing a lot of IRQ processing. >>>> >>>>In short, one process is trying to steal a context that is in the >>>>process of being freed (mm->context == NO_CONTEXT) but not completely >>>>freed (nr_free_contexts == 0). >>>>The steal_context() function does not do anything and the process stays >>>>in the loop forever. >>>> >>>>Anyway, I got a patch that fixes this part. Does not seem to affect >>>>scheduling latency at all. >>>> >>>>Comments are appreciated. >>>> >>>> >>>> >>>> >>>Your patch seems to do a hell lot more than fixing this race ... What >>>about just calling preempt_disable() in destroy_context() instead ? >>> >>> >>> >>> >>I'm still a bit confused with "kernel preemption". One thing for sure is >>that disabling kernel preemption does indeed fix my problem. >>So, my question is, what if a task in the middle of being schedule gets >>preempted by an IRQ handler, where will this task restart execution ? >>Back at the beginning of schedule or where it left of ? >> >> > >Execution is resumed exactly where it has been interrupted. > In that case, what happen when a higher priority task steal the context of the lower priority task after get_mmu_context() but before set_mmu_context() ? Then when the lower priority task resumes, its context may no longer be valid... Do I get this right ? >>The idea behind my patch was to get rid of that nr_free_contexts counter >>that is (I thing) redundant with the context_map. >> >> > >Apparently its there to avoid the spinlock exactly on !FEW_CONTEXTS machines. > >I suppose that what happens is that get_mmu_context() gets preempted after stealing >a context (so nr_free_contexts = 0), but before setting next_mmu_context to the >next entry > >next_mmu_context = (ctx + 1) & LAST_CONTEXT; > >So if the now running higher prio tasks calls switch_mm() (which is likely to happen) >it loops forever on atomic_dec_if_positive(&nr_free_contexts), while steal_context() >sees "mm->context == CONTEXT". > >I think that you should try "preempt_disable()/preempt_enable" pair at entry and >exit of get_mmu_context() - I suppose around destroy_context() is not enough (you >can try that also). > >spinlock ends up calling preempt_disable(). > > > I'm going to do like this instead of my previous attempt: /* Setup new userspace context */ preempt_disable(); get_mmu_context(next); set_context(next->context, next->pgd); preempt_enable(); To make sure we don't loose our context in between. Thanks. Guillaume. -- ======================================= Guillaume Autran Senior Software Engineer MRV Communications, Inc. Tel: (978) 952-4932 office E-mail: gautran@mrv.com ======================================= --------------010108070203090409030006 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit Hi Marcelo,

Marcelo Tosatti wrote:
Hi Guillaume,

On Wed, Jun 29, 2005 at 11:32:19AM -0400, Guillaume Autran wrote:
  
Benjamin Herrenschmidt wrote:

    
On Tue, 2005-06-28 at 09:42 -0400, Guillaume Autran wrote:


      
Hi,

I happen to notice a race condition in the mmu_context code for the 8xx 
with very few context (16 MMU contexts) and kernel preemption enable. It 
is hard to reproduce has it shows only when many processes are 
created/destroy and the system is doing a lot of IRQ processing.

In short, one process is trying to steal a context that is in the 
process of being freed (mm->context == NO_CONTEXT) but not completely 
freed (nr_free_contexts == 0).
The steal_context() function does not do anything and the process stays 
in the loop forever.

Anyway, I got a patch that fixes this part. Does not seem to affect 
scheduling latency at all.

Comments are appreciated.
  

        
Your patch seems to do a hell lot more than fixing this race ... What
about just calling preempt_disable() in destroy_context() instead ?


      
I'm still a bit confused with "kernel preemption". One thing for sure is 
that disabling kernel preemption does indeed fix my problem.
So, my question is, what if a task in the middle of being schedule gets 
preempted by an IRQ handler, where will this task restart execution ? 
Back at the beginning of schedule or where it left of ?
    

Execution is resumed exactly where it has been interrupted.
In that case, what happen when a higher priority task steal the context of the lower priority task after get_mmu_context() but before set_mmu_context() ?
Then when the lower priority task resumes, its context may no longer be valid...
Do I get this right ?

The idea behind my patch was to get rid of that nr_free_contexts counter 
that is (I thing) redundant with the context_map.
    

Apparently its there to avoid the spinlock exactly on !FEW_CONTEXTS machines.

I suppose that what happens is that get_mmu_context() gets preempted after stealing
a context (so nr_free_contexts = 0), but before setting next_mmu_context to the 
next entry

next_mmu_context = (ctx + 1) & LAST_CONTEXT;

So if the now running higher prio tasks calls switch_mm() (which is likely to happen)
it loops forever on atomic_dec_if_positive(&nr_free_contexts), while steal_context()
sees "mm->context == CONTEXT".

I think that you should try "preempt_disable()/preempt_enable" pair at entry and 
exit of get_mmu_context() - I suppose around destroy_context() is not enough (you 
can try that also).

spinlock ends up calling preempt_disable().

  
I'm going to do like this instead of my previous attempt:

        /* Setup new userspace context */
        preempt_disable();
        get_mmu_context(next);
        set_context(next->context, next->pgd);
        preempt_enable();

To make sure we don't loose our context in between.



Thanks.
Guillaume.

-- 
=======================================
Guillaume Autran
Senior Software Engineer
MRV Communications, Inc.
Tel: (978) 952-4932 office
E-mail: gautran@mrv.com
======================================= 
--------------010108070203090409030006--