From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gautran@mrv.com>
Received: from mailman.xyplex.com (mailman.xyplex.com [140.179.176.116])
	by ozlabs.org (Postfix) with ESMTP id F086567A77
	for <linuxppc-embedded@ozlabs.org>;
	Thu, 30 Jun 2005 07:24:21 +1000 (EST)
Message-ID: <42C311CD.1040405@mrv.com>
Date: Wed, 29 Jun 2005 17:25:33 -0400
From: Guillaume Autran <gautran@mrv.com>
MIME-Version: 1.0
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
References: <20050625145318.GA32117@logos.cnet>
	<f14f5f5aebd45879c39c6ce69f29c004@embeddededge.com>
	<20050626143004.GA5198@logos.cnet>
	<20050627133930.GA9109@logos.cnet>
	<f5d1cb54913dd31385d15fe1f2e5b12d@embeddededge.com>
	<1119940208.5133.204.camel@gaston> <42C153E1.3060004@mrv.com>
	<1120018530.5133.241.camel@gaston> <42C2BF03.9000402@mrv.com>
	<20050629155445.GA3560@logos.cnet>
In-Reply-To: <20050629155445.GA3560@logos.cnet>
Content-Type: multipart/alternative;
	boundary="------------010108070203090409030006"
Cc: '@logos.cnet, linux-ppc-embedded <linuxppc-embedded@ozlabs.org>
Subject: Re: [PATCH] 8xx: get_mmu_context() for (very) FEW_CONTEXTS
 and	KERNEL_PREEMPT race/starvation issue
List-Id: Linux on Embedded PowerPC Developers Mail List
	<linuxppc-embedded.ozlabs.org>
List-Unsubscribe: <https://ozlabs.org/mailman/listinfo/linuxppc-embedded>,
	<mailto:linuxppc-embedded-request@ozlabs.org?subject=unsubscribe>
List-Archive: <http://ozlabs.org/pipermail/linuxppc-embedded>
List-Post: <mailto:linuxppc-embedded@ozlabs.org>
List-Help: <mailto:linuxppc-embedded-request@ozlabs.org?subject=help>
List-Subscribe: <https://ozlabs.org/mailman/listinfo/linuxppc-embedded>,
	<mailto:linuxppc-embedded-request@ozlabs.org?subject=subscribe>

This is a multi-part message in MIME format.
--------------010108070203090409030006
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Hi Marcelo,

Marcelo Tosatti wrote:

>Hi Guillaume,
>
>On Wed, Jun 29, 2005 at 11:32:19AM -0400, Guillaume Autran wrote:
>  
>
>>Benjamin Herrenschmidt wrote:
>>
>>    
>>
>>>On Tue, 2005-06-28 at 09:42 -0400, Guillaume Autran wrote:
>>>
>>>
>>>      
>>>
>>>>Hi,
>>>>
>>>>I happen to notice a race condition in the mmu_context code for the 8xx 
>>>>with very few context (16 MMU contexts) and kernel preemption enable. It 
>>>>is hard to reproduce has it shows only when many processes are 
>>>>created/destroy and the system is doing a lot of IRQ processing.
>>>>
>>>>In short, one process is trying to steal a context that is in the 
>>>>process of being freed (mm->context == NO_CONTEXT) but not completely 
>>>>freed (nr_free_contexts == 0).
>>>>The steal_context() function does not do anything and the process stays 
>>>>in the loop forever.
>>>>
>>>>Anyway, I got a patch that fixes this part. Does not seem to affect 
>>>>scheduling latency at all.
>>>>
>>>>Comments are appreciated.
>>>>  
>>>>
>>>>        
>>>>
>>>Your patch seems to do a hell lot more than fixing this race ... What
>>>about just calling preempt_disable() in destroy_context() instead ?
>>>
>>>
>>>      
>>>
>>I'm still a bit confused with "kernel preemption". One thing for sure is 
>>that disabling kernel preemption does indeed fix my problem.
>>So, my question is, what if a task in the middle of being schedule gets 
>>preempted by an IRQ handler, where will this task restart execution ? 
>>Back at the beginning of schedule or where it left of ?
>>    
>>
>
>Execution is resumed exactly where it has been interrupted.
>
In that case, what happen when a higher priority task steal the context 
of the lower priority task after get_mmu_context() but before 
set_mmu_context() ?
Then when the lower priority task resumes, its context may no longer be 
valid...
Do I get this right ?

>>The idea behind my patch was to get rid of that nr_free_contexts counter 
>>that is (I thing) redundant with the context_map.
>>    
>>
>
>Apparently its there to avoid the spinlock exactly on !FEW_CONTEXTS machines.
>
>I suppose that what happens is that get_mmu_context() gets preempted after stealing
>a context (so nr_free_contexts = 0), but before setting next_mmu_context to the 
>next entry
>
>next_mmu_context = (ctx + 1) & LAST_CONTEXT;
>
>So if the now running higher prio tasks calls switch_mm() (which is likely to happen)
>it loops forever on atomic_dec_if_positive(&nr_free_contexts), while steal_context()
>sees "mm->context == CONTEXT".
>
>I think that you should try "preempt_disable()/preempt_enable" pair at entry and 
>exit of get_mmu_context() - I suppose around destroy_context() is not enough (you 
>can try that also).
>
>spinlock ends up calling preempt_disable().
>
>  
>
I'm going to do like this instead of my previous attempt:

        /* Setup new userspace context */
        preempt_disable();
        get_mmu_context(next);
        set_context(next->context, next->pgd);
        preempt_enable();

To make sure we don't loose our context in between.


Thanks.
Guillaume.

-- 
=======================================
Guillaume Autran
Senior Software Engineer
MRV Communications, Inc.
Tel: (978) 952-4932 office
E-mail: gautran@mrv.com
======================================= 


--------------010108070203090409030006
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
  <title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Hi Marcelo,<br>
<br>
Marcelo Tosatti wrote:<br>
<blockquote type="cite" cite="mid20050629155445.GA3560@logos.cnet">
  <pre wrap="">Hi Guillaume,

On Wed, Jun 29, 2005 at 11:32:19AM -0400, Guillaume Autran wrote:
  </pre>
  <blockquote type="cite">
    <pre wrap="">Benjamin Herrenschmidt wrote:

    </pre>
    <blockquote type="cite">
      <pre wrap="">On Tue, 2005-06-28 at 09:42 -0400, Guillaume Autran wrote:


      </pre>
      <blockquote type="cite">
        <pre wrap="">Hi,

I happen to notice a race condition in the mmu_context code for the 8xx 
with very few context (16 MMU contexts) and kernel preemption enable. It 
is hard to reproduce has it shows only when many processes are 
created/destroy and the system is doing a lot of IRQ processing.

In short, one process is trying to steal a context that is in the 
process of being freed (mm-&gt;context == NO_CONTEXT) but not completely 
freed (nr_free_contexts == 0).
The steal_context() function does not do anything and the process stays 
in the loop forever.

Anyway, I got a patch that fixes this part. Does not seem to affect 
scheduling latency at all.

Comments are appreciated.
  

        </pre>
      </blockquote>
      <pre wrap="">Your patch seems to do a hell lot more than fixing this race ... What
about just calling preempt_disable() in destroy_context() instead ?


      </pre>
    </blockquote>
    <pre wrap="">I'm still a bit confused with "kernel preemption". One thing for sure is 
that disabling kernel preemption does indeed fix my problem.
So, my question is, what if a task in the middle of being schedule gets 
preempted by an IRQ handler, where will this task restart execution ? 
Back at the beginning of schedule or where it left of ?
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Execution is resumed exactly where it has been interrupted.</pre>
</blockquote>
In that case, what happen when a higher priority task steal the context
of the lower priority task after get_mmu_context() but before
set_mmu_context() ?<br>
Then when the lower priority task resumes, its context may no longer be
valid...<br>
Do I get this right ?<br>
<br>
<blockquote type="cite" cite="mid20050629155445.GA3560@logos.cnet">
  <blockquote type="cite">
    <pre wrap="">The idea behind my patch was to get rid of that nr_free_contexts counter 
that is (I thing) redundant with the context_map.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Apparently its there to avoid the spinlock exactly on !FEW_CONTEXTS machines.

I suppose that what happens is that get_mmu_context() gets preempted after stealing
a context (so nr_free_contexts = 0), but before setting next_mmu_context to the 
next entry

next_mmu_context = (ctx + 1) &amp; LAST_CONTEXT;

So if the now running higher prio tasks calls switch_mm() (which is likely to happen)
it loops forever on atomic_dec_if_positive(&amp;nr_free_contexts), while steal_context()
sees "mm-&gt;context == CONTEXT".

I think that you should try "preempt_disable()/preempt_enable" pair at entry and 
exit of get_mmu_context() - I suppose around destroy_context() is not enough (you 
can try that also).

spinlock ends up calling preempt_disable().

  </pre>
</blockquote>
I'm going to do like this instead of my previous attempt:<br>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /* Setup new userspace context */<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; preempt_disable();<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; get_mmu_context(next);<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; set_context(next-&gt;context, next-&gt;pgd);<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; preempt_enable();<br>
<br>
To make sure we don't loose our context in between.<br>
<br>
<br>
<br>
Thanks.<br>
Guillaume.<br>
<br>
<pre class="moz-signature" cols="72">-- 
=======================================
Guillaume Autran
Senior Software Engineer
MRV Communications, Inc.
Tel: (978) 952-4932 office
E-mail: <a class="moz-txt-link-abbreviated" href="mailto:gautran@mrv.com">gautran@mrv.com</a>
======================================= </pre>
</body>
</html>

--------------010108070203090409030006--