From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4A50BF29.4000007@domain.hid> Date: Sun, 05 Jul 2009 16:56:41 +0200 From: Gilles Chanteperdrix MIME-Version: 1.0 References: <4A48FB71.6070506@domain.hid> <4A4CEB11.6020608@domain.hid> <4A4E1BC2.7080707@domain.hid> <4A4E1E5E.1090405@domain.hid> <4A4F85AF.2050705@domain.hid> <4A509637.7070300@domain.hid> In-Reply-To: <4A509637.7070300@domain.hid> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] x86: Endless minor faults List-Id: Xenomai life and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: >>>> Jan Kiszka wrote: >>>>> Hi again, >>>>> >>>>> this is now basically the patch which seems to stabilized x86 /wrt mmu >>>>> switches again: >>>>> >>>>> There were 3 race windows between setting active_mm of the current task >>>>> and actually switching it (that's a noarch issue), there were several >>>>> calls into switch_mm without proper hard interrupt protection (on archs >>>>> that have no preemptible switch_mm, like x86) and there was a race in >>>>> x86's leave_mm (as Gilles already remarked earlier in this thread - >>>>> while I was looking at an old tree where smp_invalidate_interrupt took >>>>> care of this). >>>>> >>>>> The patch is thought as a basis for further discussions about >>>>> >>>>> o how to solve all the issues for all archs technically (ideally >>>>> without the need to patch noarch parts in an arch-specific way...) >>>>> >>>>> o if anyone thinks there could be more spots like these (I've checked >>>>> the code only for x86 so far) >>>>> diff --git a/fs/aio.c b/fs/aio.c >>>>> index 76da125..0286f0f 100644 >>>>> --- a/fs/aio.c >>>>> +++ b/fs/aio.c >>>>> @@ -618,13 +618,16 @@ static void use_mm(struct mm_struct *mm) >>>>> { >>>>> struct mm_struct *active_mm; >>>>> struct task_struct *tsk = current; >>>>> + unsigned long flags; >>>>> >>>>> task_lock(tsk); >>>>> active_mm = tsk->active_mm; >>>>> atomic_inc(&mm->mm_count); >>>>> tsk->mm = mm; >>>>> + local_irq_save_hw_cond(flags); >>>>> tsk->active_mm = mm; >>>>> - switch_mm(active_mm, mm, tsk); >>>>> + __switch_mm(active_mm, mm, tsk); >>>>> + local_irq_restore_hw_cond(flags); >>>>> task_unlock(tsk); >>>>> >>>>> mmdrop(active_mm); >>>>> diff --git a/fs/exec.c b/fs/exec.c >>>>> index 3b36c69..06591ac 100644 >>>>> --- a/fs/exec.c >>>>> +++ b/fs/exec.c >>>>> @@ -718,6 +718,7 @@ static int exec_mmap(struct mm_struct *mm) >>>>> { >>>>> struct task_struct *tsk; >>>>> struct mm_struct * old_mm, *active_mm; >>>>> + unsigned long flags; >>>>> >>>>> /* Notify parent that we're no longer interested in the old VM */ >>>>> tsk = current; >>>>> @@ -740,8 +741,10 @@ static int exec_mmap(struct mm_struct *mm) >>>>> task_lock(tsk); >>>>> active_mm = tsk->active_mm; >>>>> tsk->mm = mm; >>>>> + local_irq_save_hw_cond(flags); >>>>> tsk->active_mm = mm; >>>>> activate_mm(active_mm, mm); >>>>> + local_irq_restore_hw_cond(flags); >>>>> task_unlock(tsk); >>>>> arch_pick_mmap_layout(mm); >>>>> if (old_mm) { >>>>> diff --git a/kernel/fork.c b/kernel/fork.c >>>>> index 01a836b..cf3b68a 100644 >>>>> --- a/kernel/fork.c >>>>> +++ b/kernel/fork.c >>>>> @@ -1665,11 +1665,14 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) >>>>> } >>>>> >>>>> if (new_mm) { >>>>> + unsigned long flags; >>>>> mm = current->mm; >>>>> active_mm = current->active_mm; >>>>> current->mm = new_mm; >>>>> + local_irq_save_hw_cond(flags); >>>>> current->active_mm = new_mm; >>>>> activate_mm(active_mm, new_mm); >>>>> + local_irq_restore_hw_cond(flags); >>>>> new_mm = mm; >>>>> } >>>> Ok. These are patches of the noarch code which are dependent on x86 >>>> implementation. >>>> >>>> I think we need something like >>>> >>>> mm_change_enter(flags); >>>> mm_change_leave(flags); >>>> >>>> which would resolve to ipipe_active_mm = NULL on architectures with >>>> unlocked context switches, and to local_irq_save_hw/local_irq_restore on >>>> x86? >>> + we have to change switch_mm in the lockless case, and maybe also its >>> caller (if I look at the arm code), to use ipipe_active_mm in order to >>> decide if to switch or not - see my explanation of the possible race in >>> aio.c. >> This is already the way it is currently working. the "prev" passed to >> switch_mm is always ipipe_active_mm. >> > > Ah, I missed the difference in arm's xnarch_leave_root. Now I got it. > > OK, but this needs to be thought through in details again, specifically > also the enter-root scenarios. I have no safe feeling about it yet, > partly because switch_mm fiddles with ipipe_active_mm as well, but more > generally because we used to have such critical races in this area for a > "fairly" long time. When reentering root with ipipe_active_mm NULL, the TIF_MMSWTICH_INT bit is set, in which case the switch_mm preempted by Xenomai is restarted. -- Gilles.