Re: [PATCH] sched/core: Create new task with twice disabled preemption

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>,
	Peter Zijlstra <peterz@infradead.org>,
	Kirill Tkhai <ktkhai@parallels.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH] sched/core: Create new task with twice disabled preemption
Date: Mon, 17 Feb 2014 13:55:32 +0100	[thread overview]
Message-ID: <20140217135532.372ef91e@mschwide> (raw)
In-Reply-To: <20140217104005.GB17487@arm.com>

On Mon, 17 Feb 2014 10:40:06 +0000
Catalin Marinas <catalin.marinas@arm.com> wrote:

> On Mon, Feb 17, 2014 at 09:37:38AM +0000, Martin Schwidefsky wrote:
> > On Fri, 14 Feb 2014 10:52:55 +0000
> > Catalin Marinas <catalin.marinas@arm.com> wrote:
> > 
> > > On Thu, Feb 13, 2014 at 09:32:22PM +0400, Kirill Tkhai wrote:
> > > > On 13.02.2014 20:00, Peter Zijlstra wrote:
> > > > > On Thu, Feb 13, 2014 at 07:51:56PM +0400, Kirill Tkhai wrote:
> > > > >> For archs without __ARCH_WANT_UNLOCKED_CTXSW set this means
> > > > >> that all newly created tasks execute finish_arch_post_lock_switch()
> > > > >> and post_schedule() with preemption enabled.
> > > > > 
> > > > > That's IA64 and MIPS; do they have a 'good' reason to use this?
> > > > 
> > > > It seems my description misleads reader, I'm sorry if so.
> > > > 
> > > > I mean all architectures *except* IA64 and MIPS. All, which
> > > > has no __ARCH_WANT_UNLOCKED_CTXSW defined.
> > > > 
> > > > IA64 and MIPS already have preempt_enable() in schedule_tail():
> > > > 
> > > > #ifdef __ARCH_WANT_UNLOCKED_CTXSW
> > > >         /* In this case, finish_task_switch does not reenable preemption */
> > > >         preempt_enable();
> > > > #endif
> > > > 
> > > > Their initial preemption is not decremented in finish_lock_switch().
> > > > 
> > > > So, we speak about x86, ARM64 etc.
> > > > 
> > > > Look at ARM64's finish_arch_post_lock_switch(). It looks a task
> > > > must to not be preempted between switch_mm() and this function.
> > > > But in case of new task this is possible.
> > > 
> > > We had a thread about this at the end of last year:
> > > 
> > > https://lkml.org/lkml/2013/11/15/82
> > > 
> > > There is indeed a problem on arm64, something like this (and I think
> > > s390 also needs a fix):
> > > 
> > > 1. switch_mm() via check_and_switch_context() defers the actual mm
> > >    switch by setting TIF_SWITCH_MM
> > > 2. the context switch is considered 'done' by the kernel before
> > >    finish_arch_post_lock_switch() and therefore we can be preempted to a
> > >    new thread before finish_arch_post_lock_switch()
> > > 3. The new thread has the same mm as the preempted thread but we
> > >    actually missed the mm switching in finish_arch_post_lock_switch()
> > >    because TIF_SWITCH_MM is per thread rather than mm
> > >
> > > > This is the problem I tried to solve. I don't know arm64, and I can't
> > > > say how it is serious.
> > > 
> > > Have you managed to reproduce this? I don't say it doesn't exist, but I
> > > want to make sure that any patch actually fixes it.
> > > 
> > > So we have more solutions, one of the first two suitable for stable:
> > > 
> > > 1. Propagate the TIF_SWITCH_MM to the next thread (suggested by Martin)
> > 
> > This is what I put in place for s390 but with the name TIF_TLB_WAIT instead
> > of TIF_SWITCH_MM. I took the liberty to add the code to the features branch
> > of the linux-s390 tree including the common code change that is necessary:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/s390/linux.git/commit/?h=features&id=09ddfb4d5602095aad04eada8bc8df59e873a6ef
> 
> I don't see a problem with additional calls to
> finish_arch_post_lock_switch() on arm and arm64 but I would have done
> this in more than one step:
> 
> 1. Introduce finish_switch_mm()
> 2. Convert arm and arm64 to finish_switch_mm() (which means we no longer
>    check whether the interrupts are disabled in switch_mm() to defer the
>    switch
> 3. Remove generic finish_arch_post_lock_switch() because its
>    functionality has been entirely replaced by finish_switch_mm()
> 
> Anyway, we probably end up in the same place anyway.

Peter pointed me to finish_arch_post_lock switch as a replacement for
finish_switch_mm. They are basically doing the same thing and I do not care
too much how the function is called. finish_arch_post_lock_switch is ok from
my point of view. If you want to change it be aware of the header file hell
you are getting into. 
 
> But does this solve the problem of being preempted between switch_mm()
> and finish_arch_post_lock_switch()? I guess we still need the same
> guarantees that both switch_mm() and the hook happen on the same CPU.

By itself no, I do not think so. finish_arch_post_lock_switch is supposed
to be called with no locks held, so preemption should be enabled as well.

> > https://git.kernel.org/cgit/linux/kernel/git/s390/linux.git/commit/?h=features&id=525d65f8f66ac29136ba6d2336f5a73b038701e2
> 
> That's a way to solve it for s390. I don't particularly like
> transferring the mm switch pending TIF flag to the next task but I think
> it does the job (just personal preference).

It gets the job done. The alternative is another per-cpu bitmap for each
mm. I prefer the transfer of the TIF flag to the next task, we do that
for the machine check flag TIF_MCCK_PENDING anyway. One more bit to transfer
does not hurt.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

next prev parent reply	other threads:[~2014-02-17 12:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-13 15:51 [PATCH] sched/core: Create new task with twice disabled preemption Kirill Tkhai
2014-02-13 16:00 ` Peter Zijlstra
2014-02-13 17:32   ` Kirill Tkhai
2014-02-14 10:52     ` Catalin Marinas
2014-02-14 11:16       ` Kirill Tkhai
2014-02-14 12:21         ` Catalin Marinas
2014-02-14 12:33           ` Kirill Tkhai
2014-02-17  9:37       ` Martin Schwidefsky
2014-02-17 10:40         ` Catalin Marinas
2014-02-17 12:55           ` Martin Schwidefsky [this message]
2014-02-14 12:35 ` Catalin Marinas
2014-02-14 12:44   ` Kirill Tkhai
2014-02-14 15:49     ` Catalin Marinas
2014-02-17 14:43       ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140217135532.372ef91e@mschwide \
    --to=schwidefsky@de.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=ktkhai@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tkhai@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.