Re: [PATCH 2/2] s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entries

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entries
Date: Thu, 14 Nov 2013 17:33:59 +0100	[thread overview]
Message-ID: <20131114173359.2e3cbd60@mschwide> (raw)
In-Reply-To: <20131114132223.GG20261@arm.com>

On Thu, 14 Nov 2013 13:22:23 +0000
Catalin Marinas <catalin.marinas@arm.com> wrote:

> On Thu, Nov 14, 2013 at 08:10:07AM +0000, Martin Schwidefsky wrote:
> > On Wed, 13 Nov 2013 16:16:35 +0000
> > Catalin Marinas <catalin.marinas@arm.com> wrote:
> > 
> > > On 13 November 2013 08:16, Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> > > > diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
> > > > index 5d1f950..e91afeb 100644
> > > > --- a/arch/s390/include/asm/mmu_context.h
> > > > +++ b/arch/s390/include/asm/mmu_context.h
> > > > @@ -48,13 +48,38 @@ static inline void update_mm(struct mm_struct *mm, struct task_struct *tsk)
> > > >  static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> > > >                              struct task_struct *tsk)
> > > >  {
> > > > -       cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
> > > > -       update_mm(next, tsk);
> > > > +       int cpu = smp_processor_id();
> > > > +
> > > > +       if (prev == next)
> > > > +               return;
> > > > +       if (atomic_inc_return(&next->context.attach_count) >> 16) {
> > > > +               /* Delay update_mm until all TLB flushes are done. */
> > > > +               set_tsk_thread_flag(tsk, TIF_TLB_WAIT);
> > > > +       } else {
> > > > +               cpumask_set_cpu(cpu, mm_cpumask(next));
> > > > +               update_mm(next, tsk);
> > > > +               if (next->context.flush_mm)
> > > > +                       /* Flush pending TLBs */
> > > > +                       __tlb_flush_mm(next);
> > > > +       }
> > > >         atomic_dec(&prev->context.attach_count);
> > > >         WARN_ON(atomic_read(&prev->context.attach_count) < 0);
> > > > -       atomic_inc(&next->context.attach_count);
> > > > -       /* Check for TLBs not flushed yet */
> > > > -       __tlb_flush_mm_lazy(next);
> > > > +}
> > > > +
> > > > +#define finish_switch_mm finish_switch_mm
> > > > +static inline void finish_switch_mm(struct mm_struct *mm,
> > > > +                                   struct task_struct *tsk)
> > > > +{
> > > > +       if (!test_and_clear_tsk_thread_flag(tsk, TIF_TLB_WAIT))
> > > > +               return;
> > > > +
> > > > +       while (atomic_read(&mm->context.attach_count) >> 16)
> > > > +               cpu_relax();
> > > > +
> > > > +       cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
> > > > +       update_mm(mm, tsk);
> > > > +       if (mm->context.flush_mm)
> > > > +               __tlb_flush_mm(mm);
> > > >  }
> > > 
> > > Some care is needed here with preemption (we had this on arm and I
> > > think we need a fix on arm64 as well). Basically you set TIF_TLB_WAIT
> > > on a thread but you get preempted just before finish_switch_mm(). The
> > > new thread has the same mm as the preempted on and switch_mm() exits
> > > early without setting another flag. So finish_switch_mm() wouldn't do
> > > anything but you still switched to the new mm. The fix is to make the
> > > flag per mm rather than thread (see commit bdae73cd374e).
> > 
> > Interesting. For s390 I need to make sure that each task attaching an
> > mm waits for the completion of concurrent TLB flush operations. If the
> > scheduler does not switch the mm I don't care, the mm is still attached.
> 
> I assume the actual hardware mm switch happens via update_mm(). If you
> have a context_switch() to a thread which requires an update_mm() but you
> defer this until finish_switch_mm(), you may be preempted before the
> hardware update. If the new context_switch() schedules a thread with the
> same mm as the preempted one, you no longer call update_mm(). So the new
> thread actually uses an old hardware mm.
 
If the code gets preempted between switch_mm() and finish_switch_mm()
the worst that can happen is that finish_switch_mm() is called twice.
If the preempted task is picked up again the previous task running
on the CPU at that time will do the schedule() call, including the
switch_mm() and the finish_switch_mm() before returning the code
location where preemption interrupt it. I don't see how we could end
up with an incorrect mm.

But back to the original question: would it cause a problem for arm
if we add the two additional calls to finish_arch_post_lock_switch()
to idle_task_exit() and use_mm() ?

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

next prev parent reply	other threads:[~2013-11-14 16:34 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-13  8:16 [PATCH 0/2] sched: finish_switch_mm hook Martin Schwidefsky
2013-11-13  8:16 ` [PATCH 1/2] sched/mm: add finish_switch_mm function Martin Schwidefsky
2013-11-13 11:41   ` Peter Zijlstra
2013-11-13 11:49     ` Martin Schwidefsky
2013-11-13 12:19     ` Catalin Marinas
2013-11-13 16:05       ` Martin Schwidefsky
2013-11-13 17:03         ` Catalin Marinas
2013-11-14  8:00           ` Martin Schwidefsky
2013-11-13  8:16 ` [PATCH 2/2] s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entries Martin Schwidefsky
2013-11-13 16:16   ` Catalin Marinas
2013-11-14  8:10     ` Martin Schwidefsky
2013-11-14 13:22       ` Catalin Marinas
2013-11-14 16:33         ` Martin Schwidefsky [this message]
2013-11-15 10:44           ` Catalin Marinas
2013-11-15 11:10             ` Martin Schwidefsky
2013-11-15 11:17               ` Martin Schwidefsky
2013-11-15 11:57                 ` Catalin Marinas
2013-11-15 13:29                   ` Martin Schwidefsky
2013-11-15 13:46                     ` Catalin Marinas
2013-11-18  8:11                       ` Martin Schwidefsky
2013-11-15  9:13       ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131114173359.2e3cbd60@mschwide \
    --to=schwidefsky@de.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.