All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	 Andy Lutomirski <luto@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	x86@kernel.org,  linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/3] x86/mm: fix LAM cr3 mask inconsistency during context switch
Date: Thu, 7 Mar 2024 20:42:18 +0000	[thread overview]
Message-ID: <ZeomquHvZs9-BKKK@google.com> (raw)
In-Reply-To: <ba8a51fe-7b22-46b1-be5f-1e4c837d085c@intel.com>

On Thu, Mar 07, 2024 at 09:36:58AM -0800, Dave Hansen wrote:
> I know we all have different rules, but any time you could spend absorbing:
> 
> 	https://www.kernel.org/doc/html/next/process/maintainer-tip.html

Thanks for the quick review and tips.

I didn't know this existed, I will take a look before respinning.

> 
> would be appreciated, especially:
> 
> > The condensed patch description in the subject line should start with
> > a uppercase letter and should be written in imperative tone.
> 
> 
> On 3/7/24 05:39, Yosry Ahmed wrote:
> > In switch_mm_irqs_off(), we read the 'mm->context.lam_cr3_mask' into
> > 'new_lam', which is later passed to load_new_mm_cr3(). However, there is
> > a call to set_tlbstate_lam_mode() in between which will read
> > 'mm->context.lam_cr3_mask' again and set 'cpu_tlbstate.lam' accordingly.
> > If we race with another thread updating 'mm->context.lam_cr3_mask', the
> > value in 'cpu_tlbstate.lam' could end up being different from CR3.
> 
> Your description is fine (modulo the we's).  But I slightly reworded it
> to make it more plainly readable:
> 
> LAM can only be enabled when a process is single-threaded.  But _kernel_
> threads can temporarily use a single-threaded process's mm.  That means
> that a context-switching kernel thread can race and observe the mm's LAM
> metadata (mm->context.lam_cr3_mask) change.
> 
> The context switch code does two logical things with that metadata:
> populate CR3 and populate 'cpu_tlbstate.lam'.  If it hits this race,
> 'cpu_tlbstate.lam' and CR3 can end up out of sync.
> 
> This de-synchronization is currently harmless.  But it is confusing and
> might lead to warnings or real bugs.

Thanks a lot! I will adopt your version moving forward :)

> 
> --
> 
> > Fix the problem by updating set_tlbstate_lam_mode() to return the LAM
> > mask that was set to 'cpu_tlbstate.lam', and use that mask in
> > switch_mm_irqs_off() when writing CR3. Use READ_ONCE to make sure we
> > read the mask once and use it consistenly.
> 
> Spell checking is also appreciated.

I did run checkpatch. Did it miss something?

> 
> ...
> > -static inline void set_tlbstate_lam_mode(struct mm_struct *mm)
> > +static inline unsigned long set_tlbstate_lam_mode(struct mm_struct *mm)
> >  {
> > -	this_cpu_write(cpu_tlbstate.lam,
> > -		       mm->context.lam_cr3_mask >> X86_CR3_LAM_U57_BIT);
> > +	unsigned long lam = READ_ONCE(mm->context.lam_cr3_mask);
> > +
> > +	this_cpu_write(cpu_tlbstate.lam, lam >> X86_CR3_LAM_U57_BIT);
> >  	this_cpu_write(tlbstate_untag_mask, mm->context.untag_mask);
> > +	return lam;
> >  }
> 
> The comments about races need to be _here_ so that the purpose of the
> READ_ONCE() is clear.
> 
> It would also be nice to call out the rule that this can only
> meaningfully be called once per context switch.

I wanted the comments in switch_mm_irqs_off() where the races actually
matter, but I guess I can make the comment more generic and specify that
the return value is used to write CR3 so we READ_ONCE keeps CR3 and
tlbstate.lam consistent.

> 
> > @@ -633,7 +628,12 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next,
> >  		barrier();
> >  	}
> >  
> > -	set_tlbstate_lam_mode(next);
> > +	/*
> > +	 * Even if we are not actually switching mm's, another thread could have
> > +	 * updated mm->context.lam_cr3_mask. Make sure tlbstate_lam_cr3_mask()
> > +	 * and the loaded CR3 use the up-to-date mask.
> > +	 */
> 
> I kinda dislike how the comment talks about the details of what
> set_tlbstate_lam_mode() does.  It would be much better to put the meat
> of this comment at the set_tlbstate_lam_mode() definition.

Agreed. I will move most comments to set_tlbstate_lam_mode().

> 
> > +	new_lam = set_tlbstate_lam_mode(next);
> >  	if (need_flush) {
> >  		this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id);
> >  		this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen);
> 
> This is less a complaint about your change and more of the existing
> code, but I wish it was more obvious that set_tlbstate_lam_mode() is
> logically shuffling data (once) from 'next' into the tlbstate.
> 
> The naming makes it sound like it is modifying the tlbstate of 'next'.

We can update the function name to make it more verbose, maybe something
like update_cpu_tlbstate_lam()? We can also try to put "return"
somewhere in the name to imply that it returns the LAM mask it sets, but
I can't make that look pretty.

> 
> But I don't have any particularly brilliant ideas to fix it either.
> Maybe just:
> 
> 	/* new_lam is effectively cpu_tlbstate.lam */
> 
> > @@ -705,7 +705,6 @@ void initialize_tlbstate_and_flush(void)
> >  
> >  	/* LAM expected to be disabled */
> >  	WARN_ON(cr3 & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57));
> > -	WARN_ON(mm_lam_cr3_mask(mm));
> >  
> >  	/*
> >  	 * Assert that CR4.PCIDE is set if needed.  (CR4.PCIDE initialization
> > @@ -724,7 +723,7 @@ void initialize_tlbstate_and_flush(void)
> >  	this_cpu_write(cpu_tlbstate.next_asid, 1);
> >  	this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
> >  	this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen);
> > -	set_tlbstate_lam_mode(mm);
> > +	WARN_ON(set_tlbstate_lam_mode(mm));
> 
> The "set_" naming bugs me in both of the sites that get modified here.
> I'd be with a new name that fits better, if we can think of one.

Is it because it's not clear we are updating cpu_tlbstate (in which case
I think update_cpu_tlbstate_lam() is an improvement), or is it because
the function returns a value now? If the latter we can put "return" in
the name somewhere, or keep the function void and pass in an output
parameter.


  parent reply	other threads:[~2024-03-07 20:42 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-07 13:39 [RFC PATCH 0/3] x86/mm: LAM fixups and cleanups Yosry Ahmed
2024-03-07 13:39 ` [RFC PATCH 1/3] x86/mm: fix LAM cr3 mask inconsistency during context switch Yosry Ahmed
2024-03-07 17:22   ` Kirill A. Shutemov
2024-03-07 20:31     ` Yosry Ahmed
2024-03-07 17:36   ` Dave Hansen
2024-03-07 18:49     ` Sean Christopherson
2024-03-07 20:44       ` Yosry Ahmed
2024-03-07 22:12         ` Sean Christopherson
2024-03-07 20:42     ` Yosry Ahmed [this message]
2024-03-07 23:21       ` Yosry Ahmed
2024-03-07 23:32         ` Dave Hansen
2024-03-07 23:37           ` Yosry Ahmed
2024-03-07 13:39 ` [RFC PATCH 2/3] x86/mm: make sure LAM is up-to-date during context switching Yosry Ahmed
2024-03-07 15:29   ` Dave Hansen
2024-03-07 21:04     ` Yosry Ahmed
2024-03-07 21:39       ` Dave Hansen
2024-03-07 22:29         ` Yosry Ahmed
2024-03-07 22:41           ` Dave Hansen
2024-03-07 22:44             ` Yosry Ahmed
2024-03-08  1:26           ` Yosry Ahmed
2024-03-08  8:09             ` Yosry Ahmed
2024-03-07 17:29   ` Kirill A. Shutemov
2024-03-07 17:56     ` Dave Hansen
2024-03-07 21:08       ` Yosry Ahmed
2024-03-07 21:48         ` Dave Hansen
2024-03-07 22:30           ` Yosry Ahmed
2024-03-08  1:34   ` Andy Lutomirski
2024-03-08  1:47     ` Yosry Ahmed
2024-03-08 14:05       ` Kirill A. Shutemov
2024-03-08 15:23     ` Dave Hansen
2024-03-08 18:18       ` Kirill A. Shutemov
2024-03-09  2:19       ` Yosry Ahmed
2024-03-09 16:34         ` Kirill A. Shutemov
2024-03-09 21:37           ` Yosry Ahmed
2024-03-11 12:42             ` Kirill A. Shutemov
2024-03-11 18:27               ` Yosry Ahmed
2024-03-11  6:09   ` Dan Carpenter
2024-03-11 21:28     ` Yosry Ahmed
2024-03-07 13:39 ` [RFC PATCH 3/3] x86/mm: cleanup prctl_enable_tagged_addr() nr_bits error checking Yosry Ahmed
2024-03-07 17:31   ` Kirill A. Shutemov
2024-03-07 20:27     ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZeomquHvZs9-BKKK@google.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.