public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@suse.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: Review of KPTI patchset
Date: Sat, 30 Dec 2017 20:43:07 +0000 (UTC)	[thread overview]
Message-ID: <1311401854.45816.1514666587545.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1712302043070.1899@nanos>

----- On Dec 30, 2017, at 2:58 PM, Thomas Gleixner tglx@linutronix.de wrote:

> On Sat, 30 Dec 2017, Mathieu Desnoyers wrote:
> 
>> Hi Thomas,
>> 
>> Here is some feedback on the KPTI patchset. Sorry for not replying to the
>> patch, I was not CC'd on the original email, and don't have it in my inbox.
> 
> I can bounce you 196 versions if you want.

Oh no, don't worry about this. I'm happy reviewing the resulting patchset
as it is. :)

> 
>> I notice that fill_ldt() sets the desc->type with "|= 1", whereas all
>> other operations on the desc type are done with a type enum based on
>> clearly defined bits. Is the hardcoded "1" on purpose ?
> 
> I don't understand your question. That code does not have any enum involved
> at all:

I think I got mixed up with other "desc" fields within other structures
of desc_defs.h.

> 
>        desc->type              = (info->read_exec_only ^ 1) << 1;
>        desc->type             |= info->contents << 2;
>        /* Set the ACCESS bit so it can be mapped RO */
>        desc->type             |= 1;
> 
> So the |= 1 is completely consistent with the rest of that code.

It indeed seems consistent with the rest of that code, which could use
more comments and documentation. For instance, x86 desc_defs.h
could benefit from extra comments describing the meaning of each bit
near the "type" field.

I guess a counter-argument is that anyone reading through that code
should look up the "segment descriptor" layout in a x86 manual. Not
ideal though.

> 
>> arch/x86/include/asm/processor.h:
>> 
>> "+ * With page table isolation enabled, we map the LDT in ... [stay tuned]"
>> 
>> I look forward to publication of the next chapter containing the rest of
>> this sentence. When is it due ? ;)
> 
> Don't know. Lost my crystal ball.

Me too :) I would be helpful to complete this comment though.

[...]

>> @@ -156,6 +271,12 @@ int ldt_dup_context(struct mm_struct *old_mm, struct
>> mm_struct *mm)
>>  	       new_ldt->nr_entries * LDT_ENTRY_SIZE);
>>  	finalize_ldt_struct(new_ldt);
>>  
>> +	retval = map_ldt_struct(mm, new_ldt, 0);
>> +	if (retval) {
>> +		free_ldt_pgtables(mm);
>> +		free_ldt_struct(new_ldt);
>> +		goto out_unlock;
>> +	}
>>  	mm->context.ldt = new_ldt;
>>  
>>  out_unlock:
>> 
>> ^ I don't get why it does "free_ldt_pgtables(mm)" on the mm argument, but
>> it's not done in other error paths. Perhaps it's OK, but ownership seems
>> non-obvious.
> 
> The pagetable for LDT is allocated and populated in the user space visible
> part of a process PGDIR, which obviously is connected to the mm struct....
> 
> Which other error paths are you talking about?

Let's look at the entire function:

> /*
>  * Called on fork from arch_dup_mmap(). Just copy the current LDT state,
>  * the new task is not running, so nothing can be installed.
>  */
> int ldt_dup_context(struct mm_struct *old_mm, struct mm_struct *mm)
> {
>       struct ldt_struct *new_ldt;
>       int retval = 0;
>
>       if (!old_mm)
>               return 0;

If old_mm is NULL, free_ldt_pgtables(mm) is not called.

>
>       mutex_lock(&old_mm->context.lock);
>       if (!old_mm->context.ldt)

If old_mm->context.ldt is NULL, free_ldt_pgtables(mm) is not called.

>               goto out_unlock;
>
>       new_ldt = alloc_ldt_struct(old_mm->context.ldt->nr_entries);
>       if (!new_ldt) {
>               retval = -ENOMEM;

On allocation error, free_ldt_pgtables(mm) is not called.

>               goto out_unlock;
>       }
>
>       memcpy(new_ldt->entries, old_mm->context.ldt->entries,
>              new_ldt->nr_entries * LDT_ENTRY_SIZE);
>       finalize_ldt_struct(new_ldt);
>
>       retval = map_ldt_struct(mm, new_ldt, 0);
>       if (retval) {
>               free_ldt_pgtables(mm);

Here, if we fail to map_ldt_struct, then free_ldt_pgtables(mm) is called.

>               free_ldt_struct(new_ldt);

In addition to call free_ldt_struct(), but map_ldt_struct failed... ?

This lack of symmetry makes me uncomfortable, and it may hint at something
fishy.

>               goto out_unlock;
>       }
>       mm->context.ldt = new_ldt;
>
> out_unlock:
>       mutex_unlock(&old_mm->context.lock);
>       return retval;
> }

[...]

> 
>> +	/*
>> +	 * Force the population of PMDs for not yet allocated per cpu
>> +	 * memory like debug store buffers.
>> +	 */
>> +	npages = sizeof(struct debug_store_buffers) / PAGE_SIZE;
>> +	for (; npages; npages--, cea += PAGE_SIZE)
>> +		cea_set_pte(cea, 0, PAGE_NONE);
>> 
>> ^ the code above (in percpu_setup_debug_store()) depends on having
>> struct debug_store_buffers's size being a multiple of PAGE_SIZE. A
>> comment should be added near the structure declaration to document
>> this requirement.
> 
> Hmm. There was a build_bug_on() somewhere which ensured that. That must
> have been lost in one of the gazillion iterations.

A build bug on would work as documentation indeed.

[...]

> 
>> +/*
>> + * We get here when we do something requiring a TLB invalidation
>> + * but could not go invalidate all of the contexts.  We do the
>> + * necessary invalidation by clearing out the 'ctx_id' which
>> + * forces a TLB flush when the context is loaded.
>> + */
>> +void clear_asid_other(void)
>> +{
>> +	u16 asid;
>> +
>> +	/*
>> +	 * This is only expected to be set if we have disabled
>> +	 * kernel _PAGE_GLOBAL pages.
>> +	 */
>> +	if (!static_cpu_has(X86_FEATURE_PTI)) {
>> +		WARN_ON_ONCE(1);
>> +		return;
>> +	}
>> +
>> +	for (asid = 0; asid < TLB_NR_DYN_ASIDS; asid++) {
>> +		/* Do not need to flush the current asid */
>> +		if (asid == this_cpu_read(cpu_tlbstate.loaded_mm_asid))
>> +			continue;
>> +		/*
>> +		 * Make sure the next time we go to switch to
>> +		 * this asid, we do a flush:
>> +		 */
>> +		this_cpu_write(cpu_tlbstate.ctxs[asid].ctx_id, 0);
>> +	}
>> +	this_cpu_write(cpu_tlbstate.invalidate_other, false);
>> +}
>> 
>> Can this be called with preemption enabled ? If so, what happens
>> if migrated ?
> 
> No, it can't and if it is then it's a bug and the smp_processor_id() debug
> code will yell at you.

I thought the whole point about this_cpu_*() was that it could be called
with preemption enabled, given that it figures out the per-cpu data offset
using a segment selector prefix. How would smp_processor_id() debug code be
involved here ?

Thanks,

Mathieu


> 
> Thanks,
> 
> 	tglx

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2017-12-30 20:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-30 18:45 Review of KPTI patchset Mathieu Desnoyers
2017-12-30 19:58 ` Thomas Gleixner
2017-12-30 20:43   ` Mathieu Desnoyers [this message]
2017-12-30 22:02     ` Thomas Gleixner
2017-12-30 22:45       ` Thomas Gleixner
2017-12-31 14:09       ` Mathieu Desnoyers
2017-12-31 14:14         ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1311401854.45816.1514666587545.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=bp@suse.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox