public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [x86] do_arch_prctl - bug?
@ 2008-11-18 14:33 Eric Lacombe
  2008-11-18 14:45 ` Arjan van de Ven
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Lacombe @ 2008-11-18 14:33 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

Hello,

I would like to know why the ARCH_SET_GS action of sys_arch_prctl, write the 
MSR MSR_KERNEL_GS_BASE and not the MSR MSR_GS_BASE when the variable "doit" 
equals 1? Is that a bug?

In other words, why the following code :
...
	if (doit) {
        	load_gs_index(0);
                ret = checking_wrmsrl(MSR_KERNEL_GS_BASE, addr)
	} 
...
is not the following one :
...
	if (doit) {
		ret = checking_wrmsrl(MSR_GS_BASE, addr)
        	load_gs_index(0);
	} 
...

I copy for clarity the beginning of the function "do_arch_prctl" :

long do_arch_prctl(struct task_struct *task, int code, unsigned long addr)
{ 
        int ret = 0; 
        int doit = task == current;
        int cpu;

        switch (code) { 
        case ARCH_SET_GS:
                if (addr >= TASK_SIZE_OF(task))
                        return -EPERM; 
                cpu = get_cpu();
                /* handle small bases via the GDT because that's faster to 
                   switch. */
                if (addr <= 0xffffffff) {  
                        set_32bit_tls(task, GS_TLS, addr); 
                        if (doit) { 
                                load_TLS(&task->thread, cpu);
                                load_gs_index(GS_TLS_SEL); 
                        }
                        task->thread.gsindex = GS_TLS_SEL; 
                        task->thread.gs = 0;
                } else { 
                        task->thread.gsindex = 0;
                        task->thread.gs = addr;
                        if (doit) {
                              load_gs_index(0);
                              ret = checking_wrmsrl(MSR_KERNEL_GS_BASE, addr);
                        } 
                }
                put_cpu();
                break;
[...]

Regards,

	Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [x86] do_arch_prctl - bug?
  2008-11-18 14:33 [x86] do_arch_prctl - bug? Eric Lacombe
@ 2008-11-18 14:45 ` Arjan van de Ven
       [not found]   ` <200811181820.04064.goretux@gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Arjan van de Ven @ 2008-11-18 14:45 UTC (permalink / raw)
  To: Eric Lacombe; +Cc: Ingo Molnar, linux-kernel

On Tue, 18 Nov 2008 15:33:32 +0100
Eric Lacombe <goretux@gmail.com> wrote:

> Hello,
> 
> I would like to know why the ARCH_SET_GS action of sys_arch_prctl,
> write the MSR MSR_KERNEL_GS_BASE and not the MSR MSR_GS_BASE when the
> variable "doit" equals 1? Is that a bug?
> 

I don't think it is.
The trick is that we use "swapgs" on entering/leaving the kernel, and
that will "swap" gs with the MSR, so when we return to userspace, GS
gets loaded from the MSR_KERNEL_GS_BASE ...


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [x86] do_arch_prctl - bug?
@ 2008-11-18 17:35 Eric Lacombe
  2008-11-18 23:44 ` Eric Lacombe
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Lacombe @ 2008-11-18 17:35 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Ingo Molnar, linux-kernel

Le mardi 18 novembre 2008 15:45:56, vous avez écrit :
> On Tue, 18 Nov 2008 15:33:32 +0100
>
> Eric Lacombe <goretux@gmail.com> wrote:
> > Hello,
> >
> > I would like to know why the ARCH_SET_GS action of sys_arch_prctl,
> > write the MSR MSR_KERNEL_GS_BASE and not the MSR MSR_GS_BASE when the
> > variable "doit" equals 1? Is that a bug?
>
> I don't think it is.
> The trick is that we use "swapgs" on entering/leaving the kernel, and
> that will "swap" gs with the MSR, so when we return to userspace, GS
> gets loaded from the MSR_KERNEL_GS_BASE ...

Yeah when we enter the kernel swapgs is used, so the MSR_GS_BASE is switched 
with the MSR_KERNEL_GS_BASE.

In fact, what I certainly misunderstand is why load_gs_index use swapgs 
inside.
>From that function, I trust that only when gs is loaded, its hidden part is 
loaded with the MSR_GS_BASE.

ENTRY(native_load_gs_index)
        CFI_STARTPROC
        pushf
        CFI_ADJUST_CFA_OFFSET 8
        DISABLE_INTERRUPTS(CLBR_ANY | ~(CLBR_RDI))
        SWAPGS
gs_change:     
        movl %edi,%gs   
2:      mfence          /* workaround */
        SWAPGS
        popf
        CFI_ADJUST_CFA_OFFSET -8
        ret
        CFI_ENDPROC
ENDPROC(native_load_gs_index)

Regards,

	Eric


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [x86] do_arch_prctl - bug?
       [not found]   ` <200811181820.04064.goretux@gmail.com>
@ 2008-11-18 17:35     ` Eric Lacombe
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Lacombe @ 2008-11-18 17:35 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Ingo Molnar, linux-kernel

In fact, if what I thought from that function was ok, we will have in fact the 
order of the two following lines inversed :
...
  load_gs_index(0);
  ret = checking_wrmsrl(MSR_KERNEL_GS_BASE, addr);
...

So that we would have :
...
  ret = checking_wrmsrl(MSR_KERNEL_GS_BASE, addr);
  load_gs_index(0);
...

Regards,

	Eric

Le mardi 18 novembre 2008 18:20:03 Eric Lacombe, vous avez écrit :
> Le mardi 18 novembre 2008 15:45:56, vous avez écrit :
> > On Tue, 18 Nov 2008 15:33:32 +0100
> >
> > Eric Lacombe <goretux@gmail.com> wrote:
> > > Hello,
> > >
> > > I would like to know why the ARCH_SET_GS action of sys_arch_prctl,
> > > write the MSR MSR_KERNEL_GS_BASE and not the MSR MSR_GS_BASE when the
> > > variable "doit" equals 1? Is that a bug?
> >
> > I don't think it is.
> > The trick is that we use "swapgs" on entering/leaving the kernel, and
> > that will "swap" gs with the MSR, so when we return to userspace, GS
> > gets loaded from the MSR_KERNEL_GS_BASE ...
>
> Yeah when we enter the kernel swapgs is used, so the MSR_GS_BASE is
> switched with the MSR_KERNEL_GS_BASE.
>
> In fact, what I certainly misunderstand is why load_gs_index use swapgs
> inside.
> From that function, I trust that only when gs is loaded, its hidden part is
> loaded with the MSR_GS_BASE.
>
> ENTRY(native_load_gs_index)
>         CFI_STARTPROC
>         pushf
>         CFI_ADJUST_CFA_OFFSET 8
>         DISABLE_INTERRUPTS(CLBR_ANY | ~(CLBR_RDI))
>         SWAPGS
> gs_change:
>         movl %edi,%gs
> 2:      mfence          /* workaround */
>         SWAPGS
>         popf
>         CFI_ADJUST_CFA_OFFSET -8
>         ret
>         CFI_ENDPROC
> ENDPROC(native_load_gs_index)
>
> Regards,
>
> 	Eric


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [x86] do_arch_prctl - bug?
  2008-11-18 17:35 Eric Lacombe
@ 2008-11-18 23:44 ` Eric Lacombe
  2008-11-19  1:07   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Lacombe @ 2008-11-18 23:44 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Ingo Molnar, linux-kernel

I look at the Intel docs (vol. 3A) again, and see that in 64 bits mode the 
hidden field gs.base are physically mapped to the MSR, so it seems that in 
order to load gs.base we don't need to load gs (like in 32 bits mode), but 
rather we only need to load the MSR.

So I don't understand the purpose of load_gs_index in that context :

if (doit) {
	load_gs_index(0);
	ret = checking_wrmsrl(MSR_KERNEL_GS_BASE, addr);
}

Why don't we only load the MSR ?
What is the purpose of calling load_gs_index with 0 as parameter ?

Thanks in advance for your response,

	Eric

> ENTRY(native_load_gs_index)
>         CFI_STARTPROC
>         pushf
>         CFI_ADJUST_CFA_OFFSET 8
>         DISABLE_INTERRUPTS(CLBR_ANY | ~(CLBR_RDI))
>         SWAPGS
> gs_change:
>         movl %edi,%gs
> 2:      mfence          /* workaround */
>         SWAPGS
>         popf
>         CFI_ADJUST_CFA_OFFSET -8
>         ret
>         CFI_ENDPROC
> ENDPROC(native_load_gs_index)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [x86] do_arch_prctl - bug?
  2008-11-18 23:44 ` Eric Lacombe
@ 2008-11-19  1:07   ` Jeremy Fitzhardinge
  2008-11-19  9:23     ` Eric Lacombe
  0 siblings, 1 reply; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-11-19  1:07 UTC (permalink / raw)
  To: Eric Lacombe; +Cc: Arjan van de Ven, Ingo Molnar, linux-kernel

Eric Lacombe wrote:
> I look at the Intel docs (vol. 3A) again, and see that in 64 bits mode the 
> hidden field gs.base are physically mapped to the MSR, so it seems that in 
> order to load gs.base we don't need to load gs (like in 32 bits mode), but 
> rather we only need to load the MSR.
>
> So I don't understand the purpose of load_gs_index in that context :
>
> if (doit) {
> 	load_gs_index(0);
> 	ret = checking_wrmsrl(MSR_KERNEL_GS_BASE, addr);
> }
>
> Why don't we only load the MSR ?
> What is the purpose of calling load_gs_index with 0 as parameter ?
>   

Because %gs of 0 means "base too large, go to MSR".  If you have a 
32-bit base, then loading it into the gdt and loading %gs with the right 
selector is faster.  wrmsr/rdmsr are slow instructions.

    J

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [x86] do_arch_prctl - bug?
  2008-11-19  1:07   ` Jeremy Fitzhardinge
@ 2008-11-19  9:23     ` Eric Lacombe
  2008-11-19 21:06       ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Lacombe @ 2008-11-19  9:23 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Arjan van de Ven, Ingo Molnar, linux-kernel

Le mercredi 19 novembre 2008 02:07:23 Jeremy Fitzhardinge, vous avez écrit :
> Eric Lacombe wrote:
> > I look at the Intel docs (vol. 3A) again, and see that in 64 bits mode
> > the hidden field gs.base are physically mapped to the MSR, so it seems
> > that in order to load gs.base we don't need to load gs (like in 32 bits
> > mode), but rather we only need to load the MSR.
> >
> > So I don't understand the purpose of load_gs_index in that context :
> >
> > if (doit) {
> > 	load_gs_index(0);
> > 	ret = checking_wrmsrl(MSR_KERNEL_GS_BASE, addr);
> > }
> >
> > Why don't we only load the MSR ?
> > What is the purpose of calling load_gs_index with 0 as parameter ?
>
> Because %gs of 0 means "base too large, go to MSR".  If you have a
> 32-bit base, then loading it into the gdt and loading %gs with the right
> selector is faster.  wrmsr/rdmsr are slow instructions.

Ok, thanks, so I suppose now that only doing :
asm volatile("movl %0,%%gs" :: "r" (0));
could corrupt the address of the PDA that resides actually in the MSR_GS_BASE. 
And that's why load_gs_index is used as it contains "swapgs" before and after 
the "mov to gs".

Is that correct?

Regards,

	Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [x86] do_arch_prctl - bug?
  2008-11-19  9:23     ` Eric Lacombe
@ 2008-11-19 21:06       ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 8+ messages in thread
From: Jeremy Fitzhardinge @ 2008-11-19 21:06 UTC (permalink / raw)
  To: Eric Lacombe; +Cc: Arjan van de Ven, Ingo Molnar, linux-kernel

Eric Lacombe wrote:
> Ok, thanks, so I suppose now that only doing :
> asm volatile("movl %0,%%gs" :: "r" (0));
> could corrupt the address of the PDA that resides actually in the MSR_GS_BASE. 
> And that's why load_gs_index is used as it contains "swapgs" before and after 
> the "mov to gs".
>
> Is that correct?
>   

Yes, loading a selector into a segment register will load the lower 32 
bits of the base from the ldt/gdt into the msr and zero the rest.

    J

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-11-19 21:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-18 14:33 [x86] do_arch_prctl - bug? Eric Lacombe
2008-11-18 14:45 ` Arjan van de Ven
     [not found]   ` <200811181820.04064.goretux@gmail.com>
2008-11-18 17:35     ` Eric Lacombe
  -- strict thread matches above, loose matches on Subject: below --
2008-11-18 17:35 Eric Lacombe
2008-11-18 23:44 ` Eric Lacombe
2008-11-19  1:07   ` Jeremy Fitzhardinge
2008-11-19  9:23     ` Eric Lacombe
2008-11-19 21:06       ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox