virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andi Kleen <ak@muc.de>,
	lkml - Kernel Mailing List <linux-kernel@vger.kernel.org>,
	virtualization <virtualization@lists.osdl.org>
Subject: Re: [PATCH 5/7] Use %gs for per-cpu sections in kernel
Date: Sat, 23 Sep 2006 14:31:22 +1000	[thread overview]
Message-ID: <1158985882.26261.60.camel@localhost.localdomain> (raw)
In-Reply-To: <4514663E.5050707@goop.org>

On Fri, 2006-09-22 at 15:39 -0700, Jeremy Fitzhardinge wrote:
> Rusty Russell wrote:
> > +
> > +	/* Set up GDT entry for 16bit stack */
> > +	stk16_off = (u32)&per_cpu(cpu_16bit_stack, cpu);
> > +	gdt = per_cpu(cpu_gdt_table, cpu);
> > +	*(__u64 *)(&gdt[GDT_ENTRY_ESPFIX_SS]) |=
> > +		((((__u64)stk16_off) << 16) & 0x000000ffffff0000ULL) |
> > +		((((__u64)stk16_off) << 32) & 0xff00000000000000ULL) |
> > +		(CPU_16BIT_STACK_SIZE - 1);
> >   
> 
> This should use pack_descriptor().  I'd never got around to changing it, 
> but it really should.

Yep, agreed.

> > +	/* Complete percpu area setup early, before calling printk(),
> > +	   since it may end up using it indirectly. */
> > +	setup_percpu_for_this_cpu(cpu);
> > +
> >   
> 
> I managed to get all this done in head.S before going into C code; is 
> that not still possible?  Or is there a later patch to do this.

It's possible; it would simplify the C code a little, but I'll have to
see what the asm looks like.

> > +static __cpuinit void setup_percpu_descriptor(struct desc_struct *gdt,
> > +					      unsigned long per_cpu_off)
> > +{
> > +	unsigned limit, flags;
> > +
> > +	limit = (1 << 20);
> > +	flags = 0x8;		/* 4k granularity */
> >   
> 
> Why not set the limit to the percpu section size?  It would avoid having 
> it clipped under Xen.

Sure... there was a couple of other things Xen needs, too, so I thought
I'd do a separate patch (whole page for GDT and the xen page, which
means generic per-cpu setup should use boot_alloc_pages()).

> > +/* Be careful not to use %gs references until this is setup: needs to
> > + * be done on this CPU. */
> > +void __init setup_percpu_for_this_cpu(unsigned int cpu)
> > +{
> > +	struct desc_struct *gdt = per_cpu(cpu_gdt_table, cpu);
> > +	struct Xgt_desc_struct *cpu_gdt_descr = &per_cpu(cpu_gdt_descr, cpu);
> > +
> > +	per_cpu(this_cpu_off, cpu) = __per_cpu_offset[cpu];
> > +	setup_percpu_descriptor(&gdt[GDT_ENTRY_PERCPU],	__per_cpu_offset[cpu]);
> > +	cpu_gdt_descr->address = (unsigned long)gdt;
> > +	cpu_gdt_descr->size = GDT_SIZE - 1;
> > +	load_gdt(cpu_gdt_descr);
> > +	set_kernel_gs();
> > +}
> >   
> 
> Everything except the load_gdt and set_kernel_gs could be done in advance.

Yes.   Which particularly makes sense if this is done in asm, as you
suggested above.

> > +#define percpu_to_op(op,var,val)				\
> > +	do {							\
> > +		typedef typeof(var) T__;			\
> > +		if (0) { T__ tmp__; tmp__ = (val); }		\
> > +		switch (sizeof(var)) {				\
> > +		case 1:						\
> > +			asm(op "b %1,"__percpu_seg"%0"		\
> >   
> 
> So are symbols referencing the .data.percpu section 0-based?  Wouldn't 
> you need to subtract __per_cpu_start from the symbols to get a 0-based 
> segment offset?

I don't think I understand the question.

The .data.percpu section is the "template" per-cpu section, freed along
with other initdata: after setup_percpu_areas() is called, it is not
supposed to be used.  Around that time, the gs segment is set up based
at __per_cpu_offset[cpu], so "%gs:<varname>" accesses the local version.

> Or is the only percpu benefit you're getting from %gs is a slightly 
> quicker way of getting the percpu_offset?  Does that help much?

In generic code, that's true (the arch-specific accessors can do it in 1
insn, not two).  But it's still a help.  This is __raw_get_cpu_var(x)
before:

   3:   89 e0                   mov    %esp,%eax
   5:   25 00 e0 ff ff          and    $0xffffe000,%eax
   a:   8b 40 08                mov    0x8(%eax),%eax
   d:   8b 04 85 00 00 00 00    mov    0x0(,%eax,4),%eax
                        10: R_386_32    __per_cpu_offset
  14:   8b 80 00 00 00 00       mov    0x0(%eax),%eax
                        16: R_386_32    per_cpu__x

And this is after:

  1f:   65 a1 00 00 00 00       mov    %gs:0x0,%eax
                        21: R_386_32    per_cpu__this_cpu_off
  25:   8b 80 00 00 00 00       mov    0x0(%eax),%eax
                        27: R_386_32    per_cpu__x

So we go from 5 instructions, 23 bytes, 3 memory references, to 2
instructions, 12 bytes, 2 memory references (although the extra mem ref
will almost certainly have been in cache).

> > +#define x86_read_percpu(var) percpu_from_op("mov", per_cpu__##var)
> > +#define x86_write_percpu(var,val) percpu_to_op("mov", per_cpu__##var, val)
> > +#define x86_add_percpu(var,val) percpu_to_op("add", per_cpu__##var, val)
> > +#define x86_sub_percpu(var,val) percpu_to_op("sub", per_cpu__##var, val)
> > +#define x86_or_percpu(var,val) percpu_to_op("or", per_cpu__##var, val)  
> 
> Why x86_?  If some other arch implemented a similar mechanism, wouldn't 
> they want to use the same macro names?

Possibly, but for the moment they are very arch specific: we really
don't want them in generic code.  It *might* be worth creating a generic
"read_per_cpu()" which returns a rvalue, but IMHO adding a new thread
model which is all-positive-offset is probably a better long-term plan.

Cheers,
Rusty.
-- 
Help! Save Australia from the worst of the DMCA: http://linux.org.au/law

  reply	other threads:[~2006-09-23  4:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-22 11:51 [PATCH 0/7] Using %gs for per-cpu areas on x86 Rusty Russell
2006-09-22 11:53 ` [PATCH 1/7] Use per-cpu GDT tables from early in boot Rusty Russell
2006-09-22 11:55   ` [PATCH 2/7] Rusty Russell
2006-09-22 11:56     ` [PATCH 3/7] Update sys_vm86 to cope with changed pt_regs and %gs usage Rusty Russell
2006-09-22 11:58       ` [PATCH 4/7] Fix places where using %gs changes the usermode ABI Rusty Russell
2006-09-22 11:59         ` [PATCH 5/7] Use %gs for per-cpu sections in kernel Rusty Russell
2006-09-22 12:00           ` [PATCH 6/7] (Optional) implement smp_processor_id() as a per-cpu var Rusty Russell
2006-09-22 12:01             ` [PATCH 7/7] (Optional) implement current " Rusty Russell
2006-09-25  5:29               ` Rusty Russell
2006-09-25  5:27             ` [PATCH 6/7] (Optional) implement smp_processor_id() " Rusty Russell
2006-09-22 12:32           ` [PATCH 5/7] Use %gs for per-cpu sections in kernel Andi Kleen
2006-09-22 22:43             ` Jeremy Fitzhardinge
2006-09-22 23:52               ` Andi Kleen
2006-09-23  4:51             ` Rusty Russell
2006-09-23  8:17               ` Andi Kleen
2006-09-23  8:55                 ` Rusty Russell
2006-09-22 22:39           ` Jeremy Fitzhardinge
2006-09-23  4:31             ` Rusty Russell [this message]
2006-09-25  1:03               ` Jeremy Fitzhardinge
2006-09-25  1:16                 ` Rusty Russell
2006-09-25  1:36                   ` Jeremy Fitzhardinge
2006-09-25  2:51                     ` Rusty Russell
2006-09-25  5:25                       ` Jeremy Fitzhardinge
2006-09-25  6:03                         ` Rusty Russell
2006-09-25  6:25                           ` Jeremy Fitzhardinge
2006-09-25 23:33                             ` Rusty Russell
2006-09-23  8:13             ` Andi Kleen
2006-09-25  1:07               ` Jeremy Fitzhardinge
2006-09-25  1:20                 ` Rusty Russell
2006-09-25  5:26                   ` Rusty Russell
2006-09-22 22:24     ` [PATCH 2/7] Jeremy Fitzhardinge
2006-09-23  4:36       ` Rusty Russell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1158985882.26261.60.camel@localhost.localdomain \
    --to=rusty@rustcorp.com.au \
    --cc=ak@muc.de \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=virtualization@lists.osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).