From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Jan Beulich" <jbeulich@novell.com>
Subject: Re: [PATCH, RFC] x86: make the GDT per-CPU
Date: Thu, 11 Sep 2008 13:28:07 +0100
Message-ID: <48C92AF7.76E4.0078.0@novell.com>
References: <48C7F75C.76E4.0078.0@novell.com>
	<C4EEB789.26F59%keir.fraser@eu.citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <C4EEB789.26F59%keir.fraser@eu.citrix.com>
Content-Disposition: inline
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Keir Fraser <keir.fraser@eu.citrix.com>
Cc: xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

>>> Keir Fraser <keir.fraser@eu.citrix.com> 11.09.08 12:54 >>>
>On 10/9/08 15:35, "Jan Beulich" <jbeulich@novell.com> wrote:
>
>> The major issue with supporting a significantly larger number of =
physical
>> CPUs appears to be the use of per-CPU GDT entries - at present, x86-64
>> could support only up to 126 CPUs (with code changes to also use the
>> top-most GDT page, that would be 254). Instead of trying to go with
>> incremental steps here, by converting the GDT itself to be per-CPU,
>> limitations in that respect go away entirely.
>
>Two thoughts:
>
>Firstly, we don't really need the LDT and TSS GST slots to be always =
valid.
>Actually we always initialise the slot immediately before LTR or LLDT. So =
we
>could even have per-CPU LDT and TSS initialisation share a single slot.
>Then, with the extra reserved page, we'd be good for nearly 512 CPUs.

No, this would break 32-bits at least: The GDT entry for the selector
loaded into TR must remain a valid, busy TSS descriptor for the whole
lifetime of the system. So it can't be shared with the LDT. But even for
64-bits I would fear using the same GDT slot for both LDT and GDT
loading.

>Secondly: Actually your patch looks not too bad. But the double LGDT in
>context switch is nasty. But also I do not see why it is necessary?
>Presumably your fear is about using the prev->vcpu_id's mapped GDT in
>next->vcpu_id's page tables? But we should only be relying on GDT entries
>(HYPERVISOR_CS, HYPERVISOR_DS, for example) which are identical in all
>per-CPU GDTs. So why do you need to add that LGDT before CR3 switch at =
all?

The goal is that the per-CPU descriptor be valid at all times (see the
check_cpu() calls I put in there for debugging). As the double fault =
handlers
have no way of deriving the current processor other than from that GDT
entry (actually, I think x86-64 could, but didn't so far, so I didn't =
change
that now), they'd break during that window. While you may argue that
double faults are rare, my point here is that if we ever see one, =
analyzing
its dump shouldn't be made more difficult than it likely already will be.

>You would need to use l1e_write_atomic() in the context-switch code, to =
make
>sure all VCPU's hypervisor reserved GDT mappings are always valid. =
Actually
>you must at least use l1e_write() in any case -- it is not safe to not =
use
>one of those macros on a live pagetable (by which I mean possibly in use =
by
>some CPU) because a direct write of a PAE pte is not atomic and can cause
>the pte to pass through a bogus intermediate state (which could be =
bogusly
>prefetched by a CPU into its TLB. Yuk!).

Ah, yes. l1e_write() should be sufficient, though, as the slot(s) that =
get(s)
written cannot be validly in use on any CPU (for other than speculation).

Jan