* huh startup_ipi_hook?
@ 2007-04-28 7:14 Eric W. Biederman
2007-04-28 7:22 ` Jeremy Fitzhardinge
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Eric W. Biederman @ 2007-04-28 7:14 UTC (permalink / raw)
To: Zachary Amsden; +Cc: virtualization
The current paravirt startup_ipi hook for vmware
commit: ae5da273fe3352febd38658d8d34484cbcfb3423
is quite frankly ridiculous.
In the middle of wake_up_secondary_cpu:
We have:
/*
* Paravirt / VMI wants a startup IPI hook here to set up the
* target processor state.
*/
startup_ipi_hook(phys_apicid, (unsigned long) start_secondary,
(unsigned long) stack_start.esp);
As far as I can tell from reading this there is a completely
different mechanism in place to start for a secondary processor.
Which seems sane.
What doesn't seem sane is bothering to run the rest of the code
for sending an INIT message to a secondary processor. It certainly
does not feel general at all.
I think we should be intercepting this startup call at a higher level,
where we can just say: Start secondary cpu with this stack
and with this esp. Or something like that.
So conceptually I think the concept makes sense but implementation
wise I think what is currently present is totally ridiculous.
Eric
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: huh startup_ipi_hook? 2007-04-28 7:14 huh startup_ipi_hook? Eric W. Biederman @ 2007-04-28 7:22 ` Jeremy Fitzhardinge 2007-04-28 8:06 ` Eric W. Biederman 2007-04-30 18:33 ` Zachary Amsden 2007-04-28 8:45 ` Andi Kleen 2007-04-30 20:30 ` Zachary Amsden 2 siblings, 2 replies; 14+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-28 7:22 UTC (permalink / raw) To: Eric W. Biederman; +Cc: virtualization Eric W. Biederman wrote: > So conceptually I think the concept makes sense but implementation > wise I think what is currently present is totally ridiculous. I tend to agree. For Xen I added smp_ops as an adjunct to paravirt_ops, which is basically the interface defined in linux/smp.h: struct smp_ops { void (*smp_prepare_boot_cpu)(void); void (*smp_prepare_cpus)(unsigned max_cpus); int (*cpu_up)(unsigned cpu); void (*smp_cpus_done)(unsigned max_cpus); void (*smp_send_stop)(void); void (*smp_send_reschedule)(int cpu); int (*smp_call_function_mask)(cpumask_t mask, void (*func)(void *info), void *info, int wait); }; This is a fairly close match to Xen's requirements. Certainly, anything APIC-related is useless for us, since there's no APIC emulation going on. I won't speak for Zach, but his counter-argument is generally along the lines of "we can just make use of the existing code with a couple of little hooks near the bottom". But I wonder if the existing genapic interface can be used (or extended) to cover these cases without having needing to have APIC-level interfaces in paravirt_ops. Are you reviewing -mm? That's basically OK, but there's newer stuff in Andi's patch queue. J ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-28 7:22 ` Jeremy Fitzhardinge @ 2007-04-28 8:06 ` Eric W. Biederman 2007-04-28 8:26 ` Jeremy Fitzhardinge 2007-04-30 18:33 ` Zachary Amsden 1 sibling, 1 reply; 14+ messages in thread From: Eric W. Biederman @ 2007-04-28 8:06 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: virtualization Jeremy Fitzhardinge <jeremy@goop.org> writes: > Eric W. Biederman wrote: >> So conceptually I think the concept makes sense but implementation >> wise I think what is currently present is totally ridiculous. > > I tend to agree. For Xen I added smp_ops as an adjunct to paravirt_ops, > which is basically the interface defined in linux/smp.h: > > struct smp_ops > { > void (*smp_prepare_boot_cpu)(void); > void (*smp_prepare_cpus)(unsigned max_cpus); > int (*cpu_up)(unsigned cpu); > void (*smp_cpus_done)(unsigned max_cpus); > > void (*smp_send_stop)(void); > void (*smp_send_reschedule)(int cpu); > int (*smp_call_function_mask)(cpumask_t mask, > void (*func)(void *info), void *info, > int wait); > }; That may work but at first glance that feels a little to high level, and a little lacking. What I am certain of is that we need a general ability to send inter processor interrupts. Beyond that I haven't looked closely yet. > This is a fairly close match to Xen's requirements. Certainly, anything > APIC-related is useless for us, since there's no APIC emulation going on. I almost agree. Real hardware in a paravirtualized setting is something we have to deal with. This means while we may not have to deal with APICs we do have to deal with irqs from real hardware, and there are a lot of implications there. Partly I suspect you haven't been getting some of the review you could have because arch/i386 is not that interesting right now. arch/x86_64 is where the code is generally clean, and new hardware support work tends to focus. > I won't speak for Zach, but his counter-argument is generally along the > lines of "we can just make use of the existing code with a couple of > little hooks near the bottom". But I wonder if the existing genapic > interface can be used (or extended) to cover these cases without having > needing to have APIC-level interfaces in paravirt_ops. Things need to be abstracted properly. Not to high or we don't share what should be common. Not to low or we place the hooks in the wrong location and we have a voyager on steroids problem. A big part of my problem with the startup_ipi_hook is that I am not convinced we were passing it the proper parameters. We care about some of the work that head.S does and that was just cavalierly bypassing it. Maintenance wise it looked as easy to maintain as voyager is today (way to easy to overlook). If you are going to feed a function start function and start stack you hook in where we are feeding the kernel start function and start stack. > Are you reviewing -mm? That's basically OK, but there's newer stuff in > Andi's patch queue. I wasn't even trying to review anything. Since I was touching some stuff in the general area while I was in there I was hoping to clean up arch/i386/head.S. It looks like there is currently too much other activity to do this easily. What you have are what I stumbled onto. I figured it best if I took a few minutes spoke up, and mentioned what I saw. Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-28 8:06 ` Eric W. Biederman @ 2007-04-28 8:26 ` Jeremy Fitzhardinge 2007-04-28 8:42 ` Eric W. Biederman 0 siblings, 1 reply; 14+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-28 8:26 UTC (permalink / raw) To: Eric W. Biederman; +Cc: virtualization Eric W. Biederman wrote: > That may work but at first glance that feels a little to high level, > and a little lacking. > > What I am certain of is that we need a general ability to send > inter processor interrupts. Beyond that I haven't looked closely > yet. > IPIs are used for three things: function calls, reschedule and tlb flush. smp_ops covers function calls and reschedule, and paravirt_ops has a cross-cpu tlb flush operation (which is not implemented as an IPI under Xen, since it knows what real cpus actually have stale state). >> This is a fairly close match to Xen's requirements. Certainly, anything >> APIC-related is useless for us, since there's no APIC emulation going on. >> > > I almost agree. Real hardware in a paravirtualized setting is > something we have to deal with. This means while we may not have to > deal with APICs we do have to deal with irqs from real hardware, > and there are a lot of implications there. > In the Xen model, hardware interrupts are mapped to event channels, and you can arrange for even channel IDs to be mapped directly to hardware irqs. But this is why I'm very interested in making the irq space dynamically allocatable, so that we can use event channel IDs directly as irqs, and easily have disjoint ranges for hardware statically allocated events and dynamic events. > Partly I suspect you haven't been getting some of the review you could > have because arch/i386 is not that interesting right now. arch/x86_64 > is where the code is generally clean, and new hardware support work > tends to focus. > That may be. I've been waiting to see what the outcome of the 32/64 bit merge discussions before launching into 64-bit paravirt_ops (though rostedt and glommer have made a good start on it). >> I won't speak for Zach, but his counter-argument is generally along the >> lines of "we can just make use of the existing code with a couple of >> little hooks near the bottom". But I wonder if the existing genapic >> interface can be used (or extended) to cover these cases without having >> needing to have APIC-level interfaces in paravirt_ops. >> > > Things need to be abstracted properly. Not to high or we don't share > what should be common. Not to low or we place the hooks in the wrong > location and we have a voyager on steroids problem. > Yes, and its tricky in places to have a single interface which is supposed to deal with both Xen and VMI, since they're often at opposite ends of the abstraction spectrum. So we end up with a high-level interface which calls into Xen code and the existing native code, and then some hooks in the native code to call out to Xen. If the native code were refactored a little more, I think this would come out fairly cleanly (ie, use it as a library of code which talks to hardware and things (mostly) emulating hardware). Things get a bit strange with VMI where it mixes hardware emulation with paravirtualization - the timer stuff, for example. > are what I stumbled onto. I figured it best if I took > a few minutes spoke up, and mentioned what I saw. > Thanks, J ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-28 8:26 ` Jeremy Fitzhardinge @ 2007-04-28 8:42 ` Eric W. Biederman 2007-04-28 8:59 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 14+ messages in thread From: Eric W. Biederman @ 2007-04-28 8:42 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: virtualization Jeremy Fitzhardinge <jeremy@goop.org> writes: > Eric W. Biederman wrote: >> That may work but at first glance that feels a little to high level, >> and a little lacking. >> >> What I am certain of is that we need a general ability to send >> inter processor interrupts. Beyond that I haven't looked closely >> yet. >> > > IPIs are used for three things: function calls, reschedule and tlb > flush. smp_ops covers function calls and reschedule, and paravirt_ops > has a cross-cpu tlb flush operation (which is not implemented as an IPI > under Xen, since it knows what real cpus actually have stale state). I was thinking of our magic process specific vectors and those aren't quite IPIs. But there are some other uses to add to your list but not necessarily in general we have irq migration, irq retransmission, sending NMIs to shootdown cpus. > In the Xen model, hardware interrupts are mapped to event channels, and > you can arrange for even channel IDs to be mapped directly to hardware > irqs. But this is why I'm very interested in making the irq space > dynamically allocatable, so that we can use event channel IDs directly > as irqs, and easily have disjoint ranges for hardware statically > allocated events and dynamic events. What I don't understand is how do we map MSI's to event channels. That is going to be an interesting one. Because the drivers in essence decide how many of those the hardware will have. >> Partly I suspect you haven't been getting some of the review you could >> have because arch/i386 is not that interesting right now. arch/x86_64 >> is where the code is generally clean, and new hardware support work >> tends to focus. >> > > That may be. I've been waiting to see what the outcome of the 32/64 bit > merge discussions before launching into 64-bit paravirt_ops (though > rostedt and glommer have made a good start on it). I'm a little interested in that as well. It would be good to have a common place for the shared code. Although I wonder if it is only arch/i386 and arch/x86_64 that need to be in the discussion. arch/ia64 has some significant pieces of shared heritage. Although nowhere near as much. > Yes, and its tricky in places to have a single interface which is > supposed to deal with both Xen and VMI, since they're often at opposite > ends of the abstraction spectrum. So we end up with a high-level > interface which calls into Xen code and the existing native code, and > then some hooks in the native code to call out to Xen. If the native > code were refactored a little more, I think this would come out fairly > cleanly (ie, use it as a library of code which talks to hardware and > things (mostly) emulating hardware). Things get a bit strange with VMI > where it mixes hardware emulation with paravirtualization - the timer > stuff, for example. Yes. Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-28 8:42 ` Eric W. Biederman @ 2007-04-28 8:59 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 14+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-28 8:59 UTC (permalink / raw) To: Eric W. Biederman; +Cc: virtualization Eric W. Biederman wrote: > I was thinking of our magic process specific vectors and those > aren't quite IPIs. But there are some other uses to add to your list > but not necessarily in general we have irq migration, irq > retransmission, sending NMIs to shootdown cpus. > Yes, but I see those as implementation details. In Xen I don't think we need IPIs for any of those. If a particular implementation needs IPIs then its free to use them. > What I don't understand is how do we map MSI's to event channels. > That is going to be an interesting one. Because the drivers in > essence decide how many of those the hardware will have. > That's an interesting point. I haven't really looked at giving domains direct hardware access. Its not something which makes much sense without a good IOMMU anyway. > I'm a little interested in that as well. It would be good to have a > common place for the shared code. Although I wonder if it is only > arch/i386 and arch/x86_64 that need to be in the discussion. > arch/ia64 has some significant pieces of shared heritage. Although > nowhere near as much. > Well, if i386 and x86_64 make it look like fun, I'm sure ia64 will work out how to come to the party. >> Yes, and its tricky in places to have a single interface which is >> supposed to deal with both Xen and VMI, since they're often at opposite >> ends of the abstraction spectrum. So we end up with a high-level >> interface which calls into Xen code and the existing native code, and >> then some hooks in the native code to call out to Xen. If the native >> s/Xen/VMI/ J ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-28 7:22 ` Jeremy Fitzhardinge 2007-04-28 8:06 ` Eric W. Biederman @ 2007-04-30 18:33 ` Zachary Amsden 2007-04-30 18:54 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 14+ messages in thread From: Zachary Amsden @ 2007-04-30 18:33 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: virtualization, Eric W. Biederman Jeremy Fitzhardinge wrote: > This is a fairly close match to Xen's requirements. Certainly, anything > APIC-related is useless for us, since there's no APIC emulation going on. > > I won't speak for Zach, but his counter-argument is generally along the > lines of "we can just make use of the existing code with a couple of > little hooks near the bottom". But I wonder if the existing genapic > interface can be used (or extended) to cover these cases without having > needing to have APIC-level interfaces in paravirt_ops. Because we faithfully emulate the APIC and IO-APIC, that is the underlying hardware for us, and we don't have a fancy paravirtualized interrupt controller because there is no need for it. The only obstruction to this approach is that trapping and emulating APIC access is slow. And some APIC registers have side effects on read. So we simply replace APIC read / write with faster hypercalls. Of course we can create a bunch of new code to use the genapic interface. It is just a matter of copying apic.c and io-apic.c verbatim and applying the sed command s/apic/vmi_apic/g. We can easily do this, but the only point would be to eliminate the low-level APIC access paravirt-op, which is not a maintenance burden, performance problem, or encumberance on anyone. So it would be purely a cleanliness thing. Doubling code to make two separate copies when the interface in question is already well abstracted and contained in a header file doesn't make it cleaner, at least to me. Zach ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-30 18:33 ` Zachary Amsden @ 2007-04-30 18:54 ` Jeremy Fitzhardinge 2007-04-30 20:35 ` Zachary Amsden 0 siblings, 1 reply; 14+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-30 18:54 UTC (permalink / raw) To: Zachary Amsden; +Cc: virtualization, Eric W. Biederman Zachary Amsden wrote: > Of course we can create a bunch of new code to use the genapic > interface. It is just a matter of copying apic.c and io-apic.c > verbatim and applying the sed command s/apic/vmi_apic/g. Wouldn't it be cleaner to just change apic.c and io-apic.c to use, say, apic_ops to get access to the actual hardware, and then you could have native and vmi versions while sharing the bulk of the code. Isn't that what genapic is intended to solve anyway? J ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-30 18:54 ` Jeremy Fitzhardinge @ 2007-04-30 20:35 ` Zachary Amsden 2007-04-30 21:05 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 14+ messages in thread From: Zachary Amsden @ 2007-04-30 20:35 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: virtualization, Eric W. Biederman Jeremy Fitzhardinge wrote: > Zachary Amsden wrote: > >> Of course we can create a bunch of new code to use the genapic >> interface. It is just a matter of copying apic.c and io-apic.c >> verbatim and applying the sed command s/apic/vmi_apic/g. >> > > Wouldn't it be cleaner to just change apic.c and io-apic.c to use, say, > apic_ops to get access to the actual hardware, and then you could have > native and vmi versions while sharing the bulk of the code. Isn't that > what genapic is intended to solve anyway? > But the native and vmi versions would be identical. You would be moving the apic_read / apic_write operations from paravirt_ops to apic_ops, which doesn't really solve anything, it just moves it around. Zach ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-30 20:35 ` Zachary Amsden @ 2007-04-30 21:05 ` Jeremy Fitzhardinge 2007-04-30 21:40 ` Zachary Amsden 0 siblings, 1 reply; 14+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-30 21:05 UTC (permalink / raw) To: Zachary Amsden; +Cc: virtualization, Eric W. Biederman Zachary Amsden wrote: > But the native and vmi versions would be identical. You would be > moving the apic_read / apic_write operations from paravirt_ops to > apic_ops, which doesn't really solve anything, it just moves it around. Yes, that's fine. The idea is that paravirt_ops is intended to be a relatively coherent interface for implementing a paravirtualized guest, and ideally, shrinking it over time. Given that the way VMI uses the apic as part of its hypervisor interface is a VMI implementation detail which doesn't live at the same level of abstraction as the rest of paravirt_ops. What's more, the apic interfaces have no relevance to either lguest or Xen, and there's simply no meaningful implementation for the operations other than "hope these don't get called". I think the more things we can devolve out of paravirt_ops the better, especially if they make well-defined self-contained interfaces of their own. I would be open, for example, to moving all the pagetable and privileged instruction operations out into their own _ops interfaces (but not right now). J ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-30 21:05 ` Jeremy Fitzhardinge @ 2007-04-30 21:40 ` Zachary Amsden 0 siblings, 0 replies; 14+ messages in thread From: Zachary Amsden @ 2007-04-30 21:40 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: virtualization, Eric W. Biederman Jeremy Fitzhardinge wrote: > I think the more things we can devolve out of paravirt_ops the better, > especially if they make well-defined self-contained interfaces of their > own. I would be open, for example, to moving all the pagetable and > privileged instruction operations out into their own _ops interfaces > (but not right now). > I think this trend of moving things into smaller compact interfaces is the right way to go, and certainly the apic stuff can devolve, as well as privileged / pagetable ops. We can target that for .23, and it would go quite a ways towards giving a better foundation for x86_64 (even non-x86) paravirt to build on. Zach ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-28 7:14 huh startup_ipi_hook? Eric W. Biederman 2007-04-28 7:22 ` Jeremy Fitzhardinge @ 2007-04-28 8:45 ` Andi Kleen 2007-04-28 9:05 ` Eric W. Biederman 2007-04-30 20:30 ` Zachary Amsden 2 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2007-04-28 8:45 UTC (permalink / raw) To: Eric W. Biederman; +Cc: virtualization > > I think we should be intercepting this startup call at a higher level, > where we can just say: Start secondary cpu with this stack > and with this esp. Or something like that. The reason VMI tends to thing lower level is that most of its interfaces are like real hardware and of the code would be just copied then if it did high level interfaces. I suspect this specific hook could only be made better if smpboot.c was refactored into more pieces and callable, but I doubt that would be a big overall improvement. -Andi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-28 8:45 ` Andi Kleen @ 2007-04-28 9:05 ` Eric W. Biederman 0 siblings, 0 replies; 14+ messages in thread From: Eric W. Biederman @ 2007-04-28 9:05 UTC (permalink / raw) To: Andi Kleen; +Cc: virtualization Andi Kleen <ak@suse.de> writes: >> >> I think we should be intercepting this startup call at a higher level, >> where we can just say: Start secondary cpu with this stack >> and with this esp. Or something like that. > > The reason VMI tends to thing lower level is that most of its > interfaces are like real hardware and of the code > would be just copied then if it did high level interfaces. > > I suspect this specific hook could only be made better if > smpboot.c was refactored into more pieces and callable, but I doubt > that would be a big overall improvement. We have to functions do_boot_cpu that does all of the basic setup. And wakeup_secondary_cpu that does the arch specific stuff to setup a cpu. (That is the current pre paravirt factoring). Guess what. Instead of making the hook wakeup_secondary_cpu. The hook is lost in the guts of one of the implementations of wakeup_secondary_cpu. Where the variable it wants t pass are not even available except as globals. It's ridiculous. We should have init_wakeup_secondary_cpu, nmi_wakeup_secondary_cpu, and vmi_wakeup_secondary_cpu. Having init_wakeup_secondary_cpu and vmi_wakeup_secondary_cpu share code looks like a strong false code sharing, and a major code pain. Further it skips the early setup in head.S, which is wrong. Either we need to do it or it needs to be removed from the smp startup path in head.S Further because it doesn't expose the difference at the same place we setup parameters and do something subtle and sneaky it is very easy to overlook and I suspect it is going to break just as often as voyager, because it is hard to see that the dependencies are there. So yes this is a real problem, and maybe we need to do a little refactoring to make it better because it doesn't need the initial boot code page. But there is no way it is a good long term solution the way it is. I doubt it does the correct thing to begin with. Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: huh startup_ipi_hook? 2007-04-28 7:14 huh startup_ipi_hook? Eric W. Biederman 2007-04-28 7:22 ` Jeremy Fitzhardinge 2007-04-28 8:45 ` Andi Kleen @ 2007-04-30 20:30 ` Zachary Amsden 2 siblings, 0 replies; 14+ messages in thread From: Zachary Amsden @ 2007-04-30 20:30 UTC (permalink / raw) To: Eric W. Biederman; +Cc: virtualization Eric W. Biederman wrote: > The current paravirt startup_ipi hook for vmware > commit: ae5da273fe3352febd38658d8d34484cbcfb3423 > is quite frankly ridiculous. > > In the middle of wake_up_secondary_cpu: > We have: > /* > * Paravirt / VMI wants a startup IPI hook here to set up the > * target processor state. > */ > startup_ipi_hook(phys_apicid, (unsigned long) start_secondary, > (unsigned long) stack_start.esp); > > As far as I can tell from reading this there is a completely > different mechanism in place to start for a secondary processor. > Which seems sane. > It is not completely different. The startup mechanism is the same; the startup state is not. > What doesn't seem sane is bothering to run the rest of the code > for sending an INIT message to a secondary processor. It certainly > does not feel general at all. > We need some wakeup mechanism to launch the APs; we already implement INIT and STARTUP IPIs for the non-paravirt case and the startup IPI is a good match to the wakeup we need, both in the the Linux code and the hypervisor. > I think we should be intercepting this startup call at a higher level, > where we can just say: Start secondary cpu with this stack > and with this esp. Or something like that. > > So conceptually I think the concept makes sense but implementation > wise I think what is currently present is totally ridiculous. > A heathen notion, conceivably, but not, I hope, an unenlightened one. We have to support two methods of booting on the same hardware. Traditional booting does standard SMP startup, which means the BIOS has put CPUs into a real mode wait loop (basically, cli;hlt, wait for INIT IPI). We have to emulate traditional booting; you might not be booting a paravirt kernel. Now here is where problems begin. BSP enters paravirt mode. It switches paravirt-ops over to warm and fuzzy hypercalls. APs have no idea about this. In fact, they cannot be switched into paravirt mode yet because not only might the BSP be running a UP kernel, which could crash or reboot, but more importantly, they have no code to run. Unfortunately, they can not run real mode code either. Once the BSP is up and running paravirt style, the binary translator which we use to run privileged code has been hobbled at the knees. This is an implementation artifact, certainly, and one that is mostly fixed now, but suffice it to say that interactions between CPUs in paravirt and non-paravirt mode are currently unsupported at best and unreliable at worst. To get out of this real mode loop and into paravirt mode, we have to switch on the APs at some point. There are major problems lurking here. To follow points so far: 1) We can't start all CPUs at time-zero in paravirt mode; we might load any kernel, paravirt or non para 2) At the time when we are bringing up APs, BSP is in paravirt mode and APs are halted in real mode 3) We can't run paravirt mode code on APs without properly initialized segment registers for code, stack and data. 4) The i386 architecture provides no way to initialize GDTR or segment state on AP prior to a startup IPI. 5) We can't run real mode code on APs to go through the boot trampoline and initialize GDTR because of mixed mode problems. To solve this, we modify the startup IPI to carry additional information; it takes almost a full state map and allows the startup IPI to initialize the protected mode register settings to any value the OS might want. This is what startup_ipi_hook does - it tells the hypervisor the initial state to place the AP in when it receives a startup IPI. It is the most general startup mechanism you can possibly have, and allows you to solve the above combination of constraints on any protected mode operating system. We use it to bypass head.S completely, setting control registers and segments and jumping directly into paravirtualized protected mode on the APs at the C code entry point. It is arguably cleaner than having some real mode trampoline system. So yes, we have a very different entry method, and it carries the burden of maintaining a list of register and segments that the initial CPU state should look like on the APs. Is it easy to break? Yes. Jeremy broke it at least twice already when reworking per-cpu state. Did it affect his code in any way? No. And that is _good_. Could we hack head.S into a thousand points of light and contort it so that both protected mode and real mode entry took the same path, running on some default assumed segment state provided by the hypervisor? Certainly. Would this make life easier for you to have new entry points popping up all along head.S that all have to do these initial state manipulations in slightly different yet co-dependent ways? No, the best long term solution is to fix the constraint that introduced the problem; drop condition 5 above, and make VMI / paravirt entry on APs start in real mode, just like the standard hardware, and make it follow the regular code in head.S. Once we get up to C code, it is a simple matter to call out to the paravirt-ops code and do the same thing that the BSP did to get into paravirt mode, and there are no more odd-looking hacks hanging on the wall. But it is a long term solution, not something that is feasible currently. So that is why it is good that breakage here did not stop Jeremy from improving the native kernel with per-cpu data segments. There is a deficiency on our end that did not impede his progress, and the burden of maintaining code which you (rightfully) feel is ridiculous is limited to those who have it. That's why I'm listed as a maintainer for the code, because it is not maintenance free, but certainly we would like it to be hassle free for everyone else. Zach tatpratishedhaartham ekatattva abhyasah "Adherence to single-minded effort prevents these impediments" ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-04-30 21:40 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-04-28 7:14 huh startup_ipi_hook? Eric W. Biederman 2007-04-28 7:22 ` Jeremy Fitzhardinge 2007-04-28 8:06 ` Eric W. Biederman 2007-04-28 8:26 ` Jeremy Fitzhardinge 2007-04-28 8:42 ` Eric W. Biederman 2007-04-28 8:59 ` Jeremy Fitzhardinge 2007-04-30 18:33 ` Zachary Amsden 2007-04-30 18:54 ` Jeremy Fitzhardinge 2007-04-30 20:35 ` Zachary Amsden 2007-04-30 21:05 ` Jeremy Fitzhardinge 2007-04-30 21:40 ` Zachary Amsden 2007-04-28 8:45 ` Andi Kleen 2007-04-28 9:05 ` Eric W. Biederman 2007-04-30 20:30 ` Zachary Amsden
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).