* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-08-02 10:35 He, Qing
2006-08-03 6:59 ` Himanshu Raj
0 siblings, 1 reply; 22+ messages in thread
From: He, Qing @ 2006-08-02 10:35 UTC (permalink / raw)
To: Steven Smith, xen-devel; +Cc: sos22
[-- Attachment #1: Type: text/plain, Size: 1149 bytes --]
Thanks Steven, with this patch, the problem's gone.
Qing
>-----Original Message-----
>From: Steven Smith [mailto:sos22@hermes.cam.ac.uk] On Behalf Of Steven Smith
>Sent: 2006年8月2日 18:16
>To: He, Qing
>Cc: Steven Smith; sos22@srcf.ucam.org; xen-devel@xensource.com
>Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
>
>> >> (XEN) 0, This opcode isn't handled yet!
>> >> (XEN) handle_mmio: failed to decode instruction
>> >> (XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
>> >> (XEN) domain_crash_sync called from platform.c:880
>> >This looks like a problem with hvm_copy. Is this a PAE hypervisor?
>> >
>> Your patch is based on Cset 10735, before applied the patch, I can
>>start and run the image with no problems; but after the patch, this
>>problem can be reproduced every time.
>Sorry, I wasn't trying to shift blame here: the patch I posted
>includes some changes to hvm_copy in the non-PAE case, and I suspect
>that it's those which are causing these problems. Does the attached
>patch help?
>
>(Apply it over the top of the ones I posted previously)
>
>Steven
[-- Attachment #2: unoptimise.hvm_copy.diff --]
[-- Type: application/octet-stream, Size: 1950 bytes --]
diff -r 8ca23cd6190f xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h Wed Aug 02 11:09:15 2006 +0100
+++ b/xen/include/asm-x86/shadow.h Wed Aug 02 11:09:37 2006 +0100
@@ -178,11 +178,6 @@ extern void shadow_l2_normal_pt_update(s
((s) >= (L2_PAGETABLE_FIRST_XEN_SLOT & (L2_PAGETABLE_ENTRIES - 1))) )
extern unsigned long gva_to_gpa(unsigned long gva);
-static inline unsigned long gva_to_mfn(unsigned long gva)
-{
- unsigned long gpa = gva_to_gpa(gva);
- return get_mfn_from_gpfn(gpa >> PAGE_SHIFT);
-}
extern void shadow_l3_normal_pt_update(struct domain *d,
paddr_t pa, l3_pgentry_t l3e,
@@ -1740,32 +1735,14 @@ static inline unsigned long gva_to_gpa(u
return l1e_get_paddr(gpte) + (gva & ~PAGE_MASK);
}
+#endif
+
static inline unsigned long gva_to_mfn(unsigned long gva)
{
- l1_pgentry_t l1e;
-
- if (__copy_from_user(&l1e, &shadow_linear_pg_table[l1_linear_offset(gva)],
- sizeof(l1e)) ||
- (l1e_get_flags(l1e) & (_PAGE_PRESENT | _PAGE_RW)) !=
- (_PAGE_PRESENT | _PAGE_RW) ) {
- struct cpu_user_regs cur;
- /* Error code -> write */
- cur.error_code = 3;
- cur.cs = 0; /* Ring 0 -> hypervisor */
- cur.eflags = 0;
- shadow_fault(gva, &cur);
- if (__copy_from_user(&l1e,
- &shadow_linear_pg_table[l1_linear_offset(gva)],
- sizeof(l1e)) ||
- (l1e_get_flags(l1e) & (_PAGE_PRESENT | _PAGE_RW)) !=
- (_PAGE_PRESENT | _PAGE_RW) ) {
- return 0;
- }
- }
- return l1e_get_pfn(l1e);
-}
-
-#endif
+ unsigned long gpa = gva_to_gpa(gva);
+ return get_mfn_from_gpfn(gpa >> PAGE_SHIFT);
+}
+
/************************************************************************/
extern void __update_pagetables(struct vcpu *v);
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Paravirtualised drivers for fully virtualised domains
2006-08-02 10:35 Paravirtualised drivers for fully virtualised domains He, Qing
@ 2006-08-03 6:59 ` Himanshu Raj
2006-08-03 9:35 ` Steven Smith
0 siblings, 1 reply; 22+ messages in thread
From: Himanshu Raj @ 2006-08-03 6:59 UTC (permalink / raw)
To: He, Qing; +Cc: xen-devel, sos22
Hi Folks,
Could any of you suggest a good recent linux kernel to work with? The latest
2.6.16.13 native version doesn't boot as an hvm guest (No errors, just doesn't
show anything on its boot console, xm list reports it running thought). If you
could hint on this problem, let me know.
Thanks,
Himanshu
On Wed, Aug 02, 2006 at 06:35:31PM +0800, He, Qing wrote:
> Thanks Steven, with this patch, the problem's gone.
>
> Qing
>
> >-----Original Message-----
> >From: Steven Smith [mailto:sos22@hermes.cam.ac.uk] On Behalf Of Steven Smith
> >Sent: 2006??8??2?? 18:16
> >To: He, Qing
> >Cc: Steven Smith; sos22@srcf.ucam.org; xen-devel@xensource.com
> >Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
> >
> >> >> (XEN) 0, This opcode isn't handled yet!
> >> >> (XEN) handle_mmio: failed to decode instruction
> >> >> (XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
> >> >> (XEN) domain_crash_sync called from platform.c:880
> >> >This looks like a problem with hvm_copy. Is this a PAE hypervisor?
> >> >
> >> Your patch is based on Cset 10735, before applied the patch, I can
> >>start and run the image with no problems; but after the patch, this
> >>problem can be reproduced every time.
> >Sorry, I wasn't trying to shift blame here: the patch I posted
> >includes some changes to hvm_copy in the non-PAE case, and I suspect
> >that it's those which are causing these problems. Does the attached
> >patch help?
> >
> >(Apply it over the top of the ones I posted previously)
> >
> >Steven
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
--
-------------------------------------------------------------------------
Himanshu Raj
PhD Student, GaTech (www.cc.gatech.edu/~rhim)
I prefer to receive attachments in an open, non-proprietary format.
-------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-08-03 6:59 ` Himanshu Raj
@ 2006-08-03 9:35 ` Steven Smith
2006-08-04 6:13 ` Himanshu Raj
0 siblings, 1 reply; 22+ messages in thread
From: Steven Smith @ 2006-08-03 9:35 UTC (permalink / raw)
To: Himanshu Raj; +Cc: xen-devel, sos22, He, Qing
[-- Attachment #1.1: Type: text/plain, Size: 382 bytes --]
> Could any of you suggest a good recent linux kernel to work with?
> The latest 2.6.16.13 native version doesn't boot as an hvm guest (No
> errors, just doesn't show anything on its boot console, xm list
> reports it running thought). If you could hint on this problem, let
> me know.
2.6.16.13 should work. Is the failure mode the same both with and
without my patches?
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-08-03 9:35 ` Steven Smith
@ 2006-08-04 6:13 ` Himanshu Raj
0 siblings, 0 replies; 22+ messages in thread
From: Himanshu Raj @ 2006-08-04 6:13 UTC (permalink / raw)
To: Steven Smith; +Cc: xen-devel
Yes it is the case (i.e. hvm guest w/o your patch is getting stuck the same way).
This happens with the latest version. I have only tried the RHEL4 or Debian Sarge
guest kernels before (both <= 2.6.9) and they work fine. Failure is also quite
intriguing - the guest just sort of goes into an infinite loop, no further
output on vnc console (or serial), nothing pops up in usual logs (qemu-dm.log,
xend.log, xend-debug.log ...). Kind of trumped by it.
Kindly let me know if anything comes to mind.
Thanks,
Himanshu
On Thu, Aug 03, 2006 at 10:35:51AM +0100, Steven Smith wrote:
> > Could any of you suggest a good recent linux kernel to work with?
> > The latest 2.6.16.13 native version doesn't boot as an hvm guest (No
> > errors, just doesn't show anything on its boot console, xm list
> > reports it running thought). If you could hint on this problem, let
> > me know.
> 2.6.16.13 should work. Is the failure mode the same both with and
> without my patches?
>
> Steven.
--
-------------------------------------------------------------------------
Himanshu Raj
PhD Student, GaTech (www.cc.gatech.edu/~rhim)
I prefer to receive attachments in an open, non-proprietary format.
-------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-08-02 9:49 He, Qing
0 siblings, 0 replies; 22+ messages in thread
From: He, Qing @ 2006-08-02 9:49 UTC (permalink / raw)
To: Steven Smith, xen-devel; +Cc: sos22
>-----Original Message-----
>From: Steven Smith [mailto:sos22@hermes.cam.ac.uk] On Behalf Of Steven Smith
>Sent: 2006年8月2日 17:31
>To: He, Qing
>Cc: Steven Smith; xen-devel@lists.xensource.com; sos22@srcf.ucam.org
>Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
>
>> When I'm trying to start windows as VMX guest (with no drivers, of
>> course) under this patch, the guests fail. I ran with three images,
>> windows 2000, XP and 2003.
>
>> For 2000 and XP, QEMU windows do not show, there are two lines in the serial
>output:
>> (XEN) Create event channels for vcpu 0.
>> (XEN) Send on unbound Xen event channel?
>Is there anything interesting in /var/log/qemu-dm.* ? Only the most
>recent log file is relevant (which isn't necessarily the one with the
>highest number, unfortunately).
>
>Also, it looks like this is crashing too soon for it to be related to
>what guest you're running. Are all of the disk images the same type
>(file vs. block device) and size?
>
Sorry, these 2 cases are of some kind of configuration errors, some qemu parameters changed after qemu update, I can get them run using an early changeset. So when it cannot boot, I don't think of the possibility of configuration errors.
After changed the configuration, they can boot now (doesn't test if they meet the same problem as below)
>> For 2003 guest, QEMU can start, but before the windows start screen
>>shows, it crashes and restarts, complaining about unreasonable mmio
>>opcodes. The serial output is:
>>
>> (XEN) (GUEST: 1) unsupported PCI BIOS function 0x0E
>> (XEN) (GUEST: 1) int13_harddisk: function 15, unmapped device for ELDL=82
>> (XEN) 0, This opcode isn't handled yet!
>> (XEN) handle_mmio: failed to decode instruction
>> (XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
>> (XEN) domain_crash_sync called from platform.c:880
>This looks like a problem with hvm_copy. Is this a PAE hypervisor?
>
Your patch is based on Cset 10735, before applied the patch, I can start and run the image with no problems; but after the patch, this problem can be reproduced every time.
It's not a PAE hypervisor, and qemu log doesn't show much information:
domid: 1
qemu: the number of cpus is 1
shared page at pfn:1ffff, mfn: 3e35f
char device redirected to /dev/pts/2
>> Meanwhile, I don't experience any problems for Linux guest. Do you
>> have any ideas why this happens?
>Some kind of race would be my first guess.
>
>Steven.
Best regards,
Qing
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-08-02 8:23 Zhao, Yunfeng
2006-08-02 8:56 ` Steven Hand
0 siblings, 1 reply; 22+ messages in thread
From: Zhao, Yunfeng @ 2006-08-02 8:23 UTC (permalink / raw)
To: He, Qing, Steven Smith, xen-devel; +Cc: sos22
Qing
Your problem should be problem of credit scheduler.
If you use sedf or bvt, you would not meet the problem.
Thanks
Yunfeng
>-----Original Message-----
>From: xen-devel-bounces@lists.xensource.com
>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of He, Qing
>Sent: 2006年8月2日 16:02
>To: Steven Smith; xen-devel@lists.xensource.com
>Cc: sos22@srcf.ucam.org
>Subject: RE: [Xen-devel] Paravirtualised drivers for fully virtualised domains
>
>Hi Steven,
>I found some issues regarding this patch.
>When I'm trying to start windows as VMX guest (with no drivers, of course) under
>this patch, the guests fail. I ran with three images, windows 2000, XP and 2003.
>
>For 2000 and XP, QEMU windows do not show, there are two lines in the serial
>output:
> (XEN) Create event channels for vcpu 0.
> (XEN) Send on unbound Xen event channel?
>
>For 2003 guest, QEMU can start, but before the windows start screen shows, it
>crashes and restarts, complaining about unreasonable mmio opcodes. The serial
>output is:
> (XEN) (GUEST: 1) unsupported PCI BIOS function 0x0E
> (XEN) (GUEST: 1) int13_harddisk: function 15, unmapped device for ELDL=82
> (XEN) 0, This opcode isn't handled yet!
> (XEN) handle_mmio: failed to decode instruction
> (XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
> (XEN) domain_crash_sync called from platform.c:880
> (XEN) Domain 1 (vcpu#0) crashed on cpu#2:
> (XEN) ----[ Xen-3.0-unstable Not tainted ]----
> (XEN) CPU: 2
> (XEN) EIP: 0008:[<8081d986>]
> (XEN) EFLAGS: 00010202 CONTEXT: hvm
> (XEN) eax: 00008008 ebx: 000003ce ecx: 000003ce edx: f821f600
> (XEN) esi: 8081d9fa edi: f886ecd0 ebp: f886ecfc esp: f886ecbc
> (XEN) cr0: 8001003b cr3: 8f500000
> (XEN) ds: 0023 es: 0023 fs: 0030 gs: 0000 ss: 0010 cs: 0008
> (XEN) Create event channels for vcpu 0.
> (XEN) Send on unbound Xen event channel?
> (XEN) (GUEST: 2) HVM Loader
> (XEN) (GUEST: 2) Loading ROMBIOS ...
> (XEN) (GUEST: 2) Loading Cirrus VGABIOS ...
> (XEN) (GUEST: 2) Loading VMXAssist ...
> (XEN) (GUEST: 2) VMX go ...
> (XEN) (GUEST: 2) VMXAssist (Aug 2 2006)
> (XEN) (GUEST: 2) Memory size 512 MB
> (XEN) (GUEST: 2) E820 map:
> (XEN) (GUEST: 2) 0000000000000000 - 000000000009F800 (RAM)
> (XEN) (GUEST: 2) 000000000009F800 - 00000000000A0000 (Reserved)
> (XEN) (GUEST: 2) 00000000000A0000 - 00000000000C0000 (Type 16)
> (XEN) (GUEST: 2) 00000000000F0000 - 0000000000100000 (Reserved)
> (XEN) (GUEST: 2) 0000000000100000 - 000000001FFFE000 (RAM)
> (XEN) (GUEST: 2) 000000001FFFE000 - 000000001FFFF000 (Type 18)
> (XEN) (GUEST: 2) 000000001FFFF000 - 0000000020000000 (Type 17)
> (XEN) (GUEST: 2) 0000000020000000 - 0000000020003000 (ACPI NVS)
> (XEN) (GUEST: 2) 0000000020003000 - 000000002000D000 (ACPI Data)
> (XEN) (GUEST: 2) 00000000FEC00000 - 0000000100000000 (Type 16)
> (XEN) (GUEST: 2)
> (XEN) (GUEST: 2) Start BIOS ...
> (XEN) (GUEST: 2) Starting emulated 16-bit real-mode: ip=F000:FFF0
> (XEN) (GUEST: 2) rombios.c,v 1.138 2005/05/07 15:55:26 vruppert Exp $
> (XEN) (GUEST: 2) Remapping master: ICW2 0x8 -> 0x20
> (XEN) (GUEST: 2) Remapping slave: ICW2 0x70 -> 0x28
> (XEN) (GUEST: 2) VGABios $Id: vgabios.c,v 1.61 2005/05/24 16:50:50
>vruppert Exp $
> (XEN) (GUEST: 2) HVMAssist BIOS, 1 cpu, $Revision: 1.138 $ $Date:
>2005/05/07 15:55:26 $
> (XEN) (GUEST: 2)
> (XEN) (GUEST: 2) ata0-0: PCHS=16383/16/63 translation=lba
>LCHS=1024/255/63
> (XEN) (GUEST: 2) ata0 master: QEMU HARDDISK ATA-7 Hard-Disk (12289 MBytes)
> (XEN) (GUEST: 2) ata0-1: PCHS=3047/16/63 translation=lba LCHS=761/64/63
> (XEN) (GUEST: 2) ata0 slave: QEMU HARDDISK ATA-7 Hard-Disk (1500 MBytes)
> (XEN) (GUEST: 2) ata1 master: QEMU CD-ROM ATAPI-4 CD-Rom/DVD-Rom
> (XEN) (GUEST: 2) ata1 slave: Unknown device
> (XEN) (GUEST: 2)
> (XEN) (GUEST: 2) Booting from CD-Rom...
> (XEN) (GUEST: 2) unsupported PCI BIOS function 0x0E
> (XEN) (GUEST: 2) int13_harddisk: function 15, unmapped device for ELDL=82
> (XEN) 0, This opcode isn't handled yet!
> (XEN) handle_mmio: failed to decode instruction
> (XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
> (XEN) domain_crash_sync called from platform.c:880
> (XEN) Domain 2 (vcpu#0) crashed on cpu#2:
> (XEN) ----[ Xen-3.0-unstable Not tainted ]----
> (XEN) CPU: 2
> (XEN) EIP: 0008:[<8081d986>]
> (XEN) EFLAGS: 00010202 CONTEXT: hvm
> (XEN) eax: 00008008 ebx: 000003ce ecx: 000003ce edx: f821f600
> (XEN) esi: 8081d9fa edi: f886ecd0 ebp: f886ecfc esp: f886ecbc
> (XEN) cr0: 8001003b cr3: 2ded8000
> (XEN) ds: 0023 es: 0023 fs: 0030 gs: 0000 ss: 0010 cs: 0008
>
>Meanwhile, I don't experience any problems for Linux guest. Do you have any ideas
>why this happens?
>
>Best regards,
>Qing He
>>-----Original Message-----
>>From: xen-devel-bounces@lists.xensource.com
>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Steven Smith
>>Sent: 2006年7月26日 23:35
>>To: xen-devel@lists.xensource.com
>>Cc: sos22@srcf.ucam.org
>>Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
>>
>>I've just put an updated version of these patches up at
>>http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2 . There's also an
>>equivalent single big patch at
>>http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2.combined . Thank you to
>>everyone who gave feedback on the previous version.
>>
>>The main changes since last time are:
>>
>>-- Support for SMP guests
>>-- Support for 64 bit guests on a 64 bit hypervisor
>>-- Partial support for 32 bit guests on a 64 bit hypervisor: the network
>> interface works, but the block device doesn't.
>>
>>The block device can be made to work by #define'ing ALIEN_INTERFACES
>>in blkif.h, but drivers compiled in that way won't work with 32 on 32.
>>The problem here is that blkif_request_t contains extra padding in 64
>>bit builds, and so is a different size, and so the block ring layout
>>is different.
>>
>>Other structures with similar problems are handled either by run time
>>tests in the drivers (shared_info_t) or translation wrappers in the
>>hypervisor (xen_feature_info_t, xen_add_to_physmap_t), but trying to
>>do this for the block rings would require far more painful and
>>extensive surgery. I'm inclined to stick with multiply compiling the
>>frontend drivers in the short term, although it'll obviously need
>>doing in a slightly less grotty way.
>>
>>Steven.
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-08-02 8:23 Zhao, Yunfeng
@ 2006-08-02 8:56 ` Steven Hand
2006-08-02 9:37 ` Steven Smith
0 siblings, 1 reply; 22+ messages in thread
From: Steven Hand @ 2006-08-02 8:56 UTC (permalink / raw)
To: Zhao, Yunfeng; +Cc: Steven.Hand, xen-devel, sos22, He, Qing
>>I found some issues regarding this patch.
>>When I'm trying to start windows as VMX guest (with no drivers, of course)
>>under this patch, the guests fail. I ran with three images, windows 2000,
>>XP and 2003.
>>
>>For 2000 and XP, QEMU windows do not show, there are two lines in the
>serial
>>output:
>> (XEN) Create event channels for vcpu 0.
>> (XEN) Send on unbound Xen event channel?
As you note yourself, this problem is very unlikely to be anything to do
with the new PV drivers or associated infrastructure.
I think your problem is more likely to be a race condition in the startup
of the qemu-dm helper process. If you check the latest /var/log/qemu-*.log
and see a message something like "xc_get_pfnlist returned -1" this means
that qemu-dm tried to interrogate the domain before it had been created.
(I've seen this myself before but only on non-debug builds of Xen)
We should fix this properly, but for now you can just retry.
cheers,
S.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-08-02 8:56 ` Steven Hand
@ 2006-08-02 9:37 ` Steven Smith
0 siblings, 0 replies; 22+ messages in thread
From: Steven Smith @ 2006-08-02 9:37 UTC (permalink / raw)
To: Steven Hand; +Cc: Zhao, Yunfeng, xen-devel, sos22, He, Qing
[-- Attachment #1.1: Type: text/plain, Size: 540 bytes --]
> >>For 2000 and XP, QEMU windows do not show, there are two lines in the
> >serial
> >>output:
> >> (XEN) Create event channels for vcpu 0.
> >> (XEN) Send on unbound Xen event channel?
> As you note yourself, this problem is very unlikely to be anything to do
> with the new PV drivers or associated infrastructure.
You'd think that, but the PV patch changes the way we send requests to
the device model, and could have produced this behaviour. I thought
I'd got all of the relevant bugs fixed, but apparently not.
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-08-02 8:01 He, Qing
2006-08-02 9:30 ` Steven Smith
0 siblings, 1 reply; 22+ messages in thread
From: He, Qing @ 2006-08-02 8:01 UTC (permalink / raw)
To: Steven Smith, xen-devel; +Cc: sos22
Hi Steven,
I found some issues regarding this patch.
When I'm trying to start windows as VMX guest (with no drivers, of course) under this patch, the guests fail. I ran with three images, windows 2000, XP and 2003.
For 2000 and XP, QEMU windows do not show, there are two lines in the serial output:
(XEN) Create event channels for vcpu 0.
(XEN) Send on unbound Xen event channel?
For 2003 guest, QEMU can start, but before the windows start screen shows, it crashes and restarts, complaining about unreasonable mmio opcodes. The serial output is:
(XEN) (GUEST: 1) unsupported PCI BIOS function 0x0E
(XEN) (GUEST: 1) int13_harddisk: function 15, unmapped device for ELDL=82
(XEN) 0, This opcode isn't handled yet!
(XEN) handle_mmio: failed to decode instruction
(XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
(XEN) domain_crash_sync called from platform.c:880
(XEN) Domain 1 (vcpu#0) crashed on cpu#2:
(XEN) ----[ Xen-3.0-unstable Not tainted ]----
(XEN) CPU: 2
(XEN) EIP: 0008:[<8081d986>]
(XEN) EFLAGS: 00010202 CONTEXT: hvm
(XEN) eax: 00008008 ebx: 000003ce ecx: 000003ce edx: f821f600
(XEN) esi: 8081d9fa edi: f886ecd0 ebp: f886ecfc esp: f886ecbc
(XEN) cr0: 8001003b cr3: 8f500000
(XEN) ds: 0023 es: 0023 fs: 0030 gs: 0000 ss: 0010 cs: 0008
(XEN) Create event channels for vcpu 0.
(XEN) Send on unbound Xen event channel?
(XEN) (GUEST: 2) HVM Loader
(XEN) (GUEST: 2) Loading ROMBIOS ...
(XEN) (GUEST: 2) Loading Cirrus VGABIOS ...
(XEN) (GUEST: 2) Loading VMXAssist ...
(XEN) (GUEST: 2) VMX go ...
(XEN) (GUEST: 2) VMXAssist (Aug 2 2006)
(XEN) (GUEST: 2) Memory size 512 MB
(XEN) (GUEST: 2) E820 map:
(XEN) (GUEST: 2) 0000000000000000 - 000000000009F800 (RAM)
(XEN) (GUEST: 2) 000000000009F800 - 00000000000A0000 (Reserved)
(XEN) (GUEST: 2) 00000000000A0000 - 00000000000C0000 (Type 16)
(XEN) (GUEST: 2) 00000000000F0000 - 0000000000100000 (Reserved)
(XEN) (GUEST: 2) 0000000000100000 - 000000001FFFE000 (RAM)
(XEN) (GUEST: 2) 000000001FFFE000 - 000000001FFFF000 (Type 18)
(XEN) (GUEST: 2) 000000001FFFF000 - 0000000020000000 (Type 17)
(XEN) (GUEST: 2) 0000000020000000 - 0000000020003000 (ACPI NVS)
(XEN) (GUEST: 2) 0000000020003000 - 000000002000D000 (ACPI Data)
(XEN) (GUEST: 2) 00000000FEC00000 - 0000000100000000 (Type 16)
(XEN) (GUEST: 2)
(XEN) (GUEST: 2) Start BIOS ...
(XEN) (GUEST: 2) Starting emulated 16-bit real-mode: ip=F000:FFF0
(XEN) (GUEST: 2) rombios.c,v 1.138 2005/05/07 15:55:26 vruppert Exp $
(XEN) (GUEST: 2) Remapping master: ICW2 0x8 -> 0x20
(XEN) (GUEST: 2) Remapping slave: ICW2 0x70 -> 0x28
(XEN) (GUEST: 2) VGABios $Id: vgabios.c,v 1.61 2005/05/24 16:50:50 vruppert Exp $
(XEN) (GUEST: 2) HVMAssist BIOS, 1 cpu, $Revision: 1.138 $ $Date: 2005/05/07 15:55:26 $
(XEN) (GUEST: 2)
(XEN) (GUEST: 2) ata0-0: PCHS=16383/16/63 translation=lba LCHS=1024/255/63
(XEN) (GUEST: 2) ata0 master: QEMU HARDDISK ATA-7 Hard-Disk (12289 MBytes)
(XEN) (GUEST: 2) ata0-1: PCHS=3047/16/63 translation=lba LCHS=761/64/63
(XEN) (GUEST: 2) ata0 slave: QEMU HARDDISK ATA-7 Hard-Disk (1500 MBytes)
(XEN) (GUEST: 2) ata1 master: QEMU CD-ROM ATAPI-4 CD-Rom/DVD-Rom
(XEN) (GUEST: 2) ata1 slave: Unknown device
(XEN) (GUEST: 2)
(XEN) (GUEST: 2) Booting from CD-Rom...
(XEN) (GUEST: 2) unsupported PCI BIOS function 0x0E
(XEN) (GUEST: 2) int13_harddisk: function 15, unmapped device for ELDL=82
(XEN) 0, This opcode isn't handled yet!
(XEN) handle_mmio: failed to decode instruction
(XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
(XEN) domain_crash_sync called from platform.c:880
(XEN) Domain 2 (vcpu#0) crashed on cpu#2:
(XEN) ----[ Xen-3.0-unstable Not tainted ]----
(XEN) CPU: 2
(XEN) EIP: 0008:[<8081d986>]
(XEN) EFLAGS: 00010202 CONTEXT: hvm
(XEN) eax: 00008008 ebx: 000003ce ecx: 000003ce edx: f821f600
(XEN) esi: 8081d9fa edi: f886ecd0 ebp: f886ecfc esp: f886ecbc
(XEN) cr0: 8001003b cr3: 2ded8000
(XEN) ds: 0023 es: 0023 fs: 0030 gs: 0000 ss: 0010 cs: 0008
Meanwhile, I don't experience any problems for Linux guest. Do you have any ideas why this happens?
Best regards,
Qing He
>-----Original Message-----
>From: xen-devel-bounces@lists.xensource.com
>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Steven Smith
>Sent: 2006年7月26日 23:35
>To: xen-devel@lists.xensource.com
>Cc: sos22@srcf.ucam.org
>Subject: Re: [Xen-devel] Paravirtualised drivers for fully virtualised domains
>
>I've just put an updated version of these patches up at
>http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2 . There's also an
>equivalent single big patch at
>http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2.combined . Thank you to
>everyone who gave feedback on the previous version.
>
>The main changes since last time are:
>
>-- Support for SMP guests
>-- Support for 64 bit guests on a 64 bit hypervisor
>-- Partial support for 32 bit guests on a 64 bit hypervisor: the network
> interface works, but the block device doesn't.
>
>The block device can be made to work by #define'ing ALIEN_INTERFACES
>in blkif.h, but drivers compiled in that way won't work with 32 on 32.
>The problem here is that blkif_request_t contains extra padding in 64
>bit builds, and so is a different size, and so the block ring layout
>is different.
>
>Other structures with similar problems are handled either by run time
>tests in the drivers (shared_info_t) or translation wrappers in the
>hypervisor (xen_feature_info_t, xen_add_to_physmap_t), but trying to
>do this for the block rings would require far more painful and
>extensive surgery. I'm inclined to stick with multiply compiling the
>frontend drivers in the short term, although it'll obviously need
>doing in a slightly less grotty way.
>
>Steven.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-08-02 8:01 He, Qing
@ 2006-08-02 9:30 ` Steven Smith
0 siblings, 0 replies; 22+ messages in thread
From: Steven Smith @ 2006-08-02 9:30 UTC (permalink / raw)
To: He, Qing; +Cc: xen-devel, sos22
[-- Attachment #1.1: Type: text/plain, Size: 1430 bytes --]
> When I'm trying to start windows as VMX guest (with no drivers, of
> course) under this patch, the guests fail. I ran with three images,
> windows 2000, XP and 2003.
> For 2000 and XP, QEMU windows do not show, there are two lines in the serial output:
> (XEN) Create event channels for vcpu 0.
> (XEN) Send on unbound Xen event channel?
Is there anything interesting in /var/log/qemu-dm.* ? Only the most
recent log file is relevant (which isn't necessarily the one with the
highest number, unfortunately).
Also, it looks like this is crashing too soon for it to be related to
what guest you're running. Are all of the disk images the same type
(file vs. block device) and size?
> For 2003 guest, QEMU can start, but before the windows start screen
>shows, it crashes and restarts, complaining about unreasonable mmio
>opcodes. The serial output is:
>
> (XEN) (GUEST: 1) unsupported PCI BIOS function 0x0E
> (XEN) (GUEST: 1) int13_harddisk: function 15, unmapped device for ELDL=82
> (XEN) 0, This opcode isn't handled yet!
> (XEN) handle_mmio: failed to decode instruction
> (XEN) mmio opcode: va 0xf821f600, gpa 0xa9600, len 2: 00 00
> (XEN) domain_crash_sync called from platform.c:880
This looks like a problem with hvm_copy. Is this a PAE hypervisor?
> Meanwhile, I don't experience any problems for Linux guest. Do you
> have any ideas why this happens?
Some kind of race would be my first guess.
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-07-26 22:35 Nakajima, Jun
0 siblings, 0 replies; 22+ messages in thread
From: Nakajima, Jun @ 2006-07-26 22:35 UTC (permalink / raw)
To: Steven Smith, xen-devel; +Cc: sos22
Steven Smith wrote:
> I've just put an updated version of these patches up at
> http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2 . There's also an
> equivalent single big patch at
> http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2.combined . Thank you to
> everyone who gave feedback on the previous version.
>
> The main changes since last time are:
>
> -- Support for SMP guests
> -- Support for 64 bit guests on a 64 bit hypervisor
> -- Partial support for 32 bit guests on a 64 bit hypervisor: the
> network interface works, but the block device doesn't.
>
> The block device can be made to work by #define'ing ALIEN_INTERFACES
> in blkif.h, but drivers compiled in that way won't work with 32 on 32.
> The problem here is that blkif_request_t contains extra padding in 64
> bit builds, and so is a different size, and so the block ring layout
> is different.
When do you expect this be in the unstable tree? Or which issues must be
resolved befor that?
>
> Other structures with similar problems are handled either by run time
> tests in the drivers (shared_info_t) or translation wrappers in the
> hypervisor (xen_feature_info_t, xen_add_to_physmap_t), but trying to
> do this for the block rings would require far more painful and
> extensive surgery. I'm inclined to stick with multiply compiling the
> frontend drivers in the short term, although it'll obviously need
> doing in a slightly less grotty way.
>
> Steven.
Jun
---
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: Paravirtualised drivers for fully virtualised domains
@ 2006-07-19 4:14 Ian Pratt
0 siblings, 0 replies; 22+ messages in thread
From: Ian Pratt @ 2006-07-19 4:14 UTC (permalink / raw)
To: Steve Ofsthun, Steven Smith; +Cc: xen-devel
> >>Have you built the guest environment on anything other than a 2.6.16
> >>version of Linux? We ran into extra work supporting older linux
> versions.
> >
> > #ifdef soup will get you back to about 2.6.12-ish without too many
> > problems. These patches don't include that, since it would
complicate
> > merging.
>
> I was thinking about SLES9 (2.6.5), RHEL4 (2.6.9), RHEL3 (2.4.21).
Steven's patches should be easy to back port given that we already have
real PV drivers for all these kernels.
Source for strictly unofficial (non vendor Supported) xen-ports of these
kernels are available at http://xenbits.xensource.com/kernels
2.6.5 sles9sp2; 2.6.9 rhel4u1; 2.4.21 rhel3u5
Ian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Paravirtualised drivers for fully virtualised domains
@ 2006-07-18 12:51 Steven Smith
2006-07-18 13:45 ` Ben Thomas
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Steven Smith @ 2006-07-18 12:51 UTC (permalink / raw)
To: xen-devel; +Cc: sos22
[-- Attachment #1.1.1: Type: text/plain, Size: 3748 bytes --]
(The list appears to have eaten my previous attempt to send this.
Apologies if you receive multiple copies.)
The attached patches allow you to use paravirtualised network and
block interfaces from fully virtualised domains, based on Intel's
patches from a few months ago. These are significantly faster than
the equivalent ioemu devices, sometimes by more than an order of
magnitude.
These drivers are explicitly not considered by XenSource to be an
alternative to improving the performance of the ioemu devices.
Rather, work on both will continue in parallel.
To build, apply the three patches to a clean checkout of xen-unstable
and then build Xen, dom0, and the tools in the usual way. To build
the drivers themselves, you first need to build a native kernel for
the guest, and then go
cd xen-unstable.hg/unmodified-drivers/linux-2.6
./mkbuildtree
make -C /usr/src/linux-2.6.16 M=$PWD modules
where /usr/src/linux-2.6.16 is the path to the area where you built
the guest kernel. This should be a native kernel, and not a xenolinux
one. You should end up with four modules. xen-evtchn.ko should be
loaded first, followed by xenbus.ko, and then whichever of xen-vnif.ko
and xen-vbd.ko you need. None of the modules need any arguments.
The xm configuration syntax is exactly the same as it would be for
paravirtualised devices in a paravirtualised domain. For a network
interface, you take your line
vif= [ 'type=ioemu,mac=00:16:3E:C1:CA:78' ]
(or whatever) and replace it with
vif= [ 'type=ioemu,mac=00:16:3E:C1:CA:78', 'bridge=xenbr0' ]
where bridge=xenbr0 should be some suitable netif configuration
string, as it would be in the PV-on-PV case. Disk is likewise fairly
simple:
disk = [ 'file:/path/to/image,ioemu:hda,w' ]
becomes
disk = [ 'file:/path/to/image,ioemu:hda,w', 'file:/path/to/some/other/image,hde,w' ]
There is a slight complication in that the paravirtualised block
device can't share an IDE controller with an ioemu device, so if you
have an ioemu hda, the paravirtualised device must be hde or later.
This is to avoid confusing the Linux IDE driver.
Note that having a PV device doesn't imply having a corresponding
ioemu device, and vice versa. Configuring a single backing store to
appear as both an IDE device and a paravirtualised block device is
likely to cause problems; don't do it.
The patches consist of a number of big parts:
-- A version of netback and netfront which can copy packets into
domains rather than doing page flipping. It's much easier to make
this work well with qemu, since the P2M table doesn't need to
change, and it can be faster for some workloads.
The copying interface has been confirmed to work in paravirtualised
domains, but is currently disabled there.
-- Reworking the device model and hypervisor support so that iorequest
completion notifications no longer go to the HVM guest's event
channel mask. This avoids a whole slew of really quite nasty race
conditions
-- Adding a new device to the qemu PCI bus which is used for
bootstrapping the devices and getting an IRQ.
-- Support for hypercalls from HVM domains
-- Various shims and fixes to the frontends so that they work without
the rest of the xenolinux infrastructure.
The patches still have a few rough edges, and they're not as easy to
understand as I'd like, but I think they should be mostly
comprehensible and reasonably stable. The plan is to add them to
xen-unstable over the next few weeks, probably before 3.0.3, so any
testing which anyone can do would be helpful.
The Xen and tools changes are also available as a series of smaller
patches at http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/hvm_xen . The
composition of these gives hvm_xen_unstable.diff.
Steven.
[-- Attachment #1.1.2: copy_netif.diff --]
[-- Type: text/plain, Size: 15145 bytes --]
# HG changeset patch
# User sos22@douglas.cl.cam.ac.uk
# Date 1153175686 -3600
# Node ID 7053592c928b488b0c653fb25ce6f73bc6deeb05
# Parent 4726fd416506a34da96888bac0e7c9772c5037e8
Copying netback.
diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/drivers/xen/netback/common.h
--- a/linux-2.6-xen-sparse/drivers/xen/netback/common.h Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/common.h Mon Jul 17 23:34:46 2006 +0100
@@ -59,6 +59,8 @@ typedef struct netif_st {
/* Unique identifier for this interface. */
domid_t domid;
unsigned int handle;
+ unsigned int rx_flags;
+ unsigned int copy_delivery_offset;
u8 fe_dev_addr[6];
diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/drivers/xen/netback/netback.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/netback.c Mon Jul 17 23:34:46 2006 +0100
@@ -63,13 +63,17 @@ static struct timer_list net_timer;
#define MAX_PENDING_REQS 256
static struct sk_buff_head rx_queue;
-static multicall_entry_t rx_mcl[NET_RX_RING_SIZE+1];
+static multicall_entry_t rx_mcl[NET_RX_RING_SIZE+3];
static mmu_update_t rx_mmu[NET_RX_RING_SIZE];
-static gnttab_transfer_t grant_rx_op[NET_RX_RING_SIZE];
+static gnttab_transfer_t grant_rx_trans_op[NET_RX_RING_SIZE];
+static gnttab_map_grant_ref_t grant_rx_map_op[NET_RX_RING_SIZE];
+static gnttab_unmap_grant_ref_t grant_rx_unmap_op[NET_RX_RING_SIZE];
static unsigned char rx_notify[NR_IRQS];
static unsigned long mmap_vstart;
#define MMAP_VADDR(_req) (mmap_vstart + ((_req) * PAGE_SIZE))
+
+static void *rx_mmap_area;
#define PKT_PROT_LEN 64
@@ -96,13 +100,12 @@ static struct list_head net_schedule_lis
static struct list_head net_schedule_list;
static spinlock_t net_schedule_list_lock;
+static unsigned long alloc_mfn(void)
+{
#define MAX_MFN_ALLOC 64
-static unsigned long mfn_list[MAX_MFN_ALLOC];
-static unsigned int alloc_index = 0;
-static DEFINE_SPINLOCK(mfn_lock);
-
-static unsigned long alloc_mfn(void)
-{
+ static unsigned long mfn_list[MAX_MFN_ALLOC];
+ static unsigned int alloc_index = 0;
+ static DEFINE_SPINLOCK(mfn_lock);
unsigned long mfn = 0, flags;
struct xen_memory_reservation reservation = {
.nr_extents = MAX_MFN_ALLOC,
@@ -218,73 +221,122 @@ static void net_rx_action(unsigned long
u16 size, id, irq, flags;
multicall_entry_t *mcl;
mmu_update_t *mmu;
- gnttab_transfer_t *gop;
+ gnttab_transfer_t *flip_gop;
+ gnttab_map_grant_ref_t *map_gop;
+ gnttab_unmap_grant_ref_t *unmap_gop;
unsigned long vdata, old_mfn, new_mfn;
- struct sk_buff_head rxq;
+ struct sk_buff_head flip_rxq, copy_rxq;
struct sk_buff *skb;
u16 notify_list[NET_RX_RING_SIZE];
int notify_nr = 0;
int ret;
-
- skb_queue_head_init(&rxq);
+ void *rx_mmap_ptr;
+ netif_rx_request_t *rx_req_p;
+ void *remote_data;
+
+ skb_queue_head_init(&flip_rxq);
+ skb_queue_head_init(©_rxq);
mcl = rx_mcl;
mmu = rx_mmu;
- gop = grant_rx_op;
-
+ flip_gop = grant_rx_trans_op;
+ map_gop = grant_rx_map_op;
+ rx_mmap_ptr = rx_mmap_area;
+
+ /* Split the incoming skbs according to whether they need to
+ be page flipped or copied, and build up the first set of
+ hypercall arguments. */
while ((skb = skb_dequeue(&rx_queue)) != NULL) {
netif = netdev_priv(skb->dev);
- vdata = (unsigned long)skb->data;
- old_mfn = virt_to_mfn(vdata);
-
- if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- /* Memory squeeze? Back off for an arbitrary while. */
- if ((new_mfn = alloc_mfn()) == 0) {
- if ( net_ratelimit() )
- WPRINTK("Memory squeeze in netback "
- "driver.\n");
- mod_timer(&net_timer, jiffies + HZ);
- skb_queue_head(&rx_queue, skb);
+ size = skb->tail - skb->data;
+ rx_req_p = RING_GET_REQUEST(&netif->rx,
+ netif->rx.req_cons);
+
+ if (netif->rx_flags &&
+ (rx_req_p->flags & NETIF_RXRF_copy_packet)) {
+ if (map_gop - grant_rx_map_op ==
+ ARRAY_SIZE(grant_rx_map_op))
break;
+ if (size > PAGE_SIZE - netif->copy_delivery_offset) {
+ if (net_ratelimit()) {
+ printk("Discarding jumbogram to copying interface\n");
+ }
+ netif_put(netif);
+ dev_kfree_skb(skb);
+ continue;
}
- /*
- * Set the new P2M table entry before reassigning
- * the old data page. Heed the comment in
- * pgtable-2level.h:pte_page(). :-)
- */
- set_phys_to_machine(
- __pa(skb->data) >> PAGE_SHIFT,
- new_mfn);
-
- MULTI_update_va_mapping(mcl, vdata,
- pfn_pte_ma(new_mfn,
- PAGE_KERNEL), 0);
- mcl++;
-
- mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) |
- MMU_MACHPHYS_UPDATE;
- mmu->val = __pa(vdata) >> PAGE_SHIFT;
- mmu++;
- }
-
- gop->mfn = old_mfn;
- gop->domid = netif->domid;
- gop->ref = RING_GET_REQUEST(
- &netif->rx, netif->rx.req_cons)->gref;
- netif->rx.req_cons++;
- gop++;
-
- __skb_queue_tail(&rxq, skb);
-
- /* Filled the batch queue? */
- if ((gop - grant_rx_op) == ARRAY_SIZE(grant_rx_op))
- break;
- }
-
- if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- if (mcl == rx_mcl)
- return;
-
+ map_gop->host_addr = (unsigned long)rx_mmap_ptr;
+ map_gop->dom = netif->domid;
+ map_gop->ref = rx_req_p->gref;
+ map_gop->flags = GNTMAP_host_map;
+ map_gop++;
+ rx_mmap_ptr += PAGE_SIZE;
+
+ memcpy(skb->cb, rx_req_p, sizeof(*rx_req_p));
+
+ netif->rx.req_cons++;
+ __skb_queue_tail(©_rxq, skb);
+ } else {
+ /* Filled the batch queue? */
+ if ((flip_gop - grant_rx_trans_op) ==
+ ARRAY_SIZE(grant_rx_trans_op))
+ break;
+
+ vdata = (unsigned long)skb->data;
+ old_mfn = virt_to_mfn(vdata);
+
+ if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+ /* Memory squeeze? Back off for an
+ * arbitrary while. */
+ if ((new_mfn = alloc_mfn()) == 0) {
+ if ( net_ratelimit() )
+ WPRINTK("Memory squeeze in netback "
+ "driver.\n");
+ mod_timer(&net_timer, jiffies + HZ);
+ skb_queue_head(&rx_queue, skb);
+ break;
+ }
+ /*
+ * Set the new P2M table entry before
+ * reassigning the old data page. Heed
+ * the comment in
+ * pgtable-2level.h:pte_page(). :-)
+ */
+ set_phys_to_machine(
+ __pa(skb->data) >> PAGE_SHIFT,
+ new_mfn);
+
+ MULTI_update_va_mapping(mcl, vdata,
+ pfn_pte_ma(new_mfn,
+ PAGE_KERNEL), 0);
+ mcl++;
+
+ mmu->ptr = ((maddr_t)new_mfn << PAGE_SHIFT) |
+ MMU_MACHPHYS_UPDATE;
+ mmu->val = __pa(vdata) >> PAGE_SHIFT;
+ mmu++;
+ }
+
+ flip_gop->mfn = old_mfn;
+ flip_gop->domid = netif->domid;
+ flip_gop->ref = rx_req_p->gref;
+ flip_gop++;
+
+ netif->rx.req_cons++;
+ __skb_queue_tail(&flip_rxq, skb);
+ }
+
+ netif->stats.tx_bytes += size;
+ netif->stats.tx_packets++;
+ }
+
+ if (flip_gop == grant_rx_trans_op && map_gop == grant_rx_map_op) {
+ /* Nothing to do */
+ return;
+ }
+
+ if (mcl != rx_mcl) {
+ /* Did some unmaps -> need a TLB flush */
mcl[-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL;
if (mmu - rx_mmu) {
@@ -296,26 +348,32 @@ static void net_rx_action(unsigned long
mcl++;
}
- ret = HYPERVISOR_multicall(rx_mcl, mcl - rx_mcl);
- BUG_ON(ret != 0);
- }
-
- ret = HYPERVISOR_grant_table_op(GNTTABOP_transfer, grant_rx_op,
- gop - grant_rx_op);
+ BUG_ON(flip_gop == grant_rx_trans_op);
+ MULTI_grant_table_op(mcl, GNTTABOP_transfer,
+ grant_rx_trans_op,
+ flip_gop - grant_rx_trans_op);
+ mcl++;
+ }
+ if (map_gop != grant_rx_map_op) {
+ MULTI_grant_table_op(mcl, GNTTABOP_map_grant_ref,
+ grant_rx_map_op,
+ map_gop - grant_rx_map_op);
+ mcl++;
+ }
+
+ ret = HYPERVISOR_multicall(rx_mcl, mcl - rx_mcl);
BUG_ON(ret != 0);
+ /* Now do all of the page flips */
mcl = rx_mcl;
- gop = grant_rx_op;
- while ((skb = __skb_dequeue(&rxq)) != NULL) {
+ flip_gop = grant_rx_trans_op;
+ while ((skb = __skb_dequeue(&flip_rxq)) != NULL) {
netif = netdev_priv(skb->dev);
size = skb->tail - skb->data;
atomic_set(&(skb_shinfo(skb)->dataref), 1);
skb_shinfo(skb)->nr_frags = 0;
skb_shinfo(skb)->frag_list = NULL;
-
- netif->stats.tx_bytes += size;
- netif->stats.tx_packets++;
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
/* The update_va_mapping() must not fail. */
@@ -325,14 +383,14 @@ static void net_rx_action(unsigned long
/* Check the reassignment error code. */
status = NETIF_RSP_OKAY;
- if (gop->status != 0) {
+ if (flip_gop->status != 0) {
DPRINTK("Bad status %d from grant transfer to DOM%u\n",
- gop->status, netif->domid);
+ flip_gop->status, netif->domid);
/*
* Page no longer belongs to us unless GNTST_bad_page,
* but that should be a fatal error anyway.
*/
- BUG_ON(gop->status == GNTST_bad_page);
+ BUG_ON(flip_gop->status == GNTST_bad_page);
status = NETIF_RSP_ERROR;
}
irq = netif->irq;
@@ -352,7 +410,72 @@ static void net_rx_action(unsigned long
netif_put(netif);
dev_kfree_skb(skb);
- gop++;
+ flip_gop++;
+ }
+
+ /* Now do all of the copies */
+ map_gop = grant_rx_map_op;
+ unmap_gop = grant_rx_unmap_op;
+ skb = ((struct sk_buff *)©_rxq)->next;
+ while (skb != (struct sk_buff *)©_rxq) {
+ netif = netdev_priv(skb->dev);
+ size = skb->tail - skb->data;
+
+ rx_req_p = (netif_rx_request_t *)skb->cb;
+
+ if (map_gop->status == 0) {
+ remote_data =
+ (void *)(unsigned long)map_gop->host_addr;
+ memcpy(remote_data + 16,
+ skb->data,
+ size);
+ unmap_gop->host_addr = map_gop->host_addr;
+ unmap_gop->dev_bus_addr = 0;
+ unmap_gop->handle = map_gop->handle;
+ unmap_gop++;
+ }
+
+ map_gop++;
+ skb = skb->next;
+ }
+
+ /* Unmap the packets we just copied into */
+ if (unmap_gop != grant_rx_unmap_op) {
+ ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+ grant_rx_unmap_op,
+ unmap_gop - grant_rx_unmap_op);
+ BUG_ON(ret);
+ map_gop = grant_rx_map_op;
+ /* And notify the other side. */
+ while ((skb = __skb_dequeue(©_rxq)) != NULL) {
+ netif = netdev_priv(skb->dev);
+ rx_req_p = (netif_rx_request_t *)skb->cb;
+
+ flags = 0;
+ if (skb->ip_summed == CHECKSUM_HW)
+ flags |= (NETRXF_csum_blank |
+ NETRXF_data_validated);
+ else if (skb->proto_data_valid)
+ flags |= NETRXF_data_validated;
+
+ if (map_gop->status)
+ status = NETIF_RSP_ERROR;
+ else
+ status = NETIF_RSP_OKAY;
+
+ irq = netif->irq;
+ if (make_rx_response(netif, rx_req_p->id, status,
+ netif->copy_delivery_offset, size,
+ flags) &&
+ rx_notify[irq] == 0) {
+ rx_notify[irq] = 1;
+ notify_list[notify_nr++] = irq;
+ }
+
+ netif_put(netif);
+ dev_kfree_skb(skb);
+ map_gop++;
+ }
}
while (notify_nr != 0) {
@@ -966,6 +1089,12 @@ static void netif_page_release(struct pa
set_page_count(page, 1);
netif_idx_release(pending_idx);
+}
+
+static void netif_rx_page_release(struct page *page)
+{
+ /* Ready for next use. */
+ set_page_count(page, 1);
}
irqreturn_t netif_be_int(int irq, void *dev_id, struct pt_regs *regs)
@@ -1093,6 +1222,16 @@ static int __init netback_init(void)
SetPageForeign(page, netif_page_release);
}
+ page = balloon_alloc_empty_page_range(NET_RX_RING_SIZE);
+ BUG_ON(page == NULL);
+ rx_mmap_area = pfn_to_kaddr(page_to_pfn(page));
+
+ for (i = 0; i < NET_RX_RING_SIZE; i++) {
+ page = virt_to_page(rx_mmap_area + (i * PAGE_SIZE));
+ set_page_count(page, 1);
+ SetPageForeign(page, netif_rx_page_release);
+ }
+
pending_cons = 0;
pending_prod = MAX_PENDING_REQS;
for (i = 0; i < MAX_PENDING_REQS; i++)
diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c
--- a/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netback/xenbus.c Mon Jul 17 23:34:46 2006 +0100
@@ -110,6 +110,18 @@ static int netback_probe(struct xenbus_d
}
#endif
+ err = xenbus_printf(xbt, dev->nodename, "feature-rx-copy", "%d", 1);
+ if (err) {
+ message = "writing feature-copying";
+ goto abort_transaction;
+ }
+
+ err = xenbus_printf(xbt, dev->nodename, "feature-rx-flags", "%d", 1);
+ if (err) {
+ message = "writing feature-rx-flags";
+ goto abort_transaction;
+ }
+
err = xenbus_transaction_end(xbt, 0);
} while (err == -EAGAIN);
@@ -363,6 +375,30 @@ static int connect_rings(struct backend_
if (err) {
xenbus_dev_fatal(dev, err,
"reading %s/ring-ref and event-channel",
+ dev->otherend);
+ return err;
+ }
+
+ err = xenbus_scanf(XBT_NIL, dev->otherend,
+ "use-rx-flags", "%u",
+ &be->netif->rx_flags);
+ if (err == -ENOENT) {
+ be->netif->rx_flags = 0;
+ } else if (err < 0) {
+ xenbus_dev_fatal(dev, err,
+ "reading %s/use-rx-flags",
+ dev->otherend);
+ return err;
+ }
+
+ err = xenbus_scanf(XBT_NIL, dev->otherend,
+ "copy-delivery-offset", "%u",
+ &be->netif->copy_delivery_offset);
+ if (err == -ENOENT) {
+ be->netif->copy_delivery_offset = 0;
+ } else if (err < 0) {
+ xenbus_dev_fatal(dev, err,
+ "reading %s/copy_delivery_offset",
dev->otherend);
return err;
}
diff -r 4726fd416506 -r 7053592c928b linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypervisor.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypervisor.h Mon Jul 17 22:55:34 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypervisor.h Mon Jul 17 23:34:46 2006 +0100
@@ -200,6 +200,16 @@ MULTI_update_va_mapping(
}
static inline void
+MULTI_grant_table_op(multicall_entry_t *mcl, unsigned int cmd,
+ void *uop, unsigned int count)
+{
+ mcl->op = __HYPERVISOR_grant_table_op;
+ mcl->args[0] = cmd;
+ mcl->args[1] = (unsigned long)uop;
+ mcl->args[2] = count;
+}
+
+static inline void
MULTI_update_va_mapping_otherdomain(
multicall_entry_t *mcl, unsigned long va,
pte_t new_val, unsigned long flags, domid_t domid)
diff -r 4726fd416506 -r 7053592c928b xen/include/public/io/netif.h
--- a/xen/include/public/io/netif.h Mon Jul 17 22:55:34 2006 +0100
+++ b/xen/include/public/io/netif.h Mon Jul 17 23:34:46 2006 +0100
@@ -109,8 +109,12 @@ struct netif_tx_response {
};
typedef struct netif_tx_response netif_tx_response_t;
+#define _NETIF_RXRF_copy_packet (0)
+#define NETIF_RXRF_copy_packet (1U<<_NETIF_RXRF_copy_packet)
+
struct netif_rx_request {
uint16_t id; /* Echoed in response message. */
+ uint16_t flags; /* NETRXRF_* */
grant_ref_t gref; /* Reference to incoming granted frame */
};
typedef struct netif_rx_request netif_rx_request_t;
[-- Attachment #1.1.3: frontend_changes.diff --]
[-- Type: text/plain, Size: 66799 bytes --]
# HG changeset patch
# User sos22@douglas.cl.cam.ac.uk
# Date 1153175939 -3600
# Node ID aa3087ee5769d60d5ab1e368cc062233d364ec8b
# Parent 7053592c928b488b0c653fb25ce6f73bc6deeb05
Frontend parts of PV-on-HVM patches.
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/blkfront/blkfront.c Mon Jul 17 23:38:59 2006 +0100
@@ -46,6 +46,7 @@
#include <xen/interface/grant_table.h>
#include <xen/gnttab.h>
#include <asm/hypervisor.h>
+#include <asm/maddr.h>
#define BLKIF_STATE_DISCONNECTED 0
#define BLKIF_STATE_CONNECTED 1
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/core/gnttab.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/gnttab.c Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/core/gnttab.c Mon Jul 17 23:38:59 2006 +0100
@@ -41,6 +41,13 @@
#include <asm/pgtable.h>
#include <asm/uaccess.h>
#include <asm/synch_bitops.h>
+#include <asm/maddr.h>
+#include <xen/interface/memory.h>
+
+#ifndef CONFIG_XEN
+#include <asm/io.h>
+#include <evtchn-pci.h>
+#endif
/* External tools reserve first few grant table entries. */
#define NR_RESERVED_ENTRIES 8
@@ -350,6 +357,7 @@ void gnttab_cancel_free_callback(struct
}
EXPORT_SYMBOL_GPL(gnttab_cancel_free_callback);
+#ifdef CONFIG_XEN
#ifndef __ia64__
static int map_pte_fn(pte_t *pte, struct page *pmd_page,
unsigned long addr, void *data)
@@ -404,23 +412,49 @@ int gnttab_resume(void)
shared = __va(frames[0] << PAGE_SHIFT);
printk("grant table at %p\n", shared);
#endif
-
- return 0;
-}
+}
+#else /* !CONFIG_XEN */
+int
+gnttab_resume(void)
+{
+ unsigned long frames;
+ int x;
+ struct xen_add_to_physmap xatp;
+
+ frames = alloc_xen_mmio(PAGE_SIZE * NR_GRANT_FRAMES);
+ shared = ioremap(frames, PAGE_SIZE * NR_GRANT_FRAMES);
+ if(!shared){
+ printk("error to ioremap gnttab share frames\n");
+ return -1;
+ }
+ for (x = 0; x < NR_GRANT_FRAMES; x++) {
+ xatp.domid = DOMID_SELF;
+ xatp.idx = x;
+ xatp.space = XENMAPSPACE_grant_table;
+ xatp.gpfn = (frames >> PAGE_SHIFT) + x;
+ BUG_ON(HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp));
+ }
+ return 0;
+}
+#endif
int gnttab_suspend(void)
{
#ifndef __ia64__
+#ifdef CONFIG_XEN
apply_to_page_range(&init_mm, (unsigned long)shared,
PAGE_SIZE * NR_GRANT_FRAMES,
unmap_pte_fn, NULL);
-#endif
-
- return 0;
-}
-
-static int __init gnttab_init(void)
+#else
+ iounmap(shared);
+#endif
+#endif
+
+ return 0;
+}
+
+int __init gnttab_init(void)
{
int i;
@@ -439,4 +473,6 @@ static int __init gnttab_init(void)
return 0;
}
+#ifdef CONFIG_XEN
core_initcall(gnttab_init);
+#endif
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/core/xen_proc.c
--- a/linux-2.6-xen-sparse/drivers/xen/core/xen_proc.c Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/core/xen_proc.c Mon Jul 17 23:38:59 2006 +0100
@@ -1,4 +1,5 @@
+#include <linux/module.h>
#include <linux/config.h>
#include <linux/proc_fs.h>
#include <xen/xen_proc.h>
@@ -12,6 +13,7 @@ struct proc_dir_entry *create_xen_proc_e
panic("Couldn't create /proc/xen");
return create_proc_entry(name, mode, xen_base);
}
+EXPORT_SYMBOL(create_xen_proc_entry);
void remove_xen_proc_entry(const char *name)
{
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c
--- a/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c Mon Jul 17 23:38:59 2006 +0100
@@ -61,6 +61,25 @@
#include <asm/uaccess.h>
#include <xen/interface/grant_table.h>
#include <xen/gnttab.h>
+#include <asm/maddr.h>
+
+/* If we don't have GSO, fake things up so that we never try to use
+ it */
+#ifndef NETIF_F_GSO
+#define netif_needs_gso(dev, skb) 0
+#define NETIF_F_GSO_ROBUST 0
+#define NETIF_F_GSO_SHIFT 16
+#else
+#define HAVE_GSO
+#endif
+
+#ifdef CONFIG_XEN
+#define SKB_PROTO_DATA_VALID(skb) (skb)->proto_data_valid
+#define SET_SKB_PROTO_DATA_VALID(skb, v) do { (skb)->proto_data_valid = (v); } while (0)
+#else
+#define SKB_PROTO_DATA_VALID(skb) 0
+#define SET_SKB_PROTO_DATA_VALID(skb, v) do {} while (0)
+#endif
#define GRANT_INVALID_REF 0
@@ -88,6 +107,7 @@ struct netfront_info {
unsigned int handle;
unsigned int evtchn, irq;
+ unsigned int copyall;
/* Receive-ring batched refills. */
#define RX_MIN_TARGET 8
@@ -148,7 +168,7 @@ static inline unsigned short get_id_from
static int talk_to_backend(struct xenbus_device *, struct netfront_info *);
static int setup_device(struct xenbus_device *, struct netfront_info *);
-static struct net_device *create_netdev(int, struct xenbus_device *);
+static struct net_device *create_netdev(int, int, struct xenbus_device *);
static void netfront_closing(struct xenbus_device *);
@@ -190,14 +210,41 @@ static int __devinit netfront_probe(stru
struct net_device *netdev;
struct netfront_info *info;
unsigned int handle;
+#ifndef CONFIG_XEN
+ unsigned feature_rx_flags;
+#endif
+ unsigned feature_rx_copy;
err = xenbus_scanf(XBT_NIL, dev->nodename, "handle", "%u", &handle);
if (err != 1) {
xenbus_dev_fatal(dev, err, "reading handle");
return err;
}
-
- netdev = create_netdev(handle, dev);
+#ifndef CONFIG_XEN
+ err = xenbus_scanf(XBT_NIL, dev->otherend, "feature-rx-flags", "%u",
+ &feature_rx_flags);
+ if (err == 1) {
+ err = xenbus_scanf(XBT_NIL,
+ dev->otherend,
+ "feature-rx-copy",
+ "%u",
+ &feature_rx_copy);
+ if (err != 1) {
+ feature_rx_copy = 0;
+ err = EINVAL;
+ }
+ } else {
+ feature_rx_copy = feature_rx_flags = 0;
+ }
+ if (!feature_rx_copy) {
+ xenbus_dev_fatal(dev, err, "need a copy-capable backend");
+ return err;
+ }
+#else
+ feature_rx_copy = 0;
+#endif
+
+ netdev = create_netdev(handle, feature_rx_copy, dev);
if (IS_ERR(netdev)) {
err = PTR_ERR(netdev);
xenbus_dev_fatal(dev, err, "creating netdev");
@@ -300,6 +347,19 @@ again:
"event-channel", "%u", info->evtchn);
if (err) {
message = "writing event-channel";
+ goto abort_transaction;
+ }
+
+ err = xenbus_printf(xbt, dev->nodename, "use-rx-flags", "%u", 1);
+ if (err) {
+ message = "writing use-rx-flags";
+ goto abort_transaction;
+ }
+
+ err = xenbus_printf(xbt, dev->nodename, "copy-delivery-offset", "%u",
+ 16);
+ if (err) {
+ message = "writing copy-delivery-offset";
goto abort_transaction;
}
@@ -550,6 +610,8 @@ static void network_alloc_rx_buffers(str
RING_IDX req_prod = np->rx.req_prod_pvt;
struct xen_memory_reservation reservation;
grant_ref_t ref;
+ netif_rx_request_t *req;
+ int nr_flips;
if (unlikely(!netif_carrier_ok(dev)))
return;
@@ -592,7 +654,7 @@ static void network_alloc_rx_buffers(str
np->rx_target = np->rx_max_target;
refill:
- for (i = 0; ; i++) {
+ for (nr_flips = i = 0; ; i++) {
if ((skb = __skb_dequeue(&np->rx_batch)) == NULL)
break;
@@ -602,17 +664,78 @@ static void network_alloc_rx_buffers(str
np->rx_skbs[id] = skb;
- RING_GET_REQUEST(&np->rx, req_prod + i)->id = id;
ref = gnttab_claim_grant_reference(&np->gref_rx_head);
BUG_ON((signed short)ref < 0);
np->grant_rx_ref[id] = ref;
- gnttab_grant_foreign_transfer_ref(ref,
- np->xbdev->otherend_id,
- __pa(skb->head)>>PAGE_SHIFT);
- RING_GET_REQUEST(&np->rx, req_prod + i)->gref = ref;
- np->rx_pfn_array[i] = virt_to_mfn(skb->head);
+
+ req = RING_GET_REQUEST(&np->rx, req_prod + i);
+ if ( !np->copyall ) {
+ gnttab_grant_foreign_transfer_ref(ref,
+ np->xbdev->otherend_id,
+ __pa(skb->head) >> PAGE_SHIFT);
+ np->rx_pfn_array[nr_flips] = virt_to_mfn(skb->head);
+
+ if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+ /* Remove this page from map before
+ * passing back to Xen. */
+ set_phys_to_machine(__pa(skb->head) >>
+ PAGE_SHIFT,
+ INVALID_P2M_ENTRY);
+
+ MULTI_update_va_mapping(np->rx_mcl+nr_flips,
+ (unsigned long)skb->head,
+ __pte(0), 0);
+ }
+ nr_flips++;
+ req->flags = 0;
+ } else {
+ gnttab_grant_foreign_access_ref(ref,
+ np->xbdev->otherend_id,
+ virt_to_mfn(skb->head),
+ 0);
+ req->flags = NETIF_RXRF_copy_packet;
+ }
+ req->gref = ref;
+ req->id = id;
+ }
+
+ if ( nr_flips != 0 ) {
+ set_xen_guest_handle(reservation.extent_start,
+ np->rx_pfn_array);
+ reservation.nr_extents = nr_flips;
+ reservation.extent_order = 0;
+ reservation.address_bits = 0;
+ reservation.domid = DOMID_SELF;
+
+ /* Tell the ballon driver what is going on. */
+ balloon_update_driver_allowance(nr_flips);
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+ /* After all PTEs have been zapped, flush the
+ * TLB. */
+ np->rx_mcl[nr_flips-1].args[MULTI_UVMFLAGS_INDEX] =
+ UVMF_TLB_FLUSH|UVMF_ALL;
+
+ /* Give away a batch of pages. */
+ np->rx_mcl[nr_flips].op = __HYPERVISOR_memory_op;
+ np->rx_mcl[nr_flips].args[0] =
+ XENMEM_decrease_reservation;
+ np->rx_mcl[nr_flips].args[1] =
+ (unsigned long)&reservation;
+
+ /* Zap PTEs and give away pages in one big
+ * multicall. */
+ (void)HYPERVISOR_multicall(np->rx_mcl, nr_flips + 1);
+
+ /* Check return status of
+ * HYPERVISOR_memory_op(). */
+ if (unlikely(np->rx_mcl[nr_flips].result != nr_flips))
+ panic("Unable to reduce memory reservation (%ld,%d)\n",
+ np->rx_mcl[nr_flips].result, nr_flips);
+ } else {
+ if (HYPERVISOR_memory_op(XENMEM_decrease_reservation,
+ &reservation) != i)
+ panic("Unable to reduce memory reservation\n");
/* Remove this page before passing back to Xen. */
set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
INVALID_P2M_ENTRY);
@@ -620,37 +743,9 @@ static void network_alloc_rx_buffers(str
(unsigned long)skb->head,
__pte(0), 0);
}
- }
-
- /* Tell the ballon driver what is going on. */
- balloon_update_driver_allowance(i);
-
- set_xen_guest_handle(reservation.extent_start, np->rx_pfn_array);
- reservation.nr_extents = i;
- reservation.extent_order = 0;
- reservation.address_bits = 0;
- reservation.domid = DOMID_SELF;
-
- if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- /* After all PTEs have been zapped, flush the TLB. */
- np->rx_mcl[i-1].args[MULTI_UVMFLAGS_INDEX] =
- UVMF_TLB_FLUSH|UVMF_ALL;
-
- /* Give away a batch of pages. */
- np->rx_mcl[i].op = __HYPERVISOR_memory_op;
- np->rx_mcl[i].args[0] = XENMEM_decrease_reservation;
- np->rx_mcl[i].args[1] = (unsigned long)&reservation;
-
- /* Zap PTEs and give away pages in one big multicall. */
- (void)HYPERVISOR_multicall(np->rx_mcl, i+1);
-
- /* Check return status of HYPERVISOR_memory_op(). */
- if (unlikely(np->rx_mcl[i].result != i))
- panic("Unable to reduce memory reservation\n");
- } else
- if (HYPERVISOR_memory_op(XENMEM_decrease_reservation,
- &reservation) != i)
- panic("Unable to reduce memory reservation\n");
+ } else {
+ wmb();
+ }
/* Above is a suitable barrier to ensure backend will see requests. */
np->rx.req_prod_pvt = req_prod + i;
@@ -774,9 +869,10 @@ static int network_start_xmit(struct sk_
if (skb->ip_summed == CHECKSUM_HW) /* local packet? */
tx->flags |= NETTXF_csum_blank | NETTXF_data_validated;
- if (skb->proto_data_valid) /* remote but checksummed? */
+ if (SKB_PROTO_DATA_VALID(skb)) /* remote but checksummed? */
tx->flags |= NETTXF_data_validated;
+#ifdef HAVE_GSO
if (skb_shinfo(skb)->gso_size) {
struct netif_extra_info *gso = (struct netif_extra_info *)
RING_GET_REQUEST(&np->tx, ++i);
@@ -793,6 +889,7 @@ static int network_start_xmit(struct sk_
gso->flags = 0;
extra = gso;
}
+#endif
np->tx.req_prod_pvt = i + 1;
@@ -852,6 +949,8 @@ static int netif_poll(struct net_device
unsigned long flags;
unsigned long mfn;
grant_ref_t ref;
+ unsigned long ret;
+ netif_rx_request_t *req;
spin_lock(&np->rx_lock);
@@ -883,25 +982,50 @@ static int netif_poll(struct net_device
continue;
}
- /* Memory pressure, insufficient buffer headroom, ... */
- if ((mfn = gnttab_end_foreign_transfer_ref(ref)) == 0) {
- if (net_ratelimit())
- WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
- rx->id, rx->status);
- RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id =
- rx->id;
- RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref =
- ref;
- np->rx.req_prod_pvt++;
- RING_PUSH_REQUESTS(&np->rx);
- work_done--;
- continue;
+ skb = np->rx_skbs[rx->id];
+
+ if ( !np->copyall ) {
+ /* Memory pressure, insufficient buffer
+ * headroom, ... */
+ if ((mfn = gnttab_end_foreign_transfer_ref(ref)) == 0)
+ {
+ if (net_ratelimit())
+ WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n",
+ rx->id, rx->status);
+ req = RING_GET_REQUEST(&np->rx,
+ np->rx.req_prod_pvt);
+ req->id = rx->id;
+ req->gref = ref;
+ np->rx.req_prod_pvt++;
+ RING_PUSH_REQUESTS(&np->rx);
+ work_done--;
+ continue;
+ }
+ /* Remap the page. */
+ if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+ MULTI_update_va_mapping(mcl,
+ (unsigned long)skb->head,
+ pfn_pte_ma(mfn,
+ PAGE_KERNEL),
+ 0);
+ mcl++;
+ mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
+ | MMU_MACHPHYS_UPDATE;
+ mmu->val = __pa(skb->head) >> PAGE_SHIFT;
+ mmu++;
+
+ set_phys_to_machine(__pa(skb->head)
+ >> PAGE_SHIFT,
+ mfn);
+ }
+ } else {
+ ret = gnttab_end_foreign_access_ref(ref, 0);
+ BUG_ON(!ret);
}
gnttab_release_grant_reference(&np->gref_rx_head, ref);
np->grant_rx_ref[rx->id] = GRANT_INVALID_REF;
- skb = np->rx_skbs[rx->id];
add_id_to_freelist(np->rx_skbs, rx->id);
/* NB. We handle skb overflow later. */
@@ -915,30 +1039,16 @@ static int netif_poll(struct net_device
*/
if (rx->flags & (NETRXF_data_validated|NETRXF_csum_blank)) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
- skb->proto_data_valid = 1;
+ SET_SKB_PROTO_DATA_VALID(skb, 1);
} else {
skb->ip_summed = CHECKSUM_NONE;
- skb->proto_data_valid = 0;
+ SET_SKB_PROTO_DATA_VALID(skb, 0);
}
+#ifdef CONFIG_XEN
skb->proto_csum_blank = !!(rx->flags & NETRXF_csum_blank);
-
+#endif
np->stats.rx_packets++;
np->stats.rx_bytes += rx->status;
-
- if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- /* Remap the page. */
- MULTI_update_va_mapping(mcl, (unsigned long)skb->head,
- pfn_pte_ma(mfn, PAGE_KERNEL),
- 0);
- mcl++;
- mmu->ptr = ((maddr_t)mfn << PAGE_SHIFT)
- | MMU_MACHPHYS_UPDATE;
- mmu->val = __pa(skb->head) >> PAGE_SHIFT;
- mmu++;
-
- set_phys_to_machine(__pa(skb->head) >> PAGE_SHIFT,
- mfn);
- }
__skb_queue_tail(&rxq, skb);
}
@@ -996,8 +1106,11 @@ static int netif_poll(struct net_device
/* Copy any other fields we already set up. */
nskb->dev = skb->dev;
nskb->ip_summed = skb->ip_summed;
- nskb->proto_data_valid = skb->proto_data_valid;
+ SET_SKB_PROTO_DATA_VALID(nskb,
+ SKB_PROTO_DATA_VALID(skb));
+#ifdef CONFIG_XEN
nskb->proto_csum_blank = skb->proto_csum_blank;
+#endif
}
/* Reinitialise and then destroy the old skbuff. */
@@ -1126,6 +1239,8 @@ static void network_connect(struct net_d
struct netfront_info *np = netdev_priv(dev);
int i, requeue_idx;
struct sk_buff *skb;
+ grant_ref_t gref;
+ netif_rx_request_t *req;
xennet_set_features(dev);
@@ -1159,13 +1274,21 @@ static void network_connect(struct net_d
for (requeue_idx = 0, i = 1; i <= NET_RX_RING_SIZE; i++) {
if ((unsigned long)np->rx_skbs[i] < PAGE_OFFSET)
continue;
- gnttab_grant_foreign_transfer_ref(
- np->grant_rx_ref[i], np->xbdev->otherend_id,
- __pa(np->rx_skbs[i]->data) >> PAGE_SHIFT);
- RING_GET_REQUEST(&np->rx, requeue_idx)->gref =
- np->grant_rx_ref[i];
- RING_GET_REQUEST(&np->rx, requeue_idx)->id = i;
- requeue_idx++;
+ gref = np->grant_rx_ref[i];
+ skb = np->rx_skbs[i];
+ if ( !np->copyall ) {
+ gnttab_grant_foreign_transfer_ref(
+ gref, np->xbdev->otherend_id,
+ __pa(skb->data) >> PAGE_SHIFT);
+ } else {
+ gnttab_grant_foreign_access_ref(
+ gref, np->xbdev->otherend_id,
+ virt_to_mfn(skb->data), 0);
+ }
+ req = RING_GET_REQUEST(&np->rx, requeue_idx);
+ req->gref = gref;
+ req->id = i;
+ requeue_idx++;
}
np->rx.req_prod_pvt = requeue_idx;
@@ -1348,10 +1471,13 @@ static void network_set_multicast_list(s
/** Create a network device.
* @param handle device handle
+ * @param copyall flag; 1 if every packet must be copied, 0 if every packet
+ * must be flipped.
* @param val return parameter for created device
* @return 0 on success, error code otherwise
*/
static struct net_device * __devinit create_netdev(int handle,
+ int copyall,
struct xenbus_device *dev)
{
int i, err = 0;
@@ -1368,6 +1494,7 @@ static struct net_device * __devinit cre
np = netdev_priv(netdev);
np->handle = handle;
np->xbdev = dev;
+ np->copyall = copyall;
netif_carrier_off(netdev);
@@ -1418,7 +1545,11 @@ static struct net_device * __devinit cre
netdev->uninit = netif_uninit;
netdev->change_mtu = xennet_change_mtu;
netdev->weight = 64;
+#ifdef CONFIG_XEN
netdev->features = NETIF_F_IP_CSUM;
+#else
+ netdev->features = 0;
+#endif
SET_ETHTOOL_OPS(netdev, &network_ethtool_ops);
SET_MODULE_OWNER(netdev);
@@ -1581,8 +1712,10 @@ static int __init netif_init(void)
if (!is_running_on_xen())
return -ENODEV;
+#ifdef CONFIG_XEN
if (xen_start_info->flags & SIF_INITDOMAIN)
return 0;
+#endif
IPRINTK("Initialising virtual ethernet driver.\n");
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.c Mon Jul 17 23:38:59 2006 +0100
@@ -39,6 +39,8 @@
#include <xen/xenbus.h>
#include "xenbus_comms.h"
+void *shared_xenstore_buf;
+
static int xenbus_irq;
extern void xenbus_probe(void *);
@@ -49,7 +51,7 @@ DECLARE_WAIT_QUEUE_HEAD(xb_waitq);
static inline struct xenstore_domain_interface *xenstore_domain_interface(void)
{
- return mfn_to_virt(xen_start_info->store_mfn);
+ return shared_xenstore_buf;
}
static irqreturn_t wake_waiting(int irq, void *unused, struct pt_regs *regs)
@@ -129,7 +131,7 @@ int xb_write(const void *data, unsigned
intf->req_prod += avail;
/* This implies mb() before other side sees interrupt. */
- notify_remote_via_evtchn(xen_start_info->store_evtchn);
+ notify_remote_via_evtchn(xen_store_evtchn);
}
return 0;
@@ -180,7 +182,7 @@ int xb_read(void *data, unsigned len)
pr_debug("Finished read of %i bytes (%i to go)\n", avail, len);
/* Implies mb(): they will see new header. */
- notify_remote_via_evtchn(xen_start_info->store_evtchn);
+ notify_remote_via_evtchn(xen_store_evtchn);
}
return 0;
@@ -195,7 +197,7 @@ int xb_init_comms(void)
unbind_from_irqhandler(xenbus_irq, &xb_waitq);
err = bind_evtchn_to_irqhandler(
- xen_start_info->store_evtchn, wake_waiting,
+ xen_store_evtchn, wake_waiting,
0, "xenbus", &xb_waitq);
if (err <= 0) {
printk(KERN_ERR "XENBUS request irq failed %i\n", err);
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_comms.h Mon Jul 17 23:38:59 2006 +0100
@@ -39,5 +39,7 @@ int xb_read(void *data, unsigned len);
int xb_read(void *data, unsigned len);
int xs_input_avail(void);
extern wait_queue_head_t xb_waitq;
+extern void *shared_xenstore_buf;
+extern int xen_store_evtchn;
#endif /* _XENBUS_COMMS_H */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_dev.c Mon Jul 17 23:38:59 2006 +0100
@@ -48,6 +48,7 @@
#include <xen/xenbus.h>
#include <xen/xen_proc.h>
#include <asm/hypervisor.h>
+#include <asm/io.h>
struct xenbus_dev_transaction {
struct list_head list;
@@ -181,7 +182,7 @@ static int xenbus_dev_open(struct inode
{
struct xenbus_dev_data *u;
- if (xen_start_info->store_evtchn == 0)
+ if (xen_store_evtchn == 0)
return -ENOENT;
nonseekable_open(inode, filp);
@@ -232,7 +233,7 @@ static struct file_operations xenbus_dev
.poll = xenbus_dev_poll,
};
-static int __init
+int __init
xenbus_dev_init(void)
{
xenbus_dev_intf = create_xen_proc_entry("xenbus", 0400);
@@ -242,4 +243,6 @@ xenbus_dev_init(void)
return 0;
}
+#ifndef MODULE
__initcall(xenbus_dev_init);
+#endif
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c
--- a/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/xenbus/xenbus_probe.c Mon Jul 17 23:38:59 2006 +0100
@@ -44,6 +44,7 @@
#include <linux/kthread.h>
#include <asm/io.h>
+#include <asm/maddr.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/hypervisor.h>
@@ -51,8 +52,12 @@
#include <xen/xen_proc.h>
#include <xen/evtchn.h>
#include <xen/features.h>
+#include <xen/hvm.h>
#include "xenbus_comms.h"
+
+int xen_store_evtchn;
+static unsigned long xen_store_mfn;
extern struct mutex xenwatch_mutex;
@@ -915,8 +920,7 @@ static int xsd_kva_mmap(struct file *fil
if ((size > PAGE_SIZE) || (vma->vm_pgoff != 0))
return -EINVAL;
- if (remap_pfn_range(vma, vma->vm_start,
- mfn_to_pfn(xen_start_info->store_mfn),
+ if (remap_pfn_range(vma, vma->vm_start, mfn_to_pfn(xen_store_mfn),
size, vma->vm_page_prot))
return -EAGAIN;
@@ -928,7 +932,7 @@ static int xsd_kva_read(char *page, char
{
int len;
- len = sprintf(page, "0x%p", mfn_to_virt(xen_start_info->store_mfn));
+ len = sprintf(page, "0x%p", mfn_to_virt(xen_store_mfn));
*eof = 1;
return len;
}
@@ -938,12 +942,11 @@ static int xsd_port_read(char *page, cha
{
int len;
- len = sprintf(page, "%d", xen_start_info->store_evtchn);
+ len = sprintf(page, "%d", xen_store_evtchn);
*eof = 1;
return len;
}
#endif
-
static int __init xenbus_probe_init(void)
{
@@ -962,7 +965,11 @@ static int __init xenbus_probe_init(void
/*
* Domain0 doesn't have a store_evtchn or store_mfn yet.
*/
+#ifdef CONFIG_XEN
dom0 = (xen_start_info->store_evtchn == 0);
+#else
+ dom0 = 0;
+#endif
if (dom0) {
struct evtchn_alloc_unbound alloc_unbound;
@@ -972,7 +979,7 @@ static int __init xenbus_probe_init(void
if (!page)
return -ENOMEM;
- xen_start_info->store_mfn =
+ xen_store_mfn =
pfn_to_mfn(virt_to_phys((void *)page) >>
PAGE_SHIFT);
@@ -985,7 +992,7 @@ static int __init xenbus_probe_init(void
if (err == -ENOSYS)
goto err;
BUG_ON(err);
- xen_start_info->store_evtchn = alloc_unbound.port;
+ xen_store_evtchn = alloc_unbound.port;
#ifdef CONFIG_PROC_FS
/* And finally publish the above info in /proc/xen */
@@ -1001,8 +1008,21 @@ static int __init xenbus_probe_init(void
if (xsd_port_intf)
xsd_port_intf->read_proc = xsd_port_read;
#endif
- } else
+ shared_xenstore_buf = mfn_to_virt(xen_store_mfn);
+ } else {
xenstored_ready = 1;
+#ifdef CONFIG_XEN
+ xen_store_evtchn = xen_start_info->store_evtchn;
+ xen_store_mfn = xen_start_info->store_mfn;
+ shared_xenstore_buf = mfn_to_virt(xen_store_mfn);
+#else
+ xen_store_evtchn = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN);
+ xen_store_mfn = hvm_get_parameter(HVM_PARAM_STORE_PFN);
+ shared_xenstore_buf = ioremap(xen_store_mfn << PAGE_SHIFT,
+ PAGE_SIZE);
+ xenbus_dev_init();
+#endif
+ }
/* Initialize the interface to xenstore. */
err = xs_init();
@@ -1035,8 +1055,10 @@ static int __init xenbus_probe_init(void
}
postcore_initcall(xenbus_probe_init);
-
-
+MODULE_LICENSE("Dual BSD/GPL");
+
+
+#ifndef MODULE
static int is_disconnected_device(struct device *dev, void *data)
{
struct xenbus_device *xendev = to_xenbus_device(dev);
@@ -1105,3 +1127,4 @@ static int __init wait_for_devices(void)
}
late_initcall(wait_for_devices);
+#endif
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/hypercall.h Mon Jul 17 23:38:59 2006 +0100
@@ -42,6 +42,7 @@
#define __STR(x) #x
#define STR(x) __STR(x)
+#ifdef CONFIG_XEN
#define _hypercall0(type, name) \
({ \
long __res; \
@@ -114,6 +115,92 @@
: "memory" ); \
(type)__res; \
})
+#else
+#define _hypercall0(type, name) \
+({ \
+ long __res; \
+ asm volatile ( \
+ "movl hypercall_page, %%eax\n" \
+ "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+ "call *%%eax" \
+ : "=a" (__res) \
+ : \
+ : "memory" ); \
+ (type)__res; \
+})
+
+#define _hypercall1(type, name, a1) \
+({ \
+ long __res, __ign1; \
+ asm volatile ( \
+ "movl hypercall_page, %%eax\n" \
+ "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+ "call *%%eax" \
+ : "=a" (__res), "=b" (__ign1) \
+ : "1" ((long)(a1)) \
+ : "memory" ); \
+ (type)__res; \
+})
+
+#define _hypercall2(type, name, a1, a2) \
+({ \
+ long __res, __ign1, __ign2; \
+ asm volatile ( \
+ "movl hypercall_page, %%eax\n" \
+ "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+ "call *%%eax" \
+ : "=a" (__res), "=b" (__ign1), "=c" (__ign2) \
+ : "1" ((long)(a1)), "2" ((long)(a2)) \
+ : "memory" ); \
+ (type)__res; \
+})
+
+#define _hypercall3(type, name, a1, a2, a3) \
+({ \
+ long __res, __ign1, __ign2, __ign3; \
+ asm volatile ( \
+ "movl hypercall_page, %%eax\n" \
+ "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+ "call *%%eax" \
+ : "=a" (__res), "=b" (__ign1), "=c" (__ign2), \
+ "=d" (__ign3) \
+ : "1" ((long)(a1)), "2" ((long)(a2)), \
+ "3" ((long)(a3)) \
+ : "memory" ); \
+ (type)__res; \
+})
+
+#define _hypercall4(type, name, a1, a2, a3, a4) \
+({ \
+ long __res, __ign1, __ign2, __ign3, __ign4; \
+ asm volatile ( \
+ "movl hypercall_page, %%eax\n" \
+ "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+ "call *%%eax" \
+ : "=a" (__res), "=b" (__ign1), "=c" (__ign2), \
+ "=d" (__ign3), "=S" (__ign4) \
+ : "1" ((long)(a1)), "2" ((long)(a2)), \
+ "3" ((long)(a3)), "4" ((long)(a4)) \
+ : "memory" ); \
+ (type)__res; \
+})
+
+#define _hypercall5(type, name, a1, a2, a3, a4, a5) \
+({ \
+ long __res, __ign1, __ign2, __ign3, __ign4, __ign5; \
+ asm volatile ( \
+ "movl hypercall_page, %%eax\n" \
+ "addl $"STR(__HYPERVISOR_##name)" * 32, %%eax\n"\
+ "call *%%eax" \
+ : "=a" (__res), "=b" (__ign1), "=c" (__ign2), \
+ "=d" (__ign3), "=S" (__ign4), "=D" (__ign5) \
+ : "1" ((long)(a1)), "2" ((long)(a2)), \
+ "3" ((long)(a3)), "4" ((long)(a4)), \
+ "5" ((long)(a5)) \
+ : "memory" ); \
+ (type)__res; \
+})
+#endif
static inline int
HYPERVISOR_set_trap_table(
@@ -354,6 +441,13 @@ HYPERVISOR_nmi_op(
return _hypercall2(int, nmi_op, op, arg);
}
+static inline unsigned long
+HYPERVISOR_hvm_op(
+ int op, void *arg)
+{
+ return _hypercall2(unsigned long, hvm_op, op, arg);
+}
+
static inline int
HYPERVISOR_callback_op(
int cmd, void *arg)
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/page.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/page.h Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/page.h Mon Jul 17 23:38:59 2006 +0100
@@ -20,6 +20,7 @@
#include <xen/interface/xen.h>
#include <xen/features.h>
#include <xen/foreign_page.h>
+#include <asm/maddr.h>
#define arch_free_page(_page,_order) \
({ int foreign = PageForeign(_page); \
@@ -59,123 +60,6 @@
#define clear_user_page(page, vaddr, pg) clear_page(page)
#define copy_user_page(to, from, vaddr, pg) copy_page(to, from)
-
-/**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
-#define INVALID_P2M_ENTRY (~0UL)
-#define FOREIGN_FRAME_BIT (1UL<<31)
-#define FOREIGN_FRAME(m) ((m) | FOREIGN_FRAME_BIT)
-
-extern unsigned long *phys_to_machine_mapping;
-
-#undef machine_to_phys_mapping
-extern unsigned long *machine_to_phys_mapping;
-extern unsigned int machine_to_phys_order;
-
-static inline unsigned long pfn_to_mfn(unsigned long pfn)
-{
- if (xen_feature(XENFEAT_auto_translated_physmap))
- return pfn;
- return phys_to_machine_mapping[(unsigned int)(pfn)] &
- ~FOREIGN_FRAME_BIT;
-}
-
-static inline int phys_to_machine_mapping_valid(unsigned long pfn)
-{
- if (xen_feature(XENFEAT_auto_translated_physmap))
- return 1;
- return (phys_to_machine_mapping[pfn] != INVALID_P2M_ENTRY);
-}
-
-static inline unsigned long mfn_to_pfn(unsigned long mfn)
-{
- extern unsigned long max_mapnr;
- unsigned long pfn;
-
- if (xen_feature(XENFEAT_auto_translated_physmap))
- return mfn;
-
- if (unlikely((mfn >> machine_to_phys_order) != 0))
- return max_mapnr;
-
- /* The array access can fail (e.g., device space beyond end of RAM). */
- asm (
- "1: movl %1,%0\n"
- "2:\n"
- ".section .fixup,\"ax\"\n"
- "3: movl %2,%0\n"
- " jmp 2b\n"
- ".previous\n"
- ".section __ex_table,\"a\"\n"
- " .align 4\n"
- " .long 1b,3b\n"
- ".previous"
- : "=r" (pfn)
- : "m" (machine_to_phys_mapping[mfn]), "m" (max_mapnr) );
-
- return pfn;
-}
-
-/*
- * We detect special mappings in one of two ways:
- * 1. If the MFN is an I/O page then Xen will set the m2p entry
- * to be outside our maximum possible pseudophys range.
- * 2. If the MFN belongs to a different domain then we will certainly
- * not have MFN in our p2m table. Conversely, if the page is ours,
- * then we'll have p2m(m2p(MFN))==MFN.
- * If we detect a special mapping then it doesn't have a 'struct page'.
- * We force !pfn_valid() by returning an out-of-range pointer.
- *
- * NB. These checks require that, for any MFN that is not in our reservation,
- * there is no PFN such that p2m(PFN) == MFN. Otherwise we can get confused if
- * we are foreign-mapping the MFN, and the other domain as m2p(MFN) == PFN.
- * Yikes! Various places must poke in INVALID_P2M_ENTRY for safety.
- *
- * NB2. When deliberately mapping foreign pages into the p2m table, you *must*
- * use FOREIGN_FRAME(). This will cause pte_pfn() to choke on it, as we
- * require. In all the cases we care about, the FOREIGN_FRAME bit is
- * masked (e.g., pfn_to_mfn()) so behaviour there is correct.
- */
-static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
-{
- extern unsigned long max_mapnr;
- unsigned long pfn = mfn_to_pfn(mfn);
- if ((pfn < max_mapnr)
- && !xen_feature(XENFEAT_auto_translated_physmap)
- && (phys_to_machine_mapping[pfn] != mfn))
- return max_mapnr; /* force !pfn_valid() */
- return pfn;
-}
-
-static inline void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
-{
- if (xen_feature(XENFEAT_auto_translated_physmap)) {
- BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
- return;
- }
- phys_to_machine_mapping[pfn] = mfn;
-}
-
-/* Definitions for machine and pseudophysical addresses. */
-#ifdef CONFIG_X86_PAE
-typedef unsigned long long paddr_t;
-typedef unsigned long long maddr_t;
-#else
-typedef unsigned long paddr_t;
-typedef unsigned long maddr_t;
-#endif
-
-static inline maddr_t phys_to_machine(paddr_t phys)
-{
- maddr_t machine = pfn_to_mfn(phys >> PAGE_SHIFT);
- machine = (machine << PAGE_SHIFT) | (phys & ~PAGE_MASK);
- return machine;
-}
-static inline paddr_t machine_to_phys(maddr_t machine)
-{
- paddr_t phys = mfn_to_pfn(machine >> PAGE_SHIFT);
- phys = (phys << PAGE_SHIFT) | (machine & ~PAGE_MASK);
- return phys;
-}
/*
* These are used to make use of C type-checking..
@@ -254,7 +138,6 @@ static inline unsigned long pgd_val(pgd_
#define pgprot_val(x) ((x).pgprot)
-#define __pte_ma(x) ((pte_t) { (x) } )
#define __pgprot(x) ((pgprot_t) { (x) } )
#endif /* !__ASSEMBLY__ */
@@ -323,11 +206,6 @@ extern int page_is_ram(unsigned long pag
((current->personality & READ_IMPLIES_EXEC) ? VM_EXEC : 0 ) | \
VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
-/* VIRT <-> MACHINE conversion */
-#define virt_to_machine(v) (phys_to_machine(__pa(v)))
-#define virt_to_mfn(v) (pfn_to_mfn(__pa(v) >> PAGE_SHIFT))
-#define mfn_to_virt(m) (__va(mfn_to_pfn(m) << PAGE_SHIFT))
-
#define __HAVE_ARCH_GATE_AREA 1
#endif /* __KERNEL__ */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-2level.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-2level.h Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-2level.h Mon Jul 17 23:38:59 2006 +0100
@@ -45,7 +45,6 @@
#define pte_none(x) (!(x).pte_low)
#define pfn_pte(pfn, prot) __pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
-#define pfn_pte_ma(pfn, prot) __pte_ma(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
#define pfn_pmd(pfn, prot) __pmd(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
/*
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-3level.h
--- a/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-3level.h Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/pgtable-3level.h Mon Jul 17 23:38:59 2006 +0100
@@ -151,18 +151,6 @@ static inline int pte_none(pte_t pte)
extern unsigned long long __supported_pte_mask;
-static inline pte_t pfn_pte_ma(unsigned long page_nr, pgprot_t pgprot)
-{
- pte_t pte;
-
- pte.pte_high = (page_nr >> (32 - PAGE_SHIFT)) | \
- (pgprot_val(pgprot) >> 32);
- pte.pte_high &= (__supported_pte_mask >> 32);
- pte.pte_low = ((page_nr << PAGE_SHIFT) | pgprot_val(pgprot)) & \
- __supported_pte_mask;
- return pte;
-}
-
static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
{
return pfn_pte_ma(pfn_to_mfn(page_nr), pgprot);
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/xen/xenbus.h
--- a/linux-2.6-xen-sparse/include/xen/xenbus.h Mon Jul 17 23:34:46 2006 +0100
+++ b/linux-2.6-xen-sparse/include/xen/xenbus.h Mon Jul 17 23:38:59 2006 +0100
@@ -295,5 +295,6 @@ void xenbus_dev_fatal(struct xenbus_devi
void xenbus_dev_fatal(struct xenbus_device *dev, int err, const char *fmt,
...);
+int __init xenbus_dev_init(void);
#endif /* _XEN_XENBUS_H */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/maddr.h
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/linux-2.6-xen-sparse/include/asm-i386/mach-xen/asm/maddr.h Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,153 @@
+#ifndef _I386_MADDR_H
+#define _I386_MADDR_H
+
+#include <xen/features.h>
+#include <xen/interface/arch-x86_32.h>
+#include <xen/interface/xen.h>
+
+/**** MACHINE <-> PHYSICAL CONVERSION MACROS ****/
+#define INVALID_P2M_ENTRY (~0UL)
+#define FOREIGN_FRAME_BIT (1UL<<31)
+#define FOREIGN_FRAME(m) ((m) | FOREIGN_FRAME_BIT)
+
+extern unsigned long *phys_to_machine_mapping;
+
+#undef machine_to_phys_mapping
+extern unsigned long *machine_to_phys_mapping;
+extern unsigned int machine_to_phys_order;
+
+static inline unsigned long pfn_to_mfn(unsigned long pfn)
+{
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return pfn;
+ return phys_to_machine_mapping[(unsigned int)(pfn)] &
+ ~FOREIGN_FRAME_BIT;
+}
+
+static inline int phys_to_machine_mapping_valid(unsigned long pfn)
+{
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return 1;
+ return (phys_to_machine_mapping[pfn] != INVALID_P2M_ENTRY);
+}
+
+static inline unsigned long mfn_to_pfn(unsigned long mfn)
+{
+#ifdef CONFIG_XEN
+ extern unsigned long max_mapnr;
+ unsigned long pfn;
+#endif
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return mfn;
+
+#ifndef CONFIG_XEN
+ BUG();
+#else
+ if (unlikely((mfn >> machine_to_phys_order) != 0))
+ return max_mapnr;
+
+ /* The array access can fail (e.g., device space beyond end of RAM). */
+ asm (
+ "1: movl %1,%0\n"
+ "2:\n"
+ ".section .fixup,\"ax\"\n"
+ "3: movl %2,%0\n"
+ " jmp 2b\n"
+ ".previous\n"
+ ".section __ex_table,\"a\"\n"
+ " .align 4\n"
+ " .long 1b,3b\n"
+ ".previous"
+ : "=r" (pfn)
+ : "m" (machine_to_phys_mapping[mfn]), "m" (max_mapnr) );
+
+ return pfn;
+#endif
+}
+
+/*
+ * We detect special mappings in one of two ways:
+ * 1. If the MFN is an I/O page then Xen will set the m2p entry
+ * to be outside our maximum possible pseudophys range.
+ * 2. If the MFN belongs to a different domain then we will certainly
+ * not have MFN in our p2m table. Conversely, if the page is ours,
+ * then we'll have p2m(m2p(MFN))==MFN.
+ * If we detect a special mapping then it doesn't have a 'struct page'.
+ * We force !pfn_valid() by returning an out-of-range pointer.
+ *
+ * NB. These checks require that, for any MFN that is not in our reservation,
+ * there is no PFN such that p2m(PFN) == MFN. Otherwise we can get confused if
+ * we are foreign-mapping the MFN, and the other domain as m2p(MFN) == PFN.
+ * Yikes! Various places must poke in INVALID_P2M_ENTRY for safety.
+ *
+ * NB2. When deliberately mapping foreign pages into the p2m table, you *must*
+ * use FOREIGN_FRAME(). This will cause pte_pfn() to choke on it, as we
+ * require. In all the cases we care about, the FOREIGN_FRAME bit is
+ * masked (e.g., pfn_to_mfn()) so behaviour there is correct.
+ */
+static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
+{
+ extern unsigned long max_mapnr;
+ unsigned long pfn = mfn_to_pfn(mfn);
+ if ((pfn < max_mapnr)
+ && !xen_feature(XENFEAT_auto_translated_physmap)
+ && (phys_to_machine_mapping[pfn] != mfn))
+ return max_mapnr; /* force !pfn_valid() */
+ return pfn;
+}
+
+static inline void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
+{
+ if (xen_feature(XENFEAT_auto_translated_physmap)) {
+ BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
+ return;
+ }
+ phys_to_machine_mapping[pfn] = mfn;
+}
+
+/* Definitions for machine and pseudophysical addresses. */
+#ifdef CONFIG_X86_PAE
+typedef unsigned long long paddr_t;
+typedef unsigned long long maddr_t;
+#else
+typedef unsigned long paddr_t;
+typedef unsigned long maddr_t;
+#endif
+
+static inline maddr_t phys_to_machine(paddr_t phys)
+{
+ maddr_t machine = pfn_to_mfn(phys >> PAGE_SHIFT);
+ machine = (machine << PAGE_SHIFT) | (phys & ~PAGE_MASK);
+ return machine;
+}
+static inline paddr_t machine_to_phys(maddr_t machine)
+{
+ paddr_t phys = mfn_to_pfn(machine >> PAGE_SHIFT);
+ phys = (phys << PAGE_SHIFT) | (machine & ~PAGE_MASK);
+ return phys;
+}
+
+/* VIRT <-> MACHINE conversion */
+#define virt_to_machine(v) (phys_to_machine(__pa(v)))
+#define virt_to_mfn(v) (pfn_to_mfn(__pa(v) >> PAGE_SHIFT))
+#define mfn_to_virt(m) (__va(mfn_to_pfn(m) << PAGE_SHIFT))
+
+#ifdef CONFIG_X86_PAE
+static inline pte_t pfn_pte_ma(unsigned long page_nr, pgprot_t pgprot)
+{
+ pte_t pte;
+
+ pte.pte_high = (page_nr >> (32 - PAGE_SHIFT)) | \
+ (pgprot_val(pgprot) >> 32);
+ pte.pte_high &= (__supported_pte_mask >> 32);
+ pte.pte_low = ((page_nr << PAGE_SHIFT) | pgprot_val(pgprot)) & \
+ __supported_pte_mask;
+ return pte;
+}
+#else
+#define pfn_pte_ma(pfn, prot) __pte_ma(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
+#endif
+
+#define __pte_ma(x) ((pte_t) { (x) } )
+
+#endif /* _I386_MADDR_H */
diff -r 7053592c928b -r aa3087ee5769 linux-2.6-xen-sparse/include/xen/hvm.h
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/linux-2.6-xen-sparse/include/xen/hvm.h Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,17 @@
+/* Simple wrappers around HVM functions */
+#ifndef XEN_HVM_H__
+#define XEN_HVM_H__
+
+#include <xen/interface/hvm/params.h>
+#include <asm/hypercall.h>
+
+static inline unsigned long hvm_get_parameter(int idx)
+{
+ struct xen_hvm_param xhv;
+
+ xhv.domid = DOMID_SELF;
+ xhv.index = idx;
+ return HYPERVISOR_hvm_op(HVMOP_get_param, &xhv);
+}
+
+#endif /* XEN_HVM_H__ */
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/Makefile
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/Makefile Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,22 @@
+include $(M)/overrides.mk
+
+obj-$(CONFIG_XEN_EVTCHN_PCI) += evtchn-pci/
+obj-$(CONFIG_XEN_BLKDEV_FRONTEND) += blkfront/
+obj-$(CONFIG_XEN_NETDEV_FRONTEND) += netfront/
+obj-m += xenbus/
+
+
+debug:
+ chmod +x compile.sh
+ chmod +x mkbuildtree
+ echo $(XEN_DRIVERS_ROOT)
+ echo $(EXTRA_CFLAGS)
+ ./compile.sh
+
+clean:
+ find . -name "*.o" |xargs rm -f
+ find . -name "*.ko" |xargs rm -f
+ find . -name "*.mod.c" |xargs rm -f
+ find . -name ".*.cmd" |xargs rm -f
+ rm .tmp_versions -rf
+
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/README
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/README Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,7 @@
+To build, run ./mkbuildtree and then
+
+make -C /path/to/kernel/source M=$PWD modules
+
+You get four modules, xen-evtchn-pci.ko, xenbus.ko, xen-vbd.ko, and
+xen-vnif.ko. Load xen-evtchn-pci first, then xenbus, and then
+whichever of xen-vbd and xen-vnif you happen to need.
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/blkfront/Kbuild
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/blkfront/Kbuild Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,6 @@
+include $(M)/overrides.mk
+
+obj-m += xen-vbd.o
+
+xen-vbd-objs := blkfront.o vbd.o
+
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/Kbuild
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/Kbuild Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,8 @@
+include $(M)/overrides.mk
+
+obj-m := xen-evtchn-pci.o
+
+EXTRA_CFLAGS += -I$(M)/evtchn-pci
+
+xen-evtchn-pci-objs := evtchn.o evtchn-pci.o gnttab.o xen_proc.o xen_support.o\
+ features.o
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/debuginfo.h
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/debuginfo.h Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,56 @@
+#ifndef __DEBUG_INFO__
+#define __DEBUG_INFO__
+//#define INSERT_TEST
+//#define VMX_DEBUG_INFO
+//#define KERNEL_DEBUG_INFO
+//#define FREQ_PRINT
+
+#define infotime(seconds, x, a...) \
+{ \
+static unsigned long prevjiffy = 0; \
+ if(time_after(jiffies, prevjiffy + seconds*HZ)) { \
+ prevjiffy = jiffies; \
+ vmx_printk(x, ##a); \
+ } \
+}
+
+#ifdef KERNEL_DEBUG_INFO
+#define dprintk(x, a...) \
+ printk("<vbd> " x, ##a)
+#define dprintknl(x, a...) \
+ printk(x, ##a)
+#define dprintkentry(x, a...) \
+ printk("<vbd-entry> " x "\n", ##a)
+#define dprintkexit(x, a...) \
+ printk("<vbd-exit> " x "\n", ##a)
+#ifdef FREQ_PRINT
+#define dprintkfreq(x, a...) \
+ printk("<vbd-freq> " x, ##a)
+#else
+#define dprintkfreq(x, a...)
+#endif
+#elif defined(VMX_DEBUG_INFO)
+#define dprintk(x, a...) \
+ vmx_printk("<vbd> " x, ##a)
+#define dprintknl(x, a...) \
+ vmx_printk(x, ##a)
+#define dprintkentry(x, a...) \
+ vmx_printk("<vbd-entry> " x "\n", ##a)
+#define dprintkexit(x, a...) \
+ vmx_printk("<vbd-exit> " x "\n", ##a)
+#ifdef FREQ_PRINT
+#define dprintkfreq(x, a...) \
+ vmx_printk("<vbd-freq> " x, ##a)
+#else
+#define dprintkfreq(x, a...)
+#endif
+
+#else
+#define dprintk(x, a...)
+#define dprintkentry(x, a...)
+#define dprintkexit(x, a...)
+#define dprintkfreq(x, a...)
+#define dprintknl(x, a...)
+#endif
+int vmx_printk(const char *fmt, ...);
+#endif
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.c
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.c Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,299 @@
+/******************************************************************************
+ * evtchn-pci.c
+ * xen event channel fake PCI device driver
+ * Copyright (C) 2005, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/errno.h>
+#include <linux/pci.h>
+#include <linux/init.h>
+#include <linux/version.h>
+#include <linux/interrupt.h>
+#include <asm/system.h>
+#include <asm/io.h>
+#include <asm/irq.h>
+#include <asm/uaccess.h>
+#include <asm/hypervisor.h>
+#include <xen/interface/memory.h>
+
+#include "evtchn-pci.h"
+
+#define DRV_NAME "xen-evtchn-pci"
+#define DRV_VERSION "0.10"
+#define DRV_RELDATE "03/03/2005"
+
+extern void *hypercall_page;
+
+static int callbackirq = 3; /* legacy mode irq */
+static int nopci = 0;
+static char version[] __devinitdata =
+ KERN_INFO DRV_NAME ":version " DRV_VERSION " " DRV_RELDATE
+ " Xiaofeng. Ling\n";
+
+MODULE_AUTHOR("xiaofeng.ling@intel.com");
+MODULE_DESCRIPTION("Xen evtchn PCI device");
+MODULE_LICENSE("GPL");
+
+MODULE_PARM(nopci, "i");
+MODULE_PARM(callbackirq, "i");
+MODULE_PARM_DESC(callbackirq, "callback irq number for xen event channel");
+
+#define XEN_EVTCHN_VENDOR_ID 0xfffd
+#define XEN_EVTCHN_DEVICE_ID 0x0101
+
+static struct pci_device_id evtchn_pci_tbl[] __devinitdata = {
+ {XEN_EVTCHN_VENDOR_ID, XEN_EVTCHN_DEVICE_ID,
+ PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
+ {0,}
+};
+
+MODULE_DEVICE_TABLE(pci, evtchn_pci_tbl);
+
+unsigned long *phys_to_machine_mapping;
+EXPORT_SYMBOL(phys_to_machine_mapping);
+
+static int __init init_xen_info(void)
+{
+ unsigned long shared_info_frame;
+ struct xen_add_to_physmap xatp;
+
+ setup_xen_features();
+
+ shared_info_frame = alloc_xen_mmio(PAGE_SIZE) >> PAGE_SHIFT;
+ xatp.domid = DOMID_SELF;
+ xatp.idx = 0;
+ xatp.space = XENMAPSPACE_shared_info;
+ xatp.gpfn = shared_info_frame;
+ BUG_ON(HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp));
+ HYPERVISOR_shared_info =
+ ioremap(shared_info_frame << PAGE_SHIFT, PAGE_SIZE);
+
+ if (!HYPERVISOR_shared_info)
+ panic("can't map shared info\n");
+
+ dprintk("ioremap shared_info successful\n");
+
+ phys_to_machine_mapping = NULL;
+
+ gnttab_init();
+ evtchn_init();
+
+ return 0;
+}
+
+static void __devexit evtchn_pci_remove(struct pci_dev *pdev)
+{
+ long ioaddr, iolen;
+
+ /*if there are io region, don't forget to release */
+ ioaddr = pci_resource_start(pdev, 0);
+ iolen = pci_resource_len(pdev, 0);
+ if (ioaddr != 0)
+ {
+ release_region(ioaddr, iolen);
+ }
+
+ pci_set_drvdata(pdev, NULL);
+ free_irq(pdev->irq, NULL);
+}
+
+extern irqreturn_t evtchn_interrupt(int irq, void *devid, struct pt_regs *regs);
+
+unsigned long evtchn_mmio = 0xc000000;
+unsigned long evtchn_mmio_alloc;
+unsigned long evtchn_mmiolen = 0x1000000;
+
+unsigned long alloc_xen_mmio(unsigned long len)
+{
+ unsigned long addr;
+
+ addr = 0;
+ if (evtchn_mmio_alloc + len <= evtchn_mmiolen)
+ {
+ addr = evtchn_mmio + evtchn_mmio_alloc;
+ evtchn_mmio_alloc += len;
+ } else {
+ panic("ran out of xen mmio space");
+ }
+ return addr;
+}
+
+static int __devinit evtchn_pci_init(struct pci_dev *pdev,
+ const struct pci_device_id *ent)
+{
+ int i, ret, irq;
+ long ioaddr, iolen;
+ long mmio_addr, mmio_len;
+
+ printk(KERN_INFO DRV_NAME ":found evtchn pci device model, do init\n");
+
+#ifndef MODULE
+ static int printed_version;
+ if (!printed_version++)
+ printk(version);
+#endif
+
+ i = pci_enable_device(pdev);
+ if (i)
+ return i;
+
+ ioaddr = pci_resource_start(pdev, 0);
+ iolen = pci_resource_len(pdev, 0);
+
+ mmio_addr = pci_resource_start(pdev, 1);
+ mmio_len = pci_resource_len(pdev, 1);
+
+ if (mmio_addr != 0)
+ {
+ if (request_mem_region(mmio_addr, mmio_len, DRV_NAME) == NULL)
+ {
+ printk(KERN_ERR ":MEM I/O resource 0x%lx @ 0x%lx busy\n",
+ mmio_addr, mmio_len);
+ return -EBUSY;
+ }
+ evtchn_mmio = mmio_addr;
+ evtchn_mmiolen = mmio_len;
+ }
+ else
+ {
+ printk(KERN_WARNING DRV_NAME ":no MMIO found!\n");
+ }
+
+ irq = pdev->irq;
+ callbackirq = irq;
+
+ /*
+ * maybe some day we may use I/O port for checking status
+ * when sharing interrupts
+ */
+ if (ioaddr != 0)
+ {
+ if (request_region(ioaddr, iolen, DRV_NAME) == NULL)
+ {
+ printk(KERN_ERR DRV_NAME ":I/O resource 0x%lx @ 0x%lx busy\n",
+ iolen, ioaddr);
+ return -EBUSY;
+ }
+
+ hypercall_page = (void *)__get_free_page(GFP_KERNEL);
+ if (!hypercall_page)
+ panic("Cannot get hypercall page.\n");
+ memset(hypercall_page, 0xcc, PAGE_SIZE);
+ asm volatile("outl %%eax, %%dx\n"
+ :
+ : "a" (virt_to_phys(hypercall_page) >> PAGE_SHIFT),
+ "d" (ioaddr)
+ : "memory");
+ }
+ printk(KERN_INFO DRV_NAME ":use irq %d for event channel\n", irq);
+
+ if ((ret = request_irq(irq, evtchn_interrupt, SA_SHIRQ,
+ "xen-evtchn-pci", evtchn_interrupt))) {
+ goto out;
+ }
+
+ if ((ret = init_xen_info()))
+ goto out;
+
+ if ((ret = set_callback_irq(irq)))
+ goto out;
+
+ out:
+ if (ret && hypercall_page)
+ free_page((unsigned long)hypercall_page);
+ return 0;
+}
+
+static struct pci_driver evtchn_driver = {
+ name:DRV_NAME,
+ probe:evtchn_pci_init,
+ remove:__devexit_p(evtchn_pci_remove),
+ id_table:evtchn_pci_tbl,
+};
+
+int __init setup_xen_callback(void)
+{
+ int rc = 0;
+ /* two ways for call back from hypervisor */
+
+ printk(KERN_INFO DRV_NAME ":legacy driver request irq :%d\n", callbackirq);
+ rc = request_irq(callbackirq, evtchn_interrupt, SA_SHIRQ,
+ "xen-evtchn", evtchn_interrupt);
+ if (rc != 0)
+ printk(":request irq error:%d!", rc);
+ rc = set_callback_irq(callbackirq);
+ if (rc != 0)
+ printk(KERN_ERR DRV_NAME ":set call back irq error:%d!", rc);
+ return rc;
+}
+
+static int __init evtchn_pci_module_init(void)
+{
+ int rc;
+
+ printk(KERN_INFO DRV_NAME ":do xen module support init\n");
+
+/* when a module, this is printed whether or not devices are found in probe */
+#ifdef MODULE
+ printk(version);
+#endif
+
+ if (!nopci)
+ {
+ rc = pci_module_init(&evtchn_driver);
+ if (rc)
+ printk(KERN_INFO DRV_NAME ":No evtchn pci device model found,"
+ "use legacy mode\n");
+ }
+ else
+ {
+ printk(KERN_INFO DRV_NAME ":disable evtchn pci device model"
+ "by module arguments,use legacy mode\n");
+ rc = 1;
+ }
+
+ if (rc)
+ {
+ /*No Pci device, try legacy mode */
+ rc = init_xen_info();
+ if (rc)
+ return rc;
+ setup_xen_callback();
+ if (rc)
+ printk(KERN_ERR DRV_NAME ":setup xen legacy callback fail\n");
+ }
+
+ return rc;
+}
+
+static void __exit evtchn_pci_module_cleanup(void)
+{
+ printk(KERN_INFO DRV_NAME ":Do evtchn module cleanup\n");
+ /* disable hypervisor for callback irq */
+ set_callback_irq(0);
+
+ free_irq(callbackirq, NULL);
+
+ /*TODO: unmap hypercall param share page */
+
+ pci_unregister_driver(&evtchn_driver);
+}
+
+module_init(evtchn_pci_module_init);
+module_exit(evtchn_pci_module_cleanup);
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.h
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/evtchn-pci.h Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,58 @@
+/******************************************************************************
+ * evtchn-pci.h
+ * module driver support in unmodified Linux
+ * Copyright (C) 2004, Intel Corporation. <xiaofeng.ling@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#ifndef __XEN_SUPPORT_H
+#define __XEN_SUPPORT_H
+#include <linux/version.h>
+#include <asm/io.h>
+#include <xen/interface/hvm/params.h>
+
+#include "debuginfo.h"
+
+extern unsigned long *phys_to_machine_mapping;
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
+#else
+#define __user
+#endif
+
+static inline int set_callback_irq(int irq)
+{
+ struct xen_hvm_param a;
+
+ a.domid = DOMID_SELF;
+ a.index = HVM_PARAM_CALLBACK_IRQ;
+ a.value = irq;
+ return HYPERVISOR_hvm_op(HVMOP_set_param, &a);
+}
+
+#define L2_PAGETABLE_SHIFT 22
+unsigned long alloc_xen_mmio(unsigned long len);
+
+int gnttab_init(void);
+void evtchn_init(void);
+void ctrl_if_init(void);
+
+void xen_machphys_update(unsigned long mfn, unsigned long pfn);
+int xen_do_init(void);
+
+void setup_xen_features(void);
+
+#endif
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/evtchn.c
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/evtchn.c Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,200 @@
+/******************************************************************************
+ * evtchn.c
+ *
+ * A simplified event channel for para-drivers in unmodified linux
+ *
+ * Copyright (c) 2002-2005, K A Fraser
+ * Copyright (c) 2005, <xiaofeng.ling@intel.com>
+ *
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <xen/evtchn.h>
+#include <xen/interface/hvm/ioreq.h>
+#include "evtchn-pci.h"
+
+void *hypercall_page;
+
+#define cpu_from_evtchn(port) (0)
+#define MAX_EVTCHN 256
+static struct
+{
+ irqreturn_t(*handler) (int, void *, struct pt_regs *);
+ void *dev_id;
+} evtchns[MAX_EVTCHN];
+
+void mask_evtchn(int port)
+{
+ shared_info_t *s = HYPERVISOR_shared_info;
+ synch_set_bit(port, &s->evtchn_mask[0]);
+}
+EXPORT_SYMBOL(mask_evtchn);
+
+void unmask_evtchn(int port)
+{
+ shared_info_t *s = HYPERVISOR_shared_info;
+ unsigned int cpu = smp_processor_id();
+ vcpu_info_t *vcpu_info = &s->vcpu_info[cpu];
+
+ /* Slow path (hypercall) if this is a non-local port. */
+ if (unlikely(cpu != cpu_from_evtchn(port))) {
+ evtchn_unmask_t op = { .port = port };
+ (void)HYPERVISOR_event_channel_op(EVTCHNOP_unmask,
+ &op);
+ return;
+ }
+
+ synch_clear_bit(port, &s->evtchn_mask[0]);
+
+ /*
+ * The following is basically the equivalent of 'hw_resend_irq'. Just
+ * like a real IO-APIC we 'lose the interrupt edge' if the channel is
+ * masked.
+ */
+ if (synch_test_bit(port, &s->evtchn_pending[0]) &&
+ !synch_test_and_set_bit(port / BITS_PER_LONG,
+ &vcpu_info->evtchn_pending_sel)) {
+ vcpu_info->evtchn_upcall_pending = 1;
+ if (!vcpu_info->evtchn_upcall_mask)
+ force_evtchn_callback();
+ }
+}
+EXPORT_SYMBOL(unmask_evtchn);
+
+unsigned int bind_virq_to_evtchn(int virq)
+{
+ evtchn_bind_virq_t op;
+
+ op.virq = virq;
+ op.vcpu = 0;
+ if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq, &op) != 0)
+ BUG();
+
+ return op.port;
+}
+
+int
+bind_evtchn_to_irqhandler(unsigned int evtchn,
+ irqreturn_t(*handler) (int, void *,
+ struct pt_regs *),
+ unsigned long irqflags, const char *devname,
+ void *dev_id)
+{
+ if (evtchn >= MAX_EVTCHN)
+ return -EINVAL;
+ evtchns[evtchn].handler = handler;
+ evtchns[evtchn].dev_id = dev_id;
+ unmask_evtchn(evtchn);
+ return evtchn;
+}
+
+EXPORT_SYMBOL(bind_evtchn_to_irqhandler);
+
+void unbind_from_irqhandler(unsigned int evtchn, void *dev_id)
+{
+ if (evtchn >= MAX_EVTCHN)
+ return;
+
+ mask_evtchn(evtchn);
+ evtchns[evtchn].handler = NULL;
+}
+
+EXPORT_SYMBOL(unbind_from_irqhandler);
+
+void notify_remote_via_irq(int irq)
+{
+ int evtchn = irq;
+ notify_remote_via_evtchn(evtchn);
+}
+
+EXPORT_SYMBOL(notify_remote_via_irq);
+
+void unbind_evtchn_from_irq(unsigned int evtchn)
+{
+ return;
+}
+
+EXPORT_SYMBOL(unbind_evtchn_from_irq);
+
+#define active_evtchns(cpu,sh,idx) \
+ ((sh)->evtchn_pending[idx] & \
+ ~(sh)->evtchn_mask[idx])
+
+irqreturn_t evtchn_interrupt(int irq, void *dev_id, struct pt_regs *regs)
+{
+ unsigned long l1, l2;
+ unsigned int l1i, l2i, port;
+ int cpu = smp_processor_id();
+ irqreturn_t(*handler) (int, void *, struct pt_regs *);
+ shared_info_t *s = HYPERVISOR_shared_info;
+ vcpu_info_t *vcpu_info = &s->vcpu_info[cpu];
+
+ vcpu_info->evtchn_upcall_pending = 0;
+
+ /* NB. No need for a barrier here -- XCHG is a barrier on x86. */
+ l1 = xchg(&vcpu_info->evtchn_pending_sel, 0);
+ while (l1 != 0)
+ {
+ l1i = __ffs(l1);
+ l1 &= ~(1 << l1i);
+
+ while ((l2 = active_evtchns(cpu, s, l1i)) != 0)
+ {
+ l2i = __ffs(l2);
+
+ port = (l1i * BITS_PER_LONG) + l2i;
+
+ if ((handler = evtchns[port].handler) != NULL)
+ {
+ clear_evtchn(port);
+ handler(port, evtchns[port].dev_id, regs);
+ }
+ else
+ {
+ evtchn_device_upcall(port);
+ }
+ }
+ }
+
+ return IRQ_HANDLED;
+}
+
+void force_evtchn_callback(void)
+{
+ evtchn_interrupt(0, NULL, NULL);
+}
+
+EXPORT_SYMBOL(force_evtchn_callback);
+
+void bind_evtchn_to_cpu(unsigned int chn, unsigned int cpu)
+{
+}
+
+void __init evtchn_init(void)
+{
+
+}
+
+EXPORT_SYMBOL(hypercall_page);
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/evtchn-pci/xen_support.c
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/evtchn-pci/xen_support.c Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,53 @@
+/******************************************************************************
+ * support.c
+ * Xen module support functions.
+ * Copyright (C) 2004, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <xen/evtchn.h>
+#include <xen/interface/xen.h>
+#include <asm/hypervisor.h>
+#include "evtchn-pci.h"
+
+shared_info_t *HYPERVISOR_shared_info = NULL;
+EXPORT_SYMBOL(HYPERVISOR_shared_info);
+
+EXPORT_SYMBOL(xen_machphys_update);
+void xen_machphys_update(unsigned long mfn, unsigned long pfn)
+{
+ mmu_update_t u;
+ u.ptr = (mfn << PAGE_SHIFT) | MMU_MACHPHYS_UPDATE;
+ u.val = pfn;
+ BUG_ON(HYPERVISOR_mmu_update(&u, 1, NULL, DOMID_SELF) < 0);
+}
+
+void balloon_update_driver_allowance(long delta)
+{
+}
+
+EXPORT_SYMBOL(balloon_update_driver_allowance);
+
+void evtchn_device_upcall(int port)
+{
+ printk("Error,no device upcall in guest domain (%d)!\n", port);
+ clear_evtchn(port);
+}
+
+EXPORT_SYMBOL (evtchn_device_upcall);
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/mkbuildtree
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/mkbuildtree Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,35 @@
+#! /bin/sh
+
+C=$PWD
+
+XEN=$C/../../xen
+XL=$C/../../linux-2.6-xen-sparse
+
+for d in $(find ${XL}/drivers/xen/ -type d -maxdepth 1 | sed -e 1d); do
+ if ! echo $d | egrep -q back; then
+ lndir $d $(basename $d) > /dev/null 2>&1
+ fi
+done
+
+ln -sf ${XL}/drivers/xen/net_driver_util.c netfront
+
+ln -sf ${XL}/drivers/xen/core/gnttab.c evtchn-pci
+ln -sf ${XL}/drivers/xen/core/features.c evtchn-pci
+ln -sf ${XL}/drivers/xen/core/xen_proc.c evtchn-pci
+
+mkdir -p include
+mkdir -p include/xen
+mkdir -p include/public
+mkdir -p include/asm
+
+lndir -silent ${XL}/include/xen include/xen
+ln -sf ${XEN}/include/public include/xen/interface
+
+# Need to be quite careful here: we don't want the files we link in to
+# risk overriding the native Linux ones (in particular, system.h must
+# be native and not xenolinux).
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/hypervisor.h include/asm
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/hypercall.h include/asm
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/synch_bitops.h include/asm
+ln -sf ${XL}/include/asm-i386/mach-xen/asm/maddr.h include/asm
+
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/netfront/Kbuild
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/netfront/Kbuild Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,4 @@
+include $(M)/overrides.mk
+
+obj-m = xen-vnif.o
+xen-vnif-objs := netfront.o
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/overrides.mk
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/overrides.mk Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,16 @@
+# Hack: we need to use the config which was used to build the kernel,
+# except that that won't have the right headers etc., so duplicate
+# some of the mach-xen infrastructure in here.
+#
+# (i.e. we need the native config for things like -mregparm, but
+# a Xen kernel to find the right headers)
+CONFIG_X86_XEN=y
+CONFIG_XEN_EVTCHN_PCI = m
+CONFIG_XEN_BLKDEV_FRONTEND = m
+CONFIG_XEN_NETDEV_FRONTEND = m
+EXTRA_CFLAGS += -DCONFIG_VMX -DCONFIG_VMX_GUEST -DCONFIG_X86_XEN
+EXTRA_CFLAGS += -DCONFIG_XEN_SHADOW_MODE -DCONFIG_XEN_SHADOW_TRANSLATE
+EXTRA_CFLAGS += -DCONFIG_XEN_BLKDEV_GRANT -DXEN_EVTCHN_MASK_OPS
+EXTRA_CFLAGS += -DCONFIG_XEN_NETDEV_GRANT_RX -DCONFIG_XEN_NETDEV_GRANT_TX
+EXTRA_CFLAGS += -D__XEN_INTERFACE_VERSION__=0x00030202
+EXTRA_CFLAGS += -I$(M)/include
diff -r 7053592c928b -r aa3087ee5769 unmodified_drivers/linux-2.6/xenbus/Kbuild
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/unmodified_drivers/linux-2.6/xenbus/Kbuild Mon Jul 17 23:38:59 2006 +0100
@@ -0,0 +1,9 @@
+include $(M)/overrides.mk
+
+obj-m += xenbus.o
+xenbus-objs =
+xenbus-objs += xenbus_comms.o
+xenbus-objs += xenbus_xs.o
+xenbus-objs += xenbus_probe.o
+xenbus-objs += xenbus_dev.o
+xenbus-objs += xenbus_client.o
[-- Attachment #1.1.4: hvm_xen_unstable.diff --]
[-- Type: text/plain, Size: 76134 bytes --]
diff -r ecb8ff1fcf1f linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c
--- a/linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c Fri Jul 14 18:53:27 2006 +0100
+++ b/linux-2.6-xen-sparse/drivers/xen/privcmd/privcmd.c Tue Jul 18 13:43:27 2006 +0100
@@ -270,6 +270,7 @@ static int __init privcmd_init(void)
set_bit(__HYPERVISOR_sched_op_compat, hypercall_permission_map);
set_bit(__HYPERVISOR_event_channel_op_compat,
hypercall_permission_map);
+ set_bit(__HYPERVISOR_hvm_op, hypercall_permission_map);
privcmd_intf = create_xen_proc_entry("privcmd", 0400);
if (privcmd_intf != NULL)
diff -r ecb8ff1fcf1f tools/firmware/hvmloader/hvmloader.c
--- a/tools/firmware/hvmloader/hvmloader.c Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/firmware/hvmloader/hvmloader.c Tue Jul 18 13:43:27 2006 +0100
@@ -31,7 +31,7 @@
#define ROMBIOS_PHYSICAL_ADDRESS 0x000F0000
/* invoke SVM's paged realmode support */
-#define SVM_VMMCALL_RESET_TO_REALMODE 0x00000001
+#define SVM_VMMCALL_RESET_TO_REALMODE 0x80000001
/*
* C runtime start off
@@ -133,15 +133,15 @@ cirrus_check(void)
return inb(0x3C5) == 0x12;
}
-int
-vmmcall(int edi, int esi, int edx, int ecx, int ebx)
+int
+vmmcall(int function, int edi, int esi, int edx, int ecx, int ebx)
{
int eax;
__asm__ __volatile__(
".byte 0x0F,0x01,0xD9"
: "=a" (eax)
- : "a"(0x58454E00), /* XEN\0 key */
+ : "a"(function),
"b"(ebx), "c"(ecx), "d"(edx), "D"(edi), "S"(esi)
);
return eax;
@@ -200,7 +200,7 @@ main(void)
if (check_amd()) {
/* AMD implies this is SVM */
puts("SVM go ...\n");
- vmmcall(SVM_VMMCALL_RESET_TO_REALMODE, 0, 0, 0, 0);
+ vmmcall(SVM_VMMCALL_RESET_TO_REALMODE, 0, 0, 0, 0, 0);
} else {
puts("Loading VMXAssist ...\n");
memcpy((void *)VMXASSIST_PHYSICAL_ADDRESS,
diff -r ecb8ff1fcf1f tools/ioemu/Makefile.target
--- a/tools/ioemu/Makefile.target Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/ioemu/Makefile.target Tue Jul 18 13:43:27 2006 +0100
@@ -336,6 +336,7 @@ VL_OBJS+= fdc.o mc146818rtc.o serial.o p
VL_OBJS+= fdc.o mc146818rtc.o serial.o pc.o
VL_OBJS+= cirrus_vga.o mixeng.o parallel.o
VL_OBJS+= piix4acpi.o
+VL_OBJS+= xen_evtchn.o
DEFINES += -DHAS_AUDIO
endif
ifeq ($(TARGET_BASE_ARCH), ppc)
diff -r ecb8ff1fcf1f tools/ioemu/hw/pc.c
--- a/tools/ioemu/hw/pc.c Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/ioemu/hw/pc.c Tue Jul 18 13:43:27 2006 +0100
@@ -819,6 +819,9 @@ static void pc_init1(uint64_t ram_size,
}
#endif /* !CONFIG_DM */
+ if (pci_enabled)
+ pci_xen_evtchn_init(pci_bus);
+
for(i = 0; i < MAX_SERIAL_PORTS; i++) {
if (serial_hds[i]) {
serial_init(&pic_set_irq_new, isa_pic,
diff -r ecb8ff1fcf1f tools/ioemu/target-i386-dm/helper2.c
--- a/tools/ioemu/target-i386-dm/helper2.c Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/ioemu/target-i386-dm/helper2.c Tue Jul 18 13:43:27 2006 +0100
@@ -82,6 +82,10 @@ int xce_handle = -1;
/* which vcpu we are serving */
int send_vcpu = 0;
+//the evtchn port for polling the notification,
+#define NR_CPUS 32
+evtchn_port_t ioreq_local_port[NR_CPUS];
+
CPUX86State *cpu_x86_init(void)
{
CPUX86State *env;
@@ -105,15 +109,14 @@ CPUX86State *cpu_x86_init(void)
return NULL;
}
- /* FIXME: how about if we overflow the page here? */
for (i = 0; i < vcpus; i++) {
- rc = xc_evtchn_bind_interdomain(
- xce_handle, domid, shared_page->vcpu_iodata[i].vp_eport);
+ rc = xc_evtchn_bind_interdomain(xce_handle, DOMID_XEN,
+ shared_page->vcpu_iodata[i].vp_xen_port);
if (rc == -1) {
fprintf(logfile, "bind interdomain ioctl error %d\n", errno);
return NULL;
}
- shared_page->vcpu_iodata[i].dm_eport = rc;
+ ioreq_local_port[i] = rc;
}
}
@@ -184,10 +187,9 @@ void sp_info()
for (i = 0; i < vcpus; i++) {
req = &(shared_page->vcpu_iodata[i].vp_ioreq);
- term_printf("vcpu %d: event port %d\n", i,
- shared_page->vcpu_iodata[i].vp_eport);
+ term_printf("vcpu %d: event port %d\n", i, ioreq_local_port[i]);
term_printf(" req state: %x, pvalid: %x, addr: %"PRIx64", "
- "data: %"PRIx64", count: %"PRIx64", size: %"PRIx64"\n",
+ "data: %"PRIx64", count: %"PRIx64", size: %"PRIx64"\n",
req->state, req->pdata_valid, req->addr,
req->u.data, req->count, req->size);
term_printf(" IO totally occurred on this vcpu: %"PRIx64"\n",
@@ -201,17 +203,12 @@ static ioreq_t *__cpu_get_ioreq(int vcpu
ioreq_t *req;
req = &(shared_page->vcpu_iodata[vcpu].vp_ioreq);
-
if (req->state == STATE_IOREQ_READY) {
- req->state = STATE_IOREQ_INPROCESS;
- return req;
- }
-
- fprintf(logfile, "False I/O request ... in-service already: "
- "%x, pvalid: %x, port: %"PRIx64", "
- "data: %"PRIx64", count: %"PRIx64", size: %"PRIx64"\n",
- req->state, req->pdata_valid, req->addr,
- req->u.data, req->count, req->size);
+ req->state = STATE_IOREQ_INPROCESS;
+ rmb();
+ return req;
+ }
+
return NULL;
}
@@ -226,7 +223,7 @@ static ioreq_t *cpu_get_ioreq(void)
port = xc_evtchn_pending(xce_handle);
if (port != -1) {
for ( i = 0; i < vcpus; i++ )
- if ( shared_page->vcpu_iodata[i].dm_eport == port )
+ if ( ioreq_local_port[i] == port )
break;
if ( i == vcpus ) {
@@ -447,8 +444,10 @@ void cpu_handle_ioreq(void *opaque)
}
/* No state change if state = STATE_IORESP_HOOK */
- if (req->state == STATE_IOREQ_INPROCESS)
+ if (req->state == STATE_IOREQ_INPROCESS) {
+ mb();
req->state = STATE_IORESP_READY;
+ }
env->send_event = 1;
}
}
@@ -479,8 +478,7 @@ int main_loop(void)
if (env->send_event) {
env->send_event = 0;
- xc_evtchn_notify(xce_handle,
- shared_page->vcpu_iodata[send_vcpu].dm_eport);
+ (void)xc_evtchn_notify(xce_handle, ioreq_local_port[send_vcpu]);
}
}
destroy_hvm_domain();
diff -r ecb8ff1fcf1f tools/libxc/xc_hvm_build.c
--- a/tools/libxc/xc_hvm_build.c Fri Jul 14 18:53:27 2006 +0100
+++ b/tools/libxc/xc_hvm_build.c Tue Jul 18 13:43:27 2006 +0100
@@ -6,12 +6,14 @@
#include <stddef.h>
#include <inttypes.h>
#include "xg_private.h"
+#include "xc_private.h"
#include "xc_elf.h"
#include <stdlib.h>
#include <unistd.h>
#include <zlib.h>
#include <xen/hvm/hvm_info_table.h>
#include <xen/hvm/ioreq.h>
+#include <xen/hvm/params.h>
#define HVM_LOADER_ENTR_ADDR 0x00100000
@@ -52,6 +54,30 @@ loadelfimage(
char *elfbase, int xch, uint32_t dom, unsigned long *parray,
struct domain_setup_info *dsi);
+static void xc_set_hvm_param(int handle,
+ domid_t dom, int param, unsigned long value)
+{
+ DECLARE_HYPERCALL;
+ xen_hvm_param_t arg;
+ int rc;
+
+ hypercall.op = __HYPERVISOR_hvm_op;
+ hypercall.arg[0] = HVMOP_set_param;
+ hypercall.arg[1] = (unsigned long)&arg;
+ arg.domid = dom;
+ arg.index = param;
+ arg.value = value;
+ if ( mlock(&arg, sizeof(arg)) != 0 )
+ {
+ PERROR("Could not lock memory for set parameter");
+ return;
+ }
+ rc = do_xen_hypercall(handle, &hypercall);
+ safe_munlock(&arg, sizeof(arg));
+ if (rc < 0)
+ PERROR("set HVM parameter failed (%d)", rc);
+}
+
static unsigned char build_e820map(void *e820_page, unsigned long long mem_size)
{
struct e820entry *e820entry =
@@ -162,6 +188,8 @@ static int set_hvm_info(int xc_handle, u
set_hvm_info_checksum(va_hvm);
munmap(va_map, PAGE_SIZE);
+
+ xc_set_hvm_param(xc_handle, dom, HVM_PARAM_APIC_ENABLED, apic);
return 0;
}
@@ -275,27 +303,17 @@ static int setup_guest(int xc_handle,
shared_info->vcpu_info[i].evtchn_upcall_mask = 1;
munmap(shared_info, PAGE_SIZE);
- /* Populate the event channel port in the shared page */
+ /* Paranoia */
shared_page_frame = page_array[(v_end >> PAGE_SHIFT) - 1];
if ( (sp = (shared_iopage_t *) xc_map_foreign_range(
xc_handle, dom, PAGE_SIZE, PROT_READ | PROT_WRITE,
shared_page_frame)) == 0 )
goto error_out;
memset(sp, 0, PAGE_SIZE);
-
- /* FIXME: how about if we overflow the page here? */
- for ( i = 0; i < vcpus; i++ ) {
- unsigned int vp_eport;
-
- vp_eport = xc_evtchn_alloc_unbound(xc_handle, dom, 0);
- if ( vp_eport < 0 ) {
- PERROR("Couldn't get unbound port from VMX guest.\n");
- goto error_out;
- }
- sp->vcpu_iodata[i].vp_eport = vp_eport;
- }
-
munmap(sp, PAGE_SIZE);
+
+ xc_set_hvm_param(xc_handle, dom, HVM_PARAM_STORE_PFN, (v_end >> PAGE_SHIFT) - 2);
+ xc_set_hvm_param(xc_handle, dom, HVM_PARAM_STORE_EVTCHN, store_evtchn);
*store_mfn = page_array[(v_end >> PAGE_SHIFT) - 2];
if ( xc_clear_domain_page(xc_handle, dom, *store_mfn) )
diff -r ecb8ff1fcf1f xen/arch/x86/dom0_ops.c
--- a/xen/arch/x86/dom0_ops.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/dom0_ops.c Tue Jul 18 13:43:27 2006 +0100
@@ -429,7 +429,7 @@ long arch_do_dom0_op(struct dom0_op *op,
ret = 0;
hypercall_page = map_domain_page(mfn);
- hypercall_page_initialise(hypercall_page);
+ hypercall_page_initialise(d, hypercall_page);
unmap_domain_page(hypercall_page);
put_page_and_type(mfn_to_page(mfn));
diff -r ecb8ff1fcf1f xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/domain.c Tue Jul 18 13:43:27 2006 +0100
@@ -819,7 +819,7 @@ unsigned long hypercall_create_continuat
#if defined(__i386__)
regs->eax = op;
- if ( supervisor_mode_kernel )
+ if ( supervisor_mode_kernel || hvm_guest(current) )
regs->eip &= ~31; /* re-execute entire hypercall entry stub */
else
regs->eip -= 2; /* re-execute 'int 0x82' */
diff -r ecb8ff1fcf1f xen/arch/x86/domain_build.c
--- a/xen/arch/x86/domain_build.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/domain_build.c Tue Jul 18 13:43:27 2006 +0100
@@ -704,7 +704,7 @@ int construct_dom0(struct domain *d,
return -1;
}
- hypercall_page_initialise((void *)hypercall_page);
+ hypercall_page_initialise(d, (void *)hypercall_page);
}
/* Copy the initial ramdisk. */
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/hvm.c Tue Jul 18 13:43:27 2006 +0100
@@ -45,6 +45,9 @@
#include <public/sched.h>
#include <public/hvm/ioreq.h>
#include <public/hvm/hvm_info_table.h>
+#include <xen/event.h>
+#include <xen/hypercall.h>
+#include <xen/guest_access.h>
int hvm_enabled = 0;
@@ -58,6 +61,8 @@ static void hvm_zap_mmio_range(
{
unsigned long i, val = INVALID_MFN;
+ ASSERT(d == current->domain);
+
for ( i = 0; i < nr_pfn; i++ )
{
if ( pfn + i >= 0xfffff )
@@ -67,18 +72,27 @@ static void hvm_zap_mmio_range(
}
}
-static void hvm_map_io_shared_page(struct domain *d)
+static void e820_zap_iommu_callback(struct domain *d,
+ struct e820entry *e,
+ void *ign)
+{
+ if ( e->type == E820_IO )
+ hvm_zap_mmio_range(d, e->addr >> PAGE_SHIFT, e->size >> PAGE_SHIFT);
+}
+
+static void e820_foreach(struct domain *d,
+ void (*cb)(struct domain *d,
+ struct e820entry *e,
+ void *data),
+ void *data)
{
int i;
unsigned char e820_map_nr;
struct e820entry *e820entry;
unsigned char *p;
unsigned long mfn;
- unsigned long gpfn = 0;
-
- local_flush_tlb_pge();
-
- mfn = get_mfn_from_gpfn(E820_MAP_PAGE >> PAGE_SHIFT);
+
+ mfn = gmfn_to_mfn(d, E820_MAP_PAGE >> PAGE_SHIFT);
if (mfn == INVALID_MFN) {
printk("Can not find E820 memory map page for HVM domain.\n");
domain_crash_synchronous();
@@ -95,26 +109,40 @@ static void hvm_map_io_shared_page(struc
for ( i = 0; i < e820_map_nr; i++ )
{
- if ( e820entry[i].type == E820_SHARED_PAGE )
- gpfn = (e820entry[i].addr >> PAGE_SHIFT);
- if ( e820entry[i].type == E820_IO )
- hvm_zap_mmio_range(
- d,
- e820entry[i].addr >> PAGE_SHIFT,
- e820entry[i].size >> PAGE_SHIFT);
- }
-
- if ( gpfn == 0 ) {
- printk("Can not get io request shared page"
- " from E820 memory map for HVM domain.\n");
- unmap_domain_page(p);
- domain_crash_synchronous();
- }
+ cb(d, e820entry + i, data);
+ }
+
unmap_domain_page(p);
-
- /* Initialise shared page */
- mfn = get_mfn_from_gpfn(gpfn);
- if (mfn == INVALID_MFN) {
+}
+
+static void hvm_zap_iommu_pages(struct domain *d)
+{
+ e820_foreach(d, e820_zap_iommu_callback, NULL);
+}
+
+static void e820_map_io_shared_callback(struct domain *d,
+ struct e820entry *e,
+ void *data)
+{
+ unsigned long *mfn = data;
+ if ( e->type == E820_SHARED_PAGE ) {
+ ASSERT(*mfn == INVALID_MFN);
+ *mfn = gmfn_to_mfn(d, e->addr >> PAGE_SHIFT);
+ }
+}
+
+void hvm_map_io_shared_page(struct vcpu *v)
+{
+ unsigned long mfn = INVALID_MFN;
+ void *p;
+ struct domain *d = v->domain;
+
+ if ( d->arch.hvm_domain.shared_page_va )
+ return;
+
+ e820_foreach(d, e820_map_io_shared_callback, &mfn);
+
+ if ( mfn == INVALID_MFN ) {
printk("Can not find io request shared page for HVM domain.\n");
domain_crash_synchronous();
}
@@ -127,59 +155,20 @@ static void hvm_map_io_shared_page(struc
d->arch.hvm_domain.shared_page_va = (unsigned long)p;
}
-static int validate_hvm_info(struct hvm_info_table *t)
-{
- char signature[] = "HVM INFO";
- uint8_t *ptr = (uint8_t *)t;
- uint8_t sum = 0;
- int i;
-
- /* strncmp(t->signature, "HVM INFO", 8) */
- for ( i = 0; i < 8; i++ ) {
- if ( signature[i] != t->signature[i] ) {
- printk("Bad hvm info signature\n");
- return 0;
- }
- }
-
- for ( i = 0; i < t->length; i++ )
- sum += ptr[i];
-
- return (sum == 0);
-}
-
-static void hvm_get_info(struct domain *d)
-{
- unsigned char *p;
- unsigned long mfn;
- struct hvm_info_table *t;
-
- mfn = get_mfn_from_gpfn(HVM_INFO_PFN);
- if ( mfn == INVALID_MFN ) {
- printk("Can not get info page mfn for HVM domain.\n");
- domain_crash_synchronous();
- }
-
- p = map_domain_page(mfn);
- if ( p == NULL ) {
- printk("Can not map info page for HVM domain.\n");
- domain_crash_synchronous();
- }
-
- t = (struct hvm_info_table *)(p + HVM_INFO_OFFSET);
-
- if ( validate_hvm_info(t) ) {
- d->arch.hvm_domain.nr_vcpus = t->nr_vcpus;
- d->arch.hvm_domain.apic_enabled = t->apic_enabled;
- d->arch.hvm_domain.pae_enabled = t->pae_enabled;
- } else {
- printk("Bad hvm info table\n");
- d->arch.hvm_domain.nr_vcpus = 1;
- d->arch.hvm_domain.apic_enabled = 0;
- d->arch.hvm_domain.pae_enabled = 0;
- }
-
- unmap_domain_page(p);
+static void evtchn_callback_func(void *v)
+{
+ hvm_assist_complete(v);
+}
+
+void hvm_create_event_channels(struct vcpu *v)
+{
+ vcpu_iodata_t *p;
+ p = get_vio(v->domain, v->vcpu_id);
+ v->arch.hvm_vcpu.xen_port = p->vp_xen_port =
+ alloc_xen_event_channel(evtchn_callback_func,
+ v,
+ dom0);
+ DPRINTK("Allocated port %d for hvm.\n", v->arch.hvm_vcpu.xen_port);
}
void hvm_setup_platform(struct domain* d)
@@ -196,8 +185,7 @@ void hvm_setup_platform(struct domain* d
domain_crash_synchronous();
}
- hvm_map_io_shared_page(d);
- hvm_get_info(d);
+ hvm_zap_iommu_pages(d);
platform = &d->arch.hvm_domain;
pic_init(&platform->vpic, pic_irq_request, &platform->interrupt_request);
@@ -329,6 +317,59 @@ void hvm_print_line(struct vcpu *v, cons
pbuf[(*index)++] = c;
}
+void hvm_release_assist_channel(struct vcpu *v)
+{
+ release_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+}
+
+#if defined(__i386__)
+typedef unsigned long hvm_hypercall_handler(unsigned long, unsigned long,
+ unsigned long, unsigned long,
+ unsigned long);
+#define HYPERCALL(x) [ __HYPERVISOR_ ## x ] = (hvm_hypercall_handler *) do_ ## x
+static hvm_hypercall_handler *hvm_hypercall_table[] = {
+ HYPERCALL(mmu_update),
+ HYPERCALL(memory_op),
+ HYPERCALL(multicall),
+ HYPERCALL(update_va_mapping),
+ HYPERCALL(event_channel_op_compat),
+ HYPERCALL(xen_version),
+ HYPERCALL(grant_table_op),
+ HYPERCALL(event_channel_op),
+ HYPERCALL(hvm_op)
+};
+#undef HYPERCALL
+
+void hvm_do_hypercall(struct cpu_user_regs *pregs)
+{
+ if (pregs->eax > ARRAY_SIZE(hvm_hypercall_table) ||
+ !hvm_hypercall_table[pregs->eax]) {
+ DPRINTK("HVM vcpu %d:%d did a bad hypercall %d.\n",
+ current->domain->domain_id, current->vcpu_id,
+ pregs->eax);
+ pregs->eax = -ENOSYS;
+ } else {
+ pregs->eax = hvm_hypercall_table[pregs->eax](pregs->ebx, pregs->ecx,
+ pregs->edx, pregs->esi,
+ pregs->edi);
+ }
+}
+#else
+void hvm_do_hypercall(struct cpu_user_regs *pregs)
+{
+ printk("not supported yet!\n");
+}
+#endif
+
+/* Initialise a hypercall transfer page for a VMX domain using
+ paravirtualised drivers. */
+void hvm_hypercall_page_initialise(struct domain *d,
+ void *hypercall_page)
+{
+ hvm_funcs.init_hypercall_page(d, hypercall_page);
+}
+
+
/*
* only called in HVM domain BSP context
* when booting, vcpuid is always equal to apic_id
@@ -372,6 +413,57 @@ int hvm_bringup_ap(int vcpuid, int tramp
xfree(ctxt);
+ return rc;
+}
+
+long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg)
+
+{
+ long rc = 0;
+
+ switch (op)
+ {
+ case HVMOP_set_param:
+ case HVMOP_get_param:
+ {
+ struct xen_hvm_param a;
+ struct domain *d;
+
+ if ( copy_from_guest(&a, arg, 1) )
+ return -EFAULT;
+
+ if ( a.index < 0 || a.index > HVM_NR_PARAMS ) {
+ return -EINVAL;
+ }
+
+ if ( a.domid == DOMID_SELF ) {
+ get_knownalive_domain(current->domain);
+ d = current->domain;
+ } else if ( IS_PRIV(current->domain) ) {
+ d = find_domain_by_id(a.domid);
+ if ( !d ) {
+ return -ESRCH;
+ }
+ } else {
+ return -EPERM;
+ }
+
+ if ( op == HVMOP_set_param ) {
+ rc = 0;
+ d->arch.hvm_domain.params[a.index] = a.value;
+ } else {
+ rc = d->arch.hvm_domain.params[a.index];
+ }
+
+ put_domain(d);
+ return rc;
+ }
+ default:
+ {
+ DPRINTK("Bad HVM op %ld.\n", op);
+ rc = -EINVAL;
+ }
+ }
return rc;
}
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/intercept.c
--- a/xen/arch/x86/hvm/intercept.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/intercept.c Tue Jul 18 13:43:27 2006 +0100
@@ -211,7 +211,7 @@ void hlt_timer_fn(void *data)
{
struct vcpu *v = data;
- evtchn_set_pending(v, iopacket_port(v));
+ hvm_prod_vcpu(v);
}
static __inline__ void missed_ticks(struct periodic_time *pt)
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/io.c
--- a/xen/arch/x86/hvm/io.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/io.c Tue Jul 18 13:43:27 2006 +0100
@@ -687,85 +687,18 @@ void hvm_io_assist(struct vcpu *v)
p = &vio->vp_ioreq;
- /* clear IO wait HVM flag */
- if ( test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) ) {
- if ( p->state == STATE_IORESP_READY ) {
- p->state = STATE_INVALID;
- clear_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
-
- if ( p->type == IOREQ_TYPE_PIO )
- hvm_pio_assist(regs, p, io_opp);
- else {
- hvm_mmio_assist(regs, p, io_opp);
- hvm_load_cpu_guest_regs(v, regs);
- }
-
- /* Copy register changes back into current guest state. */
- memcpy(guest_cpu_user_regs(), regs, HVM_CONTEXT_STACK_BYTES);
- }
- /* else an interrupt send event raced us */
- }
-}
-
-/*
- * On exit from hvm_wait_io, we're guaranteed not to be waiting on
- * I/O response from the device model.
- */
-void hvm_wait_io(void)
-{
- struct vcpu *v = current;
- struct domain *d = v->domain;
- int port = iopacket_port(v);
-
- for ( ; ; )
- {
- /* Clear master flag, selector flag, event flag each in turn. */
- v->vcpu_info->evtchn_upcall_pending = 0;
- clear_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
- smp_mb__after_clear_bit();
- if ( test_and_clear_bit(port, &d->shared_info->evtchn_pending[0]) )
- hvm_io_assist(v);
-
- /* Need to wait for I/O responses? */
- if ( !test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) )
- break;
-
- do_sched_op_compat(SCHEDOP_block, 0);
- }
-
- /*
- * Re-set the selector and master flags in case any other notifications
- * are pending.
- */
- if ( d->shared_info->evtchn_pending[port/BITS_PER_LONG] )
- set_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
- if ( v->vcpu_info->evtchn_pending_sel )
- v->vcpu_info->evtchn_upcall_pending = 1;
-}
-
-void hvm_safe_block(void)
-{
- struct vcpu *v = current;
- struct domain *d = v->domain;
- int port = iopacket_port(v);
-
- for ( ; ; )
- {
- /* Clear master flag & selector flag so we will wake from block. */
- v->vcpu_info->evtchn_upcall_pending = 0;
- clear_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
- smp_mb__after_clear_bit();
-
- /* Event pending already? */
- if ( test_bit(port, &d->shared_info->evtchn_pending[0]) )
- break;
-
- do_sched_op_compat(SCHEDOP_block, 0);
- }
-
- /* Reflect pending event in selector and master flags. */
- set_bit(port/BITS_PER_LONG, &v->vcpu_info->evtchn_pending_sel);
- v->vcpu_info->evtchn_upcall_pending = 1;
+ if (p->state == STATE_IORESP_READY) {
+ p->state = STATE_INVALID;
+ if (p->type == IOREQ_TYPE_PIO)
+ hvm_pio_assist(regs, p, io_opp);
+ else {
+ hvm_mmio_assist(regs, p, io_opp);
+ hvm_load_cpu_guest_regs(v, regs);
+ }
+
+ /* Copy register changes back into current guest state. */
+ memcpy(guest_cpu_user_regs(), regs, HVM_CONTEXT_STACK_BYTES);
+ }
}
/*
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/platform.c
--- a/xen/arch/x86/hvm/platform.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/platform.c Tue Jul 18 13:43:27 2006 +0100
@@ -669,6 +669,37 @@ int inst_copy_from_guest(unsigned char *
return inst_len;
}
+static void hvm_send_assist_req(struct vcpu *v)
+{
+ ioreq_t *p;
+
+ ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
+ spin_lock(&v->pause_lock);
+ if ( v->pause_count++ == 0 )
+ set_bit(_VCPUF_paused, &v->vcpu_flags);
+ spin_unlock(&v->pause_lock);
+ set_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
+ mb();
+ p = &get_vio(v->domain, v->vcpu_id)->vp_ioreq;
+ if (unlikely(p->state != STATE_INVALID)) {
+ /* This indicates a bug in the device model. Crash the
+ domain. */
+ printf("Device model set bad IO state %d.\n", p->state);
+ domain_crash(v->domain);
+ return;
+ }
+ vcpu_sleep_nosync(v);
+ wmb();
+ p->state = STATE_IOREQ_READY;
+ notify_xen_event_channel(v->arch.hvm_vcpu.xen_port);
+}
+
+/* Wake up a vcpu whihc is waiting for interrupts to come in */
+void hvm_prod_vcpu(struct vcpu *v)
+{
+ vcpu_unblock(v);
+}
+
void send_pio_req(struct cpu_user_regs *regs, unsigned long port,
unsigned long count, int size, long value, int dir, int pvalid)
{
@@ -682,13 +713,11 @@ void send_pio_req(struct cpu_user_regs *
domain_crash_synchronous();
}
- if (test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags)) {
- printf("HVM I/O has not yet completed\n");
- domain_crash_synchronous();
- }
- set_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
-
p = &vio->vp_ioreq;
+ if (p->state != STATE_INVALID) {
+ printf("WARNING: send pio with something already pending (%d)?\n",
+ p->state);
+ }
p->dir = dir;
p->pdata_valid = pvalid;
@@ -714,15 +743,11 @@ void send_pio_req(struct cpu_user_regs *
return;
}
- p->state = STATE_IOREQ_READY;
-
- evtchn_send(iopacket_port(v));
- hvm_wait_io();
-}
-
-void send_mmio_req(
- unsigned char type, unsigned long gpa,
- unsigned long count, int size, long value, int dir, int pvalid)
+ hvm_send_assist_req(v);
+}
+
+static void send_mmio_req(unsigned char type, unsigned long gpa,
+ unsigned long count, int size, long value, int dir, int pvalid)
{
struct vcpu *v = current;
vcpu_iodata_t *vio;
@@ -739,12 +764,10 @@ void send_mmio_req(
p = &vio->vp_ioreq;
- if (test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags)) {
- printf("HVM I/O has not yet completed\n");
- domain_crash_synchronous();
- }
-
- set_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags);
+ if (p->state != STATE_INVALID) {
+ printf("WARNING: send pio with something already pending (%d)?\n",
+ p->state);
+ }
p->dir = dir;
p->pdata_valid = pvalid;
@@ -770,10 +793,7 @@ void send_mmio_req(
return;
}
- p->state = STATE_IOREQ_READY;
-
- evtchn_send(iopacket_port(v));
- hvm_wait_io();
+ hvm_send_assist_req(v);
}
static void mmio_operands(int type, unsigned long gpa, struct instruction *inst,
@@ -1035,6 +1055,108 @@ void handle_mmio(unsigned long va, unsig
}
}
+void hvm_assist_complete(struct vcpu *v)
+{
+ ioreq_t *p;
+ /* The device model just sent an event channel message to us. Either:
+
+ a) It just finished processing a request, or
+ b) it wants us to send an interrupt into the guest.
+
+ We only need to handle case (b) explicitly if there is no pending
+ IO request from us to the device model (since if there is, we'll
+ pick up the interrupt when the request completes). */
+ p = &get_vio(v->domain, v->vcpu_id)->vp_ioreq;
+ if (p->state == STATE_IORESP_READY) {
+ /* There's a race here, in that the device model could set
+ p->state while we're not looking, but we don't care, since
+ that would imply that *this* notification is not related to
+ that state transition, and so there'll be another one along
+ shortly. */
+ if (test_and_clear_bit(ARCH_HVM_IO_WAIT,
+ &v->arch.hvm_vcpu.ioflags)) {
+ /* Just completed a wait-for-io, so we can unpause the
+ vcpu. It'll pick up the response when it returns. */
+ vcpu_unpause(v);
+ return;
+ } else {
+ /* Someone got in and processed the response before us.
+ Just to be on the safe side, treat this as an interrupt
+ delivery. */
+ /* (the other path implicitly does interrupt delivery as
+ the vcpu returns to the guest) */
+ }
+ }
+
+ /* Evtchn message must have been for interrupt delivery. */
+ hvm_prod_vcpu(v);
+ smp_send_event_check_cpu(v->processor);
+}
+
+#define MIN(x,y) ((x)<(y)?(x):(y))
+
+/* Note that copy_{to,from}_user_hvm don't set the A and D bits on
+ PTEs, and require the PTE to be writable even when they're only
+ trying to read from it. The guest is expected to deal with
+ this. */
+unsigned long copy_to_user_hvm(void *to, const void *from, unsigned len)
+{
+ unsigned long mfn;
+ unsigned long va;
+ void *map;
+ unsigned long off_in_page;
+ unsigned long chunk_size;
+
+ ASSERT(hvm_guest(current));
+ va = (unsigned long)to;
+ off_in_page = va % PAGE_SIZE;
+ while (len != 0) {
+ mfn = gva_to_mfn(va);
+ if (!mfn)
+ break;
+ map = map_domain_page(mfn);
+ if (!map)
+ break;
+ chunk_size = MIN(len, PAGE_SIZE - off_in_page);
+ memcpy(map + off_in_page, from, chunk_size);
+ unmap_domain_page(map);
+ off_in_page = 0;
+ len -= chunk_size;
+ from += chunk_size;
+ va += chunk_size;
+ }
+ return len;
+}
+
+unsigned long copy_from_user_hvm(void *to, const void *from, unsigned len)
+{
+ unsigned long mfn;
+ unsigned long va;
+ void *map;
+ unsigned long off_in_page;
+ unsigned long chunk_size;
+
+ ASSERT(hvm_guest(current));
+ va = (unsigned long)from;
+ off_in_page = va % PAGE_SIZE;
+ while (len != 0) {
+ mfn = gva_to_mfn(va);
+ if (!mfn)
+ break;
+ map = map_domain_page(mfn);
+ if (!map)
+ break;
+ chunk_size = MIN(len, PAGE_SIZE - off_in_page);
+ memcpy(to, map + off_in_page, chunk_size);
+ unmap_domain_page(map);
+ off_in_page = 0;
+ len -= chunk_size;
+ to += chunk_size;
+ va += chunk_size;
+ }
+ return len;
+}
+
/*
* Local variables:
* mode: C
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/svm/svm.c
--- a/xen/arch/x86/hvm/svm/svm.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/svm/svm.c Tue Jul 18 13:43:27 2006 +0100
@@ -25,6 +25,7 @@
#include <xen/sched.h>
#include <xen/irq.h>
#include <xen/softirq.h>
+#include <xen/hypercall.h>
#include <asm/current.h>
#include <asm/io.h>
#include <asm/shadow.h>
@@ -456,6 +457,28 @@ void svm_init_ap_context(struct vcpu_gue
ctxt->flags = VGCF_HVM_GUEST;
}
+static void svm_init_hypercall_page(struct domain *d, void *hypercall_page)
+{
+ char *p;
+ int i;
+
+ memset(hypercall_page, 0, PAGE_SIZE);
+
+ for ( i = 0; i < (PAGE_SIZE / 32); i++ )
+ {
+ p = (char *)(hypercall_page + (i * 32));
+ *(u8 *)(p + 0) = 0xb8; /* mov imm32, %eax */
+ *(u32 *)(p + 1) = i;
+ *(u8 *)(p + 5) = 0x0f; /* vmmcall */
+ *(u8 *)(p + 6) = 0x01;
+ *(u8 *)(p + 7) = 0xd9;
+ *(u8 *)(p + 8) = 0xc3; /* ret */
+ }
+
+ /* Don't support HYPERVISOR_iret at the moment */
+ *(u16 *)(hypercall_page + (__HYPERVISOR_iret * 32)) = 0x0b0f; /* ud2 */
+}
+
int start_svm(void)
{
u32 eax, ecx, edx;
@@ -503,6 +526,8 @@ int start_svm(void)
hvm_funcs.instruction_length = svm_instruction_length;
hvm_funcs.get_guest_ctrl_reg = svm_get_ctrl_reg;
hvm_funcs.init_ap_context = svm_init_ap_context;
+
+ hvm_funcs.init_hypercall_page = svm_init_hypercall_page;
hvm_enabled = 1;
@@ -2085,7 +2110,7 @@ static inline void svm_vmexit_do_hlt(str
next_wakeup = next_pit;
if ( next_wakeup != - 1 )
set_timer(¤t->arch.hvm_svm.hlt_timer, next_wakeup);
- hvm_safe_block();
+ do_sched_op_compat(SCHEDOP_block, 0);
}
@@ -2314,33 +2339,39 @@ static int svm_do_vmmcall(struct vcpu *v
inst_len = __get_instruction_length(vmcb, INSTR_VMCALL, NULL);
ASSERT(inst_len > 0);
- /* VMMCALL sanity check */
- if (vmcb->cpl > get_vmmcall_cpl(regs->edi))
- {
- printf("VMMCALL CPL check failed\n");
- return -1;
- }
-
- /* handle the request */
- switch (regs->edi)
- {
- case VMMCALL_RESET_TO_REALMODE:
- if (svm_do_vmmcall_reset_to_realmode(v, regs))
- {
- printf("svm_do_vmmcall_reset_to_realmode() failed\n");
+ if (regs->eax & 0x80000000) {
+ /* VMMCALL sanity check */
+ if (vmcb->cpl > get_vmmcall_cpl(regs->edi))
+ {
+ printf("VMMCALL CPL check failed\n");
return -1;
}
-
- /* since we just reset the VMCB, return without adjusting the eip */
- return 0;
- case VMMCALL_DEBUG:
- printf("DEBUG features not implemented yet\n");
- break;
- default:
- break;
- }
-
- hvm_print_line(v, regs->eax); /* provides the current domain */
+
+ /* handle the request */
+ switch (regs->eax)
+ {
+ case VMMCALL_RESET_TO_REALMODE:
+ if (svm_do_vmmcall_reset_to_realmode(v, regs))
+ {
+ printf("svm_do_vmmcall_reset_to_realmode() failed\n");
+ return -1;
+ }
+ /* since we just reset the VMCB, return without adjusting
+ * the eip */
+ return 0;
+
+ case VMMCALL_DEBUG:
+ printf("DEBUG features not implemented yet\n");
+ break;
+ default:
+ break;
+ }
+
+ hvm_print_line(v, regs->eax); /* provides the current domain */
+ } else {
+ /* It's a hypercall */
+ hvm_do_hypercall(regs);
+ }
__update_guest_eip(vmcb, inst_len);
return 0;
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/svm/vmcb.c
--- a/xen/arch/x86/hvm/svm/vmcb.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/svm/vmcb.c Tue Jul 18 13:43:27 2006 +0100
@@ -370,18 +370,6 @@ void svm_do_launch(struct vcpu *v)
if (v->vcpu_id == 0)
hvm_setup_platform(v->domain);
- if ( evtchn_bind_vcpu(iopacket_port(v), v->vcpu_id) < 0 )
- {
- printk("HVM domain bind port %d to vcpu %d failed!\n",
- iopacket_port(v), v->vcpu_id);
- domain_crash_synchronous();
- }
-
- HVM_DBG_LOG(DBG_LEVEL_1, "eport: %x", iopacket_port(v));
-
- clear_bit(iopacket_port(v),
- &v->domain->shared_info->evtchn_mask[0]);
-
if (hvm_apic_support(v->domain))
vlapic_init(v);
init_timer(&v->arch.hvm_svm.hlt_timer,
@@ -455,9 +443,10 @@ void svm_do_resume(struct vcpu *v)
pickup_deactive_ticks(pt);
}
- if ( test_bit(iopacket_port(v), &d->shared_info->evtchn_pending[0]) ||
- test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) )
- hvm_wait_io();
+ if (test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags)) {
+ hvm_io_assist(v);
+ ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
+ }
/* We can't resume the guest if we're waiting on I/O */
ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vlapic.c
--- a/xen/arch/x86/hvm/vlapic.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vlapic.c Tue Jul 18 13:43:27 2006 +0100
@@ -33,6 +33,7 @@
#include <xen/sched.h>
#include <asm/current.h>
#include <public/hvm/ioreq.h>
+#include <public/hvm/params.h>
/* XXX remove this definition after GFW enabled */
#define VLAPIC_NO_BIOS
@@ -63,7 +64,7 @@ int vlapic_find_highest_irr(struct vlapi
int hvm_apic_support(struct domain *d)
{
- return d->arch.hvm_domain.apic_enabled;
+ return d->arch.hvm_domain.params[HVM_PARAM_APIC_ENABLED];
}
s_time_t get_apictime_scheduled(struct vcpu *v)
@@ -223,7 +224,7 @@ static int vlapic_accept_irq(struct vcpu
"level trig mode for vector %d\n", vector);
set_bit(vector, &vlapic->tmr[0]);
}
- evtchn_set_pending(v, iopacket_port(v));
+ hvm_prod_vcpu(v);
result = 1;
break;
@@ -367,7 +368,7 @@ int vlapic_check_vector(struct vlapic *v
return 1;
}
-void vlapic_ipi(struct vlapic *vlapic)
+static void vlapic_ipi(struct vlapic *vlapic)
{
unsigned int dest = (vlapic->icr_high >> 24) & 0xff;
unsigned int short_hand = (vlapic->icr_low >> 18) & 3;
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vmx/io.c
--- a/xen/arch/x86/hvm/vmx/io.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vmx/io.c Tue Jul 18 13:43:27 2006 +0100
@@ -142,6 +142,7 @@ asmlinkage void vmx_intr_assist(void)
struct hvm_domain *plat=&v->domain->arch.hvm_domain;
struct periodic_time *pt = &plat->pl_time.periodic_tm;
struct hvm_virpic *pic= &plat->vpic;
+ int callback_irq;
unsigned int idtv_info_field;
unsigned long inst_len;
int has_ext_irq;
@@ -152,6 +153,15 @@ asmlinkage void vmx_intr_assist(void)
if ( (v->vcpu_id == 0) && pt->enabled && pt->pending_intr_nr ) {
pic_set_irq(pic, pt->irq, 0);
pic_set_irq(pic, pt->irq, 1);
+ }
+
+ callback_irq = v->domain->arch.hvm_domain.params[HVM_PARAM_CALLBACK_IRQ];
+ if ( callback_irq != 0 &&
+ local_events_need_delivery() ) {
+ /*inject para-device call back irq*/
+ v->vcpu_info->evtchn_upcall_mask = 1;
+ pic_set_irq(pic, callback_irq, 0);
+ pic_set_irq(pic, callback_irq, 1);
}
has_ext_irq = cpu_has_pending_irq(v);
@@ -220,7 +230,7 @@ asmlinkage void vmx_intr_assist(void)
void vmx_do_resume(struct vcpu *v)
{
- struct domain *d = v->domain;
+ ioreq_t *p;
struct periodic_time *pt = &v->domain->arch.hvm_domain.pl_time.periodic_tm;
vmx_stts();
@@ -234,9 +244,13 @@ void vmx_do_resume(struct vcpu *v)
pickup_deactive_ticks(pt);
}
- if ( test_bit(iopacket_port(v), &d->shared_info->evtchn_pending[0]) ||
- test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags) )
- hvm_wait_io();
+ p = &get_vio(v->domain, v->vcpu_id)->vp_ioreq;
+ if (p->state == STATE_IORESP_READY)
+ hvm_io_assist(v);
+ if (p->state != STATE_INVALID) {
+ printf("Weird HVM iorequest state %d.\n", p->state);
+ domain_crash(v->domain);
+ }
/* We can't resume the guest if we're waiting on I/O */
ASSERT(!test_bit(ARCH_HVM_IO_WAIT, &v->arch.hvm_vcpu.ioflags));
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vmx/vmcs.c
--- a/xen/arch/x86/hvm/vmx/vmcs.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vmx/vmcs.c Tue Jul 18 13:43:27 2006 +0100
@@ -245,18 +245,6 @@ static void vmx_do_launch(struct vcpu *v
if (v->vcpu_id == 0)
hvm_setup_platform(v->domain);
- if ( evtchn_bind_vcpu(iopacket_port(v), v->vcpu_id) < 0 )
- {
- printk("VMX domain bind port %d to vcpu %d failed!\n",
- iopacket_port(v), v->vcpu_id);
- domain_crash_synchronous();
- }
-
- HVM_DBG_LOG(DBG_LEVEL_1, "eport: %x", iopacket_port(v));
-
- clear_bit(iopacket_port(v),
- &v->domain->shared_info->evtchn_mask[0]);
-
__asm__ __volatile__ ("mov %%cr0,%0" : "=r" (cr0) : );
error |= __vmwrite(GUEST_CR0, cr0);
diff -r ecb8ff1fcf1f xen/arch/x86/hvm/vmx/vmx.c
--- a/xen/arch/x86/hvm/vmx/vmx.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/hvm/vmx/vmx.c Tue Jul 18 13:43:27 2006 +0100
@@ -25,6 +25,7 @@
#include <xen/irq.h>
#include <xen/softirq.h>
#include <xen/domain_page.h>
+#include <xen/hypercall.h>
#include <asm/current.h>
#include <asm/io.h>
#include <asm/shadow.h>
@@ -139,6 +140,7 @@ static void vmx_relinquish_guest_resourc
kill_timer(&VLAPIC(v)->vlapic_timer);
xfree(VLAPIC(v));
}
+ hvm_release_assist_channel(v);
}
kill_timer(&d->arch.hvm_domain.pl_time.periodic_tm.timer);
@@ -669,6 +671,28 @@ static int check_vmx_controls(u32 ctrls,
return 1;
}
+static void vmx_init_hypercall_page(struct domain *d, void *hypercall_page)
+{
+ char *p;
+ int i;
+
+ memset(hypercall_page, 0, PAGE_SIZE);
+
+ for ( i = 0; i < (PAGE_SIZE / 32); i++ )
+ {
+ p = (char *)(hypercall_page + (i * 32));
+ *(u8 *)(p + 0) = 0xb8; /* mov imm32, %eax */
+ *(u32 *)(p + 1) = i;
+ *(u8 *)(p + 5) = 0x0f; /* vmcall */
+ *(u8 *)(p + 6) = 0x01;
+ *(u8 *)(p + 7) = 0xc1;
+ *(u8 *)(p + 8) = 0xc3; /* ret */
+ }
+
+ /* Don't support HYPERVISOR_iret at the moment */
+ *(u16 *)(hypercall_page + (__HYPERVISOR_iret * 32)) = 0x0b0f; /* ud2 */
+}
+
int start_vmx(void)
{
u32 eax, edx;
@@ -748,6 +772,8 @@ int start_vmx(void)
hvm_funcs.get_guest_ctrl_reg = vmx_get_ctrl_reg;
hvm_funcs.init_ap_context = vmx_init_ap_context;
+
+ hvm_funcs.init_hypercall_page = vmx_init_hypercall_page;
hvm_enabled = 1;
@@ -1968,7 +1994,7 @@ void vmx_vmexit_do_hlt(void)
next_wakeup = next_pit;
if ( next_wakeup != - 1 )
set_timer(¤t->arch.hvm_vmx.hlt_timer, next_wakeup);
- hvm_safe_block();
+ do_sched_op_compat(SCHEDOP_block, 0);
}
static inline void vmx_vmexit_do_extint(struct cpu_user_regs *regs)
@@ -2138,11 +2164,10 @@ asmlinkage void vmx_vmexit_handler(struc
* (1) We can get an exception (e.g. #PG) in the guest, or
* (2) NMI
*/
- int error;
unsigned int vector;
unsigned long va;
- if ((error = __vmread(VM_EXIT_INTR_INFO, &vector))
+ if (__vmread(VM_EXIT_INTR_INFO, &vector)
|| !(vector & INTR_INFO_VALID_MASK))
__hvm_bug(®s);
vector &= INTR_INFO_VECTOR_MASK;
@@ -2215,7 +2240,7 @@ asmlinkage void vmx_vmexit_handler(struc
(unsigned long)regs.ecx, (unsigned long)regs.edx,
(unsigned long)regs.esi, (unsigned long)regs.edi);
- if (!(error = vmx_do_page_fault(va, ®s))) {
+ if (!vmx_do_page_fault(va, ®s)) {
/*
* Inject #PG using Interruption-Information Fields
*/
@@ -2273,16 +2298,16 @@ asmlinkage void vmx_vmexit_handler(struc
__update_guest_eip(inst_len);
break;
}
-#if 0 /* keep this for debugging */
case EXIT_REASON_VMCALL:
+ {
__get_instruction_length(inst_len);
__vmread(GUEST_RIP, &eip);
__vmread(EXIT_QUALIFICATION, &exit_qualification);
- hvm_print_line(v, regs.eax); /* provides the current domain */
+ hvm_do_hypercall(®s);
__update_guest_eip(inst_len);
break;
-#endif
+ }
case EXIT_REASON_CR_ACCESS:
{
__vmread(GUEST_RIP, &eip);
@@ -2323,7 +2348,6 @@ asmlinkage void vmx_vmexit_handler(struc
case EXIT_REASON_MWAIT_INSTRUCTION:
__hvm_bug(®s);
break;
- case EXIT_REASON_VMCALL:
case EXIT_REASON_VMCLEAR:
case EXIT_REASON_VMLAUNCH:
case EXIT_REASON_VMPTRLD:
diff -r ecb8ff1fcf1f xen/arch/x86/mm.c
--- a/xen/arch/x86/mm.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/mm.c Tue Jul 18 13:43:27 2006 +0100
@@ -2982,7 +2982,12 @@ long arch_memory_op(int op, XEN_GUEST_HA
if ( copy_from_guest(&xatp, arg, 1) )
return -EFAULT;
- if ( (d = find_domain_by_id(xatp.domid)) == NULL )
+ if ( xatp.domid == DOMID_SELF ) {
+ d = current->domain;
+ get_knownalive_domain(d);
+ } else if ( !IS_PRIV(current->domain) )
+ return -EPERM;
+ else if ( (d = find_domain_by_id(xatp.domid)) == NULL )
return -ESRCH;
switch ( xatp.space )
diff -r ecb8ff1fcf1f xen/arch/x86/x86_32/entry.S
--- a/xen/arch/x86/x86_32/entry.S Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/x86_32/entry.S Tue Jul 18 13:43:27 2006 +0100
@@ -656,6 +656,7 @@ ENTRY(hypercall_table)
.long do_xenoprof_op
.long do_event_channel_op
.long do_physdev_op
+ .long do_hvm_op /* 34 */
.rept NR_hypercalls-((.-hypercall_table)/4)
.long do_ni_hypercall
.endr
@@ -695,6 +696,7 @@ ENTRY(hypercall_args_table)
.byte 2 /* do_xenoprof_op */
.byte 2 /* do_event_channel_op */
.byte 2 /* do_physdev_op */
+ .byte 2 /* do_hvm_op */ /* 34 */
.rept NR_hypercalls-(.-hypercall_args_table)
.byte 0 /* do_ni_hypercall */
.endr
diff -r ecb8ff1fcf1f xen/arch/x86/x86_32/traps.c
--- a/xen/arch/x86/x86_32/traps.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/arch/x86/x86_32/traps.c Tue Jul 18 13:43:27 2006 +0100
@@ -486,9 +486,11 @@ static void hypercall_page_initialise_ri
*(u16 *)(p+ 6) = 0x82cd; /* int $0x82 */
}
-void hypercall_page_initialise(void *hypercall_page)
-{
- if ( supervisor_mode_kernel )
+void hypercall_page_initialise(struct domain *d, void *hypercall_page)
+{
+ if ( hvm_guest(d->vcpu[0]) )
+ hvm_hypercall_page_initialise(d, hypercall_page);
+ else if ( supervisor_mode_kernel )
hypercall_page_initialise_ring0_kernel(hypercall_page);
else
hypercall_page_initialise_ring1_kernel(hypercall_page);
diff -r ecb8ff1fcf1f xen/common/event_channel.c
--- a/xen/common/event_channel.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/common/event_channel.c Tue Jul 18 13:43:27 2006 +0100
@@ -46,6 +46,104 @@
goto out; \
} while ( 0 )
+#define NR_XEN_EVENT_CHANNELS 32
+#define XECS_FREE 0 /* Not in use at all */
+#define XECS_UNBOUND 1 /* Allocated but not bound to */
+#define XECS_BOUND 2 /* Bound to somewhere in domain-space */
+#define XECS_HBOUND 3 /* Half bound: Xen is trying to tear this
+ down, but a domain is still attached */
+struct xen_evtchn {
+ int state;
+
+ void (*fire)(void *d); /* called when dom0 tries to send on this
+ event channel. */
+ void *data;
+
+ struct domain *dom; /* Who is allowed to bind/currently bound */
+ int dom_port;
+};
+
+static struct xen_evtchn xen_event_channels[NR_XEN_EVENT_CHANNELS];
+/* Leaf lock protecting the xen_event_channels array. */
+static spinlock_t xen_event_channel_lock = SPIN_LOCK_UNLOCKED;
+
+int alloc_xen_event_channel(void (*f)(void *d),
+ void *data,
+ struct domain *d)
+{
+ int ind;
+
+ spin_lock(&xen_event_channel_lock);
+ for (ind = 0; ind < NR_XEN_EVENT_CHANNELS; ind++)
+ if ( xen_event_channels[ind].state == XECS_FREE )
+ break;
+ if ( ind == NR_XEN_EVENT_CHANNELS ) {
+ printf("Out of Xen event channels?\n");
+ ind = -1;
+ goto out;
+ }
+ xen_event_channels[ind].state = XECS_UNBOUND;
+ xen_event_channels[ind].fire = f;
+ xen_event_channels[ind].data = data;
+ xen_event_channels[ind].dom = d;
+ out:
+ spin_unlock(&xen_event_channel_lock);
+ return ind;
+}
+
+void release_xen_event_channel(int ind)
+{
+ spin_lock(&xen_event_channel_lock);
+ switch ( xen_event_channels[ind].state ) {
+ case XECS_UNBOUND:
+ xen_event_channels[ind].state = XECS_FREE;
+ break;
+ case XECS_BOUND:
+ xen_event_channels[ind].state = XECS_HBOUND;
+ break;
+ case XECS_HBOUND:
+ panic("Double free of Xen event channel.\n");
+ case XECS_FREE:
+ printf("Attempt to free non-allocated Xen event channel %d?\n",
+ ind);
+ default:
+ BUG();
+ }
+
+ spin_unlock(&xen_event_channel_lock);
+}
+
+void notify_xen_event_channel(int port)
+{
+ struct xen_evtchn *xchn = xen_event_channels + port;
+ struct domain *d = NULL;
+ struct evtchn *chn;
+
+ /* We rely on our caller to ensure that nobody's trying to tear
+ the channel down from inside Xen while it's being signalled on.
+ That means that the only transition the channel could make is
+ from BOUND to UNBOUND or vice-versa. Neither of those change
+ the dom field, so we can read it without taking a lock. This
+ simplifies the lock ordering a bit. */
+ d = xchn->dom;
+ ASSERT(d);
+ if ( !get_domain(d) )
+ return;
+ spin_lock(&d->evtchn_lock);
+ spin_lock(&xen_event_channel_lock);
+ if ( xchn->state != XECS_UNBOUND ) {
+ BUG_ON(xchn->state != XECS_BOUND);
+ BUG_ON(d != xchn->dom);
+ chn = evtchn_from_port(d, xchn->dom_port);
+ if ( chn->state == ECS_XEN )
+ evtchn_set_pending(d->vcpu[chn->notify_vcpu_id],
+ xchn->dom_port);
+ } else
+ printf("Send on unbound Xen event channel?\n");
+
+ spin_unlock(&d->evtchn_lock);
+ spin_unlock(&xen_event_channel_lock);
+}
static int virq_is_global(int virq)
{
@@ -134,6 +232,44 @@ static long evtchn_alloc_unbound(evtchn_
}
+static long evtchn_bind_xen(struct domain *ld, int xen_port)
+{
+ long rc = 0;
+ struct evtchn *lchn;
+ struct xen_evtchn *rchn;
+ int lport;
+
+ if ( xen_port < 0 || xen_port >= NR_XEN_EVENT_CHANNELS )
+ return -EINVAL;
+
+ spin_lock(&ld->evtchn_lock);
+ spin_lock(&xen_event_channel_lock);
+
+ rchn = xen_event_channels + xen_port;
+ if ( rchn->state != XECS_UNBOUND || rchn->dom != ld )
+ ERROR_EXIT(-EINVAL);
+
+ if ( (lport = get_free_port(ld)) < 0 )
+ ERROR_EXIT(lport);
+ lchn = evtchn_from_port(ld, lport);
+ lchn->state = ECS_XEN;
+ lchn->u.xen_port = xen_port;
+
+ rchn->state = XECS_BOUND;
+ rchn->dom_port = lport;
+
+ /* Somewhat ugly hack to avoid lost wakeups if we've tried to
+ notify this port before anyone got around to binding it. */
+ evtchn_set_pending(ld->vcpu[lchn->notify_vcpu_id], lport);
+ rc = lport;
+
+ out:
+ spin_unlock(&xen_event_channel_lock);
+ spin_unlock(&ld->evtchn_lock);
+
+ return rc;
+}
+
static long evtchn_bind_interdomain(evtchn_bind_interdomain_t *bind)
{
struct evtchn *lchn, *rchn;
@@ -147,6 +283,15 @@ static long evtchn_bind_interdomain(evtc
if ( rdom == DOMID_SELF )
rdom = current->domain->domain_id;
+
+ if ( rdom == DOMID_XEN ) {
+ rc = evtchn_bind_xen(ld, rport);
+ if ( rc >= 0 ) {
+ bind->local_port = rc;
+ rc = 0;
+ }
+ return rc;
+ }
if ( (rd = find_domain_by_id(rdom)) == NULL )
return -ESRCH;
@@ -317,11 +462,12 @@ static long evtchn_bind_pirq(evtchn_bind
static long __evtchn_close(struct domain *d1, int port1)
{
- struct domain *d2 = NULL;
- struct vcpu *v;
- struct evtchn *chn1, *chn2;
- int port2;
- long rc = 0;
+ struct domain *d2 = NULL;
+ struct vcpu *v;
+ struct evtchn *chn1, *chn2;
+ int port2;
+ long rc = 0;
+ struct xen_evtchn *xchn;
again:
spin_lock(&d1->evtchn_lock);
@@ -409,6 +555,19 @@ static long __evtchn_close(struct domain
chn2->u.unbound.remote_domid = d1->domain_id;
break;
+ case ECS_XEN:
+ spin_lock(&xen_event_channel_lock);
+ xchn = xen_event_channels + chn1->u.xen_port;
+ BUG_ON(xchn->dom != d1);
+ if ( xchn->state == XECS_HBOUND )
+ xchn->state = XECS_FREE;
+ else if (xchn->state == XECS_BOUND)
+ xchn->state = XECS_UNBOUND;
+ else
+ BUG();
+ spin_unlock(&xen_event_channel_lock);
+ break;
+
default:
BUG();
}
@@ -442,6 +601,7 @@ long evtchn_send(unsigned int lport)
struct evtchn *lchn, *rchn;
struct domain *ld = current->domain, *rd;
int rport, ret = 0;
+ struct xen_evtchn *xchn;
spin_lock(&ld->evtchn_lock);
@@ -465,6 +625,16 @@ long evtchn_send(unsigned int lport)
break;
case ECS_UNBOUND:
/* silently drop the notification */
+ break;
+ case ECS_XEN:
+ xchn = xen_event_channels + lchn->u.xen_port;
+ spin_lock(&xen_event_channel_lock);
+ if ( xchn->state != XECS_HBOUND )
+ {
+ BUG_ON(xchn->state != XECS_BOUND);
+ xchn->fire(xchn->data);
+ }
+ spin_unlock(&xen_event_channel_lock);
break;
default:
ret = -EINVAL;
@@ -596,6 +766,11 @@ static long evtchn_status(evtchn_status_
chn->u.interdomain.remote_dom->domain_id;
status->u.interdomain.port = chn->u.interdomain.remote_port;
break;
+ case ECS_XEN:
+ status->status = EVTCHNSTAT_interdomain;
+ status->u.interdomain.dom = DOMID_XEN;
+ status->u.interdomain.port = chn->u.xen_port;
+ break;
case ECS_PIRQ:
status->status = EVTCHNSTAT_pirq;
status->u.pirq = chn->u.pirq;
@@ -649,6 +824,7 @@ long evtchn_bind_vcpu(unsigned int port,
case ECS_UNBOUND:
case ECS_INTERDOMAIN:
case ECS_PIRQ:
+ case ECS_XEN:
chn->notify_vcpu_id = vcpu_id;
break;
default:
diff -r ecb8ff1fcf1f xen/common/memory.c
--- a/xen/common/memory.c Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/common/memory.c Tue Jul 18 13:43:27 2006 +0100
@@ -158,6 +158,9 @@ guest_remove_page(
}
page = mfn_to_page(mfn);
+ if ( IS_XEN_HEAP_FRAME(page) )
+ return 0;
+
if ( unlikely(!get_page(page, d)) )
{
DPRINTK("Bad page free for domain %u\n", d->domain_id);
diff -r ecb8ff1fcf1f xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/domain.h Tue Jul 18 13:43:27 2006 +0100
@@ -55,7 +55,7 @@ extern void toggle_guest_mode(struct vcp
* Initialise a hypercall-transfer page. The given pointer must be mapped
* in Xen virtual address space (accesses are not validated or checked).
*/
-extern void hypercall_page_initialise(void *);
+extern void hypercall_page_initialise(struct domain *d, void *);
struct arch_domain
{
diff -r ecb8ff1fcf1f xen/include/asm-x86/guest_access.h
--- a/xen/include/asm-x86/guest_access.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/guest_access.h Tue Jul 18 13:43:27 2006 +0100
@@ -8,6 +8,8 @@
#define __ASM_X86_GUEST_ACCESS_H__
#include <asm/uaccess.h>
+#include <asm/hvm/support.h>
+#include <asm/hvm/guest_access.h>
/* Is the guest handle a NULL reference? */
#define guest_handle_is_null(hnd) ((hnd).p == NULL)
@@ -28,6 +30,8 @@
#define copy_to_guest_offset(hnd, off, ptr, nr) ({ \
const typeof(ptr) _x = (hnd).p; \
const typeof(ptr) _y = (ptr); \
+ hvm_guest(current) ? \
+ copy_to_user_hvm(_x+(off), _y, sizeof(*_x)*(nr)) : \
copy_to_user(_x+(off), _y, sizeof(*_x)*(nr)); \
})
@@ -38,6 +42,8 @@
#define copy_from_guest_offset(ptr, hnd, off, nr) ({ \
const typeof(ptr) _x = (hnd).p; \
const typeof(ptr) _y = (ptr); \
+ hvm_guest(current) ? \
+ copy_from_user_hvm(_y, _x+(off), sizeof(*_x)*(nr)) :\
copy_from_user(_y, _x+(off), sizeof(*_x)*(nr)); \
})
@@ -45,6 +51,8 @@
#define copy_field_to_guest(hnd, ptr, field) ({ \
const typeof(&(ptr)->field) _x = &(hnd).p->field; \
const typeof(&(ptr)->field) _y = &(ptr)->field; \
+ hvm_guest(current) ? \
+ copy_to_user_hvm(_x, _y, sizeof(*_x)) : \
copy_to_user(_x, _y, sizeof(*_x)); \
})
@@ -52,6 +60,8 @@
#define copy_field_from_guest(ptr, hnd, field) ({ \
const typeof(&(ptr)->field) _x = &(hnd).p->field; \
const typeof(&(ptr)->field) _y = &(ptr)->field; \
+ hvm_guest(current) ? \
+ copy_from_user_hvm(_y, _x, sizeof(*_x)) : \
copy_from_user(_y, _x, sizeof(*_x)); \
})
@@ -60,29 +70,37 @@
* Allows use of faster __copy_* functions.
*/
#define guest_handle_okay(hnd, nr) \
- array_access_ok((hnd).p, (nr), sizeof(*(hnd).p))
+ (hvm_guest(current) || array_access_ok((hnd).p, (nr), sizeof(*(hnd).p)))
#define __copy_to_guest_offset(hnd, off, ptr, nr) ({ \
const typeof(ptr) _x = (hnd).p; \
const typeof(ptr) _y = (ptr); \
+ hvm_guest(current) ? \
+ copy_to_user_hvm(_x+(off), _y, sizeof(*_x)*(nr)) : \
__copy_to_user(_x+(off), _y, sizeof(*_x)*(nr)); \
})
#define __copy_from_guest_offset(ptr, hnd, off, nr) ({ \
const typeof(ptr) _x = (hnd).p; \
const typeof(ptr) _y = (ptr); \
+ hvm_guest(current) ? \
+ copy_from_user_hvm(_y, _x+(off),sizeof(*_x)*(nr)) : \
__copy_from_user(_y, _x+(off), sizeof(*_x)*(nr)); \
})
#define __copy_field_to_guest(hnd, ptr, field) ({ \
const typeof(&(ptr)->field) _x = &(hnd).p->field; \
const typeof(&(ptr)->field) _y = &(ptr)->field; \
+ hvm_guest(current) ? \
+ copy_to_user_hvm(_x, _y, sizeof(*_x)) : \
__copy_to_user(_x, _y, sizeof(*_x)); \
})
#define __copy_field_from_guest(ptr, hnd, field) ({ \
const typeof(&(ptr)->field) _x = &(hnd).p->field; \
const typeof(&(ptr)->field) _y = &(ptr)->field; \
+ hvm_guest(current) ? \
+ copy_from_user_hvm(_x, _y, sizeof(*_x)) : \
__copy_from_user(_y, _x, sizeof(*_x)); \
})
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/domain.h
--- a/xen/include/asm-x86/hvm/domain.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/domain.h Tue Jul 18 13:43:27 2006 +0100
@@ -27,17 +27,15 @@
#include <asm/hvm/vpit.h>
#include <asm/hvm/vlapic.h>
#include <asm/hvm/vioapic.h>
+#include <public/hvm/params.h>
#define HVM_PBUF_SIZE 80
struct hvm_domain {
unsigned long shared_page_va;
- unsigned int nr_vcpus;
- unsigned int apic_enabled;
- unsigned int pae_enabled;
s64 tsc_frequency;
struct pl_time pl_time;
-
+
struct hvm_virpic vpic;
struct hvm_vioapic vioapic;
struct hvm_io_handler io_handler;
@@ -48,6 +46,8 @@ struct hvm_domain {
int pbuf_index;
char pbuf[HVM_PBUF_SIZE];
+
+ unsigned long params[HVM_NR_PARAMS];
};
#endif /* __ASM_X86_HVM_DOMAIN_H__ */
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/hvm.h
--- a/xen/include/asm-x86/hvm/hvm.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/hvm.h Tue Jul 18 13:43:27 2006 +0100
@@ -61,6 +61,8 @@ struct hvm_function_table {
void (*init_ap_context)(struct vcpu_guest_context *ctxt,
int vcpuid, int trampoline_vector);
+
+ void (*init_hypercall_page)(struct domain *d, void *hypercall_page);
};
extern struct hvm_function_table hvm_funcs;
@@ -75,12 +77,20 @@ hvm_disable(void)
hvm_funcs.disable();
}
+void hvm_create_event_channels(struct vcpu *v);
+void hvm_map_io_shared_page(struct vcpu *v);
+
static inline int
hvm_initialize_guest_resources(struct vcpu *v)
{
- if ( hvm_funcs.initialize_guest_resources )
- return hvm_funcs.initialize_guest_resources(v);
- return 0;
+ int ret = 1;
+ if (hvm_funcs.initialize_guest_resources)
+ ret = hvm_funcs.initialize_guest_resources(v);
+ if (ret == 1) {
+ hvm_map_io_shared_page(v);
+ hvm_create_event_channels(v);
+ }
+ return ret;
}
static inline void
@@ -121,6 +131,9 @@ hvm_instruction_length(struct vcpu *v)
return hvm_funcs.instruction_length(v);
}
+void hvm_hypercall_page_initialise(struct domain *d,
+ void *hypercall_page);
+
static inline unsigned long
hvm_get_guest_ctrl_reg(struct vcpu *v, unsigned int num)
{
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/io.h
--- a/xen/include/asm-x86/hvm/io.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/io.h Tue Jul 18 13:43:27 2006 +0100
@@ -150,14 +150,14 @@ static inline int irq_masked(unsigned lo
#endif
extern void handle_mmio(unsigned long, unsigned long);
-extern void hvm_wait_io(void);
-extern void hvm_safe_block(void);
extern void hvm_io_assist(struct vcpu *v);
extern void pic_irq_request(void *data, int level);
extern void hvm_pic_assist(struct vcpu *v);
extern int cpu_get_interrupt(struct vcpu *v, int *type);
extern int cpu_has_pending_irq(struct vcpu *v);
+void hvm_release_assist_channel(struct vcpu *v);
+
// XXX - think about this, maybe use bit 30 of the mfn to signify an MMIO frame.
#define mmio_space(gpa) (!VALID_MFN(get_mfn_from_gpfn((gpa) >> PAGE_SHIFT)))
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/support.h
--- a/xen/include/asm-x86/hvm/support.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/support.h Tue Jul 18 13:43:27 2006 +0100
@@ -42,11 +42,6 @@ static inline vcpu_iodata_t *get_vio(str
static inline vcpu_iodata_t *get_vio(struct domain *d, unsigned long cpu)
{
return &get_sp(d)->vcpu_iodata[cpu];
-}
-
-static inline int iopacket_port(struct vcpu *v)
-{
- return get_vio(v->domain, v->vcpu_id)->vp_eport;
}
/* XXX these are really VMX specific */
@@ -148,4 +143,9 @@ extern void hvm_print_line(struct vcpu *
extern void hvm_print_line(struct vcpu *v, const char c);
extern void hlt_timer_fn(void *data);
+void hvm_prod_vcpu(struct vcpu *v);
+void hvm_assist_complete(struct vcpu *v);
+
+void hvm_do_hypercall(struct cpu_user_regs *pregs);
+
#endif /* __ASM_X86_HVM_SUPPORT_H__ */
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/svm/vmmcall.h
--- a/xen/include/asm-x86/hvm/svm/vmmcall.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/svm/vmmcall.h Tue Jul 18 13:43:27 2006 +0100
@@ -23,11 +23,11 @@
#define __ASM_X86_HVM_SVM_VMMCALL_H__
/* VMMCALL command fields */
-#define VMMCALL_CODE_CPL_MASK 0xC0000000
-#define VMMCALL_CODE_MBZ_MASK 0x3FFF0000
+#define VMMCALL_CODE_CPL_MASK 0x60000000
+#define VMMCALL_CODE_MBZ_MASK 0x1FFF0000
#define VMMCALL_CODE_COMMAND_MASK 0x0000FFFF
-#define MAKE_VMMCALL_CODE(cpl,func) ((cpl << 30) | (func))
+#define MAKE_VMMCALL_CODE(cpl,func) ((cpl << 29) | (func) | 0x80000000)
/* CPL=0 VMMCALL Requests */
#define VMMCALL_RESET_TO_REALMODE MAKE_VMMCALL_CODE(0,1)
@@ -38,7 +38,7 @@
/* return the cpl required for the vmmcall cmd */
static inline int get_vmmcall_cpl(int cmd)
{
- return (cmd & VMMCALL_CODE_CPL_MASK) >> 30;
+ return (cmd & VMMCALL_CODE_CPL_MASK) >> 29;
}
#endif /* __ASM_X86_HVM_SVM_VMMCALL_H__ */
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/vcpu.h
--- a/xen/include/asm-x86/hvm/vcpu.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/hvm/vcpu.h Tue Jul 18 13:43:27 2006 +0100
@@ -38,6 +38,8 @@ struct hvm_vcpu {
/* For AP startup */
unsigned long init_sipi_sipi_state;
+ int xen_port;
+
/* Flags */
int flag_dr_dirty;
diff -r ecb8ff1fcf1f xen/include/asm-x86/shadow.h
--- a/xen/include/asm-x86/shadow.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/asm-x86/shadow.h Tue Jul 18 13:43:27 2006 +0100
@@ -1733,6 +1733,32 @@ static inline unsigned long gva_to_gpa(u
return l1e_get_paddr(gpte) + (gva & ~PAGE_MASK);
}
+
+static inline unsigned long gva_to_mfn(unsigned long gva)
+{
+ l1_pgentry_t l1e;
+
+ if (__copy_from_user(&l1e, &shadow_linear_pg_table[l1_linear_offset(gva)],
+ sizeof(l1e)) ||
+ (l1e_get_flags(l1e) & (_PAGE_PRESENT | _PAGE_RW)) !=
+ (_PAGE_PRESENT | _PAGE_RW) ) {
+ struct cpu_user_regs cur;
+ /* Error code -> write */
+ cur.error_code = 3;
+ cur.cs = 0; /* Ring 0 -> hypervisor */
+ cur.eflags = 0;
+ shadow_fault(gva, &cur);
+ if (__copy_from_user(&l1e,
+ &shadow_linear_pg_table[l1_linear_offset(gva)],
+ sizeof(l1e)) ||
+ (l1e_get_flags(l1e) & (_PAGE_PRESENT | _PAGE_RW)) !=
+ (_PAGE_PRESENT | _PAGE_RW) ) {
+ return 0;
+ }
+ }
+ return l1e_get_pfn(l1e);
+}
+
#endif
/************************************************************************/
diff -r ecb8ff1fcf1f xen/include/public/hvm/ioreq.h
--- a/xen/include/public/hvm/ioreq.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/public/hvm/ioreq.h Tue Jul 18 13:43:27 2006 +0100
@@ -27,7 +27,6 @@
#define STATE_IOREQ_READY 1
#define STATE_IOREQ_INPROCESS 2
#define STATE_IORESP_READY 3
-#define STATE_IORESP_HOOK 4
#define IOREQ_TYPE_PIO 0 /* pio */
#define IOREQ_TYPE_COPY 1 /* mmio ops */
@@ -67,10 +66,8 @@ typedef struct global_iodata global_ioda
typedef struct global_iodata global_iodata_t;
struct vcpu_iodata {
- struct ioreq vp_ioreq;
- /* Event channel port */
- unsigned int vp_eport; /* VMX vcpu uses this to notify DM */
- unsigned int dm_eport; /* DM uses this to notify VMX vcpu */
+ ioreq_t vp_ioreq;
+ int vp_xen_port;
};
typedef struct vcpu_iodata vcpu_iodata_t;
diff -r ecb8ff1fcf1f xen/include/public/xen.h
--- a/xen/include/public/xen.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/public/xen.h Tue Jul 18 13:43:27 2006 +0100
@@ -66,6 +66,7 @@
#define __HYPERVISOR_xenoprof_op 31
#define __HYPERVISOR_event_channel_op 32
#define __HYPERVISOR_physdev_op 33
+#define __HYPERVISOR_hvm_op 34
/* Architecture-specific hypercall definitions. */
#define __HYPERVISOR_arch_0 48
diff -r ecb8ff1fcf1f xen/include/xen/event.h
--- a/xen/include/xen/event.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/xen/event.h Tue Jul 18 13:43:27 2006 +0100
@@ -44,4 +44,10 @@ extern long evtchn_send(unsigned int lpo
/* Bind a local event-channel port to the specified VCPU. */
extern long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id);
+int alloc_xen_event_channel(void (*f)(void *d),
+ void *data,
+ struct domain *d);
+void release_xen_event_channel(int ind);
+void notify_xen_event_channel(int port);
+
#endif /* __XEN_EVENT_H__ */
diff -r ecb8ff1fcf1f xen/include/xen/hypercall.h
--- a/xen/include/xen/hypercall.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/xen/hypercall.h Tue Jul 18 13:43:27 2006 +0100
@@ -87,4 +87,9 @@ do_nmi_op(
unsigned int cmd,
XEN_GUEST_HANDLE(void) arg);
+extern long
+do_hvm_op(
+ unsigned long op,
+ XEN_GUEST_HANDLE(void) arg);
+
#endif /* __XEN_HYPERCALL_H__ */
diff -r ecb8ff1fcf1f xen/include/xen/sched.h
--- a/xen/include/xen/sched.h Fri Jul 14 18:53:27 2006 +0100
+++ b/xen/include/xen/sched.h Tue Jul 18 13:43:27 2006 +0100
@@ -36,6 +36,7 @@ struct evtchn
#define ECS_PIRQ 4 /* Channel is bound to a physical IRQ line. */
#define ECS_VIRQ 5 /* Channel is bound to a virtual IRQ line. */
#define ECS_IPI 6 /* Channel is bound to a virtual IPI line. */
+#define ECS_XEN 7 /* Channel ends in Xen */
u16 state; /* ECS_* */
u16 notify_vcpu_id; /* VCPU for local delivery notification */
union {
@@ -48,6 +49,7 @@ struct evtchn
} interdomain; /* state == ECS_INTERDOMAIN */
u16 pirq; /* state == ECS_PIRQ */
u16 virq; /* state == ECS_VIRQ */
+ int xen_port; /* state == ECS_XEN */
} u;
};
diff -r ecb8ff1fcf1f tools/ioemu/hw/xen_evtchn.c
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/ioemu/hw/xen_evtchn.c Tue Jul 18 13:43:27 2006 +0100
@@ -0,0 +1,160 @@
+/*
+ * XEN event channel fake pci devicel
+ *
+ * Copyright (c) 2003-2004 Intel Corp.
+ * Copyright (c) 2006 XenSource
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include "vl.h"
+
+#include <xenguest.h>
+#include <xc_private.h>
+
+extern FILE *logfile;
+
+extern int domid;
+extern int xc_handle;
+
+static unsigned ioport_base;
+
+static void evtchn_ioport_write(void *opaque, uint32_t addr, uint32_t val)
+{
+ DECLARE_DOM0_OP;
+ int rc;
+
+ switch (addr - ioport_base) {
+ case 0:
+ fprintf(logfile, "Init hypercall page %x, addr %x.\n", val, addr);
+ op.u.hypercall_init.domain = domid;
+ op.u.hypercall_init.gmfn = val;
+ op.cmd = DOM0_HYPERCALL_INIT;
+ rc = xc_dom0_op(xc_handle, &op);
+ fprintf(logfile, "result -> %d.\n", rc);
+ break;
+ default:
+ fprintf(logfile, "Write to bad port %x (base %x) on evtchn device.\n",
+ addr, ioport_base);
+ break;
+ }
+}
+
+static uint32_t evtchn_ioport_read(void *opaque, uint32_t addr)
+{
+ return 0;
+}
+
+static void evtchn_map(PCIDevice *pci_dev, int region_num,
+ uint32_t addr, uint32_t size, int type)
+{
+ ioport_base = addr;
+ register_ioport_write(addr, 16, 4, evtchn_ioport_write, NULL);
+ register_ioport_read(addr, 16, 1, evtchn_ioport_read, NULL);
+}
+
+static uint32_t xen_mmio_read(void *opaque, target_phys_addr_t addr)
+{
+ fprintf(logfile, "Warning: try read from evtchn mmio space\n");
+ return 0;
+}
+
+static void xen_mmio_write(void *opaque, target_phys_addr_t addr,
+ uint32_t val)
+{
+ fprintf(logfile, "Warning: try write to evtchn mmio space\n");
+ return;
+}
+
+static CPUReadMemoryFunc *xen_evtchn_mmio_read[3] = {
+ xen_mmio_read,
+ xen_mmio_read,
+ xen_mmio_read,
+};
+
+static CPUWriteMemoryFunc *xen_evtchn_mmio_write[3] = {
+ xen_mmio_write,
+ xen_mmio_write,
+ xen_mmio_write,
+};
+
+static void xen_evtchn_pci_mmio_map(PCIDevice *d, int region_num,
+ uint32_t addr, uint32_t size, int type)
+{
+ int mmio_io_addr;
+
+ mmio_io_addr = cpu_register_io_memory(0,
+ xen_evtchn_mmio_read,
+ xen_evtchn_mmio_write, NULL);
+
+ cpu_register_physical_memory(addr, 0x1000000, mmio_io_addr);
+}
+
+struct pci_config_header {
+ unsigned short vendor_id;
+ unsigned short device_id;
+ unsigned short command;
+ unsigned short status;
+ unsigned char revision;
+ unsigned char api;
+ unsigned char subclass;
+ unsigned char class;
+ unsigned char cache_line_size; /* Units of 32 bit words */
+ unsigned char latency_timer; /* In units of bus cycles */
+ unsigned char header_type; /* Should be 0 */
+ unsigned char bist; /* Built in self test */
+ unsigned long base_address_regs[6];
+ unsigned long reserved1;
+ unsigned long reserved2;
+ unsigned long rom_addr;
+ unsigned long reserved3;
+ unsigned long reserved4;
+ unsigned char interrupt_line;
+ unsigned char interrupt_pin;
+ unsigned char min_gnt;
+ unsigned char max_lat;
+};
+
+void pci_xen_evtchn_init(PCIBus *bus)
+{
+ PCIDevice *d;
+ struct pci_config_header *pch;
+
+ printf("Register xen evtchn.\n");
+ d = pci_register_device(bus, "xen-evtchn", sizeof(PCIDevice), -1, NULL,
+ NULL);
+ pch = (struct pci_config_header *)d->config;
+ pch->vendor_id = 0xfffd;
+ pch->device_id = 0x0101;
+ pch->command = 3; /* IO and memory access */
+ pch->revision = 0;
+ pch->api = 0;
+ pch->subclass = 0x80; /* Other */
+ pch->class = 0xff; /* Unclassified device class */
+ pch->header_type = 0;
+ pch->interrupt_pin = 1;
+
+ pci_register_io_region(d, 0, 0x100, PCI_ADDRESS_SPACE_IO, evtchn_map);
+
+ /* reserve 16MB mmio address for share memory*/
+ pci_register_io_region(d, 1, 0x1000000, PCI_ADDRESS_SPACE_MEM_PREFETCH,
+ xen_evtchn_pci_mmio_map);
+
+ register_savevm("evtchn", 0, 1, generic_pci_save, generic_pci_load, d);
+ printf("Done register evtchn.\n");
+}
diff -r ecb8ff1fcf1f xen/include/asm-x86/hvm/guest_access.h
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/include/asm-x86/hvm/guest_access.h Tue Jul 18 13:43:27 2006 +0100
@@ -0,0 +1,7 @@
+#ifndef __ASM_X86_HVM_GUEST_ACCESS_H__
+#define __ASM_X86_HVM_GUEST_ACCESS_H__
+
+unsigned long copy_to_user_hvm(void *to, const void *from, unsigned len);
+unsigned long copy_from_user_hvm(void *to, const void *from, unsigned len);
+
+#endif /* __ASM_X86_HVM_GUEST_ACCESS_H__ */
diff -r ecb8ff1fcf1f xen/include/public/hvm/params.h
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/xen/include/public/hvm/params.h Tue Jul 18 13:43:27 2006 +0100
@@ -0,0 +1,22 @@
+#ifndef PARAMS_H__
+#define PARAMS_H__
+
+#define HVM_NR_PARAMS 4
+
+#define HVM_PARAM_CALLBACK_IRQ 0
+#define HVM_PARAM_STORE_PFN 1
+#define HVM_PARAM_STORE_EVTCHN 2
+#define HVM_PARAM_APIC_ENABLED 3
+
+#define HVMOP_set_param 0
+#define HVMOP_get_param 1
+
+struct xen_hvm_param {
+ domid_t domid;
+ unsigned index;
+ unsigned long value;
+};
+typedef struct xen_hvm_param xen_hvm_param_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_param_t);
+
+#endif /* PARAMS_H__ */
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: Paravirtualised drivers for fully virtualised domains
2006-07-18 12:51 Steven Smith
@ 2006-07-18 13:45 ` Ben Thomas
2006-07-18 16:00 ` Steve Ofsthun
2006-07-26 15:34 ` Steven Smith
2 siblings, 0 replies; 22+ messages in thread
From: Ben Thomas @ 2006-07-18 13:45 UTC (permalink / raw)
To: Steven Smith; +Cc: xen-devel, sos22
[-- Attachment #1.1: Type: text/plain, Size: 5131 bytes --]
Steven,
This is very interesting. Thanks for posting it. It appears to closely
parallel work that we've been doing here and have mentioned at times on this
list. I believe that Steve Ofsthun posted some suggested patches in this
area a little while back. I'm looking forward to seeing how your patches
resolve the issues and feedback that was given to Steve.
I'm also interested to see how you resolve the 32/64 bit issues. I know that
there's some sensitivity here as one of the pieces of feedback about our
recent XI shadow posting was the current lack of 32bit support. A 64 bit
hypervisor on an hvm capable machine should be capable of concurrent support
of both 32 and 64 bit guest domains.
Thanks for posting this. We're looking forward to seeing your final
submissions.
Thanks !
-b
On 7/18/06, Steven Smith <sos22-xen@srcf.ucam.org> wrote:
>
> (The list appears to have eaten my previous attempt to send this.
> Apologies if you receive multiple copies.)
>
> The attached patches allow you to use paravirtualised network and
> block interfaces from fully virtualised domains, based on Intel's
> patches from a few months ago. These are significantly faster than
> the equivalent ioemu devices, sometimes by more than an order of
> magnitude.
>
> These drivers are explicitly not considered by XenSource to be an
> alternative to improving the performance of the ioemu devices.
> Rather, work on both will continue in parallel.
>
> To build, apply the three patches to a clean checkout of xen-unstable
> and then build Xen, dom0, and the tools in the usual way. To build
> the drivers themselves, you first need to build a native kernel for
> the guest, and then go
>
> cd xen-unstable.hg/unmodified-drivers/linux-2.6
> ./mkbuildtree
> make -C /usr/src/linux-2.6.16 M=$PWD modules
>
> where /usr/src/linux-2.6.16 is the path to the area where you built
> the guest kernel. This should be a native kernel, and not a xenolinux
> one. You should end up with four modules. xen-evtchn.ko should be
> loaded first, followed by xenbus.ko, and then whichever of xen-vnif.ko
> and xen-vbd.ko you need. None of the modules need any arguments.
>
> The xm configuration syntax is exactly the same as it would be for
> paravirtualised devices in a paravirtualised domain. For a network
> interface, you take your line
>
> vif= [ 'type=ioemu,mac=00:16:3E:C1:CA:78' ]
>
> (or whatever) and replace it with
>
> vif= [ 'type=ioemu,mac=00:16:3E:C1:CA:78', 'bridge=xenbr0' ]
>
> where bridge=xenbr0 should be some suitable netif configuration
> string, as it would be in the PV-on-PV case. Disk is likewise fairly
> simple:
>
> disk = [ 'file:/path/to/image,ioemu:hda,w' ]
>
> becomes
>
> disk = [ 'file:/path/to/image,ioemu:hda,w',
> 'file:/path/to/some/other/image,hde,w' ]
>
> There is a slight complication in that the paravirtualised block
> device can't share an IDE controller with an ioemu device, so if you
> have an ioemu hda, the paravirtualised device must be hde or later.
> This is to avoid confusing the Linux IDE driver.
>
> Note that having a PV device doesn't imply having a corresponding
> ioemu device, and vice versa. Configuring a single backing store to
> appear as both an IDE device and a paravirtualised block device is
> likely to cause problems; don't do it.
>
>
>
> The patches consist of a number of big parts:
>
> -- A version of netback and netfront which can copy packets into
> domains rather than doing page flipping. It's much easier to make
> this work well with qemu, since the P2M table doesn't need to
> change, and it can be faster for some workloads.
>
> The copying interface has been confirmed to work in paravirtualised
> domains, but is currently disabled there.
>
> -- Reworking the device model and hypervisor support so that iorequest
> completion notifications no longer go to the HVM guest's event
> channel mask. This avoids a whole slew of really quite nasty race
> conditions
>
> -- Adding a new device to the qemu PCI bus which is used for
> bootstrapping the devices and getting an IRQ.
>
> -- Support for hypercalls from HVM domains
>
> -- Various shims and fixes to the frontends so that they work without
> the rest of the xenolinux infrastructure.
>
> The patches still have a few rough edges, and they're not as easy to
> understand as I'd like, but I think they should be mostly
> comprehensible and reasonably stable. The plan is to add them to
> xen-unstable over the next few weeks, probably before 3.0.3, so any
> testing which anyone can do would be helpful.
>
> The Xen and tools changes are also available as a series of smaller
> patches at http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/hvm_xen . The
> composition of these gives hvm_xen_unstable.diff.
>
> Steven.
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.3 (GNU/Linux)
>
> iD8DBQFEvNk5O4S8/gLNrjcRAviLAJ0eS/1FZY+5ArbCrAaExsMrNAl9AQCgqyIp
> cRz5az+HktMS60u0qy+3dJA=
> =19b4
> -----END PGP SIGNATURE-----
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
>
>
[-- Attachment #1.2: Type: text/html, Size: 6010 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-07-18 12:51 Steven Smith
2006-07-18 13:45 ` Ben Thomas
@ 2006-07-18 16:00 ` Steve Ofsthun
2006-07-18 16:23 ` Mark Williamson
2006-07-18 20:34 ` Steven Smith
2006-07-26 15:34 ` Steven Smith
2 siblings, 2 replies; 22+ messages in thread
From: Steve Ofsthun @ 2006-07-18 16:00 UTC (permalink / raw)
To: Steven Smith; +Cc: xen-devel, sos22
Steven Smith wrote:
> The attached patches allow you to use paravirtualised network and
> block interfaces from fully virtualised domains, based on Intel's
> patches from a few months ago. These are significantly faster than
> the equivalent ioemu devices, sometimes by more than an order of
> magnitude.
Excellent work Steven!
I've been working on a similar set of patches and your effort seems
quite comprehensive. I do have a few questions:
Can you comment on the testing matrix you used? In particular, does
this patch address both 32-bit and 64-bit hypervisors? Can 32-bit
guests make 64-bit hypercalls?
Have you built the guest environment on anything other than a 2.6.16
version of Linux? We ran into extra work supporting older linux versions.
You did some work to make xenbus a loadable module in the guest domains.
Can this be used to make xenbus loadable in Domain 0?
> These drivers are explicitly not considered by XenSource to be an
> alternative to improving the performance of the ioemu devices.
> Rather, work on both will continue in parallel.
I agree. Both activities are worth developing.
> There is a slight complication in that the paravirtualised block
> device can't share an IDE controller with an ioemu device, so if you
> have an ioemu hda, the paravirtualised device must be hde or later.
> This is to avoid confusing the Linux IDE driver.
>
> Note that having a PV device doesn't imply having a corresponding
> ioemu device, and vice versa. Configuring a single backing store to
> appear as both an IDE device and a paravirtualised block device is
> likely to cause problems; don't do it.
Several problems exist here:
Domain 0 buffer cache coherency issues can cause catastrophic file
system corruption. This is due to the backend accessing the backing
device directly, and QEMU accessing the device through buffered reads
and writes. We are working on a patch to convert QEMU to use O_DIRECT
whenever possible. This solves the cache coherency issue.
Actually presenting two copies of the same device to linux can cause
its own problems. Mounting using LABEL= will complain about duplicate
labels. However, using the device names directly seems to work. With
this approach it is possible to decide in the guest whether to mount
a device as an emulated disk or a PV disk.
> The patches consist of a number of big parts:
>
> -- A version of netback and netfront which can copy packets into
> domains rather than doing page flipping. It's much easier to make
> this work well with qemu, since the P2M table doesn't need to
> change, and it can be faster for some workloads.
Recent patches to change QEMU to dynamically map memory may make this
easier. We still avoid it to prevent large guest pages from being
broken up (under the XI shadow code).
> The copying interface has been confirmed to work in paravirtualised
> domains, but is currently disabled there.
>
> -- Reworking the device model and hypervisor support so that iorequest
> completion notifications no longer go to the HVM guest's event
> channel mask. This avoids a whole slew of really quite nasty race
> conditions
This is great news. We were filtering iorequest bits out during guest
event notification delivery. Your method is much cleaner.
> -- Adding a new device to the qemu PCI bus which is used for
> bootstrapping the devices and getting an IRQ.
Have you thought about supporting more than one IRQ. We are experimenting
with an IRQ per device class (BUS, NIC, VBD).
> -- Support for hypercalls from HVM domains
>
> -- Various shims and fixes to the frontends so that they work without
> the rest of the xenolinux infrastructure.
>
> The patches still have a few rough edges, and they're not as easy to
> understand as I'd like, but I think they should be mostly
> comprehensible and reasonably stable. The plan is to add them to
> xen-unstable over the next few weeks, probably before 3.0.3, so any
> testing which anyone can do would be helpful.
This is a very good start!
Steve
--
Steve Ofsthun - Virtual Iron Software, Inc.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-07-18 16:00 ` Steve Ofsthun
@ 2006-07-18 16:23 ` Mark Williamson
2006-07-18 20:34 ` Steven Smith
1 sibling, 0 replies; 22+ messages in thread
From: Mark Williamson @ 2006-07-18 16:23 UTC (permalink / raw)
To: xen-devel; +Cc: Steve Ofsthun, sos22
> > These drivers are explicitly not considered by XenSource to be an
> > alternative to improving the performance of the ioemu devices.
> > Rather, work on both will continue in parallel.
>
> I agree. Both activities are worth developing.
There's lots of stuff still to be done to make the ioemu devices work better,
even if some users wish to use PV drivers directly some will still want the
simplicity of working "out of the box".
> Actually presenting two copies of the same device to linux can cause
> its own problems. Mounting using LABEL= will complain about duplicate
> labels. However, using the device names directly seems to work. With
> this approach it is possible to decide in the guest whether to mount
> a device as an emulated disk or a PV disk.
We should *really* have interlocks in dom0 to prevent a guest from accessing
both simultaneously :-)
Initially, we could just allow the user only to configure as either model, not
both (using a check in Xend, as we do for checking mounted partitions, etc).
To support what you propose we'd probably have to add a little control plane
stuff, but I think it'd be worth it to avoid too many people damaging stuff!
To mangle a quote I once saw online: duplicate device access can be used to
hunt both foot and game, but only one will feed your family.
Cheers,
Mark
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-07-18 16:00 ` Steve Ofsthun
2006-07-18 16:23 ` Mark Williamson
@ 2006-07-18 20:34 ` Steven Smith
2006-07-18 23:24 ` Steve Ofsthun
1 sibling, 1 reply; 22+ messages in thread
From: Steven Smith @ 2006-07-18 20:34 UTC (permalink / raw)
To: Steve Ofsthun; +Cc: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 5367 bytes --]
> >The attached patches allow you to use paravirtualised network and
> >block interfaces from fully virtualised domains, based on Intel's
> >patches from a few months ago. These are significantly faster than
> >the equivalent ioemu devices, sometimes by more than an order of
> >magnitude.
> I've been working on a similar set of patches and your effort seems
> quite comprehensive.
Yeah, we (XenSource and Virtual Iron) really need to do a better job
of coordinating who's working on what. :)
> I do have a few questions:
>
> Can you comment on the testing matrix you used? In particular, does
> this patch address both 32-bit and 64-bit hypervisors? Can 32-bit
> guests make 64-bit hypercalls?
This set of patches only deals with the 32 bit case. Further, the PAE
case depends on Tim Deegan's new shadow mode posted last week.
Sorry, I should have said that in the initial post.
> Have you built the guest environment on anything other than a 2.6.16
> version of Linux? We ran into extra work supporting older linux versions.
#ifdef soup will get you back to about 2.6.12-ish without too many
problems. These patches don't include that, since it would complicate
merging.
> You did some work to make xenbus a loadable module in the guest domains.
> Can this be used to make xenbus loadable in Domain 0?
I can't see any immediate reason why not, but it's not clear to me why
that would be useful.
> >There is a slight complication in that the paravirtualised block
> >device can't share an IDE controller with an ioemu device, so if you
> >have an ioemu hda, the paravirtualised device must be hde or later.
> >This is to avoid confusing the Linux IDE driver.
> >
> >Note that having a PV device doesn't imply having a corresponding
> >ioemu device, and vice versa. Configuring a single backing store to
> >appear as both an IDE device and a paravirtualised block device is
> >likely to cause problems; don't do it.
> Domain 0 buffer cache coherency issues can cause catastrophic file
> system corruption. This is due to the backend accessing the backing
> device directly, and QEMU accessing the device through buffered
> reads and writes. We are working on a patch to convert QEMU to use
> O_DIRECT whenever possible. This solves the cache coherency issue.
I wasn't aware of these issues. I was much more worried about domU
trying to cache the devices twice, and those caches getting out of
sync. It's pretty much the usual problem of configuring a device into
two domains and then having them trip over each other. Do you have a
plan for dealing with this?
> Actually presenting two copies of the same device to linux can cause
> its own problems. Mounting using LABEL= will complain about duplicate
> labels. However, using the device names directly seems to work. With
> this approach it is possible to decide in the guest whether to mount
> a device as an emulated disk or a PV disk.
My plan here was to just not support VMs which mix paravirtualised and
ioemulated devices, requiring the user to load the PV drivers from an
initrd. Of course, you have to load the initrd somehow, but the
bootloader should only be reading the disk, which makes the coherency
issues much easier. As a last resort, rombios could learn about the
PV devices, but I'd rather avoid that if possible.
Your way would be preferable, though, if it works.
> >The patches consist of a number of big parts:
> >
> >-- A version of netback and netfront which can copy packets into
> > domains rather than doing page flipping. It's much easier to make
> > this work well with qemu, since the P2M table doesn't need to
> > change, and it can be faster for some workloads.
> Recent patches to change QEMU to dynamically map memory may make this
> easier.
Yes, agreed. It should be possible to add this in later in a
backwards-compatible fashion.
> >-- Reworking the device model and hypervisor support so that iorequest
> > completion notifications no longer go to the HVM guest's event
> > channel mask. This avoids a whole slew of really quite nasty race
> > conditions
> This is great news. We were filtering iorequest bits out during guest
> event notification delivery. Your method is much cleaner.
Thank you.
> >-- Adding a new device to the qemu PCI bus which is used for
> > bootstrapping the devices and getting an IRQ.
> Have you thought about supporting more than one IRQ. We are experimenting
> with an IRQ per device class (BUS, NIC, VBD).
I considered it, but it wasn't obvious that there would be much
benefit. You can potentially scan a smaller part of the pending event
channel mask, but that's fairly quick already.
Steven.
> >-- Support for hypercalls from HVM domains
> >
> >-- Various shims and fixes to the frontends so that they work without
> > the rest of the xenolinux infrastructure.
> >
> >The patches still have a few rough edges, and they're not as easy to
> >understand as I'd like, but I think they should be mostly
> >comprehensible and reasonably stable. The plan is to add them to
> >xen-unstable over the next few weeks, probably before 3.0.3, so any
> >testing which anyone can do would be helpful.
>
> This is a very good start!
>
> Steve
> --
> Steve Ofsthun - Virtual Iron Software, Inc.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-07-18 20:34 ` Steven Smith
@ 2006-07-18 23:24 ` Steve Ofsthun
2006-07-19 6:50 ` Gerd Hoffmann
0 siblings, 1 reply; 22+ messages in thread
From: Steve Ofsthun @ 2006-07-18 23:24 UTC (permalink / raw)
To: Steven Smith; +Cc: xen-devel
Steven Smith wrote:
>>Have you built the guest environment on anything other than a 2.6.16
>>version of Linux? We ran into extra work supporting older linux versions.
>
> #ifdef soup will get you back to about 2.6.12-ish without too many
> problems. These patches don't include that, since it would complicate
> merging.
I was thinking about SLES9 (2.6.5), RHEL4 (2.6.9), RHEL3 (2.4.21).
>>You did some work to make xenbus a loadable module in the guest domains.
>>Can this be used to make xenbus loadable in Domain 0?
>
> I can't see any immediate reason why not, but it's not clear to me why
> that would be useful.
It just makes it easier to insert alternate bus implementations.
>>Domain 0 buffer cache coherency issues can cause catastrophic file
>>system corruption. This is due to the backend accessing the backing
>>device directly, and QEMU accessing the device through buffered
>>reads and writes. We are working on a patch to convert QEMU to use
>>O_DIRECT whenever possible. This solves the cache coherency issue.
>
> I wasn't aware of these issues. I was much more worried about domU
> trying to cache the devices twice, and those caches getting out of
> sync. It's pretty much the usual problem of configuring a device into
> two domains and then having them trip over each other. Do you have a
> plan for dealing with this?
We eliminate any buffer cache use in domain 0 for backing store objects.
This prevents double caching and reduces domain 0 's memory footprint.
We don't restrict multiple domain access to the same "raw" backing
object. Real hardware allows this (at least for SCSI/FC). This may be
necessary for shared storage clustering.
>>Actually presenting two copies of the same device to linux can cause
>>its own problems. Mounting using LABEL= will complain about duplicate
>>labels. However, using the device names directly seems to work. With
>>this approach it is possible to decide in the guest whether to mount
>>a device as an emulated disk or a PV disk.
>
> My plan here was to just not support VMs which mix paravirtualised and
> ioemulated devices, requiring the user to load the PV drivers from an
> initrd. Of course, you have to load the initrd somehow, but the
> bootloader should only be reading the disk, which makes the coherency
> issues much easier. As a last resort, rombios could learn about the
> PV devices, but I'd rather avoid that if possible.
>
> Your way would be preferable, though, if it works.
We currently only allow this for the boot device (mainly to avoid the
rombios work you mention). In addition, we make the qemu device only
visible to the rombios (and not the guest O/S) by controlling the IDE
probe logic in qemu.
>>>-- Adding a new device to the qemu PCI bus which is used for
>>> bootstrapping the devices and getting an IRQ.
>>
>>Have you thought about supporting more than one IRQ. We are experimenting
>>with an IRQ per device class (BUS, NIC, VBD).
>
> I considered it, but it wasn't obvious that there would be much
> benefit. You can potentially scan a smaller part of the pending event
> channel mask, but that's fairly quick already.
The main benefit we see is for legacy Linux variants that limit 1 CPU
per IRQ. Allowing additional IRQs increases the possible interrupt
processing concurrency. In addition, one interrupt class can't starve
another (on SMP guests).
Steve
--
Steve Ofsthun - Virtual Iron Software, Inc.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-07-18 12:51 Steven Smith
2006-07-18 13:45 ` Ben Thomas
2006-07-18 16:00 ` Steve Ofsthun
@ 2006-07-26 15:34 ` Steven Smith
2006-08-08 9:42 ` Steven Smith
2 siblings, 1 reply; 22+ messages in thread
From: Steven Smith @ 2006-07-26 15:34 UTC (permalink / raw)
To: xen-devel; +Cc: sos22
[-- Attachment #1.1: Type: text/plain, Size: 1263 bytes --]
I've just put an updated version of these patches up at
http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2 . There's also an
equivalent single big patch at
http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev2.combined . Thank you to
everyone who gave feedback on the previous version.
The main changes since last time are:
-- Support for SMP guests
-- Support for 64 bit guests on a 64 bit hypervisor
-- Partial support for 32 bit guests on a 64 bit hypervisor: the network
interface works, but the block device doesn't.
The block device can be made to work by #define'ing ALIEN_INTERFACES
in blkif.h, but drivers compiled in that way won't work with 32 on 32.
The problem here is that blkif_request_t contains extra padding in 64
bit builds, and so is a different size, and so the block ring layout
is different.
Other structures with similar problems are handled either by run time
tests in the drivers (shared_info_t) or translation wrappers in the
hypervisor (xen_feature_info_t, xen_add_to_physmap_t), but trying to
do this for the block rings would require far more painful and
extensive surgery. I'm inclined to stick with multiply compiling the
frontend drivers in the short term, although it'll obviously need
doing in a slightly less grotty way.
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-07-26 15:34 ` Steven Smith
@ 2006-08-08 9:42 ` Steven Smith
2006-08-09 18:05 ` Steve Dobbelstein
0 siblings, 1 reply; 22+ messages in thread
From: Steven Smith @ 2006-08-08 9:42 UTC (permalink / raw)
To: xen-devel; +Cc: sos22
[-- Attachment #1.1: Type: text/plain, Size: 215 bytes --]
I just put a new version of the PV-on-HVM patches up at
http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev8 . These are against
10968:51c227428166 and are otherwise largely unchanged from the
previous versions.
Steven.
[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Paravirtualised drivers for fully virtualised domains
2006-08-08 9:42 ` Steven Smith
@ 2006-08-09 18:05 ` Steve Dobbelstein
0 siblings, 0 replies; 22+ messages in thread
From: Steve Dobbelstein @ 2006-08-09 18:05 UTC (permalink / raw)
To: Steven Smith; +Cc: xen-devel
Steven Smith <sos22-xen@srcf.ucam.org> wrote on 08/08/2006 04:42:15 AM:
> I just put a new version of the PV-on-HVM patches up at
> http://www.cl.cam.ac.uk/~sos22/pv-on-hvm/rev8 . These are against
> 10968:51c227428166 and are otherwise largely unchanged from the
> previous versions.
>
> Steven.
I have been running some informal performance tests on the rev8 patches.
Thought I'd share my finding thus far.
I am finding that disk performance (sequential/random read/write) with the
PV xen-vbd driver in an HVM domain is pretty much equal to that of a PV
domain. Cool. Not surprising, but cool nonetheless.
At the moment I'm having trouble running a network test (netperf) of the PV
xen-vnif driver within our testing framework. I'll post those findings
when I get some reliable numbers. Testing on the rev2 version of the
patches showed pretty much equal network performance between running on a
PV driver in an HVM domain and a PV domain.
I am noticing two odd behaviors with the rev8 patches, though.
1. When I try to create a PV domain, the domain hangs on bootup displaying
repeated messages to the console:
netfront: Bad rx response id 1.
netfront: Bad rx response id 0.
netfront: Bad rx response id 1.
netfront: Bad rx response id 0.
...
I had to reboot from an unpatched changeset 10968 build to get the
performance numbers for a PV domain. (Hence, I am not comparing numbers
from the exact same code base, which is one reason why the tests are
"informal".)
I haven't dug into the cause of this problem yet.
2. When I destroy the HVM domain it stays in the zombie state.
dib:~ # xm list
Name ID Mem(MiB) VCPUs State Time(s)
Domain-0 0 768 1 r----- 2328.4
Zombie-hvm1 1 768 1 -----d 1502.6
I'm not sure how to debug this one. Any pointers would be helpful.
Steve D.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2006-08-09 18:05 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-02 10:35 Paravirtualised drivers for fully virtualised domains He, Qing
2006-08-03 6:59 ` Himanshu Raj
2006-08-03 9:35 ` Steven Smith
2006-08-04 6:13 ` Himanshu Raj
-- strict thread matches above, loose matches on Subject: below --
2006-08-02 9:49 He, Qing
2006-08-02 8:23 Zhao, Yunfeng
2006-08-02 8:56 ` Steven Hand
2006-08-02 9:37 ` Steven Smith
2006-08-02 8:01 He, Qing
2006-08-02 9:30 ` Steven Smith
2006-07-26 22:35 Nakajima, Jun
2006-07-19 4:14 Ian Pratt
2006-07-18 12:51 Steven Smith
2006-07-18 13:45 ` Ben Thomas
2006-07-18 16:00 ` Steve Ofsthun
2006-07-18 16:23 ` Mark Williamson
2006-07-18 20:34 ` Steven Smith
2006-07-18 23:24 ` Steve Ofsthun
2006-07-19 6:50 ` Gerd Hoffmann
2006-07-26 15:34 ` Steven Smith
2006-08-08 9:42 ` Steven Smith
2006-08-09 18:05 ` Steve Dobbelstein
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.