All of lore.kernel.org
 help / color / mirror / Atom feed
* [ALSA - driver 0001953]: Onboard Soundcard (STAC9221) records only noise
From: bugtrack @ 2006-03-22  7:04 UTC (permalink / raw)
  To: alsa-devel


The following issue has been SUBMITTED.
======================================================================
<https://bugtrack.alsa-project.org/alsa-bug/view.php?id=1953> 
======================================================================
Reported By:                derquader
Assigned To:                tiwai
======================================================================
Project:                    ALSA - driver
Issue ID:                   1953
Category:                   PCI - hda-intel
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     assigned
Distribution:               Opensuse 10.0
Kernel Version:             2.6.13-15.8
======================================================================
Date Submitted:             03-22-2006 08:04 CET
Last Modified:              03-22-2006 08:04 CET
======================================================================
Summary:                    Onboard Soundcard (STAC9221) records only noise
Description: 
I am using a Intel 945GCZ board with an integrated Sigmatel STAC9221
soundcard.

When I try to record sound from my microphone I only get noise (and if I
listen very closely there is my voice very quiet in the background).

I have tried to set the amplification to maxium which gives me loud noise
but does not make my voice sound any better :-/

(Recording works well with WinXP)
======================================================================

Issue History
Date Modified  Username       Field                    Change              
======================================================================
03-22-06 08:04 derquader      New Issue                                    
03-22-06 08:04 derquader      Distribution              => Opensuse 10.0   
03-22-06 08:04 derquader      Kernel Version            => 2.6.13-15.8     
======================================================================




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642

^ permalink raw reply

* RE: ALSA with OSK
From: Ishigami, Tatsuya @ 2006-03-22  7:01 UTC (permalink / raw)
  To: Dirk Behme; +Cc: linux-omap-open-source

Hello,

 During the OSS days, I think I had similar error on certain 
sampling rate, while other rates were just OK.  

 It might be interesting to try other sampling rates if you 
have the MP3 encoder handy.

 Regards,
 Ishigami

-----Original Message-----
From: Behalf Of Dirk Behme
Sent: Wednesday, March 22, 2006 3:53 PM
To: lamikr@cc.jyu.fi; linux-omap-open-source@linux.omap.com
Subject: Re: ALSA with OSK

lamikr wrote:
>>output: ioctl(SNDCTL_DSP_SYNC): Invalid argument
>
> How about if you try to play wav with aplay? Or by applying 3 alsa
> patches send to list 2 weeks ago?

Thanks! Does aplay play mp3? I applied the patches, but no 
difference:

# madplay /mnt/nfs/foo.mp3
MPEG Audio Decoder 0.15.2 (beta) - Copyright (C) 2000-2004 
Robert Leslie et al.
          Title: Foo
          Artist: Foo
           Album: Various - New Project
           Track: 1
           Genre: Blues
output: ioctl(SNDCTL_DSP_SYNC): Invalid argument
# lsmod | grep snd
snd_pcm_oss 47136 0 - Live 0xbf0ba000
snd_mixer_oss 14752 1 snd_pcm_oss, Live 0xbf0b5000
snd_omap_alsa_aic23 15804 0 - Live 0xbf0b0000
snd_pcm 79240 2 snd_pcm_oss,snd_omap_alsa_aic23, Live 0xbf09b000
snd_timer 21796 1 snd_pcm, Live 0xbf094000
snd_page_alloc 6312 1 snd_pcm, Live 0xbf091000
snd 45592 5 
snd_pcm_oss,snd_mixer_oss,snd_omap_alsa_aic23,snd_pcm,snd_timer, 
Live 0xbf084000
soundcore 8196 1 snd, Live 0xbf080000
#

Any other working mp3 player?

Best regards

Dirk

BTW: Any plans to apply the pending patches?
_______________________________________________
Linux-omap-open-source mailing list
Linux-omap-open-source@linux.omap.com
http://linux.omap.com/mailman/listinfo/linux-omap-open-source

^ permalink raw reply

* Re: 2.6.16-rc5: known regressions [TP 600X S3, vanilla DSDT]
From: Sanjoy Mahajan @ 2006-03-22  7:00 UTC (permalink / raw)
  To: Yu, Luming
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Tom Seeley,
	Dave Jones, Jiri Slaby, michael, mchehab, Brian Marete,
	Ryan Phillips, gregkh, Brown, Len, linux-acpi, Mark Lord,
	Randy Dunlap, jgarzik, Duncan, Pavlik Vojtech, Meelis Roos
In-Reply-To: <3ACA40606221794F80A5670F0AF15F840B417BB4@pdsmsx403>

I tried the following kernels (all with only THM0):

1. No other changes: It hangs, as before and as expected.

2. Commented out a large chunk of the UPDT() method:

diff -r ac2b38909dfa -r c431c477d3b6 dsdt/600x.dsl
--- a/dsdt/600x.dsl	Tue Mar 21 17:12:47 2006 -0500
+++ b/dsdt/600x.dsl	Wed Mar 22 00:22:21 2006 -0500
@@ -4132,19 +4132,6 @@ DefinitionBlock ("DSDT.aml", "DSDT", 1, 
                                 If (Acquire (I2CM, 0x0064)) {}
                                 Else
                                 {
-                                    Store (I2RB (Zero, 0x01, 0x04), Local7)
-                                    If (Local7)
-                                    {
-                                        Fatal (0x01, 0x80000003, Local7)
-                                    }
-                                    Else
-                                    {
-                                        Store (HBS0, TMP0)
-                                        Store (HBS2, TMP2)
-                                        Store (HBS6, TMP6)
-                                        Store (HBS7, TMP7)
-                                    }
-
                                     Release (I2CM)
                                 }
                             }

So now it just grabs and releases the I2CM lock.  

This kernel hung on the first sleep, but I couldn't reproduce that
behavior.  I tried two more boots, and each time it never hung.  I even
thought it might depend on the result of previous boots, so I tried
kernel #1 again and got the same hang, and then tried #2.  But it still
wouldn't hang.  So I went back through the serial console logs to check
whether I was hallucinating, and I was not.  This kernel had indeed hung
on the first sleep, but only the first time I booted it.

So I'm going to assume that it's okay, and that if it isn't okay, it's
because of another, more intermitten bug.

3. Commented out EC0.UPDT() call in THM0._THM, and this kernel was fine,
   which is how it behaved a couple days ago, and is what I expected.

I'm about to try a smaller change (continuing the bisect):

diff -r ac2b38909dfa -r f10a309b8385 dsdt/600x.dsl
--- a/dsdt/600x.dsl	Tue Mar 21 17:12:47 2006 -0500
+++ b/dsdt/600x.dsl	Wed Mar 22 01:44:12 2006 -0500
@@ -4137,13 +4137,6 @@ DefinitionBlock ("DSDT.aml", "DSDT", 1, 
                                     {
                                         Fatal (0x01, 0x80000003, Local7)
                                     }
-                                    Else
-                                    {
-                                        Store (HBS0, TMP0)
-                                        Store (HBS2, TMP2)
-                                        Store (HBS6, TMP6)
-                                        Store (HBS7, TMP7)
-                                    }
 
                                     Release (I2CM)
                                 }


After that bisection there's not much more to change in UPDT().
However, I can drill down into I2RB() because of this line in UPDT():

    Store (I2RB (Zero, 0x01, 0x04), Local7)

But I have no idea what's safe to experiment with in I2RB():

                    Method (I2RB, 3, NotSerialized)
                    {
                        Store (Arg0, HCSL)
                        Store (ShiftLeft (Arg1, 0x01), HMAD)
                        Store (Arg2, HMCM)
                        Store (0x0B, HMPR)
                        Return (CHKS ())
                    }


All those lines look like tricky hardware manipulations.

By the way, which debug_{level,layer} settings will show the lines of
the human-readable DSDT as they are executed?

-Sanjoy

^ permalink raw reply

* Re: [RFC, PATCH] avoid some atomics in open()/close() for monothreaded processes
From: Eric Dumazet @ 2006-03-22  6:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel
In-Reply-To: <20060321224140.7e40a380.akpm@osdl.org>

Andrew Morton a écrit :
> Eric Dumazet <dada1@cosmosbay.com> wrote:
>> Goal : Avoid some locking/unlocking 'struct files_struct'->file_lock for mono 
>> threaded processes.
>>
>> We define files_multithreaded() function .
>>
>> static inline int files_multithreaded(const struct files_struct *files)
>> {
>>         return sizeof(files->file_lock) > 0 && atomic_read(&files->count) > 1;
>> }
> 
> That's bascially sizeof(spinlock_t).  That's architecture dependent and
> varies wildly according to the day of week.

I used sizeof(files->file_lock) instead of sizeof(spinlock_t) because I found 
it more explicit , while not using ugly ifdefs.

> 
> It _might_ work in all situations - probably you checked that.  But I still
> wouldn't do it because it might break in the future.  Let's be explicit and
> stick the appropriate ifdefs in there.
> 
> I'd also consider renaming it to files_shared() - processes are
> multithreaded, not data structures.

Thanks for the feedback, I will redo the patch and test it on various 
platforms before resubmit (including performance data :) )

> 
> Once you're done with that we should change fget_light() and fput_light() to
> use this helper.  Separate patch.

Hum... this discussion is not relevant with fget_light() unless I mistaken.

Nowadays, this function doesnt take spinlock thanks to RCU

Eric

^ permalink raw reply

* [patch 08/23] ACPI: Add "ACPI" to motherboard resources in /proc/io{mem,port}
From: akpm @ 2006-03-22  6:30 UTC (permalink / raw)
  To: len.brown; +Cc: linux-acpi, akpm, bjorn.helgaas


From: Bjorn Helgaas <bjorn.helgaas@hp.com>

Add "ACPI" to motherboard resource allocation names, so people have a clue
about where to look.  And remove some trailing spaces.

Changes these /proc/iomem entries from this:

    ff5c1004-ff5c1007 : PM_TMR
    ff5c1008-ff5c100b : PM1a_EVT_BLK
    ff5c100c-ff5c100d : PM1a_CNT_BLK
    ff5c1010-ff5c1013 : GPE0_BLK
    ff5c1014-ff5c1017 : GPE1_BLK

to this:

    ff5c1004-ff5c1007 : ACPI PM_TMR
    ff5c1008-ff5c100b : ACPI PM1a_EVT_BLK
    ff5c100c-ff5c100d : ACPI PM1a_CNT_BLK
    ff5c1010-ff5c1013 : ACPI GPE0_BLK
    ff5c1014-ff5c1017 : ACPI GPE1_BLK

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/acpi/motherboard.c |   22 +++++++++++-----------
 1 files changed, 11 insertions(+), 11 deletions(-)

diff -puN drivers/acpi/motherboard.c~acpi-add-acpi-to-motherboard-resources-in-proc-iomemport drivers/acpi/motherboard.c
--- devel/drivers/acpi/motherboard.c~acpi-add-acpi-to-motherboard-resources-in-proc-iomemport	2006-03-21 22:30:05.000000000 -0800
+++ devel-akpm/drivers/acpi/motherboard.c	2006-03-21 22:30:05.000000000 -0800
@@ -37,7 +37,7 @@ ACPI_MODULE_NAME("acpi_motherboard")
 #define ACPI_MB_HID2			"PNP0C02"
 /**
  * Doesn't care about legacy IO ports, only IO ports beyond 0x1000 are reserved
- * Doesn't care about the failure of 'request_region', since other may reserve 
+ * Doesn't care about the failure of 'request_region', since other may reserve
  * the io ports as well
  */
 #define IS_RESERVED_ADDR(base, len) \
@@ -46,7 +46,7 @@ ACPI_MODULE_NAME("acpi_motherboard")
 /*
  * Clearing the flag (IORESOURCE_BUSY) allows drivers to use
  * the io ports if they really know they can use it, while
- * still preventing hotplug PCI devices from using it. 
+ * still preventing hotplug PCI devices from using it.
  */
 static acpi_status acpi_reserve_io_ranges(struct acpi_resource *res, void *data)
 {
@@ -138,39 +138,39 @@ static void __init acpi_request_region (
 static void __init acpi_reserve_resources(void)
 {
 	acpi_request_region(&acpi_gbl_FADT->xpm1a_evt_blk,
-			       acpi_gbl_FADT->pm1_evt_len, "PM1a_EVT_BLK");
+			       acpi_gbl_FADT->pm1_evt_len, "ACPI PM1a_EVT_BLK");
 
 	acpi_request_region(&acpi_gbl_FADT->xpm1b_evt_blk,
-			       acpi_gbl_FADT->pm1_evt_len, "PM1b_EVT_BLK");
+			       acpi_gbl_FADT->pm1_evt_len, "ACPI PM1b_EVT_BLK");
 
 	acpi_request_region(&acpi_gbl_FADT->xpm1a_cnt_blk,
-			       acpi_gbl_FADT->pm1_cnt_len, "PM1a_CNT_BLK");
+			       acpi_gbl_FADT->pm1_cnt_len, "ACPI PM1a_CNT_BLK");
 
 	acpi_request_region(&acpi_gbl_FADT->xpm1b_cnt_blk,
-			       acpi_gbl_FADT->pm1_cnt_len, "PM1b_CNT_BLK");
+			       acpi_gbl_FADT->pm1_cnt_len, "ACPI PM1b_CNT_BLK");
 
 	if (acpi_gbl_FADT->pm_tm_len == 4)
-		acpi_request_region(&acpi_gbl_FADT->xpm_tmr_blk, 4, "PM_TMR");
+		acpi_request_region(&acpi_gbl_FADT->xpm_tmr_blk, 4, "ACPI PM_TMR");
 
 	acpi_request_region(&acpi_gbl_FADT->xpm2_cnt_blk,
-			       acpi_gbl_FADT->pm2_cnt_len, "PM2_CNT_BLK");
+			       acpi_gbl_FADT->pm2_cnt_len, "ACPI PM2_CNT_BLK");
 
 	/* Length of GPE blocks must be a non-negative multiple of 2 */
 
 	if (!(acpi_gbl_FADT->gpe0_blk_len & 0x1))
 		acpi_request_region(&acpi_gbl_FADT->xgpe0_blk,
-			       acpi_gbl_FADT->gpe0_blk_len, "GPE0_BLK");
+			       acpi_gbl_FADT->gpe0_blk_len, "ACPI GPE0_BLK");
 
 	if (!(acpi_gbl_FADT->gpe1_blk_len & 0x1))
 		acpi_request_region(&acpi_gbl_FADT->xgpe1_blk,
-			       acpi_gbl_FADT->gpe1_blk_len, "GPE1_BLK");
+			       acpi_gbl_FADT->gpe1_blk_len, "ACPI GPE1_BLK");
 }
 
 static int __init acpi_motherboard_init(void)
 {
 	acpi_bus_register_driver(&acpi_motherboard_driver1);
 	acpi_bus_register_driver(&acpi_motherboard_driver2);
-	/* 
+	/*
 	 * Guarantee motherboard IO reservation first
 	 * This module must run after scan.c
 	 */
_

^ permalink raw reply

* [patch 07/23] ACPI: request correct fixed hardware resource type (MMIO vs I/O port)
From: akpm @ 2006-03-22  6:30 UTC (permalink / raw)
  To: len.brown; +Cc: linux-acpi, akpm, bjorn.helgaas


From: Bjorn Helgaas <bjorn.helgaas@hp.com>

ACPI supports fixed hardware (PM_TMR, GPE blocks, etc) in either I/O port
or MMIO space, but used to always request the regions from I/O space
because it didn't check the address_space_id.

Sample ACPI fixed hardware in MMIO space (HP rx2600), was incorrectly
reported in /proc/ioports, now reported in /proc/iomem:

    ff5c1004-ff5c1007 : PM_TMR
    ff5c1008-ff5c100b : PM1a_EVT_BLK
    ff5c100c-ff5c100d : PM1a_CNT_BLK
    ff5c1010-ff5c1013 : GPE0_BLK
    ff5c1014-ff5c1017 : GPE1_BLK

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 drivers/acpi/motherboard.c |   41 +++++++++++++++++++----------------
 1 files changed, 23 insertions(+), 18 deletions(-)

diff -puN drivers/acpi/motherboard.c~acpi-request-correct-fixed-hardware-resource-type-mmio-vs-i-o-port drivers/acpi/motherboard.c
--- devel/drivers/acpi/motherboard.c~acpi-request-correct-fixed-hardware-resource-type-mmio-vs-i-o-port	2006-03-21 22:30:04.000000000 -0800
+++ devel-akpm/drivers/acpi/motherboard.c	2006-03-21 22:30:04.000000000 -0800
@@ -123,41 +123,46 @@ static struct acpi_driver acpi_motherboa
 		},
 };
 
+static void __init acpi_request_region (struct acpi_generic_address *addr,
+	unsigned int length, char *desc)
+{
+	if (!addr->address || !length)
+		return;
+
+	if (addr->address_space_id == ACPI_ADR_SPACE_SYSTEM_IO)
+		request_region(addr->address, length, desc);
+	else if (addr->address_space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY)
+		request_mem_region(addr->address, length, desc);
+}
+
 static void __init acpi_reserve_resources(void)
 {
-	if (acpi_gbl_FADT->xpm1a_evt_blk.address && acpi_gbl_FADT->pm1_evt_len)
-		request_region(acpi_gbl_FADT->xpm1a_evt_blk.address,
+	acpi_request_region(&acpi_gbl_FADT->xpm1a_evt_blk,
 			       acpi_gbl_FADT->pm1_evt_len, "PM1a_EVT_BLK");
 
-	if (acpi_gbl_FADT->xpm1b_evt_blk.address && acpi_gbl_FADT->pm1_evt_len)
-		request_region(acpi_gbl_FADT->xpm1b_evt_blk.address,
+	acpi_request_region(&acpi_gbl_FADT->xpm1b_evt_blk,
 			       acpi_gbl_FADT->pm1_evt_len, "PM1b_EVT_BLK");
 
-	if (acpi_gbl_FADT->xpm1a_cnt_blk.address && acpi_gbl_FADT->pm1_cnt_len)
-		request_region(acpi_gbl_FADT->xpm1a_cnt_blk.address,
+	acpi_request_region(&acpi_gbl_FADT->xpm1a_cnt_blk,
 			       acpi_gbl_FADT->pm1_cnt_len, "PM1a_CNT_BLK");
 
-	if (acpi_gbl_FADT->xpm1b_cnt_blk.address && acpi_gbl_FADT->pm1_cnt_len)
-		request_region(acpi_gbl_FADT->xpm1b_cnt_blk.address,
+	acpi_request_region(&acpi_gbl_FADT->xpm1b_cnt_blk,
 			       acpi_gbl_FADT->pm1_cnt_len, "PM1b_CNT_BLK");
 
-	if (acpi_gbl_FADT->xpm_tmr_blk.address && acpi_gbl_FADT->pm_tm_len == 4)
-		request_region(acpi_gbl_FADT->xpm_tmr_blk.address, 4, "PM_TMR");
+	if (acpi_gbl_FADT->pm_tm_len == 4)
+		acpi_request_region(&acpi_gbl_FADT->xpm_tmr_blk, 4, "PM_TMR");
 
-	if (acpi_gbl_FADT->xpm2_cnt_blk.address && acpi_gbl_FADT->pm2_cnt_len)
-		request_region(acpi_gbl_FADT->xpm2_cnt_blk.address,
+	acpi_request_region(&acpi_gbl_FADT->xpm2_cnt_blk,
 			       acpi_gbl_FADT->pm2_cnt_len, "PM2_CNT_BLK");
 
 	/* Length of GPE blocks must be a non-negative multiple of 2 */
 
-	if (acpi_gbl_FADT->xgpe0_blk.address && acpi_gbl_FADT->gpe0_blk_len &&
-	    !(acpi_gbl_FADT->gpe0_blk_len & 0x1))
-		request_region(acpi_gbl_FADT->xgpe0_blk.address,
+	if (!(acpi_gbl_FADT->gpe0_blk_len & 0x1))
+		acpi_request_region(&acpi_gbl_FADT->xgpe0_blk,
 			       acpi_gbl_FADT->gpe0_blk_len, "GPE0_BLK");
 
-	if (acpi_gbl_FADT->xgpe1_blk.address && acpi_gbl_FADT->gpe1_blk_len &&
-	    !(acpi_gbl_FADT->gpe1_blk_len & 0x1))
-		request_region(acpi_gbl_FADT->xgpe1_blk.address,
+	if (!(acpi_gbl_FADT->gpe1_blk_len & 0x1))
+		acpi_request_region(&acpi_gbl_FADT->xgpe1_blk,
 			       acpi_gbl_FADT->gpe1_blk_len, "GPE1_BLK");
 }
 
_

^ permalink raw reply

* [Qemu-devel] QEMU crash
From: georg.skillas @ 2006-03-22  6:56 UTC (permalink / raw)
  To: qemu-devel


[-- Attachment #1.1: Type: text/plain, Size: 609 bytes --]

Hello 

I am using version 0.7.2 with kqemu upto date without any problems. 
However in the morning (computer was running all night with a qemu 
calculation) I found the following error displayed and had to shut down 
qemu. What gives?

George Skillas


A short translation: Command "0x0048..." points to memory at "0x0049..." 
Due to an I/O error in "0x00...20c" the data was not written to RAM.
 
-- - - -  -  -  -   -   -    -       -        -
Degussa AG               Tel.: +49 6181 595223
Georg Skillas            FAX : +49 6181 5975223
VT-C 1024-319
Rodenbacher Chaussee 4
D-63457 Hanau-Wolfgang
Germany

[-- Attachment #1.2: Type: text/html, Size: 1156 bytes --]

[-- Attachment #2: 20060322_crash.PNG --]
[-- Type: application/octet-stream, Size: 15634 bytes --]

^ permalink raw reply

* xm86 syscall
From: D. Hazelton @ 2006-03-22  6:54 UTC (permalink / raw)
  To: Linux Kernel Mailing List

I've been looking through the kernel, reading the man pages, studying the 
headers and I still cannot find enough information on the vm86 syscall in 
order to be able to use it in one of my private projects. Can anyone help me 
with information on how it's setup to handle interrupts and the rest?

DRH

^ permalink raw reply

* Re: ALSA with OSK
From: Dirk Behme @ 2006-03-22  6:53 UTC (permalink / raw)
  To: lamikr, linux-omap-open-source
In-Reply-To: <442072B2.5050005@cc.jyu.fi>

lamikr wrote:
>>output: ioctl(SNDCTL_DSP_SYNC): Invalid argument
>
> How about if you try to play wav with aplay? Or by applying 3 alsa
> patches send to list 2 weeks ago?

Thanks! Does aplay play mp3? I applied the patches, but no 
difference:

# madplay /mnt/nfs/foo.mp3
MPEG Audio Decoder 0.15.2 (beta) - Copyright (C) 2000-2004 
Robert Leslie et al.
          Title: Foo
          Artist: Foo
           Album: Various - New Project
           Track: 1
           Genre: Blues
output: ioctl(SNDCTL_DSP_SYNC): Invalid argument
# lsmod | grep snd
snd_pcm_oss 47136 0 - Live 0xbf0ba000
snd_mixer_oss 14752 1 snd_pcm_oss, Live 0xbf0b5000
snd_omap_alsa_aic23 15804 0 - Live 0xbf0b0000
snd_pcm 79240 2 snd_pcm_oss,snd_omap_alsa_aic23, Live 0xbf09b000
snd_timer 21796 1 snd_pcm, Live 0xbf094000
snd_page_alloc 6312 1 snd_pcm, Live 0xbf091000
snd 45592 5 
snd_pcm_oss,snd_mixer_oss,snd_omap_alsa_aic23,snd_pcm,snd_timer, 
Live 0xbf084000
soundcore 8196 1 snd, Live 0xbf080000
#

Any other working mp3 player?

Best regards

Dirk

BTW: Any plans to apply the pending patches?

^ permalink raw reply

* [RFC PATCH 01/35] Add XEN config options and disable unsupported config options.
From: Chris Wright @ 2006-03-22  6:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 00-config-xen --]
[-- Type: text/plain, Size: 4210 bytes --]

The XEN config option is selected from the i386 subarch menu by
choosing the X86_XEN "Xen-compatible" subarch.

The XEN_SHADOW_MODE option defines the memory virtualization mode for
the kernel -- with it enabled, the kernel expects the hypervisor to
perform translation between pseudo-physical and machine addresses on
its behalf.

The disabled config options are:
- DOUBLEFAULT: are trapped by Xen and not virtualized
- HZ: defaults to 100 in Xen VMs
- Power management: not supported in unprivileged VMs
- SMP: not supported in this set of patches
- X86_{UP,LOCAL,IO}_APIC: not supported in unprivileged VMs

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 arch/i386/Kconfig      |   18 ++++++++++++++----
 drivers/xen/Kconfig    |   21 +++++++++++++++++++++
 kernel/Kconfig.hz      |    4 ++--
 kernel/Kconfig.preempt |    1 +
 4 files changed, 38 insertions(+), 6 deletions(-)

--- xen-subarch-2.6.orig/arch/i386/Kconfig
+++ xen-subarch-2.6/arch/i386/Kconfig
@@ -58,6 +58,12 @@ config X86_PC
 	help
 	  Choose this option if your computer is a standard PC or compatible.
 
+config X86_XEN
+	bool "Xen-compatible"
+	help
+	  Choose this option if you plan to run this kernel on top of the
+	  Xen Hypervisor.
+
 config X86_ELAN
 	bool "AMD Elan"
 	help
@@ -175,6 +181,7 @@ config HPET_EMULATE_RTC
 
 config SMP
 	bool "Symmetric multi-processing support"
+	depends on !X86_XEN
 	---help---
 	  This enables support for systems with more than one CPU. If you have
 	  a system with only one CPU, like most personal computers, say N. If
@@ -230,7 +237,7 @@ source "kernel/Kconfig.preempt"
 
 config X86_UP_APIC
 	bool "Local APIC support on uniprocessors"
-	depends on !SMP && !(X86_VISWS || X86_VOYAGER)
+	depends on !SMP && !(X86_VISWS || X86_VOYAGER || X86_XEN)
 	help
 	  A local APIC (Advanced Programmable Interrupt Controller) is an
 	  integrated interrupt controller in the CPU. If you have a single-CPU
@@ -255,12 +262,12 @@ config X86_UP_IOAPIC
 
 config X86_LOCAL_APIC
 	bool
-	depends on X86_UP_APIC || ((X86_VISWS || SMP) && !X86_VOYAGER)
+	depends on X86_UP_APIC || ((X86_VISWS || SMP) && !(X86_VOYAGER || X86_XEN))
 	default y
 
 config X86_IO_APIC
 	bool
-	depends on X86_UP_IOAPIC || (SMP && !(X86_VISWS || X86_VOYAGER))
+	depends on X86_UP_IOAPIC || (SMP && !(X86_VISWS || X86_VOYAGER || X86_XEN))
 	default y
 
 config X86_VISWS_APIC
@@ -743,6 +750,7 @@ config HOTPLUG_CPU
 config DOUBLEFAULT
 	default y
 	bool "Enable doublefault exception handler" if EMBEDDED
+	depends on !X86_XEN
 	help
           This option allows trapping of rare doublefault exceptions that
           would otherwise cause a system to silently reboot. Disabling this
@@ -753,7 +761,7 @@ endmenu
 
 
 menu "Power management options (ACPI, APM)"
-	depends on !X86_VOYAGER
+	depends on !(X86_VOYAGER || X86_XEN)
 
 source kernel/power/Kconfig
 
@@ -1075,6 +1083,8 @@ source "security/Kconfig"
 
 source "crypto/Kconfig"
 
+source "drivers/xen/Kconfig"
+
 source "lib/Kconfig"
 
 #
--- xen-subarch-2.6.orig/kernel/Kconfig.hz
+++ xen-subarch-2.6/kernel/Kconfig.hz
@@ -3,7 +3,7 @@
 #
 
 choice
-	prompt "Timer frequency"
+	prompt "Timer frequency" if !XEN
 	default HZ_250
 	help
 	 Allows the configuration of the timer frequency. It is customary
@@ -40,7 +40,7 @@ endchoice
 
 config HZ
 	int
-	default 100 if HZ_100
+	default 100 if HZ_100 || XEN
 	default 250 if HZ_250
 	default 1000 if HZ_1000
 
--- xen-subarch-2.6.orig/kernel/Kconfig.preempt
+++ xen-subarch-2.6/kernel/Kconfig.preempt
@@ -35,6 +35,7 @@ config PREEMPT_VOLUNTARY
 
 config PREEMPT
 	bool "Preemptible Kernel (Low-Latency Desktop)"
+	depends on !XEN
 	help
 	  This option reduces the latency of the kernel by making
 	  all kernel code (that is not executing in a critical section)
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/Kconfig
@@ -0,0 +1,21 @@
+#
+# This Kconfig describe xen options
+#
+
+mainmenu "Xen Configuration"
+
+config XEN
+	bool
+	default y if X86_XEN
+	help
+	  This is the Linux Xen port.
+
+if XEN
+
+config XEN_SHADOW_MODE
+	bool
+	default y
+	help
+	  Fakes out a shadow mode kernel
+
+endif

--

^ permalink raw reply

* [RFC PATCH 10/35] Add a new head.S start-of-day file for booting on Xen.
From: Chris Wright @ 2006-03-22  6:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 09-i386-head.S --]
[-- Type: text/plain, Size: 7230 bytes --]

When running on Xen, the kernel is started with paging enabled.  Also
don't check for cpu features which are present on all cpus supported
by Xen.

Don't define segments which are not supported when running on Xen.

Define the __xen_guest section which exports information about the
kernel to the domain builder.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 arch/i386/Makefile          |    7 +
 arch/i386/mach-xen/Makefile |    2 
 arch/i386/mach-xen/head.S   |  184 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 191 insertions(+), 2 deletions(-)

--- xen-subarch-2.6.orig/arch/i386/Makefile
+++ xen-subarch-2.6/arch/i386/Makefile
@@ -45,6 +45,9 @@ CFLAGS				+= $(shell if [ $(call cc-vers
 
 CFLAGS += $(cflags-y)
 
+# Default subarch head files
+head-y := arch/i386/kernel/head.o arch/i386/kernel/init_task.o
+
 # Default subarch .c files
 mcore-y  := mach-default
 
@@ -71,6 +74,8 @@ mcore-$(CONFIG_X86_SUMMIT)  := mach-defa
 # Xen subarch support
 mflags-$(CONFIG_X86_XEN)	:= -Iinclude/asm-i386/mach-xen
 mcore-$(CONFIG_X86_XEN)		:= mach-xen
+head-$(CONFIG_X86_XEN)		:= arch/i386/mach-xen/head.o \
+				   arch/i386/kernel/init_task.o
 
 # generic subarchitecture
 mflags-$(CONFIG_X86_GENERICARCH) := -Iinclude/asm-i386/mach-generic
@@ -85,8 +90,6 @@ core-$(CONFIG_X86_ES7000)	:= arch/i386/m
 # default subarch .h files
 mflags-y += -Iinclude/asm-i386/mach-default
 
-head-y := arch/i386/kernel/head.o arch/i386/kernel/init_task.o
-
 libs-y 					+= arch/i386/lib/
 core-y					+= arch/i386/kernel/ \
 					   arch/i386/mm/ \
--- xen-subarch-2.6.orig/arch/i386/mach-xen/Makefile
+++ xen-subarch-2.6/arch/i386/mach-xen/Makefile
@@ -2,6 +2,8 @@
 # Makefile for the linux kernel.
 #
 
+extra-y				:= head.o
+
 obj-y				:= setup.o
  
 setup-y				:= ../mach-default/setup.o
--- /dev/null
+++ xen-subarch-2.6/arch/i386/mach-xen/head.S
@@ -0,0 +1,184 @@
+
+
+.text
+#include <linux/config.h>
+#include <linux/threads.h>
+#include <linux/linkage.h>
+#include <asm/segment.h>
+#include <asm/page.h>
+#include <asm/thread_info.h>
+#include <asm/asm-offsets.h>
+#include <xen/interface/arch-x86_32.h>
+
+/*
+ * References to members of the new_cpu_data structure.
+ */
+
+#define X86		new_cpu_data+CPUINFO_x86
+#define X86_VENDOR	new_cpu_data+CPUINFO_x86_vendor
+#define X86_MODEL	new_cpu_data+CPUINFO_x86_model
+#define X86_MASK	new_cpu_data+CPUINFO_x86_mask
+#define X86_HARD_MATH	new_cpu_data+CPUINFO_hard_math
+#define X86_CPUID	new_cpu_data+CPUINFO_cpuid_level
+#define X86_CAPABILITY	new_cpu_data+CPUINFO_x86_capability
+#define X86_VENDOR_ID	new_cpu_data+CPUINFO_x86_vendor_id
+
+ENTRY(startup_32)
+	movl %esi,xen_start_info
+	cld
+
+	/* Set up the stack pointer */
+	movl $(init_thread_union+THREAD_SIZE),%esp
+
+	/* get vendor info */
+	xorl %eax,%eax			# call CPUID with 0 -> return vendor ID
+	cpuid
+	movl %eax,X86_CPUID		# save CPUID level
+	movl %ebx,X86_VENDOR_ID		# lo 4 chars
+	movl %edx,X86_VENDOR_ID+4	# next 4 chars
+	movl %ecx,X86_VENDOR_ID+8	# last 4 chars
+
+	movl $1,%eax		# Use the CPUID instruction to get CPU type
+	cpuid
+	movb %al,%cl		# save reg for future use
+	andb $0x0f,%ah		# mask processor family
+	movb %ah,X86
+	andb $0xf0,%al		# mask model
+	shrb $4,%al
+	movb %al,X86_MODEL
+	andb $0x0f,%cl		# mask mask revision
+	movb %cl,X86_MASK
+	movl %edx,X86_CAPABILITY
+
+	movb $1,X86_HARD_MATH
+
+	xorl %eax,%eax			# Clear FS/GS and LDT
+	movl %eax,%fs
+	movl %eax,%gs
+	cld			# gcc2 wants the direction flag cleared at all times
+
+	call start_kernel
+L6:
+	jmp L6			# main should never return here, but
+				# just in case, we know what happens.
+
+#define HYPERCALL_PAGE_OFFSET 0x1000
+.org HYPERCALL_PAGE_OFFSET
+ENTRY(hypercall_page)
+.skip 0x1000
+
+/*
+ * Real beginning of normal "text" segment
+ */
+ENTRY(stext)
+ENTRY(_stext)
+
+/*
+ * BSS section
+ */
+.section ".bss.page_aligned","w"
+ENTRY(swapper_pg_dir)
+	.fill 1024,4,0
+ENTRY(empty_zero_page)
+	.fill 4096,1,0
+
+/*
+ * This starts the data section.
+ */
+.data
+
+	ALIGN
+	.word 0				# 32 bit align gdt_desc.address
+	.globl cpu_gdt_descr
+cpu_gdt_descr:
+	.word GDT_SIZE
+	.long cpu_gdt_table
+
+	.fill NR_CPUS-1,8,0		# space for the other GDT descriptors
+
+/*
+ * The Global Descriptor Table contains 28 quadwords, per-CPU.
+ */
+	.align PAGE_SIZE_asm
+ENTRY(cpu_gdt_table)
+	.quad 0x0000000000000000	/* NULL descriptor */
+	.quad 0x0000000000000000	/* 0x0b reserved */
+	.quad 0x0000000000000000	/* 0x13 reserved */
+	.quad 0x0000000000000000	/* 0x1b reserved */
+	.quad 0x0000000000000000	/* 0x20 unused */
+	.quad 0x0000000000000000	/* 0x28 unused */
+	.quad 0x0000000000000000	/* 0x33 TLS entry 1 */
+	.quad 0x0000000000000000	/* 0x3b TLS entry 2 */
+	.quad 0x0000000000000000	/* 0x43 TLS entry 3 */
+	.quad 0x0000000000000000	/* 0x4b reserved */
+	.quad 0x0000000000000000	/* 0x53 reserved */
+	.quad 0x0000000000000000	/* 0x5b reserved */
+
+	.quad 0x00cf9a000000ffff	/* 0x60 kernel 4GB code at 0x00000000 */
+	.quad 0x00cf92000000ffff	/* 0x68 kernel 4GB data at 0x00000000 */
+	.quad 0x00cffa000000ffff	/* 0x73 user 4GB code at 0x00000000 */
+	.quad 0x00cff2000000ffff	/* 0x7b user 4GB data at 0x00000000 */
+
+	.quad 0x0000000000000000	/* 0x80 TSS descriptor */
+	.quad 0x0000000000000000	/* 0x88 LDT descriptor */
+
+	/*
+	 * Segments used for calling PnP BIOS have byte granularity.
+	 * They code segments and data segments have fixed 64k limits,
+	 * the transfer segment sizes are set at run time.
+	 */
+	.quad 0x0000000000000000	/* 0x90 32-bit code */
+	.quad 0x0000000000000000	/* 0x98 16-bit code */
+	.quad 0x0000000000000000	/* 0xa0 16-bit data */
+	.quad 0x0000000000000000	/* 0xa8 16-bit data */
+	.quad 0x0000000000000000	/* 0xb0 16-bit data */
+
+	/*
+	 * The APM segments have byte granularity and their bases
+	 * are set at run time.  All have 64k limits.
+	 */
+	.quad 0x0000000000000000	/* 0xb8 APM CS    code */
+	.quad 0x0000000000000000	/* 0xc0 APM CS 16 code (16 bit) */
+	.quad 0x0000000000000000	/* 0xc8 APM DS    data */
+
+	.quad 0x0000000000000000	/* 0xd0 - ESPFIX 16-bit SS */
+	.quad 0x0000000000000000	/* 0xd8 - unused */
+	.quad 0x0000000000000000	/* 0xe0 - unused */
+	.quad 0x0000000000000000	/* 0xe8 - unused */
+	.quad 0x0000000000000000	/* 0xf0 - unused */
+	.quad 0x0000000000000000	/* 0xf8 - GDT entry 31: double-fault TSS */
+
+	/* Be sure this is zeroed to avoid false validations in Xen */
+	.fill PAGE_SIZE_asm / 8 - GDT_ENTRIES,8,0
+
+
+/*
+ * __xen_guest information
+ */
+.macro utoa value
+ .if (\value) < 0 || (\value) >= 0x10
+	utoa (((\value)>>4)&0x0fffffff)
+ .endif
+ .if ((\value) & 0xf) < 10
+  .byte '0' + ((\value) & 0xf)
+ .else
+  .byte 'A' + ((\value) & 0xf) - 10
+ .endif
+.endm
+
+.section __xen_guest
+	.ascii	"GUEST_OS=linux,GUEST_VER=2.6"
+	.ascii	",XEN_VER=xen-3.0"
+	.ascii	",VIRT_BASE=0x"
+		utoa __PAGE_OFFSET
+	.ascii	",HYPERCALL_PAGE=0x"
+		utoa ((__PHYSICAL_START+HYPERCALL_PAGE_OFFSET)>>PAGE_SHIFT)
+	.ascii  ",FEATURES=!writable_page_tables"
+	.ascii	         "|!auto_translated_physmap"
+#ifdef CONFIG_X86_PAE
+	.ascii	",PAE=yes"
+#else
+	.ascii	",PAE=no"
+#endif
+	.ascii	",LOADER=generic"
+	.byte	0

--

^ permalink raw reply

* [RFC PATCH 04/35] Hypervisor interface header files.
From: Chris Wright @ 2006-03-22  6:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 03-i386-hypervisor-interface --]
[-- Type: text/plain, Size: 10820 bytes --]

Define macros and inline functions for doing hypercalls into the
hypervisor.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 include/asm-i386/hypercall.h  |  306 ++++++++++++++++++++++++++++++++++++++++++
 include/asm-i386/hypervisor.h |   67 +++++++++
 2 files changed, 373 insertions(+)

--- /dev/null
+++ xen-subarch-2.6/include/asm-i386/hypercall.h
@@ -0,0 +1,306 @@
+/******************************************************************************
+ * hypercall.h
+ *
+ * Linux-specific hypervisor handling.
+ *
+ * Copyright (c) 2002-2004, K A Fraser
+ *
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __HYPERCALL_H__
+#define __HYPERCALL_H__
+
+#include <linux/config.h>
+
+#include <xen/interface/xen.h>
+#include <xen/interface/sched.h>
+
+#define __STR(x) #x
+#define STR(x) __STR(x)
+
+#define _hypercall0(type, name)			\
+({						\
+	long __res;				\
+	asm volatile (				\
+		"call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)"\
+		: "=a" (__res)			\
+		:				\
+		: "memory" );			\
+	(type)__res;				\
+})
+
+#define _hypercall1(type, name, a1)				\
+({								\
+	long __res, __ign1;					\
+	asm volatile (						\
+		"call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)"\
+		: "=a" (__res), "=b" (__ign1)			\
+		: "1" ((long)(a1))				\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall2(type, name, a1, a2)				\
+({								\
+	long __res, __ign1, __ign2;				\
+	asm volatile (						\
+		"call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)"\
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2)	\
+		: "1" ((long)(a1)), "2" ((long)(a2))		\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall3(type, name, a1, a2, a3)			\
+({								\
+	long __res, __ign1, __ign2, __ign3;			\
+	asm volatile (						\
+		"call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)"\
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2), 	\
+		"=d" (__ign3)					\
+		: "1" ((long)(a1)), "2" ((long)(a2)),		\
+		"3" ((long)(a3))				\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall4(type, name, a1, a2, a3, a4)			\
+({								\
+	long __res, __ign1, __ign2, __ign3, __ign4;		\
+	asm volatile (						\
+		"call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)"\
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2),	\
+		"=d" (__ign3), "=S" (__ign4)			\
+		: "1" ((long)(a1)), "2" ((long)(a2)),		\
+		"3" ((long)(a3)), "4" ((long)(a4))		\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+#define _hypercall5(type, name, a1, a2, a3, a4, a5)		\
+({								\
+	long __res, __ign1, __ign2, __ign3, __ign4, __ign5;	\
+	asm volatile (						\
+		"call hypercall_page + ("STR(__HYPERVISOR_##name)" * 32)"\
+		: "=a" (__res), "=b" (__ign1), "=c" (__ign2),	\
+		"=d" (__ign3), "=S" (__ign4), "=D" (__ign5)	\
+		: "1" ((long)(a1)), "2" ((long)(a2)),		\
+		"3" ((long)(a3)), "4" ((long)(a4)),		\
+		"5" ((long)(a5))				\
+		: "memory" );					\
+	(type)__res;						\
+})
+
+static inline int
+HYPERVISOR_set_trap_table(
+	struct trap_info *table)
+{
+	return _hypercall1(int, set_trap_table, table);
+}
+
+static inline int
+HYPERVISOR_mmu_update(
+	struct mmu_update *req, int count, int *success_count, domid_t domid)
+{
+	return _hypercall4(int, mmu_update, req, count, success_count, domid);
+}
+
+static inline int
+HYPERVISOR_mmuext_op(
+	struct mmuext_op *op, int count, int *success_count, domid_t domid)
+{
+	return _hypercall4(int, mmuext_op, op, count, success_count, domid);
+}
+
+static inline int
+HYPERVISOR_set_gdt(
+	unsigned long *frame_list, int entries)
+{
+	return _hypercall2(int, set_gdt, frame_list, entries);
+}
+
+static inline int
+HYPERVISOR_stack_switch(
+	unsigned long ss, unsigned long esp)
+{
+	return _hypercall2(int, stack_switch, ss, esp);
+}
+
+static inline int
+HYPERVISOR_set_callbacks(
+	unsigned long event_selector, unsigned long event_address,
+	unsigned long failsafe_selector, unsigned long failsafe_address)
+{
+	return _hypercall4(int, set_callbacks,
+			   event_selector, event_address,
+			   failsafe_selector, failsafe_address);
+}
+
+static inline int
+HYPERVISOR_fpu_taskswitch(
+	int set)
+{
+	return _hypercall1(int, fpu_taskswitch, set);
+}
+
+static inline int
+HYPERVISOR_sched_op(
+	int cmd, unsigned long arg)
+{
+	return _hypercall2(int, sched_op, cmd, arg);
+}
+
+static inline long
+HYPERVISOR_set_timer_op(
+	u64 timeout)
+{
+	unsigned long timeout_hi = (unsigned long)(timeout>>32);
+	unsigned long timeout_lo = (unsigned long)timeout;
+	return _hypercall2(long, set_timer_op, timeout_lo, timeout_hi);
+}
+
+static inline int
+HYPERVISOR_set_debugreg(
+	int reg, unsigned long value)
+{
+	return _hypercall2(int, set_debugreg, reg, value);
+}
+
+static inline unsigned long
+HYPERVISOR_get_debugreg(
+	int reg)
+{
+	return _hypercall1(unsigned long, get_debugreg, reg);
+}
+
+static inline int
+HYPERVISOR_update_descriptor(
+	u64 ma, u64 desc)
+{
+	return _hypercall4(int, update_descriptor, ma, ma>>32, desc, desc>>32);
+}
+
+static inline int
+HYPERVISOR_memory_op(
+	unsigned int cmd, void *arg)
+{
+	return _hypercall2(int, memory_op, cmd, arg);
+}
+
+static inline int
+HYPERVISOR_multicall(
+	void *call_list, int nr_calls)
+{
+	return _hypercall2(int, multicall, call_list, nr_calls);
+}
+
+static inline int
+HYPERVISOR_update_va_mapping(
+	unsigned long va, pte_t new_val, unsigned long flags)
+{
+	unsigned long pte_hi = 0;
+#ifdef CONFIG_X86_PAE
+	pte_hi = new_val.pte_high;
+#endif
+	return _hypercall4(int, update_va_mapping, va,
+			   new_val.pte_low, pte_hi, flags);
+}
+
+static inline int
+HYPERVISOR_event_channel_op(
+	void *op)
+{
+	return _hypercall1(int, event_channel_op, op);
+}
+
+static inline int
+HYPERVISOR_xen_version(
+	int cmd, void *arg)
+{
+	return _hypercall2(int, xen_version, cmd, arg);
+}
+
+static inline int
+HYPERVISOR_console_io(
+	int cmd, int count, char *str)
+{
+	return _hypercall3(int, console_io, cmd, count, str);
+}
+
+static inline int
+HYPERVISOR_physdev_op(
+	void *physdev_op)
+{
+	return _hypercall1(int, physdev_op, physdev_op);
+}
+
+static inline int
+HYPERVISOR_grant_table_op(
+	unsigned int cmd, void *uop, unsigned int count)
+{
+	return _hypercall3(int, grant_table_op, cmd, uop, count);
+}
+
+static inline int
+HYPERVISOR_update_va_mapping_otherdomain(
+	unsigned long va, pte_t new_val, unsigned long flags, domid_t domid)
+{
+	unsigned long pte_hi = 0;
+#ifdef CONFIG_X86_PAE
+	pte_hi = new_val.pte_high;
+#endif
+	return _hypercall5(int, update_va_mapping_otherdomain, va,
+			   new_val.pte_low, pte_hi, flags, domid);
+}
+
+static inline int
+HYPERVISOR_vm_assist(
+	unsigned int cmd, unsigned int type)
+{
+	return _hypercall2(int, vm_assist, cmd, type);
+}
+
+static inline int
+HYPERVISOR_vcpu_op(
+	int cmd, int vcpuid, void *extra_args)
+{
+	return _hypercall3(int, vcpu_op, cmd, vcpuid, extra_args);
+}
+
+static inline int
+HYPERVISOR_suspend(
+	unsigned long srec)
+{
+	return _hypercall3(int, sched_op, SCHEDOP_shutdown,
+			   SHUTDOWN_suspend, srec);
+}
+
+static inline int
+HYPERVISOR_nmi_op(
+	unsigned long op,
+	unsigned long arg)
+{
+	return _hypercall2(int, nmi_op, op, arg);
+}
+
+#endif /* __HYPERCALL_H__ */
--- /dev/null
+++ xen-subarch-2.6/include/asm-i386/hypervisor.h
@@ -0,0 +1,67 @@
+/******************************************************************************
+ * hypervisor.h
+ *
+ * Linux-specific hypervisor handling.
+ *
+ * Copyright (c) 2002-2004, K A Fraser
+ *
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __HYPERVISOR_H__
+#define __HYPERVISOR_H__
+
+#include <linux/config.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/version.h>
+
+#include <xen/interface/xen.h>
+#include <xen/interface/version.h>
+#include <xen/features.h>
+
+#include <asm/ptrace.h>
+#include <asm/page.h>
+#if defined(__i386__)
+#  ifdef CONFIG_X86_PAE
+#   include <asm-generic/pgtable-nopud.h>
+#  else
+#   include <asm-generic/pgtable-nopmd.h>
+#  endif
+#endif
+
+/* arch/i386/kernel/setup.c */
+extern struct shared_info *HYPERVISOR_shared_info;
+extern struct start_info *xen_start_info;
+
+/* arch/i386/mach-xen/evtchn.c */
+/* Force a proper event-channel callback from Xen. */
+extern void force_evtchn_callback(void);
+
+/* Turn jiffies into Xen system time. */
+u64 jiffies_to_st(unsigned long jiffies);
+
+#include <asm/hypercall.h>
+
+#define xen_init()	(0)
+
+#endif /* __HYPERVISOR_H__ */

--

^ permalink raw reply

* [RFC PATCH 06/35] Add vmlinuz build target.
From: Chris Wright @ 2006-03-22  6:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 05-boot-xen --]
[-- Type: text/plain, Size: 2022 bytes --]

The vmlinuz image is a stripped and compressed kernel image, it is
smaller than the vmlinux image and the Xen domain builder supports
loading compressed images directly.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 arch/i386/Makefile      |    5 +++--
 arch/i386/boot/Makefile |   10 +++++++++-
 2 files changed, 12 insertions(+), 3 deletions(-)

--- xen-subarch-2.6.orig/arch/i386/Makefile
+++ xen-subarch-2.6/arch/i386/Makefile
@@ -105,15 +105,16 @@ CPPFLAGS += $(mflags-y)
 boot := arch/i386/boot
 
 .PHONY: zImage bzImage compressed zlilo bzlilo \
-	zdisk bzdisk fdimage fdimage144 fdimage288 install
+	zdisk bzdisk fdimage fdimage144 fdimage288 install vmlinuz
 
 all: bzImage
 
 # KBUILD_IMAGE specify target image being built
                     KBUILD_IMAGE := $(boot)/bzImage
 zImage zlilo zdisk: KBUILD_IMAGE := arch/i386/boot/zImage
+vmlinuz:            KBUILD_IMAGE := $(boot)/vmlinuz
 
-zImage bzImage: vmlinux
+zImage bzImage vmlinuz: vmlinux
 	$(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE)
 
 compressed: zImage
--- xen-subarch-2.6.orig/arch/i386/boot/Makefile
+++ xen-subarch-2.6/arch/i386/boot/Makefile
@@ -26,7 +26,7 @@ SVGA_MODE := -DSVGA_MODE=NORMAL_VGA
 #RAMDISK := -DRAMDISK=512
 
 targets		:= vmlinux.bin bootsect bootsect.o \
-		   setup setup.o zImage bzImage
+		   setup setup.o zImage bzImage vmlinuz
 subdir- 	:= compressed
 
 hostprogs-y	:= tools/build
@@ -100,5 +100,13 @@ zlilo: $(BOOTIMAGE)
 	cp System.map $(INSTALL_PATH)/
 	if [ -x /sbin/lilo ]; then /sbin/lilo; else /etc/lilo/install; fi
 
+$(obj)/vmlinuz: $(obj)/vmlinux-stripped FORCE
+	$(call if_changed,gzip)
+	@echo 'Kernel: $@ is ready' ' (#'`cat .version`')'
+
+$(obj)/vmlinux-stripped: OBJCOPYFLAGS := -g --strip-unneeded
+$(obj)/vmlinux-stripped: vmlinux FORCE
+	$(call if_changed,objcopy)
+
 install:
 	sh $(srctree)/$(src)/install.sh $(KERNELRELEASE) $(BOOTIMAGE) System.map "$(INSTALL_PATH)"

--

^ permalink raw reply

* [RFC PATCH 23/35] Add support for Xen event channels.
From: Chris Wright @ 2006-03-22  6:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 22-evtchn --]
[-- Type: text/plain, Size: 31967 bytes --]

Support Xen event channels instead of the i8259 PIC.

Event channels are used to inject events into the kernel, either from
the hypervisor or from another VM.  The injected events are mapped to
interrupts.

If an event needs to be injected, the hypervisor causes an upcall into
the kernel.  The upcall handler then scans the event pending bitmap
and calls do_IRQ for each pending event.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 arch/i386/kernel/Makefile               |    6 
 arch/i386/mach-xen/evtchn.c             |  844 ++++++++++++++++++++++++++++++++
 drivers/xen/core/Makefile               |    2 
 include/asm-i386/hw_irq.h               |    4 
 include/asm-i386/mach-xen/irq_vectors.h |  126 ++++
 include/xen/evtchn.h                    |  113 ++++
 6 files changed, 1093 insertions(+), 2 deletions(-)

--- xen-subarch-2.6.orig/arch/i386/kernel/Makefile
+++ xen-subarch-2.6/arch/i386/kernel/Makefile
@@ -5,7 +5,7 @@
 extra-y := head.o init_task.o vmlinux.lds
 
 obj-y	:= process.o semaphore.o signal.o entry.o traps.o irq.o \
-		ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
+		ptrace.o time.o ioport.o ldt.o setup.o hw_irq.o sys_i386.o \
 		pci-dma.o i386_ksyms.o i387.o dmi_scan.o bootflag.o \
 		quirks.o i8237.o topology.o
 
@@ -42,6 +42,10 @@ EXTRA_AFLAGS   := -traditional
 
 obj-$(CONFIG_SCx200)		+= scx200.o
 
+hw_irq-y			:= i8259.o
+
+hw_irq-$(CONFIG_XEN)		:= ../mach-xen/evtchn.o
+
 # vsyscall.o contains the vsyscall DSO images as __initdata.
 # We must build both images before we can assemble it.
 # Note: kbuild does not track this dependency due to usage of .incbin
--- xen-subarch-2.6.orig/include/asm-i386/hw_irq.h
+++ xen-subarch-2.6/include/asm-i386/hw_irq.h
@@ -68,7 +68,9 @@ extern atomic_t irq_mis_count;
 
 #define IO_APIC_IRQ(x) (((x) >= 16) || ((1<<(x)) & io_apic_irqs))
 
-#if defined(CONFIG_X86_IO_APIC)
+#if defined(CONFIG_X86_XEN)
+extern void hw_resend_irq(struct hw_interrupt_type *h, unsigned int i);
+#elif defined(CONFIG_X86_IO_APIC)
 static inline void hw_resend_irq(struct hw_interrupt_type *h, unsigned int i)
 {
 	if (IO_APIC_IRQ(i))
--- /dev/null
+++ xen-subarch-2.6/arch/i386/mach-xen/evtchn.c
@@ -0,0 +1,844 @@
+/******************************************************************************
+ * evtchn.c
+ * 
+ * Communication via Xen event channels.
+ * 
+ * Copyright (c) 2002-2005, K A Fraser
+ * 
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/config.h>
+#include <linux/module.h>
+#include <linux/irq.h>
+#include <linux/interrupt.h>
+#include <linux/sched.h>
+#include <linux/kernel_stat.h>
+#include <linux/version.h>
+#include <asm/atomic.h>
+#include <asm/system.h>
+#include <asm/ptrace.h>
+#include <asm/bitops.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/event_channel.h>
+#include <xen/interface/physdev.h>
+#include <asm/hypervisor.h>
+#include <xen/evtchn.h>
+#include <linux/mc146818rtc.h> /* RTC_IRQ */
+
+/*
+ * This lock protects updates to the following mapping and reference-count
+ * arrays. The lock does not need to be acquired to read the mapping tables.
+ */
+static spinlock_t irq_mapping_update_lock;
+
+/* IRQ <-> event-channel mappings. */
+static int evtchn_to_irq[NR_EVENT_CHANNELS];
+
+/* Packed IRQ information: binding type, sub-type index, and event channel. */
+static u32 irq_info[NR_IRQS];
+/* Binding types. */
+enum { IRQT_UNBOUND, IRQT_PIRQ, IRQT_VIRQ, IRQT_IPI, IRQT_EVTCHN };
+/* Constructor for packed IRQ information. */
+#define mk_irq_info(type, index, evtchn)				\
+	(((u32)(type) << 24) | ((u32)(index) << 16) | (u32)(evtchn))
+/* Convenient shorthand for packed representation of an unbound IRQ. */
+#define IRQ_UNBOUND	mk_irq_info(IRQT_UNBOUND, 0, 0)
+/* Accessor macros for packed IRQ information. */
+#define evtchn_from_irq(irq) ((u16)(irq_info[irq]))
+#define index_from_irq(irq)  ((u8)(irq_info[irq] >> 16))
+#define type_from_irq(irq)   ((u8)(irq_info[irq] >> 24))
+
+/* IRQ <-> VIRQ mapping. */
+DEFINE_PER_CPU(int, virq_to_irq[NR_VIRQS]);
+
+/* IRQ <-> IPI mapping. */
+#ifndef NR_IPIS
+#define NR_IPIS 1
+#endif
+DEFINE_PER_CPU(int, ipi_to_irq[NR_IPIS]);
+
+/* Reference counts for bindings to IRQs. */
+static int irq_bindcount[NR_IRQS];
+
+/* Bitmap indicating which PIRQs require Xen to be notified on unmask. */
+static unsigned long pirq_needs_unmask_notify[NR_PIRQS/sizeof(unsigned long)];
+
+#ifdef CONFIG_SMP
+
+static u8 cpu_evtchn[NR_EVENT_CHANNELS];
+static unsigned long cpu_evtchn_mask[NR_CPUS][NR_EVENT_CHANNELS/BITS_PER_LONG];
+
+#define active_evtchns(cpu,sh,idx)		\
+	((sh)->evtchn_pending[idx] &		\
+	 cpu_evtchn_mask[cpu][idx] &		\
+	 ~(sh)->evtchn_mask[idx])
+
+static void bind_evtchn_to_cpu(unsigned int chn, unsigned int cpu)
+{
+	clear_bit(chn, (unsigned long *)cpu_evtchn_mask[cpu_evtchn[chn]]);
+	set_bit(chn, (unsigned long *)cpu_evtchn_mask[cpu]);
+	cpu_evtchn[chn] = cpu;
+}
+
+static void init_evtchn_cpu_bindings(void)
+{
+	/* By default all event channels notify CPU#0. */
+	memset(cpu_evtchn, 0, sizeof(cpu_evtchn));
+	memset(cpu_evtchn_mask[0], ~0, sizeof(cpu_evtchn_mask[0]));
+}
+
+#define cpu_from_evtchn(evtchn)		(cpu_evtchn[evtchn])
+
+#else
+
+#define active_evtchns(cpu,sh,idx)		\
+	((sh)->evtchn_pending[idx] &		\
+	 ~(sh)->evtchn_mask[idx])
+#define bind_evtchn_to_cpu(chn,cpu)	((void)0)
+#define init_evtchn_cpu_bindings()	((void)0)
+#define cpu_from_evtchn(evtchn)		(0)
+
+#endif
+
+/* Upcall to generic IRQ layer. */
+#ifdef CONFIG_X86
+extern fastcall unsigned int do_IRQ(struct pt_regs *regs);
+#if defined (__i386__)
+static inline void exit_idle(void) {}
+#define IRQ_REG orig_eax
+#elif defined (__x86_64__)
+#include <asm/idle.h>
+#define IRQ_REG orig_rax
+#endif
+#define do_IRQ(irq, regs) do {			\
+	(regs)->IRQ_REG = (irq);		\
+	do_IRQ((regs));				\
+} while (0)
+#endif
+
+/* Xen will never allocate port zero for any purpose. */
+#define VALID_EVTCHN(chn)	((chn) != 0)
+
+/*
+ * Force a proper event-channel callback from Xen after clearing the
+ * callback mask. We do this in a very simple manner, by making a call
+ * down into Xen. The pending flag will be checked by Xen on return.
+ */
+void force_evtchn_callback(void)
+{
+	(void)HYPERVISOR_xen_version(0, NULL);
+}
+EXPORT_SYMBOL(force_evtchn_callback);
+
+/* NB. Interrupts are disabled on entry. */
+asmlinkage void evtchn_do_upcall(struct pt_regs *regs)
+{
+	unsigned long  l1, l2;
+	unsigned int   l1i, l2i, port;
+	int            irq, cpu = smp_processor_id();
+	struct shared_info *s = HYPERVISOR_shared_info;
+	struct vcpu_info *vcpu_info = &s->vcpu_info[cpu];
+
+	vcpu_info->evtchn_upcall_pending = 0;
+
+	/* NB. No need for a barrier here -- XCHG is a barrier on x86. */
+	l1 = xchg(&vcpu_info->evtchn_pending_sel, 0);
+	while (l1 != 0) {
+		l1i = __ffs(l1);
+		l1 &= ~(1UL << l1i);
+
+		while ((l2 = active_evtchns(cpu, s, l1i)) != 0) {
+			l2i = __ffs(l2);
+
+			port = (l1i * BITS_PER_LONG) + l2i;
+			if ((irq = evtchn_to_irq[port]) != -1)
+				do_IRQ(irq, regs);
+			else {
+				exit_idle();
+#ifdef CONFIG_XEN_EVTCHN_DEVICE
+				evtchn_device_upcall(port);
+#else
+				mask_evtchn(port);
+#endif
+			}
+		}
+	}
+}
+
+static int find_unbound_irq(void)
+{
+	int irq;
+
+	for (irq = 0; irq < NR_IRQS; irq++)
+		if (irq_bindcount[irq] == 0)
+			break;
+
+	if (irq == NR_IRQS) {
+		printk(KERN_ERR "No available IRQ to bind to: increase NR_IRQS!\n");
+		irq = -EINVAL;
+	}
+
+	return irq;
+}
+
+static int bind_evtchn_to_irq(unsigned int evtchn)
+{
+	int irq;
+
+	spin_lock(&irq_mapping_update_lock);
+
+	if ((irq = evtchn_to_irq[evtchn]) == -1) {
+		irq = find_unbound_irq();
+		if (irq < 0)
+			goto out;
+		evtchn_to_irq[evtchn] = irq;
+		irq_info[irq] = mk_irq_info(IRQT_EVTCHN, 0, evtchn);
+	}
+
+	irq_bindcount[irq]++;
+out:
+	spin_unlock(&irq_mapping_update_lock);
+
+	return irq;
+}
+
+static int bind_virq_to_irq(unsigned int virq, unsigned int cpu)
+{
+	struct evtchn_op op = { .cmd = EVTCHNOP_bind_virq };
+	int evtchn, irq;
+
+	spin_lock(&irq_mapping_update_lock);
+
+	if ((irq = per_cpu(virq_to_irq, cpu)[virq]) == -1) {
+		irq = find_unbound_irq();
+		if (irq < 0)
+			goto out;
+
+		op.u.bind_virq.virq = virq;
+		op.u.bind_virq.vcpu = cpu;
+		BUG_ON(HYPERVISOR_event_channel_op(&op) != 0);
+		evtchn = op.u.bind_virq.port;
+
+		evtchn_to_irq[evtchn] = irq;
+		irq_info[irq] = mk_irq_info(IRQT_VIRQ, virq, evtchn);
+
+		per_cpu(virq_to_irq, cpu)[virq] = irq;
+
+		bind_evtchn_to_cpu(evtchn, cpu);
+	}
+
+	irq_bindcount[irq]++;
+out:
+	spin_unlock(&irq_mapping_update_lock);
+
+	return irq;
+}
+
+static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
+{
+	struct evtchn_op op = { .cmd = EVTCHNOP_bind_ipi };
+	int evtchn, irq;
+
+	spin_lock(&irq_mapping_update_lock);
+
+	if ((irq = per_cpu(ipi_to_irq, cpu)[ipi]) == -1) {
+		irq = find_unbound_irq();
+		if (irq < 0)
+			goto out;
+
+		op.u.bind_ipi.vcpu = cpu;
+		BUG_ON(HYPERVISOR_event_channel_op(&op) != 0);
+		evtchn = op.u.bind_ipi.port;
+
+		evtchn_to_irq[evtchn] = irq;
+		irq_info[irq] = mk_irq_info(IRQT_IPI, ipi, evtchn);
+
+		per_cpu(ipi_to_irq, cpu)[ipi] = irq;
+
+		bind_evtchn_to_cpu(evtchn, cpu);
+	}
+
+	irq_bindcount[irq]++;
+out:
+	spin_unlock(&irq_mapping_update_lock);
+
+	return irq;
+}
+
+static void unbind_from_irq(unsigned int irq)
+{
+	struct evtchn_op op = { .cmd = EVTCHNOP_close };
+	int evtchn = evtchn_from_irq(irq);
+
+	spin_lock(&irq_mapping_update_lock);
+
+	if ((--irq_bindcount[irq] == 0) && VALID_EVTCHN(evtchn)) {
+		op.u.close.port = evtchn;
+		BUG_ON(HYPERVISOR_event_channel_op(&op) != 0);
+
+		switch (type_from_irq(irq)) {
+		case IRQT_VIRQ:
+			per_cpu(virq_to_irq, cpu_from_evtchn(evtchn))
+				[index_from_irq(irq)] = -1;
+			break;
+		case IRQT_IPI:
+			per_cpu(ipi_to_irq, cpu_from_evtchn(evtchn))
+				[index_from_irq(irq)] = -1;
+			break;
+		default:
+			break;
+		}
+
+		/* Closed ports are implicitly re-bound to VCPU0. */
+		bind_evtchn_to_cpu(evtchn, 0);
+
+		evtchn_to_irq[evtchn] = -1;
+		irq_info[irq] = IRQ_UNBOUND;
+	}
+
+	spin_unlock(&irq_mapping_update_lock);
+}
+
+int bind_evtchn_to_irqhandler(
+	unsigned int evtchn,
+	irqreturn_t (*handler)(int, void *, struct pt_regs *),
+	unsigned long irqflags,
+	const char *devname,
+	void *dev_id)
+{
+	unsigned int irq;
+	int retval;
+
+	irq = bind_evtchn_to_irq(evtchn);
+	if (irq < 0)
+		goto out;
+
+	retval = request_irq(irq, handler, irqflags, devname, dev_id);
+	if (retval != 0) {
+		unbind_from_irq(irq);
+		return retval;
+	}
+out:
+	return irq;
+}
+EXPORT_SYMBOL(bind_evtchn_to_irqhandler);
+
+int bind_virq_to_irqhandler(
+	unsigned int virq,
+	unsigned int cpu,
+	irqreturn_t (*handler)(int, void *, struct pt_regs *),
+	unsigned long irqflags,
+	const char *devname,
+	void *dev_id)
+{
+	unsigned int irq;
+	int retval;
+
+	irq = bind_virq_to_irq(virq, cpu);
+	if (irq < 0)
+		goto out;
+
+	retval = request_irq(irq, handler, irqflags, devname, dev_id);
+	if (retval != 0) {
+		unbind_from_irq(irq);
+		return retval;
+	}
+out:
+	return irq;
+}
+EXPORT_SYMBOL(bind_virq_to_irqhandler);
+
+int bind_ipi_to_irqhandler(
+	unsigned int ipi,
+	unsigned int cpu,
+	irqreturn_t (*handler)(int, void *, struct pt_regs *),
+	unsigned long irqflags,
+	const char *devname,
+	void *dev_id)
+{
+	unsigned int irq;
+	int retval;
+
+	irq = bind_ipi_to_irq(ipi, cpu);
+	if (irq < 0)
+		goto out;
+
+	retval = request_irq(irq, handler, irqflags, devname, dev_id);
+	if (retval != 0) {
+		unbind_from_irq(irq);
+		return retval;
+	}
+out:
+	return irq;
+}
+EXPORT_SYMBOL(bind_ipi_to_irqhandler);
+
+void unbind_from_irqhandler(unsigned int irq, void *dev_id)
+{
+	free_irq(irq, dev_id);
+	unbind_from_irq(irq);
+}
+EXPORT_SYMBOL(unbind_from_irqhandler);
+
+#ifdef CONFIG_SMP
+static void do_nothing_function(void *ign)
+{
+}
+#endif
+
+/* Rebind an evtchn so that it gets delivered to a specific cpu */
+static void rebind_irq_to_cpu(unsigned irq, unsigned tcpu)
+{
+	struct evtchn_op op = { .cmd = EVTCHNOP_bind_vcpu };
+	int evtchn;
+
+	spin_lock(&irq_mapping_update_lock);
+
+	evtchn = evtchn_from_irq(irq);
+	if (!VALID_EVTCHN(evtchn)) {
+		spin_unlock(&irq_mapping_update_lock);
+		return;
+	}
+
+	/* Send future instances of this interrupt to other vcpu. */
+	op.u.bind_vcpu.port = evtchn;
+	op.u.bind_vcpu.vcpu = tcpu;
+
+	/*
+	 * If this fails, it usually just indicates that we're dealing with a 
+	 * virq or IPI channel, which don't actually need to be rebound. Ignore
+	 * it, but don't do the xenlinux-level rebind in that case.
+	 */
+	if (HYPERVISOR_event_channel_op(&op) >= 0)
+		bind_evtchn_to_cpu(evtchn, tcpu);
+
+	spin_unlock(&irq_mapping_update_lock);
+
+	/*
+	 * Now send the new target processor a NOP IPI. When this returns, it
+	 * will check for any pending interrupts, and so service any that got 
+	 * delivered to the wrong processor by mistake.
+	 * 
+	 * XXX: The only time this is called with interrupts disabled is from
+	 * the hotplug/hotunplug path. In that case, all cpus are stopped with 
+	 * interrupts disabled, and the missed interrupts will be picked up
+	 * when they start again. This is kind of a hack.
+	 */
+	if (!irqs_disabled())
+		smp_call_function(do_nothing_function, NULL, 0, 0);
+}
+
+
+static void set_affinity_irq(unsigned irq, cpumask_t dest)
+{
+	unsigned tcpu = first_cpu(dest);
+	rebind_irq_to_cpu(irq, tcpu);
+}
+
+/*
+ * Interface to generic handling in irq.c
+ */
+
+static unsigned int startup_dynirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn))
+		unmask_evtchn(evtchn);
+	return 0;
+}
+
+static void shutdown_dynirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn))
+		mask_evtchn(evtchn);
+}
+
+static void enable_dynirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn))
+		unmask_evtchn(evtchn);
+}
+
+static void disable_dynirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn))
+		mask_evtchn(evtchn);
+}
+
+static void ack_dynirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn)) {
+		mask_evtchn(evtchn);
+		clear_evtchn(evtchn);
+	}
+}
+
+static void end_dynirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn) && !(irq_desc[irq].status & IRQ_DISABLED))
+		unmask_evtchn(evtchn);
+}
+
+static struct hw_interrupt_type dynirq_type = {
+	"Dynamic-irq",
+	startup_dynirq,
+	shutdown_dynirq,
+	enable_dynirq,
+	disable_dynirq,
+	ack_dynirq,
+	end_dynirq,
+	set_affinity_irq
+};
+
+static inline void pirq_unmask_notify(int pirq)
+{
+	struct physdev_op op;
+	if (unlikely(test_bit(pirq, &pirq_needs_unmask_notify[0]))) {
+		op.cmd = PHYSDEVOP_IRQ_UNMASK_NOTIFY;
+		(void)HYPERVISOR_physdev_op(&op);
+	}
+}
+
+static inline void pirq_query_unmask(int pirq)
+{
+	struct physdev_op op;
+	op.cmd = PHYSDEVOP_IRQ_STATUS_QUERY;
+	op.u.irq_status_query.irq = pirq;
+	(void)HYPERVISOR_physdev_op(&op);
+	clear_bit(pirq, &pirq_needs_unmask_notify[0]);
+	if (op.u.irq_status_query.flags & PHYSDEVOP_IRQ_NEEDS_UNMASK_NOTIFY)
+		set_bit(pirq, &pirq_needs_unmask_notify[0]);
+}
+
+/*
+ * On startup, if there is no action associated with the IRQ then we are
+ * probing. In this case we should not share with others as it will confuse us.
+ */
+#define probing_irq(_irq) (irq_desc[(_irq)].action == NULL)
+
+static unsigned int startup_pirq(unsigned int irq)
+{
+	struct evtchn_op op = { .cmd = EVTCHNOP_bind_pirq };
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn))
+		goto out;
+
+	op.u.bind_pirq.pirq  = irq;
+	/* NB. We are happy to share unless we are probing. */
+	op.u.bind_pirq.flags = probing_irq(irq) ? 0 : BIND_PIRQ__WILL_SHARE;
+	if (HYPERVISOR_event_channel_op(&op) != 0) {
+		if (!probing_irq(irq))
+			printk(KERN_INFO "Failed to obtain physical IRQ %d\n",
+			       irq);
+		return 0;
+	}
+	evtchn = op.u.bind_pirq.port;
+
+	pirq_query_unmask(irq_to_pirq(irq));
+
+	bind_evtchn_to_cpu(evtchn, 0);
+	evtchn_to_irq[evtchn] = irq;
+	irq_info[irq] = mk_irq_info(IRQT_PIRQ, irq, evtchn);
+
+ out:
+	unmask_evtchn(evtchn);
+	pirq_unmask_notify(irq_to_pirq(irq));
+
+	return 0;
+}
+
+static void shutdown_pirq(unsigned int irq)
+{
+	struct evtchn_op op = { .cmd = EVTCHNOP_close };
+	int evtchn = evtchn_from_irq(irq);
+
+	if (!VALID_EVTCHN(evtchn))
+		return;
+
+	mask_evtchn(evtchn);
+
+	op.u.close.port = evtchn;
+	BUG_ON(HYPERVISOR_event_channel_op(&op) != 0);
+
+	bind_evtchn_to_cpu(evtchn, 0);
+	evtchn_to_irq[evtchn] = -1;
+	irq_info[irq] = IRQ_UNBOUND;
+}
+
+static void enable_pirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn)) {
+		unmask_evtchn(evtchn);
+		pirq_unmask_notify(irq_to_pirq(irq));
+	}
+}
+
+static void disable_pirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn))
+		mask_evtchn(evtchn);
+}
+
+static void ack_pirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn)) {
+		mask_evtchn(evtchn);
+		clear_evtchn(evtchn);
+	}
+}
+
+static void end_pirq(unsigned int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn) && !(irq_desc[irq].status & IRQ_DISABLED)) {
+		unmask_evtchn(evtchn);
+		pirq_unmask_notify(irq_to_pirq(irq));
+	}
+}
+
+static struct hw_interrupt_type pirq_type = {
+	"Phys-irq",
+	startup_pirq,
+	shutdown_pirq,
+	enable_pirq,
+	disable_pirq,
+	ack_pirq,
+	end_pirq,
+	set_affinity_irq
+};
+
+void hw_resend_irq(struct hw_interrupt_type *h, unsigned int i)
+{
+	int evtchn = evtchn_from_irq(i);
+	struct shared_info *s = HYPERVISOR_shared_info;
+	if (!VALID_EVTCHN(evtchn))
+		return;
+	BUG_ON(!test_bit(evtchn, &s->evtchn_mask[0]));
+	synch_set_bit(evtchn, &s->evtchn_pending[0]);
+}
+
+void notify_remote_via_irq(int irq)
+{
+	int evtchn = evtchn_from_irq(irq);
+
+	if (VALID_EVTCHN(evtchn))
+		notify_remote_via_evtchn(evtchn);
+}
+EXPORT_SYMBOL(notify_remote_via_irq);
+
+void mask_evtchn(int port)
+{
+	struct shared_info *s = HYPERVISOR_shared_info;
+	synch_set_bit(port, &s->evtchn_mask[0]);
+}
+EXPORT_SYMBOL(mask_evtchn);
+
+void unmask_evtchn(int port)
+{
+	struct shared_info *s = HYPERVISOR_shared_info;
+	unsigned int cpu = smp_processor_id();
+	struct vcpu_info *vcpu_info = &s->vcpu_info[cpu];
+
+	/* Slow path (hypercall) if this is a non-local port. */
+	if (unlikely(cpu != cpu_from_evtchn(port))) {
+		struct evtchn_op op = { .cmd = EVTCHNOP_unmask,
+				   .u.unmask.port = port };
+		(void)HYPERVISOR_event_channel_op(&op);
+		return;
+	}
+
+	synch_clear_bit(port, &s->evtchn_mask[0]);
+
+	/*
+	 * The following is basically the equivalent of 'hw_resend_irq'. Just
+	 * like a real IO-APIC we 'lose the interrupt edge' if the channel is
+	 * masked.
+	 */
+	if (test_bit(port, &s->evtchn_pending[0]) &&
+	    !synch_test_and_set_bit(port / BITS_PER_LONG,
+				    &vcpu_info->evtchn_pending_sel)) {
+		vcpu_info->evtchn_upcall_pending = 1;
+		if (!vcpu_info->evtchn_upcall_mask)
+			force_evtchn_callback();
+	}
+}
+EXPORT_SYMBOL(unmask_evtchn);
+
+void irq_resume(void)
+{
+	struct evtchn_op op;
+	int cpu, pirq, virq, ipi, irq, evtchn;
+
+	init_evtchn_cpu_bindings();
+
+	/* New event-channel space is not 'live' yet. */
+	for (evtchn = 0; evtchn < NR_EVENT_CHANNELS; evtchn++)
+		mask_evtchn(evtchn);
+
+	/* Check that no PIRQs are still bound. */
+	for (pirq = 0; pirq < NR_PIRQS; pirq++)
+		BUG_ON(irq_info[pirq_to_irq(pirq)] != IRQ_UNBOUND);
+
+	/* Secondary CPUs must have no VIRQ or IPI bindings. */
+	for (cpu = 1; cpu < NR_CPUS; cpu++) {
+		for (virq = 0; virq < NR_VIRQS; virq++)
+			BUG_ON(per_cpu(virq_to_irq, cpu)[virq] != -1);
+		for (ipi = 0; ipi < NR_IPIS; ipi++)
+			BUG_ON(per_cpu(ipi_to_irq, cpu)[ipi] != -1);
+	}
+
+	/* No IRQ <-> event-channel mappings. */
+	for (irq = 0; irq < NR_IRQS; irq++)
+		irq_info[irq] &= ~0xFFFF; /* zap event-channel binding */
+	for (evtchn = 0; evtchn < NR_EVENT_CHANNELS; evtchn++)
+		evtchn_to_irq[evtchn] = -1;
+
+	/* Primary CPU: rebind VIRQs automatically. */
+	for (virq = 0; virq < NR_VIRQS; virq++) {
+		if ((irq = per_cpu(virq_to_irq, 0)[virq]) == -1)
+			continue;
+
+		BUG_ON(irq_info[irq] != mk_irq_info(IRQT_VIRQ, virq, 0));
+
+		/* Get a new binding from Xen. */
+		memset(&op, 0, sizeof(op));
+		op.cmd              = EVTCHNOP_bind_virq;
+		op.u.bind_virq.virq = virq;
+		op.u.bind_virq.vcpu = 0;
+		BUG_ON(HYPERVISOR_event_channel_op(&op) != 0);
+		evtchn = op.u.bind_virq.port;
+
+		/* Record the new mapping. */
+		evtchn_to_irq[evtchn] = irq;
+		irq_info[irq] = mk_irq_info(IRQT_VIRQ, virq, evtchn);
+
+		/* Ready for use. */
+		unmask_evtchn(evtchn);
+	}
+
+	/* Primary CPU: rebind IPIs automatically. */
+	for (ipi = 0; ipi < NR_IPIS; ipi++) {
+		if ((irq = per_cpu(ipi_to_irq, 0)[ipi]) == -1)
+			continue;
+
+		BUG_ON(irq_info[irq] != mk_irq_info(IRQT_IPI, ipi, 0));
+
+		/* Get a new binding from Xen. */
+		memset(&op, 0, sizeof(op));
+		op.cmd = EVTCHNOP_bind_ipi;
+		op.u.bind_ipi.vcpu = 0;
+		BUG_ON(HYPERVISOR_event_channel_op(&op) != 0);
+		evtchn = op.u.bind_ipi.port;
+
+		/* Record the new mapping. */
+		evtchn_to_irq[evtchn] = irq;
+		irq_info[irq] = mk_irq_info(IRQT_IPI, ipi, evtchn);
+
+		/* Ready for use. */
+		unmask_evtchn(evtchn);
+	}
+}
+
+void init_8259A(int auto_eoi)
+{
+}
+
+void __init init_ISA_irqs (void)
+{
+}
+
+void __init init_IRQ(void)
+{
+	int i;
+	int cpu;
+
+	irq_ctx_init(0);
+
+	spin_lock_init(&irq_mapping_update_lock);
+
+	init_evtchn_cpu_bindings();
+
+	/* No VIRQ or IPI bindings. */
+	for (cpu = 0; cpu < NR_CPUS; cpu++) {
+		for (i = 0; i < NR_VIRQS; i++)
+			per_cpu(virq_to_irq, cpu)[i] = -1;
+		for (i = 0; i < NR_IPIS; i++)
+			per_cpu(ipi_to_irq, cpu)[i] = -1;
+	}
+
+	/* No event-channel -> IRQ mappings. */
+	for (i = 0; i < NR_EVENT_CHANNELS; i++) {
+		evtchn_to_irq[i] = -1;
+		mask_evtchn(i); /* No event channels are 'live' right now. */
+	}
+
+	/* No IRQ -> event-channel mappings. */
+	for (i = 0; i < NR_IRQS; i++)
+		irq_info[i] = IRQ_UNBOUND;
+
+	/* Dynamic IRQ space is currently unbound. Zero the refcnts. */
+	for (i = 0; i < NR_DYNIRQS; i++) {
+		irq_bindcount[dynirq_to_irq(i)] = 0;
+
+		irq_desc[dynirq_to_irq(i)].status  = IRQ_DISABLED;
+		irq_desc[dynirq_to_irq(i)].action  = NULL;
+		irq_desc[dynirq_to_irq(i)].depth   = 1;
+		irq_desc[dynirq_to_irq(i)].handler = &dynirq_type;
+	}
+
+	/* Phys IRQ space is statically bound (1:1 mapping). Nail refcnts. */
+	for (i = 0; i < NR_PIRQS; i++) {
+		irq_bindcount[pirq_to_irq(i)] = 1;
+
+#ifdef RTC_IRQ
+		/* If not domain 0, force our RTC driver to fail its probe. */
+		if ((i == RTC_IRQ) &&
+		    !(xen_start_info->flags & SIF_INITDOMAIN))
+			continue;
+#endif
+
+		irq_desc[pirq_to_irq(i)].status  = IRQ_DISABLED;
+		irq_desc[pirq_to_irq(i)].action  = NULL;
+		irq_desc[pirq_to_irq(i)].depth   = 1;
+		irq_desc[pirq_to_irq(i)].handler = &pirq_type;
+	}
+}
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/core/Makefile
@@ -0,0 +1,2 @@
+
+obj-y   := evtchn.o
--- /dev/null
+++ xen-subarch-2.6/include/asm-i386/mach-xen/irq_vectors.h
@@ -0,0 +1,126 @@
+/*
+ * This file should contain #defines for all of the interrupt vector
+ * numbers used by this architecture.
+ *
+ * In addition, there are some standard defines:
+ *
+ *	FIRST_EXTERNAL_VECTOR:
+ *		The first free place for external interrupts
+ *
+ *	SYSCALL_VECTOR:
+ *		The IRQ vector a syscall makes the user to kernel transition
+ *		under.
+ *
+ *	TIMER_IRQ:
+ *		The IRQ number the timer interrupt comes in at.
+ *
+ *	NR_IRQS:
+ *		The total number of interrupt vectors (including all the
+ *		architecture specific interrupts) needed.
+ *
+ */
+#ifndef _ASM_IRQ_VECTORS_H
+#define _ASM_IRQ_VECTORS_H
+
+/*
+ * IDT vectors usable for external interrupt sources start
+ * at 0x20:
+ */
+#define FIRST_EXTERNAL_VECTOR	0x20
+
+#define SYSCALL_VECTOR		0x80
+
+/*
+ * Vectors 0x20-0x2f are used for ISA interrupts.
+ */
+
+#if 0
+/*
+ * Special IRQ vectors used by the SMP architecture, 0xf0-0xff
+ *
+ *  some of the following vectors are 'rare', they are merged
+ *  into a single vector (CALL_FUNCTION_VECTOR) to save vector space.
+ *  TLB, reschedule and local APIC vectors are performance-critical.
+ *
+ *  Vectors 0xf0-0xfa are free (reserved for future Linux use).
+ */
+#define SPURIOUS_APIC_VECTOR	0xff
+#define ERROR_APIC_VECTOR	0xfe
+#define INVALIDATE_TLB_VECTOR	0xfd
+#define RESCHEDULE_VECTOR	0xfc
+#define CALL_FUNCTION_VECTOR	0xfb
+
+#define THERMAL_APIC_VECTOR	0xf0
+/*
+ * Local APIC timer IRQ vector is on a different priority level,
+ * to work around the 'lost local interrupt if more than 2 IRQ
+ * sources per level' errata.
+ */
+#define LOCAL_TIMER_VECTOR	0xef
+#endif
+
+#define SPURIOUS_APIC_VECTOR	0xff
+#define ERROR_APIC_VECTOR	0xfe
+#define THERMAL_APIC_VECTOR	0xf0
+
+/*
+ * First APIC vector available to drivers: (vectors 0x30-0xee)
+ * we start at 0x31 to spread out vectors evenly between priority
+ * levels. (0x80 is the syscall vector)
+ */
+#define FIRST_DEVICE_VECTOR	0x31
+#define FIRST_SYSTEM_VECTOR	0xef
+
+/*
+ * 16 8259A IRQ's, 208 potential APIC interrupt sources.
+ * Right now the APIC is mostly only used for SMP.
+ * 256 vectors is an architectural limit. (we can have
+ * more than 256 devices theoretically, but they will
+ * have to use shared interrupts)
+ * Since vectors 0x00-0x1f are used/reserved for the CPU,
+ * the usable vector space is 0x20-0xff (224 vectors)
+ */
+
+#define RESCHEDULE_VECTOR	0
+#define CALL_FUNCTION_VECTOR	1
+#define NR_IPIS			2
+
+/*
+ * The maximum number of vectors supported by i386 processors
+ * is limited to 256. For processors other than i386, NR_VECTORS
+ * should be changed accordingly.
+ */
+#define NR_VECTORS 256
+
+#define FPU_IRQ			13
+
+#define	FIRST_VM86_IRQ		3
+#define LAST_VM86_IRQ		15
+#define invalid_vm86_irq(irq)	((irq) < 3 || (irq) > 15)
+
+/*
+ * The flat IRQ space is divided into two regions:
+ *  1. A one-to-one mapping of real physical IRQs. This space is only used
+ *     if we have physical device-access privilege. This region is at the 
+ *     start of the IRQ space so that existing device drivers do not need
+ *     to be modified to translate physical IRQ numbers into our IRQ space.
+ *  3. A dynamic mapping of inter-domain and Xen-sourced virtual IRQs. These
+ *     are bound using the provided bind/unbind functions.
+ */
+
+#define PIRQ_BASE		0
+#define NR_PIRQS		256
+
+#define DYNIRQ_BASE		(PIRQ_BASE + NR_PIRQS)
+#define NR_DYNIRQS		256
+
+#define NR_IRQS			(NR_PIRQS + NR_DYNIRQS)
+#define NR_IRQ_VECTORS		NR_IRQS
+
+#define pirq_to_irq(_x)		((_x) + PIRQ_BASE)
+#define irq_to_pirq(_x)		((_x) - PIRQ_BASE)
+
+#define dynirq_to_irq(_x)	((_x) + DYNIRQ_BASE)
+#define irq_to_dynirq(_x)	((_x) - DYNIRQ_BASE)
+
+#endif /* _ASM_IRQ_VECTORS_H */
--- /dev/null
+++ xen-subarch-2.6/include/xen/evtchn.h
@@ -0,0 +1,113 @@
+/******************************************************************************
+ * evtchn.h
+ * 
+ * Communication via Xen event channels.
+ * Also definitions for the device that demuxes notifications to userspace.
+ * 
+ * Copyright (c) 2004-2005, K A Fraser
+ * 
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __ASM_EVTCHN_H__
+#define __ASM_EVTCHN_H__
+
+#include <linux/config.h>
+#include <linux/interrupt.h>
+#include <asm/hypervisor.h>
+#include <asm/ptrace.h>
+#include <asm/bitops.h>
+#include <xen/interface/event_channel.h>
+#include <linux/smp.h>
+
+/*
+ * LOW-LEVEL DEFINITIONS
+ */
+
+/*
+ * Dynamically bind an event source to an IRQ-like callback handler.
+ * On some platforms this may not be implemented via the Linux IRQ subsystem.
+ * The IRQ argument passed to the callback handler is the same as returned
+ * from the bind call. It may not correspond to a Linux IRQ number.
+ * Returns IRQ or negative errno.
+ * UNBIND: Takes IRQ to unbind from; automatically closes the event channel.
+ */
+extern int bind_evtchn_to_irqhandler(
+	unsigned int evtchn,
+	irqreturn_t (*handler)(int, void *, struct pt_regs *),
+	unsigned long irqflags,
+	const char *devname,
+	void *dev_id);
+extern int bind_virq_to_irqhandler(
+	unsigned int virq,
+	unsigned int cpu,
+	irqreturn_t (*handler)(int, void *, struct pt_regs *),
+	unsigned long irqflags,
+	const char *devname,
+	void *dev_id);
+extern int bind_ipi_to_irqhandler(
+	unsigned int ipi,
+	unsigned int cpu,
+	irqreturn_t (*handler)(int, void *, struct pt_regs *),
+	unsigned long irqflags,
+	const char *devname,
+	void *dev_id);
+
+/*
+ * Common unbind function for all event sources. Takes IRQ to unbind from.
+ * Automatically closes the underlying event channel (even for bindings
+ * made with bind_evtchn_to_irqhandler()).
+ */
+extern void unbind_from_irqhandler(unsigned int irq, void *dev_id);
+
+extern void irq_resume(void);
+
+/* Entry point for notifications into Linux subsystems. */
+asmlinkage void evtchn_do_upcall(struct pt_regs *regs);
+
+/* Entry point for notifications into the userland character device. */
+extern void evtchn_device_upcall(int port);
+
+extern void mask_evtchn(int port);
+extern void unmask_evtchn(int port);
+
+static inline void clear_evtchn(int port)
+{
+	struct shared_info *s = HYPERVISOR_shared_info;
+	synch_clear_bit(port, &s->evtchn_pending[0]);
+}
+
+static inline void notify_remote_via_evtchn(int port)
+{
+	struct evtchn_op op;
+	op.cmd         = EVTCHNOP_send,
+	op.u.send.port = port;
+	(void)HYPERVISOR_event_channel_op(&op);
+}
+
+/*
+ * Unlike notify_remote_via_evtchn(), this is safe to use across
+ * save/restore. Notifications on a broken connection are silently dropped.
+ */
+extern void notify_remote_via_irq(int irq);
+
+#endif /* __ASM_EVTCHN_H__ */

--

^ permalink raw reply

* Re: Cloning from sites with 404 overridden
From: Junio C Hamano @ 2006-03-22  6:47 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git
In-Reply-To: <e5bfff550603212206k1924d352xcbdb0e5a11b88a50@mail.gmail.com>

"Marco Costalba" <mcostalba@gmail.com> writes:

> Perhaps I am proposing a total idiocy, I don't know git-fetch
> internals, but wouldn't be better to avoid trying to download a non
> existing object? So to fix the problem at the origin?
>
> I don't know if it is possible to list contents before try to download
> so to avoid asking for a non existing object.

There is no way for the downloader to know if the upstream
repository has packed which object.  What is happening is that
the commit walker asks for loose object first because it does
not know.  Upon getting a "no such file" (or in the case of
misconfigured HTTP server that does not say 404, "corrupt
object"), it then checks if the object appears in the pack by
downloading the pack index.  It can tell what objects are in the
packs by looking at the pack index and downloads the pack that
contains needed object.

^ permalink raw reply

* [RFC PATCH 14/35] subarch modify CPU capabilities
From: Chris Wright @ 2006-03-22  6:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 13-i386-cpu --]
[-- Type: text/plain, Size: 2959 bytes --]

Allow subarchitectures to modify CPU capabilities during bootstrap CPU
identification. Add a subarch implementation for Xen which hides
features unsupported by the hypervisor.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 arch/i386/kernel/cpu/common.c               |    3 +++
 include/asm-i386/mach-default/mach_cpu.h    |    5 +++++
 include/asm-i386/mach-xen/mach_cpu.h        |    2 ++
 include/asm-i386/mach-xen/setup_arch_post.h |   15 +++++++++++++++
 include/asm-i386/mach-xen/setup_arch_pre.h  |    1 +
 5 files changed, 26 insertions(+)

--- xen-subarch-2.6.orig/arch/i386/kernel/cpu/common.c
+++ xen-subarch-2.6/arch/i386/kernel/cpu/common.c
@@ -16,6 +16,7 @@
 #include <asm/apic.h>
 #include <mach_apic.h>
 #endif
+#include <mach_cpu.h>
 
 #include "cpu.h"
 
@@ -420,6 +421,8 @@ void __devinit identify_cpu(struct cpuin
 				c->x86_vendor, c->x86_model);
 	}
 
+	machine_specific_modify_cpu_capabilities(c);
+
 	/* Now the feature flags better reflect actual CPU features! */
 
 	printk(KERN_DEBUG "CPU: After all inits, caps:");
--- xen-subarch-2.6.orig/include/asm-i386/mach-xen/setup_arch_post.h
+++ xen-subarch-2.6/include/asm-i386/mach-xen/setup_arch_post.h
@@ -18,6 +18,19 @@ static char * __init machine_specific_me
 	return "Xen";
 }
 
+void __devinit machine_specific_modify_cpu_capabilities(struct cpuinfo_x86 *c)
+{
+	clear_bit(X86_FEATURE_VME, c->x86_capability);
+	clear_bit(X86_FEATURE_DE, c->x86_capability);
+	clear_bit(X86_FEATURE_PSE, c->x86_capability);
+	clear_bit(X86_FEATURE_PGE, c->x86_capability);
+	clear_bit(X86_FEATURE_SEP, c->x86_capability);
+	clear_bit(X86_FEATURE_MWAIT, c->x86_capability);
+	if (!(xen_start_info->flags & SIF_PRIVILEGED))
+		clear_bit(X86_FEATURE_MTRR, c->x86_capability);
+	c->hlt_works_ok = 0;
+}
+
 static void __init machine_specific_arch_setup(void)
 {
 	struct physdev_op op;
@@ -30,6 +43,8 @@ static void __init machine_specific_arch
 	    __KERNEL_CS, (unsigned long)hypervisor_callback,
 	    __KERNEL_CS, (unsigned long)failsafe_callback);
 
+	machine_specific_modify_cpu_capabilities(&boot_cpu_data);
+
 	init_pg_tables_end = __pa(xen_start_info->pt_base) +
 		PFN_PHYS(xen_start_info->nr_pt_frames);
 
--- xen-subarch-2.6.orig/include/asm-i386/mach-xen/setup_arch_pre.h
+++ xen-subarch-2.6/include/asm-i386/mach-xen/setup_arch_pre.h
@@ -1,6 +1,7 @@
 
 #include <xen/interface/xen.h>
 #include <asm/hypervisor.h>
+#include <mach_cpu.h>
 
 struct start_info *xen_start_info;
 EXPORT_SYMBOL(xen_start_info);
--- /dev/null
+++ xen-subarch-2.6/include/asm-i386/mach-default/mach_cpu.h
@@ -0,0 +1,5 @@
+
+static inline void
+machine_specific_modify_cpu_capabilities(struct cpuinfo_x86 *c)
+{
+}
--- /dev/null
+++ xen-subarch-2.6/include/asm-i386/mach-xen/mach_cpu.h
@@ -0,0 +1,2 @@
+
+extern void machine_specific_modify_cpu_capabilities(struct cpuinfo_x86 *);

--

^ permalink raw reply

* [RFC PATCH 11/35] Add support for Xen to entry.S.
From: Chris Wright @ 2006-03-22  6:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 10-i386-entry.S --]
[-- Type: text/plain, Size: 11119 bytes --]

- change cli/sti
- change test for user mode return to work for kernel mode in ring1
- check hypervisor saved event mask on return from exception
- add entry points for the hypervisor upcall handlers
- avoid math emulation check when running on Xen
- add nmi handler for running on Xen

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 arch/i386/kernel/entry.S |  228 +++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 211 insertions(+), 17 deletions(-)

--- xen-subarch-2.6.orig/arch/i386/kernel/entry.S
+++ xen-subarch-2.6/arch/i386/kernel/entry.S
@@ -75,8 +75,42 @@ DF_MASK		= 0x00000400 
 NT_MASK		= 0x00004000
 VM_MASK		= 0x00020000
 
+#ifndef CONFIG_XEN
+#define DISABLE_INTERRUPTS	cli
+#define ENABLE_INTERRUPTS	sti
+#else
+#include <xen/interface/xen.h>
+
+EVENT_MASK	= 0x2E
+
+/* Pseudo-eflags. */
+NMI_MASK	= 0x80000000
+
+/* Offsets into shared_info_t. */
+#define evtchn_upcall_pending		/* 0 */
+#define evtchn_upcall_mask		1
+
+#define sizeof_vcpu_shift		6
+
+#ifdef CONFIG_SMP
+#define GET_VCPU_INFO		movl TI_cpu(%ebp),%esi			; \
+				shl  $sizeof_vcpu_shift,%esi		; \
+				addl HYPERVISOR_shared_info,%esi
+#else
+#define GET_VCPU_INFO		movl HYPERVISOR_shared_info,%esi
+#endif
+
+#define __DISABLE_INTERRUPTS	movb $1,evtchn_upcall_mask(%esi)
+#define __ENABLE_INTERRUPTS	movb $0,evtchn_upcall_mask(%esi)
+#define DISABLE_INTERRUPTS	GET_VCPU_INFO				; \
+				__DISABLE_INTERRUPTS
+#define ENABLE_INTERRUPTS	GET_VCPU_INFO				; \
+				__ENABLE_INTERRUPTS
+#define __TEST_PENDING		testb $0xFF,evtchn_upcall_pending(%esi)
+#endif
+
 #ifdef CONFIG_PREEMPT
-#define preempt_stop		cli
+#define preempt_stop		DISABLE_INTERRUPTS
 #else
 #define preempt_stop
 #define resume_kernel		restore_nocheck
@@ -145,10 +179,10 @@ ret_from_intr:
 	GET_THREAD_INFO(%ebp)
 	movl EFLAGS(%esp), %eax		# mix EFLAGS and CS
 	movb CS(%esp), %al
-	testl $(VM_MASK | 3), %eax
+	testl $(VM_MASK | USER_MODE_MASK), %eax
 	jz resume_kernel
 ENTRY(resume_userspace)
- 	cli				# make sure we don't miss an interrupt
+	DISABLE_INTERRUPTS		# make sure we don't miss an interrupt
 					# setting need_resched or sigpending
 					# between sampling and the iret
 	movl TI_flags(%ebp), %ecx
@@ -159,7 +193,7 @@ ENTRY(resume_userspace)
 
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
-	cli
+	DISABLE_INTERRUPTS
 	cmpl $0,TI_preempt_count(%ebp)	# non-zero preempt_count ?
 	jnz restore_nocheck
 need_resched:
@@ -179,7 +213,7 @@ need_resched:
 ENTRY(sysenter_entry)
 	movl TSS_sysenter_esp0(%esp),%esp
 sysenter_past_esp:
-	sti
+	ENABLE_INTERRUPTS
 	pushl $(__USER_DS)
 	pushl %ebp
 	pushfl
@@ -209,7 +243,7 @@ sysenter_past_esp:
 	jae syscall_badsys
 	call *sys_call_table(,%eax,4)
 	movl %eax,EAX(%esp)
-	cli
+	DISABLE_INTERRUPTS
 	movl TI_flags(%ebp), %ecx
 	testw $_TIF_ALLWORK_MASK, %cx
 	jne syscall_exit_work
@@ -217,7 +251,7 @@ sysenter_past_esp:
 	movl EIP(%esp), %edx
 	movl OLDESP(%esp), %ecx
 	xorl %ebp,%ebp
-	sti
+	ENABLE_INTERRUPTS
 	sysexit
 
 
@@ -236,7 +270,7 @@ syscall_call:
 	call *sys_call_table(,%eax,4)
 	movl %eax,EAX(%esp)		# store the return value
 syscall_exit:
-	cli				# make sure we don't miss an interrupt
+	DISABLE_INTERRUPTS		# make sure we don't miss an interrupt
 					# setting need_resched or sigpending
 					# between sampling and the iret
 	movl TI_flags(%ebp), %ecx
@@ -244,6 +278,7 @@ syscall_exit:
 	jne syscall_exit_work
 
 restore_all:
+#ifndef CONFIG_XEN
 	movl EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
 	# Warning: OLDSS(%esp) contains the wrong/random values if we
 	# are returning to the kernel.
@@ -254,12 +289,25 @@ restore_all:
 	cmpl $((4 << 8) | 3), %eax
 	je ldt_ss			# returning to user-space with LDT SS
 restore_nocheck:
+#else
+restore_nocheck:
+	testl $(VM_MASK|NMI_MASK), EFLAGS(%esp)
+	jnz hypervisor_iret
+	movb EVENT_MASK(%esp), %al
+	notb %al			# %al == ~saved_mask
+	GET_VCPU_INFO
+	andb evtchn_upcall_mask(%esi),%al
+	andb $1,%al			# %al == mask & ~saved_mask
+	jnz restore_all_enable_events	#     != 0 => reenable event delivery
+#endif
 	RESTORE_REGS
 	addl $4, %esp
 1:	iret
 .section .fixup,"ax"
 iret_exc:
-	sti
+#ifndef CONFIG_XEN
+	ENABLE_INTERRUPTS
+#endif
 	pushl $0			# no error code
 	pushl $do_iret_error
 	jmp error_code
@@ -269,6 +317,7 @@ iret_exc:
 	.long 1b,iret_exc
 .previous
 
+#ifndef CONFIG_XEN
 ldt_ss:
 	larl OLDSS(%esp), %eax
 	jnz restore_nocheck
@@ -281,7 +330,7 @@ ldt_ss:
 	 * CPUs, which we can try to work around to make
 	 * dosemu and wine happy. */
 	subl $8, %esp		# reserve space for switch16 pointer
-	cli
+	DISABLE_INTERRUPTS
 	movl %esp, %eax
 	/* Set up the 16bit stack frame with switch32 pointer on top,
 	 * and a switch16 pointer on top of the current frame. */
@@ -293,6 +342,13 @@ ldt_ss:
 	.align 4
 	.long 1b,iret_exc
 .previous
+#else
+hypervisor_iret:
+	andl $~NMI_MASK, EFLAGS(%esp)
+	RESTORE_REGS
+	addl $4, %esp
+	jmp  hypercall_page + (__HYPERVISOR_iret * 32)
+#endif
 
 	# perform work that needs to be done immediately before resumption
 	ALIGN
@@ -301,7 +357,7 @@ work_pending:
 	jz work_notifysig
 work_resched:
 	call schedule
-	cli				# make sure we don't miss an interrupt
+	DISABLE_INTERRUPTS		# make sure we don't miss an interrupt
 					# setting need_resched or sigpending
 					# between sampling and the iret
 	movl TI_flags(%ebp), %ecx
@@ -353,7 +409,7 @@ syscall_trace_entry:
 syscall_exit_work:
 	testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP), %cl
 	jz work_pending
-	sti				# could let do_syscall_trace() call
+	ENABLE_INTERRUPTS		# could let do_syscall_trace() call
 					# schedule() instead
 	movl %esp, %eax
 	movl $1, %edx
@@ -373,6 +429,7 @@ syscall_badsys:
 	movl $-ENOSYS,EAX(%esp)
 	jmp resume_userspace
 
+#ifndef CONFIG_XEN
 #define FIXUP_ESPFIX_STACK \
 	movl %esp, %eax; \
 	/* switch to 32bit stack using the pointer on top of 16bit stack */ \
@@ -431,6 +488,9 @@ ENTRY(name)				\
 
 /* The include is where all of the SMP etc. interrupts come from */
 #include "entry_arch.h"
+#else
+#define UNWIND_ESPFIX_STACK
+#endif
 
 ENTRY(divide_error)
 	pushl $0			# no error code
@@ -462,6 +522,126 @@ error_code:
 	call *%edi
 	jmp ret_from_exception
 
+#ifdef CONFIG_XEN
+# A note on the "critical region" in our callback handler.
+# We want to avoid stacking callback handlers due to events occurring
+# during handling of the last event. To do this, we keep events disabled
+# until we've done all processing. HOWEVER, we must enable events before
+# popping the stack frame (can't be done atomically) and so it would still
+# be possible to get enough handler activations to overflow the stack.
+# Although unlikely, bugs of that kind are hard to track down, so we'd
+# like to avoid the possibility.
+# So, on entry to the handler we detect whether we interrupted an
+# existing activation in its critical region -- if so, we pop the current
+# activation and restart the handler using the previous one.
+ENTRY(hypervisor_callback)
+	pushl %eax
+	SAVE_ALL
+	movl EIP(%esp),%eax
+	cmpl $scrit,%eax
+	jb   11f
+	cmpl $ecrit,%eax
+	jb   critical_region_fixup
+11:	push %esp
+	call evtchn_do_upcall
+	add  $4,%esp
+	jmp  ret_from_intr
+
+        ALIGN
+restore_all_enable_events:
+	__ENABLE_INTERRUPTS
+scrit:	/**** START OF CRITICAL REGION ****/
+	__TEST_PENDING
+	jnz  14f			# process more events if necessary...
+	RESTORE_REGS
+	addl $4, %esp
+1:	iret
+.section .fixup,"ax"
+2:	pushl $0
+	pushl $do_iret_error
+	jmp error_code
+.previous
+.section __ex_table,"a"
+	.align 4
+	.long 1b,2b
+.previous
+14:	__DISABLE_INTERRUPTS
+	jmp  11b
+ecrit:  /**** END OF CRITICAL REGION ****/
+# [How we do the fixup]. We want to merge the current stack frame with the
+# just-interrupted frame. How we do this depends on where in the critical
+# region the interrupted handler was executing, and so how many saved
+# registers are in each frame. We do this quickly using the lookup table
+# 'critical_fixup_table'. For each byte offset in the critical region, it
+# provides the number of bytes which have already been popped from the
+# interrupted stack frame.
+critical_region_fixup:
+	addl $critical_fixup_table-scrit,%eax
+	movzbl (%eax),%eax		# %eax contains num bytes popped
+	cmpb $0xff,%al                  # 0xff => vcpu_info critical region
+	jne  15f
+	GET_THREAD_INFO(%ebp)
+        xorl %eax,%eax
+15:	mov  %esp,%esi
+	add  %eax,%esi			# %esi points at end of src region
+	mov  %esp,%edi
+	add  $0x34,%edi			# %edi points at end of dst region
+	mov  %eax,%ecx
+	shr  $2,%ecx			# convert words to bytes
+	je   17f			# skip loop if nothing to copy
+16:	subl $4,%esi			# pre-decrementing copy loop
+	subl $4,%edi
+	movl (%esi),%eax
+	movl %eax,(%edi)
+	loop 16b
+17:	movl %edi,%esp			# final %edi is top of merged stack
+	jmp  11b
+
+critical_fixup_table:
+	.byte 0xff,0xff,0xff		# testb $0xff,(%esi) = __TEST_PENDING
+	.byte 0xff,0xff			# jnz  14f
+	.byte 0x00			# pop  %ebx
+	.byte 0x04			# pop  %ecx
+	.byte 0x08			# pop  %edx
+	.byte 0x0c			# pop  %esi
+	.byte 0x10			# pop  %edi
+	.byte 0x14			# pop  %ebp
+	.byte 0x18			# pop  %eax
+	.byte 0x1c			# pop  %ds
+	.byte 0x20			# pop  %es
+	.byte 0x24,0x24,0x24		# add  $4,%esp
+	.byte 0x28			# iret
+	.byte 0xff,0xff,0xff,0xff	# movb $1,1(%esi)
+	.byte 0x00,0x00			# jmp  11b
+
+# Hypervisor uses this for application faults while it executes.
+ENTRY(failsafe_callback)
+1:	popl %ds
+2:	popl %es
+3:	popl %fs
+4:	popl %gs
+	subl $4,%esp
+	SAVE_ALL
+	jmp  ret_from_exception
+.section .fixup,"ax";	\
+6:	movl $0,(%esp);	\
+	jmp 1b;		\
+7:	movl $0,(%esp);	\
+	jmp 2b;		\
+8:	movl $0,(%esp);	\
+	jmp 3b;		\
+9:	movl $0,(%esp);	\
+	jmp 4b;		\
+.previous;		\
+.section __ex_table,"a";\
+	.align 4;	\
+	.long 1b,6b;	\
+	.long 2b,7b;	\
+	.long 3b,8b;	\
+	.long 4b,9b;	\
+.previous
+#endif
+
 ENTRY(coprocessor_error)
 	pushl $0
 	pushl $do_coprocessor_error
@@ -475,17 +655,19 @@ ENTRY(simd_coprocessor_error)
 ENTRY(device_not_available)
 	pushl $-1			# mark this as an int
 	SAVE_ALL
+#ifndef CONFIG_XEN
 	movl %cr0, %eax
 	testl $0x4, %eax		# EM (math emulation bit)
-	jne device_not_available_emulate
-	preempt_stop
-	call math_state_restore
-	jmp ret_from_exception
-device_not_available_emulate:
+	je device_available_emulate
 	pushl $0			# temporary storage for ORIG_EIP
 	call math_emulate
 	addl $4, %esp
 	jmp ret_from_exception
+device_available_emulate:
+#endif
+	preempt_stop
+	call math_state_restore
+	jmp ret_from_exception
 
 /*
  * Debug traps and NMI can happen at the one SYSENTER instruction
@@ -521,6 +703,8 @@ debug_stack_correct:
 	call do_debug
 	jmp ret_from_exception
 	.previous .text
+
+#ifndef CONFIG_XEN
 /*
  * NMI is doubly nasty. It can happen _while_ we're handling
  * a debug fault, and the debug fault hasn't yet been able to
@@ -591,6 +775,16 @@ nmi_16bit_stack:
 	.align 4
 	.long 1b,iret_exc
 .previous
+#else
+ENTRY(nmi)
+	pushl %eax
+	SAVE_ALL
+	xorl %edx,%edx		# zero error code
+	movl %esp,%eax		# pt_regs pointer
+	call do_nmi
+	orl  $NMI_MASK, EFLAGS(%esp)
+	jmp restore_all
+#endif
 
 KPROBE_ENTRY(int3)
 	pushl $-1			# mark this as an int

--

^ permalink raw reply

* [RFC PATCH 26/35] Add Xen subarch reboot support
From: Chris Wright @ 2006-03-22  6:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 25-reboot --]
[-- Type: text/plain, Size: 7315 bytes --]

Add remote reboot capability, so that a virtual machine can be
rebooted, halted or 'powered off' by external management tools.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 arch/i386/kernel/Makefile   |    1 
 arch/i386/mach-xen/reboot.c |  265 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 266 insertions(+)

--- xen-subarch-2.6.orig/arch/i386/kernel/Makefile
+++ xen-subarch-2.6/arch/i386/kernel/Makefile
@@ -49,6 +49,7 @@ hw_irq-y			:= i8259.o
 
 hw_irq-$(CONFIG_XEN)		:= ../mach-xen/evtchn.o
 time-$(CONFIG_XEN)		:= ../mach-xen/time.o
+reboot-$(CONFIG_XEN)		:= ../mach-xen/reboot.o
 
 # vsyscall.o contains the vsyscall DSO images as __initdata.
 # We must build both images before we can assemble it.
--- /dev/null
+++ xen-subarch-2.6/arch/i386/mach-xen/reboot.c
@@ -0,0 +1,265 @@
+#define __KERNEL_SYSCALLS__
+#include <linux/version.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/unistd.h>
+#include <linux/module.h>
+#include <linux/reboot.h>
+#include <linux/sysrq.h>
+#include <linux/stringify.h>
+#include <asm/irq.h>
+#include <asm/mmu_context.h>
+#include <xen/evtchn.h>
+#include <asm/hypervisor.h>
+#ifdef CONFIG_XEN_XENBUS
+#include <xen/xenbus.h>
+#endif
+#include <linux/cpu.h>
+#include <linux/kthread.h>
+#include <xen/xencons.h>
+
+#if defined(__i386__) || defined(__x86_64__)
+/*
+ * Power off function, if any
+ */
+void (*pm_power_off)(void);
+EXPORT_SYMBOL(pm_power_off);
+#endif
+
+#define SHUTDOWN_INVALID  -1
+#define SHUTDOWN_POWEROFF  0
+#define SHUTDOWN_REBOOT    1
+#define SHUTDOWN_SUSPEND   2
+/* Code 3 is SHUTDOWN_CRASH, which we don't use because the domain can only
+ * report a crash, not be instructed to crash!
+ * HALT is the same as POWEROFF, as far as we're concerned.  The tools use
+ * the distinction when we return the reason code to them.
+ */
+#define SHUTDOWN_HALT      4
+
+void machine_emergency_restart(void)
+{
+	/* We really want to get pending console data out before we die. */
+	xencons_force_flush();
+	HYPERVISOR_sched_op(SCHEDOP_shutdown, SHUTDOWN_reboot);
+}
+
+void machine_restart(char * __unused)
+{
+	machine_emergency_restart();
+}
+
+void machine_halt(void)
+{
+	machine_power_off();
+}
+
+void machine_power_off(void)
+{
+	/* We really want to get pending console data out before we die. */
+	xencons_force_flush();
+	HYPERVISOR_sched_op(SCHEDOP_shutdown, SHUTDOWN_poweroff);
+}
+
+int reboot_thru_bios = 0;	/* for dmi_scan.c */
+EXPORT_SYMBOL(machine_restart);
+EXPORT_SYMBOL(machine_halt);
+EXPORT_SYMBOL(machine_power_off);
+
+
+/******************************************************************************
+ * Stop/pickle callback handling.
+ */
+
+#ifdef CONFIG_XEN_XENBUS
+/* Ignore multiple shutdown requests. */
+static int shutting_down = SHUTDOWN_INVALID;
+static void __shutdown_handler(void *unused);
+static DECLARE_WORK(shutdown_work, __shutdown_handler, NULL);
+#endif
+
+#ifdef CONFIG_XEN_XENBUS
+static int shutdown_process(void *__unused)
+{
+	static char *envp[] = { "HOME=/", "TERM=linux",
+				"PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL };
+	static char *restart_argv[]  = { "/sbin/reboot", NULL };
+	static char *poweroff_argv[] = { "/sbin/poweroff", NULL };
+
+	extern asmlinkage long sys_reboot(int magic1, int magic2,
+					  unsigned int cmd, void *arg);
+
+	daemonize("shutdown");
+
+	switch (shutting_down) {
+	case SHUTDOWN_POWEROFF:
+	case SHUTDOWN_HALT:
+		if (execve("/sbin/poweroff", poweroff_argv, envp) < 0) {
+			sys_reboot(LINUX_REBOOT_MAGIC1,
+				   LINUX_REBOOT_MAGIC2,
+				   LINUX_REBOOT_CMD_POWER_OFF,
+				   NULL);
+		}
+		break;
+
+	case SHUTDOWN_REBOOT:
+		if (execve("/sbin/reboot", restart_argv, envp) < 0) {
+			sys_reboot(LINUX_REBOOT_MAGIC1,
+				   LINUX_REBOOT_MAGIC2,
+				   LINUX_REBOOT_CMD_RESTART,
+				   NULL);
+		}
+		break;
+	}
+
+	shutting_down = SHUTDOWN_INVALID; /* could try again */
+
+	return 0;
+}
+
+static void __shutdown_handler(void *unused)
+{
+	int err = 0;
+
+	if (shutting_down != SHUTDOWN_SUSPEND)
+		err = kernel_thread(shutdown_process, NULL,
+				    CLONE_FS | CLONE_FILES);
+
+	if (err < 0) {
+		printk(KERN_WARNING "Error creating shutdown process (%d): "
+		       "retrying...\n", -err);
+		schedule_delayed_work(&shutdown_work, HZ/2);
+	}
+}
+
+static void shutdown_handler(struct xenbus_watch *watch,
+			     const char **vec, unsigned int len)
+{
+	char *str;
+	xenbus_transaction_t xbt;
+	int err;
+
+	if (shutting_down != SHUTDOWN_INVALID)
+		return;
+
+ again:
+	err = xenbus_transaction_start(&xbt);
+	if (err)
+		return;
+	str = (char *)xenbus_read(xbt, "control", "shutdown", NULL);
+	/* Ignore read errors and empty reads. */
+	if (XENBUS_IS_ERR_READ(str)) {
+		xenbus_transaction_end(xbt, 1);
+		return;
+	}
+
+	xenbus_write(xbt, "control", "shutdown", "");
+
+	err = xenbus_transaction_end(xbt, 0);
+	if (err == -EAGAIN) {
+		kfree(str);
+		goto again;
+	}
+
+	if (strcmp(str, "poweroff") == 0)
+		shutting_down = SHUTDOWN_POWEROFF;
+	else if (strcmp(str, "reboot") == 0)
+		shutting_down = SHUTDOWN_REBOOT;
+	else if (strcmp(str, "suspend") == 0)
+		shutting_down = SHUTDOWN_SUSPEND;
+	else if (strcmp(str, "halt") == 0)
+		shutting_down = SHUTDOWN_HALT;
+	else {
+		printk("Ignoring shutdown request: %s\n", str);
+		shutting_down = SHUTDOWN_INVALID;
+	}
+
+	if (shutting_down != SHUTDOWN_INVALID)
+		schedule_work(&shutdown_work);
+
+	kfree(str);
+}
+
+#ifdef CONFIG_MAGIC_SYSRQ
+static void sysrq_handler(struct xenbus_watch *watch, const char **vec,
+			  unsigned int len)
+{
+	char sysrq_key = '\0';
+	xenbus_transaction_t xbt;
+	int err;
+
+ again:
+	err = xenbus_transaction_start(&xbt);
+	if (err)
+		return;
+	if (!xenbus_scanf(xbt, "control", "sysrq", "%c", &sysrq_key)) {
+		printk(KERN_ERR "Unable to read sysrq code in "
+		       "control/sysrq\n");
+		xenbus_transaction_end(xbt, 1);
+		return;
+	}
+
+	if (sysrq_key != '\0')
+		xenbus_printf(xbt, "control", "sysrq", "%c", '\0');
+
+	err = xenbus_transaction_end(xbt, 0);
+	if (err == -EAGAIN)
+		goto again;
+
+	if (sysrq_key != '\0') {
+		handle_sysrq(sysrq_key, NULL, NULL);
+	}
+}
+#endif
+
+static struct xenbus_watch shutdown_watch = {
+	.node = "control/shutdown",
+	.callback = shutdown_handler
+};
+
+#ifdef CONFIG_MAGIC_SYSRQ
+static struct xenbus_watch sysrq_watch = {
+	.node ="control/sysrq",
+	.callback = sysrq_handler
+};
+#endif
+
+static struct notifier_block xenstore_notifier;
+
+static int setup_shutdown_watcher(struct notifier_block *notifier,
+                                  unsigned long event,
+                                  void *data)
+{
+	int err1 = 0;
+#ifdef CONFIG_MAGIC_SYSRQ
+	int err2 = 0;
+#endif
+
+	err1 = register_xenbus_watch(&shutdown_watch);
+#ifdef CONFIG_MAGIC_SYSRQ
+	err2 = register_xenbus_watch(&sysrq_watch);
+#endif
+
+	if (err1)
+		printk(KERN_ERR "Failed to set shutdown watcher\n");
+
+#ifdef CONFIG_MAGIC_SYSRQ
+	if (err2)
+		printk(KERN_ERR "Failed to set sysrq watcher\n");
+#endif
+
+	return NOTIFY_DONE;
+}
+
+static int __init setup_shutdown_event(void)
+{
+
+	xenstore_notifier.notifier_call = setup_shutdown_watcher;
+
+	register_xenstore_notifier(&xenstore_notifier);
+
+	return 0;
+}
+
+subsys_initcall(setup_shutdown_event);
+#endif

--

^ permalink raw reply

* [RFC PATCH 32/35] Add Xen driver utility functions.
From: Chris Wright @ 2006-03-22  6:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 31-driver-util --]
[-- Type: text/plain, Size: 3408 bytes --]

Allocate/destroy a 'vmalloc' VM area: alloc_vm_area and free_vm_area
The alloc function ensures that page tables are constructed for the
region of kernel virtual address space and mapped into init_mm.

Lock an area so that PTEs are accessible in the current address space:
lock_vm_area and unlock_vm_area
The lock function prevents context switches to a lazy mm that doesn't
have the area mapped into its page tables.  It also ensures that the
page tables are mapped into the current mm by causing the page fault
handler to copy the page directory pointers from init_mm into the
current mm.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 drivers/xen/Makefile      |    2 +
 drivers/xen/util.c        |   70 ++++++++++++++++++++++++++++++++++++++++++++++
 include/xen/driver_util.h |   16 ++++++++++
 3 files changed, 88 insertions(+)

--- xen-subarch-2.6.orig/drivers/xen/Makefile
+++ xen-subarch-2.6/drivers/xen/Makefile
@@ -1,2 +1,4 @@
 
+obj-y	+= util.o
+
 obj-y	+= console/
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/util.c
@@ -0,0 +1,70 @@
+#include <linux/config.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <asm/uaccess.h>
+#include <xen/driver_util.h>
+
+static int f(pte_t *pte, struct page *pte_page, unsigned long addr, void *data)
+{
+	/* generic_page_range() does all the hard work. */
+	return 0;
+}
+
+struct vm_struct *alloc_vm_area(unsigned long size)
+{
+	struct vm_struct *area;
+
+	area = get_vm_area(size, VM_IOREMAP);
+	if (area == NULL)
+		return NULL;
+
+	/*
+	 * This ensures that page tables are constructed for this region
+	 * of kernel virtual address space and mapped into init_mm.
+	 */
+	if (generic_page_range(&init_mm, (unsigned long)area->addr,
+			       area->size, f, NULL)) {
+		free_vm_area(area);
+		return NULL;
+	}
+
+	return area;
+}
+EXPORT_SYMBOL_GPL(alloc_vm_area);
+
+void free_vm_area(struct vm_struct *area)
+{
+	struct vm_struct *ret;
+	ret = remove_vm_area(area->addr);
+	BUG_ON(ret != area);
+	kfree(area);
+}
+EXPORT_SYMBOL_GPL(free_vm_area);
+
+void lock_vm_area(struct vm_struct *area)
+{
+	unsigned long i;
+	char c;
+
+	/*
+	 * Prevent context switch to a lazy mm that doesn't have this area
+	 * mapped into its page tables.
+	 */
+	preempt_disable();
+
+	/*
+	 * Ensure that the page tables are mapped into the current mm. The
+	 * page-fault path will copy the page directory pointers from init_mm.
+	 */
+	for (i = 0; i < area->size; i += PAGE_SIZE)
+		(void)__get_user(c, (char __user *)area->addr + i);
+}
+EXPORT_SYMBOL_GPL(lock_vm_area);
+
+void unlock_vm_area(struct vm_struct *area)
+{
+	preempt_enable();
+}
+EXPORT_SYMBOL_GPL(unlock_vm_area);
--- /dev/null
+++ xen-subarch-2.6/include/xen/driver_util.h
@@ -0,0 +1,16 @@
+
+#ifndef __ASM_XEN_DRIVER_UTIL_H__
+#define __ASM_XEN_DRIVER_UTIL_H__
+
+#include <linux/config.h>
+#include <linux/vmalloc.h>
+
+/* Allocate/destroy a 'vmalloc' VM area. */
+extern struct vm_struct *alloc_vm_area(unsigned long size);
+extern void free_vm_area(struct vm_struct *area);
+
+/* Lock an area so that PTEs are accessible in the current address space. */
+extern void lock_vm_area(struct vm_struct *area);
+extern void unlock_vm_area(struct vm_struct *area);
+
+#endif /* __ASM_XEN_DRIVER_UTIL_H__ */

--

^ permalink raw reply

* [RFC PATCH 24/35] subarch support for mask value for irq nubmers
From: Chris Wright @ 2006-03-22  6:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 23-i386-irq-mask --]
[-- Type: text/plain, Size: 2601 bytes --]

Abstract the mask value for irq numbers into subarch-specific headers.
Xen extends the IRQ numbering space to include room for dynamically
allocated virtual interrupts (in the range 256-511), which requires a
more permissive mask value.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 arch/i386/kernel/irq.c                      |    2 +-
 include/asm-i386/mach-default/irq_vectors.h |    2 ++
 include/asm-i386/mach-visws/irq_vectors.h   |    2 ++
 include/asm-i386/mach-voyager/irq_vectors.h |    2 ++
 include/asm-i386/mach-xen/irq_vectors.h     |    8 ++++++--
 5 files changed, 13 insertions(+), 3 deletions(-)

--- xen-subarch-2.6.orig/arch/i386/kernel/irq.c
+++ xen-subarch-2.6/arch/i386/kernel/irq.c
@@ -54,7 +54,7 @@ static union irq_ctx *softirq_ctx[NR_CPU
 fastcall unsigned int do_IRQ(struct pt_regs *regs)
 {	
 	/* high bits used in ret_from_ code */
-	int irq = regs->orig_eax & 0xff;
+	int irq = regs->orig_eax & DO_IRQ_MASK;
 #ifdef CONFIG_4KSTACKS
 	union irq_ctx *curctx, *irqctx;
 	u32 *isp;
--- xen-subarch-2.6.orig/include/asm-i386/mach-default/irq_vectors.h
+++ xen-subarch-2.6/include/asm-i386/mach-default/irq_vectors.h
@@ -86,6 +86,8 @@
 
 #include "irq_vectors_limits.h"
 
+#define DO_IRQ_MASK		255
+
 #define FPU_IRQ			13
 
 #define	FIRST_VM86_IRQ		3
--- xen-subarch-2.6.orig/include/asm-i386/mach-visws/irq_vectors.h
+++ xen-subarch-2.6/include/asm-i386/mach-visws/irq_vectors.h
@@ -53,6 +53,8 @@
 #define NR_IRQS 224
 #define NR_IRQ_VECTORS NR_IRQS
 
+#define DO_IRQ_MASK		255
+
 #define FPU_IRQ			13
 
 #define	FIRST_VM86_IRQ		3
--- xen-subarch-2.6.orig/include/asm-i386/mach-voyager/irq_vectors.h
+++ xen-subarch-2.6/include/asm-i386/mach-voyager/irq_vectors.h
@@ -59,6 +59,8 @@
 #define NR_IRQS 224
 #define NR_IRQ_VECTORS NR_IRQS
 
+#define DO_IRQ_MASK 			255
+
 #define FPU_IRQ				13
 
 #define	FIRST_VM86_IRQ		3
--- xen-subarch-2.6.orig/include/asm-i386/mach-xen/irq_vectors.h
+++ xen-subarch-2.6/include/asm-i386/mach-xen/irq_vectors.h
@@ -109,14 +109,18 @@
  */
 
 #define PIRQ_BASE		0
-#define NR_PIRQS		256
+#define PIRQ_BITS		8
+#define NR_PIRQS		(1 << PIRQ_BITS)
 
 #define DYNIRQ_BASE		(PIRQ_BASE + NR_PIRQS)
-#define NR_DYNIRQS		256
+#define DYNIRQ_BITS		8
+#define NR_DYNIRQS		(1 << DYNIRQ_BITS)
 
 #define NR_IRQS			(NR_PIRQS + NR_DYNIRQS)
 #define NR_IRQ_VECTORS		NR_IRQS
 
+#define DO_IRQ_MASK		__IRQ_MASK(PIRQ_BITS + DYNIRQ_BITS)
+
 #define pirq_to_irq(_x)		((_x) + PIRQ_BASE)
 #define irq_to_pirq(_x)		((_x) - PIRQ_BASE)
 

--

^ permalink raw reply

* Re: [RFC, PATCH] avoid some atomics in open()/close() for monothreaded processes
From: Andrew Morton @ 2006-03-22  6:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel
In-Reply-To: <4420ED66.5060703@cosmosbay.com>

Eric Dumazet <dada1@cosmosbay.com> wrote:
>
> Goal : Avoid some locking/unlocking 'struct files_struct'->file_lock for mono 
> threaded processes.
> 
> We define files_multithreaded() function .
> 
> static inline int files_multithreaded(const struct files_struct *files)
> {
>         return sizeof(files->file_lock) > 0 && atomic_read(&files->count) > 1;
> }

That's bascially sizeof(spinlock_t).  That's architecture dependent and
varies wildly according to the day of week.

It _might_ work in all situations - probably you checked that.  But I still
wouldn't do it because it might break in the future.  Let's be explicit and
stick the appropriate ifdefs in there.

I'd also consider renaming it to files_shared() - processes are
multithreaded, not data structures.

Once you're done with that we should change fget_light() and fput_light() to
use this helper.  Separate patch.


^ permalink raw reply

* [RFC PATCH 35/35] Add Xen virtual block device driver.
From: Chris Wright @ 2006-03-22  6:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 34-blkfront --]
[-- Type: text/plain, Size: 36520 bytes --]

The block device frontend driver allows the kernel to access block
devices exported exported by a virtual machine containing a physical
block device driver.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 drivers/block/Kconfig           |    2 
 drivers/xen/Kconfig.blk         |   14 
 drivers/xen/Makefile            |    1 
 drivers/xen/blkfront/Makefile   |    5 
 drivers/xen/blkfront/blkfront.c |  812 ++++++++++++++++++++++++++++++++++++++++
 drivers/xen/blkfront/block.h    |  152 +++++++
 drivers/xen/blkfront/vbd.c      |  316 +++++++++++++++
 7 files changed, 1302 insertions(+)

--- xen-subarch-2.6.orig/drivers/block/Kconfig
+++ xen-subarch-2.6/drivers/block/Kconfig
@@ -450,6 +450,8 @@ config CDROM_PKTCDVD_WCACHE
 
 source "drivers/s390/block/Kconfig"
 
+source "drivers/xen/Kconfig.blk"
+
 config ATA_OVER_ETH
 	tristate "ATA over Ethernet support"
 	depends on NET
--- xen-subarch-2.6.orig/drivers/xen/Makefile
+++ xen-subarch-2.6/drivers/xen/Makefile
@@ -5,4 +5,5 @@ obj-y	+= util.o
 obj-y	+= console/
 obj-y	+= xenbus/
 
+obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	+= blkfront/
 obj-$(CONFIG_XEN_NETDEV_FRONTEND)	+= netfront/
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/Kconfig.blk
@@ -0,0 +1,14 @@
+menu "Xen block device drivers"
+        depends on XEN
+
+config XEN_BLKDEV_FRONTEND
+	tristate "Block device frontend driver"
+	depends on XEN
+	default y
+	help
+	  The block device frontend driver allows the kernel to access block
+	  devices exported from a device driver virtual machine. Unless you
+	  are building a dedicated device driver virtual machine, then you
+	  almost certainly want to say Y here.
+
+endmenu
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/blkfront/Makefile
@@ -0,0 +1,5 @@
+
+obj-$(CONFIG_XEN_BLKDEV_FRONTEND)	:= xenblk.o
+
+xenblk-objs := blkfront.o vbd.o
+
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/blkfront/blkfront.c
@@ -0,0 +1,812 @@
+/******************************************************************************
+ * blkfront.c
+ * 
+ * XenLinux virtual block device driver.
+ * 
+ * Copyright (c) 2003-2004, Keir Fraser & Steve Hand
+ * Modifications by Mark A. Williamson are (c) Intel Research Cambridge
+ * Copyright (c) 2004, Christian Limpach
+ * Copyright (c) 2004, Andrew Warfield
+ * Copyright (c) 2005, Christopher Clark
+ * Copyright (c) 2005, XenSource Ltd
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/version.h>
+#include "block.h"
+#include <linux/cdrom.h>
+#include <linux/sched.h>
+#include <linux/interrupt.h>
+#include <scsi/scsi.h>
+#include <xen/evtchn.h>
+#include <xen/xenbus.h>
+#include <xen/interface/grant_table.h>
+#include <xen/gnttab.h>
+#include <asm/hypervisor.h>
+
+#define BLKIF_STATE_DISCONNECTED 0
+#define BLKIF_STATE_CONNECTED    1
+#define BLKIF_STATE_SUSPENDED    2
+
+#define MAXIMUM_OUTSTANDING_BLOCK_REQS \
+    (BLKIF_MAX_SEGMENTS_PER_REQUEST * BLK_RING_SIZE)
+#define GRANT_INVALID_REF	0
+
+static void connect(struct blkfront_info *);
+static void blkfront_closing(struct xenbus_device *);
+static int blkfront_remove(struct xenbus_device *);
+static int talk_to_backend(struct xenbus_device *, struct blkfront_info *);
+static int setup_blkring(struct xenbus_device *, struct blkfront_info *);
+
+static void kick_pending_request_queues(struct blkfront_info *);
+
+static irqreturn_t blkif_int(int irq, void *dev_id, struct pt_regs *ptregs);
+static void blkif_restart_queue(void *arg);
+static void blkif_recover(struct blkfront_info *);
+static void blkif_completion(struct blk_shadow *);
+static void blkif_free(struct blkfront_info *, int);
+
+
+/**
+ * Entry point to this code when a new device is created.  Allocate the basic
+ * structures and the ring buffer for communication with the backend, and
+ * inform the backend of the appropriate details for those.  Switch to
+ * Initialised state.
+ */
+static int blkfront_probe(struct xenbus_device *dev,
+			  const struct xenbus_device_id *id)
+{
+	int err, vdevice, i;
+	struct blkfront_info *info;
+
+	/* FIXME: Use dynamic device id if this is not set. */
+	err = xenbus_scanf(XBT_NULL, dev->nodename,
+			   "virtual-device", "%i", &vdevice);
+	if (err != 1) {
+		xenbus_dev_fatal(dev, err, "reading virtual-device");
+		return err;
+	}
+
+	info = kmalloc(sizeof(*info), GFP_KERNEL);
+	if (!info) {
+		xenbus_dev_fatal(dev, -ENOMEM, "allocating info structure");
+		return -ENOMEM;
+	}
+
+	memset(info, 0, sizeof(*info));
+	info->xbdev = dev;
+	info->vdevice = vdevice;
+	info->connected = BLKIF_STATE_DISCONNECTED;
+	INIT_WORK(&info->work, blkif_restart_queue, (void *)info);
+
+	for (i = 0; i < BLK_RING_SIZE; i++)
+		info->shadow[i].req.id = i+1;
+	info->shadow[BLK_RING_SIZE-1].req.id = 0x0fffffff;
+
+	/* Front end dir is a number, which is used as the id. */
+	info->handle = simple_strtoul(strrchr(dev->nodename,'/')+1, NULL, 0);
+	dev->data = info;
+
+	err = talk_to_backend(dev, info);
+	if (err) {
+		kfree(info);
+		dev->data = NULL;
+		return err;
+	}
+
+	return 0;
+}
+
+
+/**
+ * We are reconnecting to the backend, due to a suspend/resume, or a backend
+ * driver restart.  We tear down our blkif structure and recreate it, but
+ * leave the device-layer structures intact so that this is transparent to the
+ * rest of the kernel.
+ */
+static int blkfront_resume(struct xenbus_device *dev)
+{
+	struct blkfront_info *info = dev->data;
+	int err;
+
+	DPRINTK("blkfront_resume: %s\n", dev->nodename);
+
+	blkif_free(info, 1);
+
+	err = talk_to_backend(dev, info);
+	if (!err)
+		blkif_recover(info);
+
+	return err;
+}
+
+
+/* Common code used when first setting up, and when resuming. */
+static int talk_to_backend(struct xenbus_device *dev,
+			   struct blkfront_info *info)
+{
+	const char *message = NULL;
+	xenbus_transaction_t xbt;
+	int err;
+
+	/* Create shared ring, alloc event channel. */
+	err = setup_blkring(dev, info);
+	if (err)
+		goto out;
+
+again:
+	err = xenbus_transaction_start(&xbt);
+	if (err) {
+		xenbus_dev_fatal(dev, err, "starting transaction");
+		goto destroy_blkring;
+	}
+
+	err = xenbus_printf(xbt, dev->nodename,
+			    "ring-ref","%u", info->ring_ref);
+	if (err) {
+		message = "writing ring-ref";
+		goto abort_transaction;
+	}
+	err = xenbus_printf(xbt, dev->nodename,
+			    "event-channel", "%u", info->evtchn);
+	if (err) {
+		message = "writing event-channel";
+		goto abort_transaction;
+	}
+
+	err = xenbus_switch_state(dev, xbt, XenbusStateInitialised);
+	if (err)
+		goto abort_transaction;
+
+	err = xenbus_transaction_end(xbt, 0);
+	if (err) {
+		if (err == -EAGAIN)
+			goto again;
+		xenbus_dev_fatal(dev, err, "completing transaction");
+		goto destroy_blkring;
+	}
+
+	return 0;
+
+ abort_transaction:
+	xenbus_transaction_end(xbt, 1);
+	if (message)
+		xenbus_dev_fatal(dev, err, "%s", message);
+ destroy_blkring:
+	blkif_free(info, 0);
+ out:
+	return err;
+}
+
+
+static int setup_blkring(struct xenbus_device *dev,
+			 struct blkfront_info *info)
+{
+	struct blkif_sring *sring;
+	int err;
+
+	info->ring_ref = GRANT_INVALID_REF;
+
+	sring = (struct blkif_sring *)__get_free_page(GFP_KERNEL);
+	if (!sring) {
+		xenbus_dev_fatal(dev, -ENOMEM, "allocating shared ring");
+		return -ENOMEM;
+	}
+	SHARED_RING_INIT(sring);
+	FRONT_RING_INIT(&info->ring, sring, PAGE_SIZE);
+
+	err = xenbus_grant_ring(dev, virt_to_mfn(info->ring.sring));
+	if (err < 0) {
+		free_page((unsigned long)sring);
+		info->ring.sring = NULL;
+		goto fail;
+	}
+	info->ring_ref = err;
+
+	err = xenbus_alloc_evtchn(dev, &info->evtchn);
+	if (err)
+		goto fail;
+
+	err = bind_evtchn_to_irqhandler(
+		info->evtchn, blkif_int, SA_SAMPLE_RANDOM, "blkif", info);
+	if (err <= 0) {
+		xenbus_dev_fatal(dev, err,
+				 "bind_evtchn_to_irqhandler failed");
+		goto fail;
+	}
+	info->irq = err;
+
+	return 0;
+fail:
+	blkif_free(info, 0);
+	return err;
+}
+
+
+/**
+ * Callback received when the backend's state changes.
+ */
+static void backend_changed(struct xenbus_device *dev,
+			    XenbusState backend_state)
+{
+	struct blkfront_info *info = dev->data;
+	struct block_device *bd;
+
+	DPRINTK("blkfront:backend_changed.\n");
+
+	switch (backend_state) {
+	case XenbusStateUnknown:
+	case XenbusStateInitialising:
+	case XenbusStateInitWait:
+	case XenbusStateInitialised:
+	case XenbusStateClosed:
+		break;
+
+	case XenbusStateConnected:
+		connect(info);
+		break;
+
+	case XenbusStateClosing:
+		bd = bdget(info->dev);
+		if (bd == NULL)
+			xenbus_dev_fatal(dev, -ENODEV, "bdget failed");
+
+		down(&bd->bd_sem);
+		if (info->users > 0)
+			xenbus_dev_error(dev, -EBUSY,
+					 "Device in use; refusing to close");
+		else
+			blkfront_closing(dev);
+		up(&bd->bd_sem);
+		bdput(bd);
+		break;
+	}
+}
+
+
+/* ** Connection ** */
+
+
+/*
+ * Invoked when the backend is finally 'ready' (and has told produced
+ * the details about the physical device - #sectors, size, etc).
+ */
+static void connect(struct blkfront_info *info)
+{
+	unsigned long sectors, sector_size;
+	unsigned int binfo;
+	int err;
+
+	if ((info->connected == BLKIF_STATE_CONNECTED) ||
+	    (info->connected == BLKIF_STATE_SUSPENDED) )
+		return;
+
+	DPRINTK("blkfront.c:connect:%s.\n", info->xbdev->otherend);
+
+	err = xenbus_gather(XBT_NULL, info->xbdev->otherend,
+			    "sectors", "%lu", &sectors,
+			    "info", "%u", &binfo,
+			    "sector-size", "%lu", &sector_size,
+			    NULL);
+	if (err) {
+		xenbus_dev_fatal(info->xbdev, err,
+				 "reading backend fields at %s",
+				 info->xbdev->otherend);
+		return;
+	}
+
+	err = xlvbd_add(sectors, info->vdevice, binfo, sector_size, info);
+	if (err) {
+		xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s",
+		                 info->xbdev->otherend);
+		return;
+	}
+
+	(void)xenbus_switch_state(info->xbdev, XBT_NULL, XenbusStateConnected);
+
+	/* Kick pending requests. */
+	spin_lock_irq(&blkif_io_lock);
+	info->connected = BLKIF_STATE_CONNECTED;
+	kick_pending_request_queues(info);
+	spin_unlock_irq(&blkif_io_lock);
+
+	add_disk(info->gd);
+}
+
+/**
+ * Handle the change of state of the backend to Closing.  We must delete our
+ * device-layer structures now, to ensure that writes are flushed through to
+ * the backend.  Once is this done, we can switch to Closed in
+ * acknowledgement.
+ */
+static void blkfront_closing(struct xenbus_device *dev)
+{
+	struct blkfront_info *info = dev->data;
+
+	DPRINTK("blkfront_closing: %s removed\n", dev->nodename);
+
+	xlvbd_del(info);
+
+	xenbus_switch_state(dev, XBT_NULL, XenbusStateClosed);
+}
+
+
+static int blkfront_remove(struct xenbus_device *dev)
+{
+	struct blkfront_info *info = dev->data;
+
+	DPRINTK("blkfront_remove: %s removed\n", dev->nodename);
+
+	blkif_free(info, 0);
+
+	kfree(info);
+
+	return 0;
+}
+
+
+static inline int GET_ID_FROM_FREELIST(
+	struct blkfront_info *info)
+{
+	unsigned long free = info->shadow_free;
+	BUG_ON(free > BLK_RING_SIZE);
+	info->shadow_free = info->shadow[free].req.id;
+	info->shadow[free].req.id = 0x0fffffee; /* debug */
+	return free;
+}
+
+static inline void ADD_ID_TO_FREELIST(
+	struct blkfront_info *info, unsigned long id)
+{
+	info->shadow[id].req.id  = info->shadow_free;
+	info->shadow[id].request = 0;
+	info->shadow_free = id;
+}
+
+static inline void flush_requests(struct blkfront_info *info)
+{
+	int notify;
+
+	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&info->ring, notify);
+
+	if (notify)
+		notify_remote_via_irq(info->irq);
+}
+
+static void kick_pending_request_queues(struct blkfront_info *info)
+{
+	if (!RING_FULL(&info->ring)) {
+		/* Re-enable calldowns. */
+		blk_start_queue(info->rq);
+		/* Kick things off immediately. */
+		do_blkif_request(info->rq);
+	}
+}
+
+static void blkif_restart_queue(void *arg)
+{
+	struct blkfront_info *info = (struct blkfront_info *)arg;
+	spin_lock_irq(&blkif_io_lock);
+	kick_pending_request_queues(info);
+	spin_unlock_irq(&blkif_io_lock);
+}
+
+static void blkif_restart_queue_callback(void *arg)
+{
+	struct blkfront_info *info = (struct blkfront_info *)arg;
+	schedule_work(&info->work);
+}
+
+int blkif_open(struct inode *inode, struct file *filep)
+{
+	struct blkfront_info *info = inode->i_bdev->bd_disk->private_data;
+	info->users++;
+	return 0;
+}
+
+
+int blkif_release(struct inode *inode, struct file *filep)
+{
+	struct blkfront_info *info = inode->i_bdev->bd_disk->private_data;
+	info->users--;
+	if (info->users == 0) {
+		/* Check whether we have been instructed to close.  We will
+		   have ignored this request initially, as the device was
+		   still mounted. */
+		struct xenbus_device * dev = info->xbdev;
+		XenbusState state = xenbus_read_driver_state(dev->otherend);
+
+		if (state == XenbusStateClosing)
+			blkfront_closing(dev);
+	}
+	return 0;
+}
+
+
+int blkif_ioctl(struct inode *inode, struct file *filep,
+                unsigned command, unsigned long argument)
+{
+	int i;
+
+	DPRINTK_IOCTL("command: 0x%x, argument: 0x%lx, dev: 0x%04x\n",
+		      command, (long)argument, inode->i_rdev);
+
+	switch (command) {
+	case HDIO_GETGEO:
+		/* return ENOSYS to use defaults */
+		return -ENOSYS;
+
+	case CDROMMULTISESSION:
+		DPRINTK("FIXME: support multisession CDs later\n");
+		for (i = 0; i < sizeof(struct cdrom_multisession); i++)
+			if (put_user(0, (char __user *)(argument + i)))
+				return -EFAULT;
+		return 0;
+
+	default:
+		/*printk(KERN_ALERT "ioctl %08x not supported by Xen blkdev\n",
+		  command);*/
+		return -EINVAL; /* same return as native Linux */
+	}
+
+	return 0;
+}
+
+
+/*
+ * blkif_queue_request
+ *
+ * request block io
+ *
+ * id: for guest use only.
+ * operation: BLKIF_OP_{READ,WRITE,PROBE}
+ * buffer: buffer to read/write into. this should be a
+ *   virtual address in the guest os.
+ */
+static int blkif_queue_request(struct request *req)
+{
+	struct blkfront_info *info = req->rq_disk->private_data;
+	unsigned long buffer_mfn;
+	struct blkif_request *ring_req;
+	struct bio *bio;
+	struct bio_vec *bvec;
+	int idx;
+	unsigned long id;
+	unsigned int fsect, lsect;
+	int ref;
+	grant_ref_t gref_head;
+
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
+		return 1;
+
+	if (gnttab_alloc_grant_references(
+		BLKIF_MAX_SEGMENTS_PER_REQUEST, &gref_head) < 0) {
+		gnttab_request_free_callback(
+			&info->callback,
+			blkif_restart_queue_callback,
+			info,
+			BLKIF_MAX_SEGMENTS_PER_REQUEST);
+		return 1;
+	}
+
+	/* Fill out a communications ring structure. */
+	ring_req = RING_GET_REQUEST(&info->ring, info->ring.req_prod_pvt);
+	id = GET_ID_FROM_FREELIST(info);
+	info->shadow[id].request = (unsigned long)req;
+
+	ring_req->id = id;
+	ring_req->operation = rq_data_dir(req) ?
+		BLKIF_OP_WRITE : BLKIF_OP_READ;
+	ring_req->sector_number = (blkif_sector_t)req->sector;
+	ring_req->handle = info->handle;
+
+	ring_req->nr_segments = 0;
+	rq_for_each_bio (bio, req) {
+		bio_for_each_segment (bvec, bio, idx) {
+			BUG_ON(ring_req->nr_segments
+			       == BLKIF_MAX_SEGMENTS_PER_REQUEST);
+			buffer_mfn = page_to_phys(bvec->bv_page) >> PAGE_SHIFT;
+			fsect = bvec->bv_offset >> 9;
+			lsect = fsect + (bvec->bv_len >> 9) - 1;
+			/* install a grant reference. */
+			ref = gnttab_claim_grant_reference(&gref_head);
+			BUG_ON(ref == -ENOSPC);
+
+			gnttab_grant_foreign_access_ref(
+				ref,
+				info->xbdev->otherend_id,
+				buffer_mfn,
+				rq_data_dir(req) );
+
+			info->shadow[id].frame[ring_req->nr_segments] =
+				mfn_to_pfn(buffer_mfn);
+
+			ring_req->seg[ring_req->nr_segments] =
+				(struct blkif_request_segment) {
+					.gref       = ref,
+					.first_sect = fsect,
+					.last_sect  = lsect };
+
+			ring_req->nr_segments++;
+		}
+	}
+
+	info->ring.req_prod_pvt++;
+
+	/* Keep a private copy so we can reissue requests when recovering. */
+	info->shadow[id].req = *ring_req;
+
+	gnttab_free_grant_references(gref_head);
+
+	return 0;
+}
+
+/*
+ * do_blkif_request
+ *  read a block; request is in a request queue
+ */
+void do_blkif_request(request_queue_t *rq)
+{
+	struct blkfront_info *info = NULL;
+	struct request *req;
+	int queued;
+
+	DPRINTK("Entered do_blkif_request\n");
+
+	queued = 0;
+
+	while ((req = elv_next_request(rq)) != NULL) {
+		info = req->rq_disk->private_data;
+		if (!blk_fs_request(req)) {
+			end_request(req, 0);
+			continue;
+		}
+
+		if (RING_FULL(&info->ring))
+			goto wait;
+
+		DPRINTK("do_blk_req %p: cmd %p, sec %lx, "
+			"(%u/%li) buffer:%p [%s]\n",
+			req, req->cmd, req->sector, req->current_nr_sectors,
+			req->nr_sectors, req->buffer,
+			rq_data_dir(req) ? "write" : "read");
+
+
+		blkdev_dequeue_request(req);
+		if (blkif_queue_request(req)) {
+			blk_requeue_request(rq, req);
+		wait:
+			/* Avoid pointless unplugs. */
+			blk_stop_queue(rq);
+			break;
+		}
+
+		queued++;
+	}
+
+	if (queued != 0)
+		flush_requests(info);
+}
+
+
+static irqreturn_t blkif_int(int irq, void *dev_id, struct pt_regs *ptregs)
+{
+	struct request *req;
+	struct blkif_response *bret;
+	RING_IDX i, rp;
+	unsigned long flags;
+	struct blkfront_info *info = (struct blkfront_info *)dev_id;
+
+	spin_lock_irqsave(&blkif_io_lock, flags);
+
+	if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) {
+		spin_unlock_irqrestore(&blkif_io_lock, flags);
+		return IRQ_HANDLED;
+	}
+
+ again:
+	rp = info->ring.sring->rsp_prod;
+	rmb(); /* Ensure we see queued responses up to 'rp'. */
+
+	for (i = info->ring.rsp_cons; i != rp; i++) {
+		unsigned long id;
+		int ret;
+
+		bret = RING_GET_RESPONSE(&info->ring, i);
+		id   = bret->id;
+		req  = (struct request *)info->shadow[id].request;
+
+		blkif_completion(&info->shadow[id]);
+
+		ADD_ID_TO_FREELIST(info, id);
+
+		switch (bret->operation) {
+		case BLKIF_OP_READ:
+		case BLKIF_OP_WRITE:
+			if (unlikely(bret->status != BLKIF_RSP_OKAY))
+				DPRINTK("Bad return from blkdev data "
+					"request: %x\n", bret->status);
+
+			ret = end_that_request_first(
+				req, (bret->status == BLKIF_RSP_OKAY),
+				req->hard_nr_sectors);
+			BUG_ON(ret);
+			end_that_request_last(
+				req, (bret->status == BLKIF_RSP_OKAY));
+			break;
+		default:
+			BUG();
+		}
+	}
+
+	info->ring.rsp_cons = i;
+
+	if (i != info->ring.req_prod_pvt) {
+		int more_to_do;
+		RING_FINAL_CHECK_FOR_RESPONSES(&info->ring, more_to_do);
+		if (more_to_do)
+			goto again;
+	} else
+		info->ring.sring->rsp_event = i + 1;
+
+	kick_pending_request_queues(info);
+
+	spin_unlock_irqrestore(&blkif_io_lock, flags);
+
+	return IRQ_HANDLED;
+}
+
+static void blkif_free(struct blkfront_info *info, int suspend)
+{
+	/* Prevent new requests being issued until we fix things up. */
+	spin_lock_irq(&blkif_io_lock);
+	info->connected = suspend ?
+		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
+	spin_unlock_irq(&blkif_io_lock);
+
+	/* Free resources associated with old device channel. */
+	if (info->ring_ref != GRANT_INVALID_REF) {
+		gnttab_end_foreign_access(info->ring_ref, 0,
+					  (unsigned long)info->ring.sring);
+		info->ring_ref = GRANT_INVALID_REF;
+		info->ring.sring = NULL;
+	}
+	if (info->irq)
+		unbind_from_irqhandler(info->irq, info);
+	info->evtchn = info->irq = 0;
+
+}
+
+static void blkif_completion(struct blk_shadow *s)
+{
+	int i;
+	for (i = 0; i < s->req.nr_segments; i++)
+		gnttab_end_foreign_access(s->req.seg[i].gref, 0, 0UL);
+}
+
+static void blkif_recover(struct blkfront_info *info)
+{
+	int i;
+	struct blkif_request *req;
+	struct blk_shadow *copy;
+	int j;
+
+	/* Stage 1: Make a safe copy of the shadow state. */
+	copy = kmalloc(sizeof(info->shadow), GFP_KERNEL | __GFP_NOFAIL);
+	memcpy(copy, info->shadow, sizeof(info->shadow));
+
+	/* Stage 2: Set up free list. */
+	memset(&info->shadow, 0, sizeof(info->shadow));
+	for (i = 0; i < BLK_RING_SIZE; i++)
+		info->shadow[i].req.id = i+1;
+	info->shadow_free = info->ring.req_prod_pvt;
+	info->shadow[BLK_RING_SIZE-1].req.id = 0x0fffffff;
+
+	/* Stage 3: Find pending requests and requeue them. */
+	for (i = 0; i < BLK_RING_SIZE; i++) {
+		/* Not in use? */
+		if (copy[i].request == 0)
+			continue;
+
+		/* Grab a request slot and copy shadow state into it. */
+		req = RING_GET_REQUEST(
+			&info->ring, info->ring.req_prod_pvt);
+		*req = copy[i].req;
+
+		/* We get a new request id, and must reset the shadow state. */
+		req->id = GET_ID_FROM_FREELIST(info);
+		memcpy(&info->shadow[req->id], &copy[i], sizeof(copy[i]));
+
+		/* Rewrite any grant references invalidated by susp/resume. */
+		for (j = 0; j < req->nr_segments; j++)
+			gnttab_grant_foreign_access_ref(
+				req->seg[j].gref,
+				info->xbdev->otherend_id,
+				pfn_to_mfn(info->shadow[req->id].frame[j]),
+				rq_data_dir(
+					(struct request *)
+					info->shadow[req->id].request));
+		info->shadow[req->id].req = *req;
+
+		info->ring.req_prod_pvt++;
+	}
+
+	kfree(copy);
+
+	(void)xenbus_switch_state(info->xbdev, XBT_NULL, XenbusStateConnected);
+
+	/* Now safe for us to use the shared ring */
+	spin_lock_irq(&blkif_io_lock);
+	info->connected = BLKIF_STATE_CONNECTED;
+	spin_unlock_irq(&blkif_io_lock);
+
+	/* Send off requeued requests */
+	flush_requests(info);
+
+	/* Kick any other new requests queued since we resumed */
+	spin_lock_irq(&blkif_io_lock);
+	kick_pending_request_queues(info);
+	spin_unlock_irq(&blkif_io_lock);
+}
+
+
+/* ** Driver Registration ** */
+
+
+static struct xenbus_device_id blkfront_ids[] = {
+	{ "vbd" },
+	{ "" }
+};
+
+
+static struct xenbus_driver blkfront = {
+	.name = "vbd",
+	.owner = THIS_MODULE,
+	.ids = blkfront_ids,
+	.probe = blkfront_probe,
+	.remove = blkfront_remove,
+	.resume = blkfront_resume,
+	.otherend_changed = backend_changed,
+};
+
+
+static int __init xlblk_init(void)
+{
+	if (xen_init() < 0)
+		return -ENODEV;
+
+	return xenbus_register_frontend(&blkfront);
+}
+module_init(xlblk_init);
+
+
+static void xlblk_exit(void)
+{
+	return xenbus_unregister_driver(&blkfront);
+}
+module_exit(xlblk_exit);
+
+MODULE_LICENSE("Dual BSD/GPL");
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/blkfront/block.h
@@ -0,0 +1,152 @@
+/******************************************************************************
+ * block.h
+ * 
+ * Shared definitions between all levels of XenLinux Virtual block devices.
+ * 
+ * Copyright (c) 2003-2004, Keir Fraser & Steve Hand
+ * Modifications by Mark A. Williamson are (c) Intel Research Cambridge
+ * Copyright (c) 2004-2005, Christian Limpach
+ * 
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_DRIVERS_BLOCK_H__
+#define __XEN_DRIVERS_BLOCK_H__
+
+#include <linux/config.h>
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/hdreg.h>
+#include <linux/blkdev.h>
+#include <linux/major.h>
+#include <linux/devfs_fs_kernel.h>
+#include <asm/hypervisor.h>
+#include <xen/xenbus.h>
+#include <xen/gnttab.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/io/blkif.h>
+#include <xen/interface/io/ring.h>
+#include <asm/io.h>
+#include <asm/atomic.h>
+#include <asm/uaccess.h>
+
+#if 1
+#define IPRINTK(fmt, args...) \
+    printk(KERN_INFO "xen_blk: " fmt, ##args)
+#else
+#define IPRINTK(fmt, args...) ((void)0)
+#endif
+
+#if 1
+#define WPRINTK(fmt, args...) \
+    printk(KERN_WARNING "xen_blk: " fmt, ##args)
+#else
+#define WPRINTK(fmt, args...) ((void)0)
+#endif
+
+#define DPRINTK(_f, _a...) pr_debug(_f, ## _a)
+
+#if 0
+#define DPRINTK_IOCTL(_f, _a...) printk(KERN_ALERT _f, ## _a)
+#else
+#define DPRINTK_IOCTL(_f, _a...) ((void)0)
+#endif
+
+struct xlbd_type_info
+{
+	int partn_shift;
+	int disks_per_major;
+	char *devname;
+	char *diskname;
+};
+
+struct xlbd_major_info
+{
+	int major;
+	int index;
+	int usage;
+	struct xlbd_type_info *type;
+};
+
+struct blk_shadow {
+	struct blkif_request req;
+	unsigned long request;
+	unsigned long frame[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+};
+
+#define BLK_RING_SIZE __RING_SIZE((struct blkif_sring *)0, PAGE_SIZE)
+
+/*
+ * We have one of these per vbd, whether ide, scsi or 'other'.  They
+ * hang in private_data off the gendisk structure. We may end up
+ * putting all kinds of interesting stuff here :-)
+ */
+struct blkfront_info
+{
+	struct xenbus_device *xbdev;
+	dev_t dev;
+ 	struct gendisk *gd;
+	int vdevice;
+	blkif_vdev_t handle;
+	int connected;
+	int ring_ref;
+	struct blkif_front_ring ring;
+	unsigned int evtchn, irq;
+	struct xlbd_major_info *mi;
+	request_queue_t *rq;
+	struct work_struct work;
+	struct gnttab_free_callback callback;
+	struct blk_shadow shadow[BLK_RING_SIZE];
+	unsigned long shadow_free;
+
+	/**
+	 * The number of people holding this device open.  We won't allow a
+	 * hot-unplug unless this is 0.
+	 */
+	int users;
+};
+
+extern spinlock_t blkif_io_lock;
+
+extern int blkif_open(struct inode *inode, struct file *filep);
+extern int blkif_release(struct inode *inode, struct file *filep);
+extern int blkif_ioctl(struct inode *inode, struct file *filep,
+                       unsigned command, unsigned long argument);
+extern int blkif_check(dev_t dev);
+extern int blkif_revalidate(dev_t dev);
+extern void do_blkif_request (request_queue_t *rq);
+
+/* Virtual block device subsystem. */
+/* Note that xlvbd_add doesn't call add_disk for you: you're expected
+   to call add_disk on info->gd once the disk is properly connected
+   up. */
+int xlvbd_add(blkif_sector_t capacity, int device,
+	      u16 vdisk_info, u16 sector_size, struct blkfront_info *info);
+void xlvbd_del(struct blkfront_info *info);
+
+#endif /* __XEN_DRIVERS_BLOCK_H__ */
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/blkfront/vbd.c
@@ -0,0 +1,316 @@
+/******************************************************************************
+ * vbd.c
+ * 
+ * XenLinux virtual block device driver (xvd).
+ * 
+ * Copyright (c) 2003-2004, Keir Fraser & Steve Hand
+ * Modifications by Mark A. Williamson are (c) Intel Research Cambridge
+ * Copyright (c) 2004-2005, Christian Limpach
+ * 
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "block.h"
+#include <linux/blkdev.h>
+#include <linux/list.h>
+
+#define BLKIF_MAJOR(dev) ((dev)>>8)
+#define BLKIF_MINOR(dev) ((dev) & 0xff)
+
+/*
+ * For convenience we distinguish between ide, scsi and 'other' (i.e.,
+ * potentially combinations of the two) in the naming scheme and in a few other
+ * places.
+ */
+
+#define NUM_IDE_MAJORS 10
+#define NUM_SCSI_MAJORS 9
+#define NUM_VBD_MAJORS 1
+
+static struct xlbd_type_info xlbd_ide_type = {
+	.partn_shift = 6,
+	.disks_per_major = 2,
+	.devname = "ide",
+	.diskname = "hd",
+};
+
+static struct xlbd_type_info xlbd_scsi_type = {
+	.partn_shift = 4,
+	.disks_per_major = 16,
+	.devname = "sd",
+	.diskname = "sd",
+};
+
+static struct xlbd_type_info xlbd_vbd_type = {
+	.partn_shift = 4,
+	.disks_per_major = 16,
+	.devname = "xvd",
+	.diskname = "xvd",
+};
+
+static struct xlbd_major_info *major_info[NUM_IDE_MAJORS + NUM_SCSI_MAJORS +
+					 NUM_VBD_MAJORS];
+
+#define XLBD_MAJOR_IDE_START	0
+#define XLBD_MAJOR_SCSI_START	(NUM_IDE_MAJORS)
+#define XLBD_MAJOR_VBD_START	(NUM_IDE_MAJORS + NUM_SCSI_MAJORS)
+
+#define XLBD_MAJOR_IDE_RANGE	XLBD_MAJOR_IDE_START ... XLBD_MAJOR_SCSI_START - 1
+#define XLBD_MAJOR_SCSI_RANGE	XLBD_MAJOR_SCSI_START ... XLBD_MAJOR_VBD_START - 1
+#define XLBD_MAJOR_VBD_RANGE	XLBD_MAJOR_VBD_START ... XLBD_MAJOR_VBD_START + NUM_VBD_MAJORS - 1
+
+/* Information about our VBDs. */
+#define MAX_VBDS 64
+static LIST_HEAD(vbds_list);
+
+static struct block_device_operations xlvbd_block_fops =
+{
+	.owner = THIS_MODULE,
+	.open = blkif_open,
+	.release = blkif_release,
+	.ioctl  = blkif_ioctl,
+};
+
+spinlock_t blkif_io_lock = SPIN_LOCK_UNLOCKED;
+
+static struct xlbd_major_info *
+xlbd_alloc_major_info(int major, int minor, int index)
+{
+	struct xlbd_major_info *ptr;
+
+	ptr = kmalloc(sizeof(struct xlbd_major_info), GFP_KERNEL);
+	if (ptr == NULL)
+		return NULL;
+
+	memset(ptr, 0, sizeof(struct xlbd_major_info));
+
+	ptr->major = major;
+
+	switch (index) {
+	case XLBD_MAJOR_IDE_RANGE:
+		ptr->type = &xlbd_ide_type;
+		ptr->index = index - XLBD_MAJOR_IDE_START;
+		break;
+	case XLBD_MAJOR_SCSI_RANGE:
+		ptr->type = &xlbd_scsi_type;
+		ptr->index = index - XLBD_MAJOR_SCSI_START;
+		break;
+	case XLBD_MAJOR_VBD_RANGE:
+		ptr->type = &xlbd_vbd_type;
+		ptr->index = index - XLBD_MAJOR_VBD_START;
+		break;
+	}
+
+	printk("Registering block device major %i\n", ptr->major);
+	if (register_blkdev(ptr->major, ptr->type->devname)) {
+		WPRINTK("can't get major %d with name %s\n",
+			ptr->major, ptr->type->devname);
+		kfree(ptr);
+		return NULL;
+	}
+
+	devfs_mk_dir(ptr->type->devname);
+	major_info[index] = ptr;
+	return ptr;
+}
+
+static struct xlbd_major_info *
+xlbd_get_major_info(int vdevice)
+{
+	struct xlbd_major_info *mi;
+	int major, minor, index;
+
+	major = BLKIF_MAJOR(vdevice);
+	minor = BLKIF_MINOR(vdevice);
+
+	switch (major) {
+	case IDE0_MAJOR: index = 0; break;
+	case IDE1_MAJOR: index = 1; break;
+	case IDE2_MAJOR: index = 2; break;
+	case IDE3_MAJOR: index = 3; break;
+	case IDE4_MAJOR: index = 4; break;
+	case IDE5_MAJOR: index = 5; break;
+	case IDE6_MAJOR: index = 6; break;
+	case IDE7_MAJOR: index = 7; break;
+	case IDE8_MAJOR: index = 8; break;
+	case IDE9_MAJOR: index = 9; break;
+	case SCSI_DISK0_MAJOR: index = 10; break;
+	case SCSI_DISK1_MAJOR ... SCSI_DISK7_MAJOR:
+		index = 11 + major - SCSI_DISK1_MAJOR;
+		break;
+	case SCSI_CDROM_MAJOR: index = 18; break;
+	default: index = 19; break;
+	}
+
+	mi = ((major_info[index] != NULL) ? major_info[index] :
+	      xlbd_alloc_major_info(major, minor, index));
+	if (mi)
+		mi->usage++;
+	return mi;
+}
+
+static void
+xlbd_put_major_info(struct xlbd_major_info *mi)
+{
+	mi->usage--;
+	/* XXX: release major if 0 */
+}
+
+static int
+xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size)
+{
+	request_queue_t *rq;
+
+	rq = blk_init_queue(do_blkif_request, &blkif_io_lock);
+	if (rq == NULL)
+		return -1;
+
+	elevator_init(rq, "noop");
+
+	/* Hard sector size and max sectors impersonate the equiv. hardware. */
+	blk_queue_hardsect_size(rq, sector_size);
+	blk_queue_max_sectors(rq, 512);
+
+	/* Each segment in a request is up to an aligned page in size. */
+	blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
+	blk_queue_max_segment_size(rq, PAGE_SIZE);
+
+	/* Ensure a merged request will fit in a single I/O ring slot. */
+	blk_queue_max_phys_segments(rq, BLKIF_MAX_SEGMENTS_PER_REQUEST);
+	blk_queue_max_hw_segments(rq, BLKIF_MAX_SEGMENTS_PER_REQUEST);
+
+	/* Make sure buffer addresses are sector-aligned. */
+	blk_queue_dma_alignment(rq, 511);
+
+	gd->queue = rq;
+
+	return 0;
+}
+
+static int
+xlvbd_alloc_gendisk(int minor, blkif_sector_t capacity, int vdevice,
+		    u16 vdisk_info, u16 sector_size,
+		    struct blkfront_info *info)
+{
+	struct gendisk *gd;
+	struct xlbd_major_info *mi;
+	int nr_minors = 1;
+	int err = -ENODEV;
+
+	BUG_ON(info->gd != NULL);
+	BUG_ON(info->mi != NULL);
+	BUG_ON(info->rq != NULL);
+
+	mi = xlbd_get_major_info(vdevice);
+	if (mi == NULL)
+		goto out;
+	info->mi = mi;
+
+	if ((minor & ((1 << mi->type->partn_shift) - 1)) == 0)
+		nr_minors = 1 << mi->type->partn_shift;
+
+	gd = alloc_disk(nr_minors);
+	if (gd == NULL)
+		goto out;
+
+	if (nr_minors > 1)
+		sprintf(gd->disk_name, "%s%c", mi->type->diskname,
+			'a' + mi->index * mi->type->disks_per_major +
+			(minor >> mi->type->partn_shift));
+	else
+		sprintf(gd->disk_name, "%s%c%d", mi->type->diskname,
+			'a' + mi->index * mi->type->disks_per_major +
+			(minor >> mi->type->partn_shift),
+			minor & ((1 << mi->type->partn_shift) - 1));
+
+	gd->major = mi->major;
+	gd->first_minor = minor;
+	gd->fops = &xlvbd_block_fops;
+	gd->private_data = info;
+	gd->driverfs_dev = &(info->xbdev->dev);
+	set_capacity(gd, capacity);
+
+	if (xlvbd_init_blk_queue(gd, sector_size)) {
+		del_gendisk(gd);
+		goto out;
+	}
+
+	info->rq = gd->queue;
+
+	if (vdisk_info & VDISK_READONLY)
+		set_disk_ro(gd, 1);
+
+	if (vdisk_info & VDISK_REMOVABLE)
+		gd->flags |= GENHD_FL_REMOVABLE;
+
+	if (vdisk_info & VDISK_CDROM)
+		gd->flags |= GENHD_FL_CD;
+
+	info->gd = gd;
+
+	return 0;
+
+ out:
+	if (mi)
+		xlbd_put_major_info(mi);
+	info->mi = NULL;
+	return err;
+}
+
+int
+xlvbd_add(blkif_sector_t capacity, int vdevice, u16 vdisk_info,
+	  u16 sector_size, struct blkfront_info *info)
+{
+	struct block_device *bd;
+	int err = 0;
+
+	info->dev = MKDEV(BLKIF_MAJOR(vdevice), BLKIF_MINOR(vdevice));
+
+	bd = bdget(info->dev);
+	if (bd == NULL)
+		return -ENODEV;
+
+	err = xlvbd_alloc_gendisk(BLKIF_MINOR(vdevice), capacity, vdevice,
+				  vdisk_info, sector_size, info);
+
+	bdput(bd);
+	return err;
+}
+
+void
+xlvbd_del(struct blkfront_info *info)
+{
+	if (info->mi == NULL)
+		return;
+
+	BUG_ON(info->gd == NULL);
+	del_gendisk(info->gd);
+	put_disk(info->gd);
+	info->gd = NULL;
+
+	xlbd_put_major_info(info->mi);
+	info->mi = NULL;
+
+	BUG_ON(info->rq == NULL);
+	blk_cleanup_queue(info->rq);
+	info->rq = NULL;
+}

--

^ permalink raw reply

* [RFC PATCH 21/35] subarch TLB support
From: Chris Wright @ 2006-03-22  6:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 20-i386-tlbflush --]
[-- Type: text/plain, Size: 5220 bytes --]

Paravirtualize TLB flushes by using the flush interfaces provided by
the hypervisor. These hide the details of cross-CPU shootdowns and
allow significant optimisations (for example, by avoiding shooting
down on virtual CPUs that are descheduled). This is considerably
faster in most cases than performing virtual IPIs in the guest kernel.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 include/asm-i386/mach-default/mach_tlbflush.h |   59 ++++++++++++++++++++++++++
 include/asm-i386/mach-xen/mach_tlbflush.h     |   25 +++++++++++
 include/asm-i386/tlbflush.h                   |   55 ------------------------
 3 files changed, 85 insertions(+), 54 deletions(-)

--- xen-subarch-2.6.orig/include/asm-i386/tlbflush.h
+++ xen-subarch-2.6/include/asm-i386/tlbflush.h
@@ -5,64 +5,11 @@
 #include <linux/mm.h>
 #include <asm/processor.h>
 
-#define __flush_tlb()							\
-	do {								\
-		unsigned int tmpreg;					\
-									\
-		__asm__ __volatile__(					\
-			"movl %%cr3, %0;              \n"		\
-			"movl %0, %%cr3;  # flush TLB \n"		\
-			: "=r" (tmpreg)					\
-			:: "memory");					\
-	} while (0)
-
-/*
- * Global pages have to be flushed a bit differently. Not a real
- * performance problem because this does not happen often.
- */
-#define __flush_tlb_global()						\
-	do {								\
-		unsigned int tmpreg, cr4, cr4_orig;			\
-									\
-		__asm__ __volatile__(					\
-			"movl %%cr4, %2;  # turn off PGE     \n"	\
-			"movl %2, %1;                        \n"	\
-			"andl %3, %1;                        \n"	\
-			"movl %1, %%cr4;                     \n"	\
-			"movl %%cr3, %0;                     \n"	\
-			"movl %0, %%cr3;  # flush TLB        \n"	\
-			"movl %2, %%cr4;  # turn PGE back on \n"	\
-			: "=&r" (tmpreg), "=&r" (cr4), "=&r" (cr4_orig)	\
-			: "i" (~X86_CR4_PGE)				\
-			: "memory");					\
-	} while (0)
-
 extern unsigned long pgkern_mask;
 
-# define __flush_tlb_all()						\
-	do {								\
-		if (cpu_has_pge)					\
-			__flush_tlb_global();				\
-		else							\
-			__flush_tlb();					\
-	} while (0)
-
 #define cpu_has_invlpg	(boot_cpu_data.x86 > 3)
 
-#define __flush_tlb_single(addr) \
-	__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
-
-#ifdef CONFIG_X86_INVLPG
-# define __flush_tlb_one(addr) __flush_tlb_single(addr)
-#else
-# define __flush_tlb_one(addr)						\
-	do {								\
-		if (cpu_has_invlpg)					\
-			__flush_tlb_single(addr);			\
-		else							\
-			__flush_tlb();					\
-	} while (0)
-#endif
+#include <mach_tlbflush.h>
 
 /*
  * TLB flushing:
--- /dev/null
+++ xen-subarch-2.6/include/asm-i386/mach-default/mach_tlbflush.h
@@ -0,0 +1,59 @@
+#ifndef __ASM_MACH_TLBFLUSH_H
+#define __ASM_MACH_TLBFLUSH_H
+
+#define __flush_tlb()							\
+	do {								\
+		unsigned int tmpreg;					\
+									\
+		__asm__ __volatile__(					\
+			"movl %%cr3, %0;              \n"		\
+			"movl %0, %%cr3;  # flush TLB \n"		\
+			: "=r" (tmpreg)					\
+			:: "memory");					\
+	} while (0)
+
+/*
+ * Global pages have to be flushed a bit differently. Not a real
+ * performance problem because this does not happen often.
+ */
+#define __flush_tlb_global()						\
+	do {								\
+		unsigned int tmpreg, cr4, cr4_orig;			\
+									\
+		__asm__ __volatile__(					\
+			"movl %%cr4, %2;  # turn off PGE     \n"	\
+			"movl %2, %1;                        \n"	\
+			"andl %3, %1;                        \n"	\
+			"movl %1, %%cr4;                     \n"	\
+			"movl %%cr3, %0;                     \n"	\
+			"movl %0, %%cr3;  # flush TLB        \n"	\
+			"movl %2, %%cr4;  # turn PGE back on \n"	\
+			: "=&r" (tmpreg), "=&r" (cr4), "=&r" (cr4_orig)	\
+			: "i" (~X86_CR4_PGE)				\
+			: "memory");					\
+	} while (0)
+
+#define __flush_tlb_all()						\
+	do {								\
+		if (cpu_has_pge)					\
+			__flush_tlb_global();				\
+		else							\
+			__flush_tlb();					\
+	} while (0)
+
+#define __flush_tlb_single(addr) \
+	__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
+
+#ifdef CONFIG_X86_INVLPG
+# define __flush_tlb_one(addr) __flush_tlb_single(addr)
+#else
+# define __flush_tlb_one(addr)						\
+	do {								\
+		if (cpu_has_invlpg)					\
+			__flush_tlb_single(addr);			\
+		else							\
+			__flush_tlb();					\
+	} while (0)
+#endif
+
+#endif /* __ASM_MACH_TLBFLUSH_H */
--- /dev/null
+++ xen-subarch-2.6/include/asm-i386/mach-xen/mach_tlbflush.h
@@ -0,0 +1,25 @@
+#ifndef __ASM_MACH_TLBFLUSH_H
+#define __ASM_MACH_TLBFLUSH_H
+
+static inline void xen_tlb_flush(void)
+{
+        struct mmuext_op op;
+        op.cmd = MMUEXT_TLB_FLUSH_LOCAL;
+        BUG_ON(HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF) < 0);
+}
+
+static inline void xen_invlpg(unsigned long ptr)
+{
+        struct mmuext_op op;
+        op.cmd = MMUEXT_INVLPG_LOCAL;
+        op.arg1.linear_addr = ptr & PAGE_MASK;
+        BUG_ON(HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF) < 0);
+}
+
+#define __flush_tlb() xen_tlb_flush()
+#define __flush_tlb_global() xen_tlb_flush()
+#define __flush_tlb_all() xen_tlb_flush()
+#define __flush_tlb_single(addr) xen_invlpg(addr)
+#define __flush_tlb_one(addr) __flush_tlb_single(addr)
+
+#endif /* __ASM_MACH_TLBFLUSH_H */

--

^ permalink raw reply

* [RFC PATCH 29/35] Add the Xen virtual console driver.
From: Chris Wright @ 2006-03-22  6:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 28-xen-console --]
[-- Type: text/plain, Size: 21185 bytes --]

This provides a bootstrap and ongoing emergency console which is
intended to be available from very early during boot and at all times
thereafter, in contrast with alternatives such as UDP-based syslogd,
or logging in via ssh. The protocol is based on a simple shared-memory
ring buffer.

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 drivers/Makefile                   |    1 
 drivers/char/tty_io.c              |    7 
 drivers/xen/Makefile               |    2 
 drivers/xen/console/Makefile       |    2 
 drivers/xen/console/console.c      |  633 +++++++++++++++++++++++++++++++++++++
 drivers/xen/console/xencons_ring.c |  115 ++++++
 include/xen/xencons.h              |   14 
 7 files changed, 773 insertions(+), 1 deletion(-)

--- xen-subarch-2.6.orig/drivers/Makefile
+++ xen-subarch-2.6/drivers/Makefile
@@ -34,6 +34,7 @@ obj-y				+= base/ block/ misc/ mfd/ net/
 obj-$(CONFIG_NUBUS)		+= nubus/
 obj-$(CONFIG_ATM)		+= atm/
 obj-$(CONFIG_PPC_PMAC)		+= macintosh/
+obj-$(CONFIG_XEN)		+= xen/
 obj-$(CONFIG_IDE)		+= ide/
 obj-$(CONFIG_FC4)		+= fc4/
 obj-$(CONFIG_SCSI)		+= scsi/
--- xen-subarch-2.6.orig/drivers/char/tty_io.c
+++ xen-subarch-2.6/drivers/char/tty_io.c
@@ -132,6 +132,8 @@ LIST_HEAD(tty_drivers);			/* linked list
    vt.c for deeply disgusting hack reasons */
 DECLARE_MUTEX(tty_sem);
 
+int console_use_vt = 1;
+
 #ifdef CONFIG_UNIX98_PTYS
 extern struct tty_driver *ptm_driver;	/* Unix98 pty masters; for /dev/ptmx */
 extern int pty_limit;		/* Config limit on Unix98 ptys */
@@ -2054,7 +2056,7 @@ retry_open:
 		goto got_driver;
 	}
 #ifdef CONFIG_VT
-	if (device == MKDEV(TTY_MAJOR,0)) {
+	if (console_use_vt && (device == MKDEV(TTY_MAJOR,0))) {
 		extern struct tty_driver *console_driver;
 		driver = console_driver;
 		index = fg_console;
@@ -3237,6 +3239,8 @@ static int __init tty_init(void)
 #endif
 
 #ifdef CONFIG_VT
+	if (!console_use_vt)
+		goto out_vt;
 	cdev_init(&vc0_cdev, &console_fops);
 	if (cdev_add(&vc0_cdev, MKDEV(TTY_MAJOR, 0), 1) ||
 	    register_chrdev_region(MKDEV(TTY_MAJOR, 0), 1, "/dev/vc/0") < 0)
@@ -3245,6 +3249,7 @@ static int __init tty_init(void)
 	class_device_create(tty_class, NULL, MKDEV(TTY_MAJOR, 0), NULL, "tty0");
 
 	vty_init();
+ out_vt:
 #endif
 	return 0;
 }
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/Makefile
@@ -0,0 +1,2 @@
+
+obj-y	+= console/
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/console/Makefile
@@ -0,0 +1,2 @@
+
+obj-y	:= console.o xencons_ring.o
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/console/console.c
@@ -0,0 +1,633 @@
+/******************************************************************************
+ * console.c
+ * 
+ * Virtual console driver.
+ * 
+ * Copyright (c) 2002-2004, K A Fraser.
+ * 
+ * This file may be distributed separately from the Linux kernel, or
+ * incorporated into other software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/config.h>
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/signal.h>
+#include <linux/sched.h>
+#include <linux/interrupt.h>
+#include <linux/tty.h>
+#include <linux/tty_flip.h>
+#include <linux/serial.h>
+#include <linux/major.h>
+#include <linux/ptrace.h>
+#include <linux/ioport.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/console.h>
+#include <linux/bootmem.h>
+#include <linux/sysrq.h>
+#include <asm/io.h>
+#include <asm/irq.h>
+#include <asm/uaccess.h>
+#include <xen/interface/xen.h>
+#include <xen/interface/event_channel.h>
+#include <asm/hypervisor.h>
+#include <xen/evtchn.h>
+#include <xen/xencons.h>
+
+/*
+ * Modes:
+ *  'xencons=off'  [XC_OFF]:     Console is disabled.
+ *  'xencons=tty'  [XC_TTY]:     Console attached to '/dev/tty[0-9]+'.
+ *  'xencons=ttyS' [XC_SERIAL]:  Console attached to '/dev/ttyS[0-9]+'.
+ *                 [XC_DEFAULT]: DOM0 -> XC_SERIAL ; all others -> XC_TTY.
+ * 
+ * NB. In mode XC_TTY, we create dummy consoles for tty2-63. This suppresses
+ * warnings from standard distro startup scripts.
+ */
+static enum { XC_OFF, XC_DEFAULT, XC_TTY, XC_SERIAL } xc_mode = XC_DEFAULT;
+static int xc_num = -1;
+
+#ifdef CONFIG_MAGIC_SYSRQ
+static unsigned long sysrq_requested;
+extern int sysrq_enabled;
+#endif
+
+static int __init xencons_setup(char *str)
+{
+	char *q;
+	int n;
+
+	if (!strncmp(str, "ttyS", 4))
+		xc_mode = XC_SERIAL;
+	else if (!strncmp(str, "tty", 3))
+		xc_mode = XC_TTY;
+	else if (!strncmp(str, "off", 3))
+		xc_mode = XC_OFF;
+
+	switch (xc_mode) {
+	case XC_SERIAL:
+		n = simple_strtol(str+4, &q, 10);
+		if (q > (str + 4))
+			xc_num = n;
+		break;
+	case XC_TTY:
+		n = simple_strtol(str+3, &q, 10);
+		if (q > (str + 3))
+			xc_num = n;
+		break;
+	default:
+		break;
+	}
+
+	return 1;
+}
+__setup("xencons=", xencons_setup);
+
+/* The kernel and user-land drivers share a common transmit buffer. */
+static unsigned int wbuf_size = 4096;
+#define WBUF_MASK(_i) ((_i)&(wbuf_size-1))
+static char *wbuf;
+static unsigned int wc, wp; /* write_cons, write_prod */
+
+static int __init xencons_bufsz_setup(char *str)
+{
+	unsigned int goal;
+	goal = simple_strtoul(str, NULL, 0);
+	while (wbuf_size < goal)
+		wbuf_size <<= 1;
+	return 1;
+}
+__setup("xencons_bufsz=", xencons_bufsz_setup);
+
+/* This lock protects accesses to the common transmit buffer. */
+static spinlock_t xencons_lock = SPIN_LOCK_UNLOCKED;
+
+/* Common transmit-kick routine. */
+static void __xencons_tx_flush(void);
+
+static struct tty_driver *xencons_driver;
+
+/******************** Kernel console driver ********************************/
+
+static void kcons_write(
+	struct console *c, const char *s, unsigned int count)
+{
+	int           i = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&xencons_lock, flags);
+
+	while (i < count) {
+		for (; i < count; i++) {
+			if ((wp - wc) >= (wbuf_size - 1))
+				break;
+			if ((wbuf[WBUF_MASK(wp++)] = s[i]) == '\n')
+				wbuf[WBUF_MASK(wp++)] = '\r';
+		}
+
+		__xencons_tx_flush();
+	}
+
+	spin_unlock_irqrestore(&xencons_lock, flags);
+}
+
+static void kcons_write_dom0(
+	struct console *c, const char *s, unsigned int count)
+{
+	int rc;
+
+	while ((count > 0) &&
+	       ((rc = HYPERVISOR_console_io(
+			CONSOLEIO_write, count, (char *)s)) > 0)) {
+		count -= rc;
+		s += rc;
+	}
+}
+
+static struct tty_driver *kcons_device(struct console *c, int *index)
+{
+	*index = 0;
+	return xencons_driver;
+}
+
+static struct console kcons_info = {
+	.device	= kcons_device,
+	.flags	= CON_PRINTBUFFER,
+	.index	= -1,
+};
+
+#define __RETCODE 0
+static int __init xen_console_init(void)
+{
+	if (xen_init() < 0)
+		return __RETCODE;
+
+	if (xen_start_info->flags & SIF_INITDOMAIN) {
+		if (xc_mode == XC_DEFAULT)
+			xc_mode = XC_SERIAL;
+		kcons_info.write = kcons_write_dom0;
+		if (xc_mode == XC_SERIAL)
+			kcons_info.flags |= CON_ENABLED;
+	} else {
+		if (xc_mode == XC_DEFAULT)
+			xc_mode = XC_TTY;
+		kcons_info.write = kcons_write;
+	}
+
+	switch (xc_mode) {
+	case XC_SERIAL:
+		strcpy(kcons_info.name, "ttyS");
+		if (xc_num == -1)
+			xc_num = 0;
+		break;
+
+	case XC_TTY:
+		strcpy(kcons_info.name, "tty");
+		if (xc_num == -1)
+			xc_num = 1;
+		break;
+
+	default:
+		return __RETCODE;
+	}
+
+	wbuf = alloc_bootmem(wbuf_size);
+
+	register_console(&kcons_info);
+
+	return __RETCODE;
+}
+console_initcall(xen_console_init);
+
+/*** Useful function for console debugging -- goes straight to Xen. ***/
+asmlinkage int xprintk(const char *fmt, ...)
+{
+	va_list args;
+	int printk_len;
+	static char printk_buf[1024];
+
+	/* Emit the output into the temporary buffer */
+	va_start(args, fmt);
+	printk_len = vsnprintf(printk_buf, sizeof(printk_buf), fmt, args);
+	va_end(args);
+
+	/* Send the processed output directly to Xen. */
+	kcons_write_dom0(NULL, printk_buf, printk_len);
+
+	return 0;
+}
+
+/*** Forcibly flush console data before dying. ***/
+void xencons_force_flush(void)
+{
+	int sz;
+
+	/* Emergency console is synchronous, so there's nothing to flush. */
+	if (xen_start_info->flags & SIF_INITDOMAIN)
+		return;
+
+	/* Spin until console data is flushed through to the daemon. */
+	while (wc != wp) {
+		int sent = 0;
+		if ((sz = wp - wc) == 0)
+			continue;
+		sent = xencons_ring_send(&wbuf[WBUF_MASK(wc)], sz);
+		if (sent > 0)
+			wc += sent;
+	}
+}
+
+
+/******************** User-space console driver (/dev/console) ************/
+
+#define DRV(_d)         (_d)
+#define TTY_INDEX(_tty) ((_tty)->index)
+
+static struct termios *xencons_termios[MAX_NR_CONSOLES];
+static struct termios *xencons_termios_locked[MAX_NR_CONSOLES];
+static struct tty_struct *xencons_tty;
+static int xencons_priv_irq;
+static char x_char;
+
+void xencons_rx(char *buf, unsigned len, struct pt_regs *regs)
+{
+	int           i;
+	unsigned long flags;
+
+	spin_lock_irqsave(&xencons_lock, flags);
+	if (xencons_tty == NULL)
+		goto out;
+
+	for (i = 0; i < len; i++) {
+#ifdef CONFIG_MAGIC_SYSRQ
+		if (sysrq_enabled) {
+			if (buf[i] == '\x0f') { /* ^O */
+				sysrq_requested = jiffies;
+				continue; /* don't print the sysrq key */
+			} else if (sysrq_requested) {
+				unsigned long sysrq_timeout =
+					sysrq_requested + HZ*2;
+				sysrq_requested = 0;
+				if (time_before(jiffies, sysrq_timeout)) {
+					spin_unlock_irqrestore(
+						&xencons_lock, flags);
+					handle_sysrq(
+						buf[i], regs, xencons_tty);
+					spin_lock_irqsave(
+						&xencons_lock, flags);
+					continue;
+				}
+			}
+		}
+#endif
+		tty_insert_flip_char(xencons_tty, buf[i], 0);
+	}
+	tty_flip_buffer_push(xencons_tty);
+
+ out:
+	spin_unlock_irqrestore(&xencons_lock, flags);
+}
+
+static void __xencons_tx_flush(void)
+{
+	int sent, sz, work_done = 0;
+
+	if (x_char) {
+		if (xen_start_info->flags & SIF_INITDOMAIN)
+			kcons_write_dom0(NULL, &x_char, 1);
+		else
+			while (x_char)
+				if (xencons_ring_send(&x_char, 1) == 1)
+					break;
+		x_char = 0;
+		work_done = 1;
+	}
+
+	while (wc != wp) {
+		sz = wp - wc;
+		if (sz > (wbuf_size - WBUF_MASK(wc)))
+			sz = wbuf_size - WBUF_MASK(wc);
+		if (xen_start_info->flags & SIF_INITDOMAIN) {
+			kcons_write_dom0(NULL, &wbuf[WBUF_MASK(wc)], sz);
+			wc += sz;
+		} else {
+			sent = xencons_ring_send(&wbuf[WBUF_MASK(wc)], sz);
+			if (sent == 0)
+				break;
+			wc += sent;
+		}
+		work_done = 1;
+	}
+
+	if (work_done && (xencons_tty != NULL)) {
+		wake_up_interruptible(&xencons_tty->write_wait);
+		if ((xencons_tty->flags & (1 << TTY_DO_WRITE_WAKEUP)) &&
+		    (xencons_tty->ldisc.write_wakeup != NULL))
+			(xencons_tty->ldisc.write_wakeup)(xencons_tty);
+	}
+}
+
+void xencons_tx(void)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&xencons_lock, flags);
+	__xencons_tx_flush();
+	spin_unlock_irqrestore(&xencons_lock, flags);
+}
+
+/* Privileged receive callback and transmit kicker. */
+static irqreturn_t xencons_priv_interrupt(int irq, void *dev_id,
+                                          struct pt_regs *regs)
+{
+	static char rbuf[16];
+	int         l;
+
+	while ((l = HYPERVISOR_console_io(CONSOLEIO_read, 16, rbuf)) > 0)
+		xencons_rx(rbuf, l, regs);
+
+	xencons_tx();
+
+	return IRQ_HANDLED;
+}
+
+static int xencons_write_room(struct tty_struct *tty)
+{
+	return wbuf_size - (wp - wc);
+}
+
+static int xencons_chars_in_buffer(struct tty_struct *tty)
+{
+	return wp - wc;
+}
+
+static void xencons_send_xchar(struct tty_struct *tty, char ch)
+{
+	unsigned long flags;
+
+	if (TTY_INDEX(tty) != 0)
+		return;
+
+	spin_lock_irqsave(&xencons_lock, flags);
+	x_char = ch;
+	__xencons_tx_flush();
+	spin_unlock_irqrestore(&xencons_lock, flags);
+}
+
+static void xencons_throttle(struct tty_struct *tty)
+{
+	if (TTY_INDEX(tty) != 0)
+		return;
+
+	if (I_IXOFF(tty))
+		xencons_send_xchar(tty, STOP_CHAR(tty));
+}
+
+static void xencons_unthrottle(struct tty_struct *tty)
+{
+	if (TTY_INDEX(tty) != 0)
+		return;
+
+	if (I_IXOFF(tty)) {
+		if (x_char != 0)
+			x_char = 0;
+		else
+			xencons_send_xchar(tty, START_CHAR(tty));
+	}
+}
+
+static void xencons_flush_buffer(struct tty_struct *tty)
+{
+	unsigned long flags;
+
+	if (TTY_INDEX(tty) != 0)
+		return;
+
+	spin_lock_irqsave(&xencons_lock, flags);
+	wc = wp = 0;
+	spin_unlock_irqrestore(&xencons_lock, flags);
+}
+
+static inline int __xencons_put_char(int ch)
+{
+	char _ch = (char)ch;
+	if ((wp - wc) == wbuf_size)
+		return 0;
+	wbuf[WBUF_MASK(wp++)] = _ch;
+	return 1;
+}
+
+static int xencons_write(
+	struct tty_struct *tty,
+	const unsigned char *buf,
+	int count)
+{
+	int i;
+	unsigned long flags;
+
+	if (TTY_INDEX(tty) != 0)
+		return count;
+
+	spin_lock_irqsave(&xencons_lock, flags);
+
+	for (i = 0; i < count; i++)
+		if (!__xencons_put_char(buf[i]))
+			break;
+
+	if (i != 0)
+		__xencons_tx_flush();
+
+	spin_unlock_irqrestore(&xencons_lock, flags);
+
+	return i;
+}
+
+static void xencons_put_char(struct tty_struct *tty, u_char ch)
+{
+	unsigned long flags;
+
+	if (TTY_INDEX(tty) != 0)
+		return;
+
+	spin_lock_irqsave(&xencons_lock, flags);
+	(void)__xencons_put_char(ch);
+	spin_unlock_irqrestore(&xencons_lock, flags);
+}
+
+static void xencons_flush_chars(struct tty_struct *tty)
+{
+	unsigned long flags;
+
+	if (TTY_INDEX(tty) != 0)
+		return;
+
+	spin_lock_irqsave(&xencons_lock, flags);
+	__xencons_tx_flush();
+	spin_unlock_irqrestore(&xencons_lock, flags);
+}
+
+static void xencons_wait_until_sent(struct tty_struct *tty, int timeout)
+{
+	unsigned long orig_jiffies = jiffies;
+
+	if (TTY_INDEX(tty) != 0)
+		return;
+
+	while (DRV(tty->driver)->chars_in_buffer(tty)) {
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout(1);
+		if (signal_pending(current))
+			break;
+		if (timeout && time_after(jiffies, orig_jiffies + timeout))
+			break;
+	}
+
+	set_current_state(TASK_RUNNING);
+}
+
+static int xencons_open(struct tty_struct *tty, struct file *filp)
+{
+	unsigned long flags;
+
+	if (TTY_INDEX(tty) != 0)
+		return 0;
+
+	spin_lock_irqsave(&xencons_lock, flags);
+	tty->driver_data = NULL;
+	if (xencons_tty == NULL)
+		xencons_tty = tty;
+	__xencons_tx_flush();
+	spin_unlock_irqrestore(&xencons_lock, flags);
+
+	return 0;
+}
+
+static void xencons_close(struct tty_struct *tty, struct file *filp)
+{
+	unsigned long flags;
+
+	if (TTY_INDEX(tty) != 0)
+		return;
+
+	if (tty->count == 1) {
+		tty->closing = 1;
+		tty_wait_until_sent(tty, 0);
+		if (DRV(tty->driver)->flush_buffer != NULL)
+			DRV(tty->driver)->flush_buffer(tty);
+		if (tty->ldisc.flush_buffer != NULL)
+			tty->ldisc.flush_buffer(tty);
+		tty->closing = 0;
+		spin_lock_irqsave(&xencons_lock, flags);
+		xencons_tty = NULL;
+		spin_unlock_irqrestore(&xencons_lock, flags);
+	}
+}
+
+static struct tty_operations xencons_ops = {
+	.open = xencons_open,
+	.close = xencons_close,
+	.write = xencons_write,
+	.write_room = xencons_write_room,
+	.put_char = xencons_put_char,
+	.flush_chars = xencons_flush_chars,
+	.chars_in_buffer = xencons_chars_in_buffer,
+	.send_xchar = xencons_send_xchar,
+	.flush_buffer = xencons_flush_buffer,
+	.throttle = xencons_throttle,
+	.unthrottle = xencons_unthrottle,
+	.wait_until_sent = xencons_wait_until_sent,
+};
+
+static int __init xencons_init(void)
+{
+	int rc;
+
+	if (xen_init() < 0)
+		return -ENODEV;
+
+	if (xc_mode == XC_OFF)
+		return 0;
+
+	xencons_ring_init();
+
+	xencons_driver = alloc_tty_driver((xc_mode == XC_SERIAL) ?
+					  1 : MAX_NR_CONSOLES);
+	if (xencons_driver == NULL)
+		return -ENOMEM;
+
+	DRV(xencons_driver)->name            = "xencons";
+	DRV(xencons_driver)->major           = TTY_MAJOR;
+	DRV(xencons_driver)->type            = TTY_DRIVER_TYPE_SERIAL;
+	DRV(xencons_driver)->subtype         = SERIAL_TYPE_NORMAL;
+	DRV(xencons_driver)->init_termios    = tty_std_termios;
+	DRV(xencons_driver)->flags           =
+		TTY_DRIVER_REAL_RAW |
+		TTY_DRIVER_RESET_TERMIOS |
+		TTY_DRIVER_NO_DEVFS;
+	DRV(xencons_driver)->termios         = xencons_termios;
+	DRV(xencons_driver)->termios_locked  = xencons_termios_locked;
+
+	if (xc_mode == XC_SERIAL) {
+		DRV(xencons_driver)->name        = "ttyS";
+		DRV(xencons_driver)->minor_start = 64 + xc_num;
+		DRV(xencons_driver)->name_base   = 0 + xc_num;
+	} else {
+		DRV(xencons_driver)->name        = "tty";
+		DRV(xencons_driver)->minor_start = xc_num;
+		DRV(xencons_driver)->name_base   = xc_num;
+	}
+
+	tty_set_operations(xencons_driver, &xencons_ops);
+
+	if ((rc = tty_register_driver(DRV(xencons_driver))) != 0) {
+		printk("WARNING: Failed to register Xen virtual "
+		       "console driver as '%s%d'\n",
+		       DRV(xencons_driver)->name,
+		       DRV(xencons_driver)->name_base);
+		put_tty_driver(xencons_driver);
+		xencons_driver = NULL;
+		return rc;
+	}
+
+	tty_register_device(xencons_driver, 0, NULL);
+
+	if (xen_start_info->flags & SIF_INITDOMAIN) {
+		xencons_priv_irq = bind_virq_to_irqhandler(
+			VIRQ_CONSOLE,
+			0,
+			xencons_priv_interrupt,
+			0,
+			"console",
+			NULL);
+		BUG_ON(xencons_priv_irq < 0);
+	}
+
+	printk("Xen virtual console successfully installed as %s%d\n",
+	       DRV(xencons_driver)->name,
+	       DRV(xencons_driver)->name_base );
+
+	return 0;
+}
+
+module_init(xencons_init);
--- /dev/null
+++ xen-subarch-2.6/drivers/xen/console/xencons_ring.c
@@ -0,0 +1,115 @@
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/signal.h>
+#include <linux/sched.h>
+#include <linux/interrupt.h>
+#include <linux/tty.h>
+#include <linux/tty_flip.h>
+#include <linux/serial.h>
+#include <linux/major.h>
+#include <linux/ptrace.h>
+#include <linux/ioport.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+
+#include <asm/hypervisor.h>
+#include <xen/evtchn.h>
+#include <xen/xencons.h>
+#include <linux/wait.h>
+#include <linux/interrupt.h>
+#include <linux/sched.h>
+#include <linux/err.h>
+#include <xen/interface/io/console.h>
+
+static int xencons_irq;
+
+static inline struct xencons_interface *xencons_interface(void)
+{
+	return mfn_to_virt(xen_start_info->console_mfn);
+}
+
+static inline void notify_daemon(void)
+{
+	/* Use evtchn: this is called early, before irq is set up. */
+	notify_remote_via_evtchn(xen_start_info->console_evtchn);
+}
+
+int xencons_ring_send(const char *data, unsigned len)
+{
+	int sent = 0;
+	struct xencons_interface *intf = xencons_interface();
+	XENCONS_RING_IDX cons, prod;
+
+	cons = intf->out_cons;
+	prod = intf->out_prod;
+	mb();
+	BUG_ON((prod - cons) > sizeof(intf->out));
+
+	while ((sent < len) && ((prod - cons) < sizeof(intf->out)))
+		intf->out[MASK_XENCONS_IDX(prod++, intf->out)] = data[sent++];
+
+	wmb();
+	intf->out_prod = prod;
+
+	notify_daemon();
+
+	return sent;
+}
+
+static irqreturn_t handle_input(int irq, void *unused, struct pt_regs *regs)
+{
+	struct xencons_interface *intf = xencons_interface();
+	XENCONS_RING_IDX cons, prod;
+
+	cons = intf->in_cons;
+	prod = intf->in_prod;
+	mb();
+	BUG_ON((prod - cons) > sizeof(intf->in));
+
+	while (cons != prod) {
+		xencons_rx(intf->in+MASK_XENCONS_IDX(cons,intf->in), 1, regs);
+		cons++;
+	}
+
+	mb();
+	intf->in_cons = cons;
+
+	notify_daemon();
+
+	xencons_tx();
+
+	return IRQ_HANDLED;
+}
+
+int xencons_ring_init(void)
+{
+	int err;
+
+	if (xencons_irq)
+		unbind_from_irqhandler(xencons_irq, NULL);
+	xencons_irq = 0;
+
+	if (!xen_start_info->console_evtchn)
+		return 0;
+
+	err = bind_evtchn_to_irqhandler(
+		xen_start_info->console_evtchn,
+		handle_input, 0, "xencons", NULL);
+	if (err <= 0) {
+		printk(KERN_ERR "XEN console request irq failed %i\n", err);
+		return err;
+	}
+
+	xencons_irq = err;
+
+	/* In case we have in-flight data after save/restore... */
+	notify_daemon();
+
+	return 0;
+}
+
+void xencons_resume(void)
+{
+	(void)xencons_ring_init();
+}
--- /dev/null
+++ xen-subarch-2.6/include/xen/xencons.h
@@ -0,0 +1,14 @@
+#ifndef __ASM_XENCONS_H__
+#define __ASM_XENCONS_H__
+
+void xencons_force_flush(void);
+void xencons_resume(void);
+
+/* Interrupt work hooks. Receive data, or kick data out. */
+void xencons_rx(char *buf, unsigned len, struct pt_regs *regs);
+void xencons_tx(void);
+
+int xencons_ring_init(void);
+int xencons_ring_send(const char *data, unsigned len);
+
+#endif /* __ASM_XENCONS_H__ */

--

^ permalink raw reply

* [RFC PATCH 20/35] subarch stack pointer update
From: Chris Wright @ 2006-03-22  6:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: xen-devel, virtualization, Ian Pratt, Christian Limpach
In-Reply-To: <20060322063040.960068000@sorel.sous-sol.org>

[-- Attachment #1: 19-i386-tss --]
[-- Type: text/plain, Size: 1734 bytes --]

Register the new kernel ('ring 0') stack pointer with the hypervisor
during context switch. 

Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
 include/asm-i386/mach-default/mach_processor.h |    8 ++++++++
 include/asm-i386/mach-xen/mach_processor.h     |    9 +++++++++
 include/asm-i386/processor.h                   |    3 +++
 3 files changed, 20 insertions(+)

--- xen-subarch-2.6.orig/include/asm-i386/processor.h
+++ xen-subarch-2.6/include/asm-i386/processor.h
@@ -472,6 +472,8 @@ struct thread_struct {
 	.io_bitmap_ptr = NULL,						\
 }
 
+#include <mach_processor.h>
+
 /*
  * Note that the .io_bitmap member must be extra-big. This is because
  * the CPU will access an additional byte beyond the end of the IO
@@ -494,6 +496,7 @@ static inline void load_esp0(struct tss_
 		tss->ss1 = thread->sysenter_cs;
 		wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
 	}
+	mach_load_esp0(tss, thread);
 }
 
 #define start_thread(regs, new_eip, new_esp) do {		\
--- /dev/null
+++ xen-subarch-2.6/include/asm-i386/mach-xen/mach_processor.h
@@ -0,0 +1,9 @@
+#ifndef __ASM_MACH_PROCESSOR_H
+#define __ASM_MACH_PROCESSOR_H
+
+static inline void mach_load_esp0(struct tss_struct *tss, struct thread_struct *thread)
+{
+	HYPERVISOR_stack_switch(tss->ss0, tss->esp0);
+}
+
+#endif /* __ASM_MACH_PROCESSOR_H */
--- /dev/null
+++ xen-subarch-2.6/include/asm-i386/mach-default/mach_processor.h
@@ -0,0 +1,8 @@
+#ifndef __ASM_MACH_PROCESSOR_H
+#define __ASM_MACH_PROCESSOR_H
+
+static inline void mach_load_esp0(struct tss_struct *tss, struct thread_struct *thread)
+{
+}
+
+#endif /* __ASM_MACH_PROCESSOR_H */

--

^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.