All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: special video mode numbers
@ 2007-07-19 11:53 Jan Beulich
  2007-07-20 21:33 ` Buffered IO for IO? Zulauf, John
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2007-07-19 11:53 UTC (permalink / raw)
  To: Kaushik_Barde; +Cc: xen-devel

>On the same line of thought, 
>
>Is it possible to change default mode to one of vesa modes (118 or 11b)
>instead of text mode?

I'd suggest not to do so - whaetever mode chosen, I don't think there's a
formal guarantee that it'll be available. Further, Linux doesn't do so either.

>If so, (in 3.0.4) setup-xen.c, what does video_mode 3 means? Is it the
>correct place to make this change?

Mode 3 was VGA color text 80x25 originally, but got overloaded by derived
modes (80x43, 80x50, and all the less standard modes) later, so today it
really at best means 80 colums, color, and a varying number of lines. But
I don't thinkyou want to backport all the necessary support code to 3.0.4...

>I don't see screen_info used beyond dom0_init_screen_info, is this code
>deprecated?

???

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Buffered IO for IO?
  2007-07-19 11:53 special video mode numbers Jan Beulich
@ 2007-07-20 21:33 ` Zulauf, John
  2007-07-21  9:44   ` Keir Fraser
  0 siblings, 1 reply; 11+ messages in thread
From: Zulauf, John @ 2007-07-20 21:33 UTC (permalink / raw)
  To: xen-devel


Has anyone experimented with adding Buffered IO support for "out"
instructions?  Currently, the buffered io pages is only used for mmio
writes (and then only to vga space).  It seems quite straight-forward to
add.

Two questions:

(1) Does buffering actually have a measurable performance impact in it's
current use?

(2) Has anyone experimented with adding COM port ioreq buffering?

If not I'll give it a rip and shout out what I find.

John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Buffered IO for IO?
  2007-07-20 21:33 ` Buffered IO for IO? Zulauf, John
@ 2007-07-21  9:44   ` Keir Fraser
  2007-07-21  9:50     ` Mats Petersson
  2007-07-21 10:59     ` Trolle Selander
  0 siblings, 2 replies; 11+ messages in thread
From: Keir Fraser @ 2007-07-21  9:44 UTC (permalink / raw)
  To: Zulauf, John, xen-devel




On 20/7/07 22:33, "Zulauf, John" <john.zulauf@intel.com> wrote:

> Has anyone experimented with adding Buffered IO support for "out"
> instructions?  Currently, the buffered io pages is only used for mmio
> writes (and then only to vga space).  It seems quite straight-forward to
> add.

Is it safe to buffer, and hence arbitrarily delay, any I/O port write?

 -- Keir

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Buffered IO for IO?
  2007-07-21  9:44   ` Keir Fraser
@ 2007-07-21  9:50     ` Mats Petersson
  2007-07-21 10:59     ` Trolle Selander
  1 sibling, 0 replies; 11+ messages in thread
From: Mats Petersson @ 2007-07-21  9:50 UTC (permalink / raw)
  To: Keir Fraser, Zulauf, John, xen-devel

At 10:44 21/07/2007, Keir Fraser wrote:



>On 20/7/07 22:33, "Zulauf, John" <john.zulauf@intel.com> wrote:
>
> > Has anyone experimented with adding Buffered IO support for "out"
> > instructions?  Currently, the buffered io pages is only used for mmio
> > writes (and then only to vga space).  It seems quite straight-forward to
> > add.
>
>Is it safe to buffer, and hence arbitrarily delay, any I/O port write?


That would depend on the actual port - some are OK to delay, others 
are not. E.g. OUT to the serial port FIFO would be OK to delay for a 
bit, but the next IN from the status register would require 
preceeding OUT's to be flushed (and processed) before the IN can be 
correctly assessed - as otherwise the serial port may look like it's 
got an infinite FIFO, and/or data has already been sent, which is 
likely to NOT be the case.

To be perfect, you'd need a separate set of rules for each type of 
device, but I think it can be simplified by a "OUTs must be processed 
before INs can be processed" - so a long stream of OUT instructions 
could be batched up, but as soon as an IN happens, the batched OUTs 
will need to be processed.

How much there is to gain from this would be relatively easy to asses 
by counting the number of OUT between each IN - I suspect that 
there's a few OUTs per IN, so there would be some gain to just return 
back to the guest after an OUT.

The real trouble, of course, comes if there are devices that use a 
mixture of IOIO and MMIO, where a IOIO is used to send data, and 
status is read from MMIO... This would complicate matters by adding a 
rule of "MMIO read must flush batched OUT". The only suspect device I 
can think of here is a IDE controller with DMA capabilities - I 
haven't looked at those, so I don't know if they mix IOIO and MMIO.

--
Mats


>  -- Keir
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Buffered IO for IO?
  2007-07-21  9:44   ` Keir Fraser
  2007-07-21  9:50     ` Mats Petersson
@ 2007-07-21 10:59     ` Trolle Selander
  2007-07-21 11:08       ` Keir Fraser
  1 sibling, 1 reply; 11+ messages in thread
From: Trolle Selander @ 2007-07-21 10:59 UTC (permalink / raw)
  To: Zulauf, John; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1683 bytes --]

Safety would depend on how the emulated device works. For serial ports in
particular, it's definitely not safe, since depending on the model of UART
emulated, and the settings of the UART control registers, every outb may
result in a serial interrupt and UART register changes that will have to be
processed before any further io can be done.
It's possible that there might be some performance to be gained by
"upgrading" the emulated UART to a 16550A or better, and doing buffered IO
for the FIFO. Earlier this year I was experimenting with a patch that made
the qemu-dm serial emulation into a 16550A with FIFO, but though the patch
did fix some compatability issues with software that assumed a 16550A UART
in the HVM guest I'm working with, serial performance actually got
noticeably _worse_, so I never bothered submitting it. Implementing the FIFO
with buffered IO would possibly make it work better, but I don't see how it
could be done without moving at least part of the serial device model into
the hypervisor, which just strikes me as more trouble than it's worth.

/Trolle

On 7/21/07, Keir Fraser <keir@xensource.com> wrote:
>
>
>
>
> On 20/7/07 22:33, "Zulauf, John" <john.zulauf@intel.com> wrote:
>
> > Has anyone experimented with adding Buffered IO support for "out"
> > instructions?  Currently, the buffered io pages is only used for mmio
> > writes (and then only to vga space).  It seems quite straight-forward to
> > add.
>
> Is it safe to buffer, and hence arbitrarily delay, any I/O port write?
>
> -- Keir
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

[-- Attachment #1.2: Type: text/html, Size: 2229 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Buffered IO for IO?
  2007-07-21 10:59     ` Trolle Selander
@ 2007-07-21 11:08       ` Keir Fraser
  2007-07-23 18:49         ` Zulauf, John
  0 siblings, 1 reply; 11+ messages in thread
From: Keir Fraser @ 2007-07-21 11:08 UTC (permalink / raw)
  To: Trolle Selander, Zulauf, John; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2278 bytes --]

Yes, it strikes me that this cannot be done safely without providing a set
of Œproxy device models¹ in the hypervisor that know when it is safe to
buffer and when the buffers must be flushed, according to native hardware
behaviour.

 -- Keir

On 21/7/07 11:59, "Trolle Selander" <trolle.selander@gmail.com> wrote:

> Safety would depend on how the emulated device works. For serial ports in
> particular, it's definitely not safe, since depending on the model of UART
> emulated, and the settings of the UART control registers, every outb may
> result in a serial interrupt and UART register changes that will have to be
> processed before any further io can be done.
> It's possible that there might be some performance to be gained by "upgrading"
> the emulated UART to a 16550A or better, and doing buffered IO for the FIFO.
> Earlier this year I was experimenting with a patch that made the qemu-dm
> serial emulation into a 16550A with FIFO, but though the patch did fix some
> compatability issues with software that assumed a 16550A UART in the HVM guest
> I'm working with, serial performance actually got noticeably _worse_, so I
> never bothered submitting it. Implementing the FIFO with buffered IO would
> possibly make it work better, but I don't see how it could be done without
> moving at least part of the serial device model into the hypervisor, which
> just strikes me as more trouble than it's worth.
> 
> /Trolle
> 
> On 7/21/07, Keir Fraser <keir@xensource.com> wrote:
>> 
>> 
>> 
>> On 20/7/07 22:33, "Zulauf, John" <john.zulauf@intel.com> wrote:
>> 
>>> > Has anyone experimented with adding Buffered IO support for "out"
>>> > instructions?  Currently, the buffered io pages is only used for mmio
>>> > writes (and then only to vga space).  It seems quite straight-forward to
>>> > add.
>> 
>> Is it safe to buffer, and hence arbitrarily delay, any I/O port write?
>> 
>>  -- Keir
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



[-- Attachment #1.2: Type: text/html, Size: 3242 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Buffered IO for IO?
  2007-07-21 11:08       ` Keir Fraser
@ 2007-07-23 18:49         ` Zulauf, John
       [not found]           ` <BD262A443AD428499D90AF8368C4528D8A189E@fmsmsx411.amr.corp. intel.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Zulauf, John @ 2007-07-23 18:49 UTC (permalink / raw)
  To: Keir Fraser, Trolle Selander; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4139 bytes --]

Thanks for the comments.  Frankly, I'm guessing the bulk of the time in
the COM port IO is VMEXIT time, and that saving qemu round-trip would be
a marginal effect**.

 

As for the read's flushing writes, this happens automatically as a
result of how the buffered_io page works (and assuming one sticks to
this design for IO buffering).  If dir == IOREQ_READ then attempt to
buffered the IO request will fail.  Thus, hvm_send_assist_req is
invoked.  When qemu catches the "notify" event of the READ it firsts
dispatches *all* of the buffered io requests before dispatching the
READ. Thus order is preserved and inb are synchronous from the vcpu
point of view.

 

As for controlling outbound FIFO depth, adding a per range "max_depth"
test to the "queue is full" test already in use for mmio buffering would
be straight forward.

 

The interrupt issues are more concerning.  A one byte write "window" at
3F8 doesn't seem to have this issue (c.f.)
ftp://ftp.phil.uni-sb.de/pub/staff/chris/The_Serial_Port

 

But I agree that proxy device models are not desirable when not
performance critical. Regardless, they wouldn't be supported directly
though a simple "hvm_buffered_io_intercept" call.  This would be more
suited to the approach used in hvm_mmio_intercept to do the lapic
emulation.

 

 

John

 

** For those interested, I'm looking at the performance of using Windbg
for Guest domain debug, and the time to do the serial port based
initialization of a kernel debug session. Starting a WinDBG session on a
Windows guest OS takes several minutes. Any suggestions to optimize that
process would be gladly entertained.

 

________________________________

From: Keir Fraser [mailto:keir@xensource.com] 
Sent: Saturday, July 21, 2007 4:09 AM
To: Trolle Selander; Zulauf, John
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Buffered IO for IO?

 

Yes, it strikes me that this cannot be done safely without providing a
set of 'proxy device models' in the hypervisor that know when it is safe
to buffer and when the buffers must be flushed, according to native
hardware behaviour.

 -- Keir

On 21/7/07 11:59, "Trolle Selander" <trolle.selander@gmail.com> wrote:

Safety would depend on how the emulated device works. For serial ports
in particular, it's definitely not safe, since depending on the model of
UART emulated, and the settings of the UART control registers, every
outb may result in a serial interrupt and UART register changes that
will have to be processed before any further io can be done. 
It's possible that there might be some performance to be gained by
"upgrading" the emulated UART to a 16550A or better, and doing buffered
IO for the FIFO. Earlier this year I was experimenting with a patch that
made the qemu-dm serial emulation into a 16550A with FIFO, but though
the patch did fix some compatability issues with software that assumed a
16550A UART in the HVM guest I'm working with, serial performance
actually got noticeably _worse_, so I never bothered submitting it.
Implementing the FIFO with buffered IO would possibly make it work
better, but I don't see how it could be done without moving at least
part of the serial device model into the hypervisor, which just strikes
me as more trouble than it's worth. 

/Trolle

On 7/21/07, Keir Fraser <keir@xensource.com> wrote:




On 20/7/07 22:33, "Zulauf, John" <john.zulauf@intel.com> wrote:

> Has anyone experimented with adding Buffered IO support for "out" 
> instructions?  Currently, the buffered io pages is only used for mmio
> writes (and then only to vga space).  It seems quite straight-forward
to
> add.

Is it safe to buffer, and hence arbitrarily delay, any I/O port write? 

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

 

________________________________

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

 


[-- Attachment #1.2: Type: text/html, Size: 10375 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Buffered IO for IO?
       [not found]           ` <BD262A443AD428499D90AF8368C4528D8A189E@fmsmsx411.amr.corp. intel.com>
@ 2007-07-23 19:00             ` Mats Petersson
  2007-07-23 19:38               ` Zulauf, John
  0 siblings, 1 reply; 11+ messages in thread
From: Mats Petersson @ 2007-07-23 19:00 UTC (permalink / raw)
  To: Zulauf, John, Keir Fraser, Trolle Selander; +Cc: xen-devel

At 19:49 23/07/2007, Zulauf, John wrote:
>Content-class: urn:content-classes:message
>Content-Type: multipart/alternative;
>         boundary="----_=_NextPart_001_01C7CD5A.31A74761"
>
>Thanks for the comments.  Frankly, I'm guessing the bulk of the time 
>in the COM port IO is VMEXIT time, and that saving qemu round-trip 
>would be a marginal effect**.

I guess the question of how much of the time is spent where depends 
on the setup. One thing you may want to try, is to ensure that the 
guest domain(s) and Dom0 doesn't share the same CPU(core) - by giving 
Dom0 it's own CPU(core) to run on you eliminate the possibility that 
some other guest is still using Dom0's CPU when you want QEMU to run. 
If you have MANY HVM domains, you may also want to give more than a 
single core to Dom0.

>
>As for the read's flushing writes, this happens automatically as a 
>result of how the buffered_io page works (and assuming one sticks to 
>this design for IO buffering).  If dir == IOREQ_READ then attempt to 
>buffered the IO request will fail.  Thus, hvm_send_assist_req is 
>invoked.  When qemu catches the "notify" event of the READ it firsts 
>dispatches *all* of the buffered io requests before dispatching the 
>READ. Thus order is preserved and inb are synchronous from the vcpu 
>point of view.

Yes, that's the trivial case. But what about a write to 0x3F8 (send 
data) and code that goes to sleep, waiting for an IRQ to say that the 
data has been sent? There may not be a read of any port in the serial 
port in between - thanks to Trolle for reminding me of this type of operation.

--
Mats

>
>As for controlling outbound FIFO depth, adding a per range 
>"max_depth" test to the "queue is full" test already in use for mmio 
>buffering would be straight forward.
>
>The interrupt issues are more concerning.  A one byte write "window" 
>at 3F8 doesn't seem to have this issue (c.f.) 
>ftp://ftp.phil.uni-sb.de/pub/staff/chris/The_Serial_Port
>
>But I agree that proxy device models are not desirable when not 
>performance critical. Regardless, they wouldn't be supported 
>directly though a simple "hvm_buffered_io_intercept" call.  This 
>would be more suited to the approach used in hvm_mmio_intercept to 
>do the lapic emulation.
>
>
>John
>
>** For those interested, I'm looking at the performance of using 
>Windbg for Guest domain debug, and the time to do the serial port 
>based initialization of a kernel debug session. Starting a WinDBG 
>session on a Windows guest OS takes several minutes. Any suggestions 
>to optimize that process would be gladly entertained.
>
>
>----------
>From: Keir Fraser [mailto:keir@xensource.com]
>Sent: Saturday, July 21, 2007 4:09 AM
>To: Trolle Selander; Zulauf, John
>Cc: xen-devel@lists.xensource.com
>Subject: Re: [Xen-devel] Buffered IO for IO?
>
>Yes, it strikes me that this cannot be done safely without providing 
>a set of 'proxy device models' in the hypervisor that know when it 
>is safe to buffer and when the buffers must be flushed, according to 
>native hardware behaviour.
>
>  -- Keir
>
>On 21/7/07 11:59, "Trolle Selander" <trolle.selander@gmail.com> wrote:
>Safety would depend on how the emulated device works. For serial 
>ports in particular, it's definitely not safe, since depending on 
>the model of UART emulated, and the settings of the UART control 
>registers, every outb may result in a serial interrupt and UART 
>register changes that will have to be processed before any further 
>io can be done.
>It's possible that there might be some performance to be gained by 
>"upgrading" the emulated UART to a 16550A or better, and doing 
>buffered IO for the FIFO. Earlier this year I was experimenting with 
>a patch that made the qemu-dm serial emulation into a 16550A with 
>FIFO, but though the patch did fix some compatability issues with 
>software that assumed a 16550A UART in the HVM guest I'm working 
>with, serial performance actually got noticeably _worse_, so I never 
>bothered submitting it. Implementing the FIFO with buffered IO would 
>possibly make it work better, but I don't see how it could be done 
>without moving at least part of the serial device model into the 
>hypervisor, which just strikes me as more trouble than it's worth.
>
>/Trolle
>
>On 7/21/07, Keir Fraser <keir@xensource.com> wrote:
>
>
>
>On 20/7/07 22:33, "Zulauf, John" <john.zulauf@intel.com> wrote:
>
> > Has anyone experimented with adding Buffered IO support for "out"
> > instructions?  Currently, the buffered io pages is only used for mmio
> > writes (and then only to vga space).  It seems quite straight-forward to
> > add.
>
>Is it safe to buffer, and hence arbitrarily delay, any I/O port write?
>
>  -- Keir
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
><http://lists.xensource.com/xen-devel>http://lists.xensource.com/xen-devel
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
><http://lists.xensource.com/xen-devel>http://lists.xensource.com/xen-devel
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Buffered IO for IO?
  2007-07-23 19:00             ` Mats Petersson
@ 2007-07-23 19:38               ` Zulauf, John
       [not found]                 ` <BD262A443AD428499D90AF8368C4528D8A18BE@fmsmsx411.amr.corp. intel.com>
  2007-07-24  6:31                 ` Keir Fraser
  0 siblings, 2 replies; 11+ messages in thread
From: Zulauf, John @ 2007-07-23 19:38 UTC (permalink / raw)
  To: Mats Petersson, Keir Fraser, Trolle Selander; +Cc: xen-devel

I'm running on an 8-core system with currently only two HVM domains
(with currently single VCPU each). Both top on Dom0 and xm top, don't
seem to indicate qemu-dm as the performance bottleneck.  However, I'm
not sure about roundtrip latency through the xenstore to qemu and back.

As for the interrupt handling, buffered IO is on a 100ms(?) timer in
qemu-dm, so we're not looking at a deadlock.  Buffered IO handling
appears to handle this case as well.  

However, if the comport code is in a write/sleep/intr/ tight loop, this
is going to be tragic w.r.t. performance.  (80bps!)  So it's not a clear
win, and would need *something* (a new hvm_op to control interrupt
generation on buffered io ops?) in order to not run the risk of being
vastly slower.

So, this is definitely neither obvious, easy, nor a clear win.

Thanks to all.

John

-----Original Message-----
From: mats petersson [mailto:mats.o.petersson@googlemail.com] On Behalf
Of Mats Petersson
Sent: Monday, July 23, 2007 12:01 PM
To: Zulauf, John; Keir Fraser; Trolle Selander
Cc: xen-devel@lists.xensource.com
Subject: RE: [Xen-devel] Buffered IO for IO?

At 19:49 23/07/2007, Zulauf, John wrote:
>Content-class: urn:content-classes:message
>Content-Type: multipart/alternative;
>         boundary="----_=_NextPart_001_01C7CD5A.31A74761"
>
>Thanks for the comments.  Frankly, I'm guessing the bulk of the time 
>in the COM port IO is VMEXIT time, and that saving qemu round-trip 
>would be a marginal effect**.

I guess the question of how much of the time is spent where depends 
on the setup. One thing you may want to try, is to ensure that the 
guest domain(s) and Dom0 doesn't share the same CPU(core) - by giving 
Dom0 it's own CPU(core) to run on you eliminate the possibility that 
some other guest is still using Dom0's CPU when you want QEMU to run. 
If you have MANY HVM domains, you may also want to give more than a 
single core to Dom0.

>
>As for the read's flushing writes, this happens automatically as a 
>result of how the buffered_io page works (and assuming one sticks to 
>this design for IO buffering).  If dir == IOREQ_READ then attempt to 
>buffered the IO request will fail.  Thus, hvm_send_assist_req is 
>invoked.  When qemu catches the "notify" event of the READ it firsts 
>dispatches *all* of the buffered io requests before dispatching the 
>READ. Thus order is preserved and inb are synchronous from the vcpu 
>point of view.

Yes, that's the trivial case. But what about a write to 0x3F8 (send 
data) and code that goes to sleep, waiting for an IRQ to say that the 
data has been sent? There may not be a read of any port in the serial 
port in between - thanks to Trolle for reminding me of this type of
operation.

--
Mats

>
>As for controlling outbound FIFO depth, adding a per range 
>"max_depth" test to the "queue is full" test already in use for mmio 
>buffering would be straight forward.
>
>The interrupt issues are more concerning.  A one byte write "window" 
>at 3F8 doesn't seem to have this issue (c.f.) 
>ftp://ftp.phil.uni-sb.de/pub/staff/chris/The_Serial_Port
>
>But I agree that proxy device models are not desirable when not 
>performance critical. Regardless, they wouldn't be supported 
>directly though a simple "hvm_buffered_io_intercept" call.  This 
>would be more suited to the approach used in hvm_mmio_intercept to 
>do the lapic emulation.
>
>
>John
>
>** For those interested, I'm looking at the performance of using 
>Windbg for Guest domain debug, and the time to do the serial port 
>based initialization of a kernel debug session. Starting a WinDBG 
>session on a Windows guest OS takes several minutes. Any suggestions 
>to optimize that process would be gladly entertained.
>
>
>----------
>From: Keir Fraser [mailto:keir@xensource.com]
>Sent: Saturday, July 21, 2007 4:09 AM
>To: Trolle Selander; Zulauf, John
>Cc: xen-devel@lists.xensource.com
>Subject: Re: [Xen-devel] Buffered IO for IO?
>
>Yes, it strikes me that this cannot be done safely without providing 
>a set of 'proxy device models' in the hypervisor that know when it 
>is safe to buffer and when the buffers must be flushed, according to 
>native hardware behaviour.
>
>  -- Keir
>
>On 21/7/07 11:59, "Trolle Selander" <trolle.selander@gmail.com> wrote:
>Safety would depend on how the emulated device works. For serial 
>ports in particular, it's definitely not safe, since depending on 
>the model of UART emulated, and the settings of the UART control 
>registers, every outb may result in a serial interrupt and UART 
>register changes that will have to be processed before any further 
>io can be done.
>It's possible that there might be some performance to be gained by 
>"upgrading" the emulated UART to a 16550A or better, and doing 
>buffered IO for the FIFO. Earlier this year I was experimenting with 
>a patch that made the qemu-dm serial emulation into a 16550A with 
>FIFO, but though the patch did fix some compatability issues with 
>software that assumed a 16550A UART in the HVM guest I'm working 
>with, serial performance actually got noticeably _worse_, so I never 
>bothered submitting it. Implementing the FIFO with buffered IO would 
>possibly make it work better, but I don't see how it could be done 
>without moving at least part of the serial device model into the 
>hypervisor, which just strikes me as more trouble than it's worth.
>
>/Trolle
>
>On 7/21/07, Keir Fraser <keir@xensource.com> wrote:
>
>
>
>On 20/7/07 22:33, "Zulauf, John" <john.zulauf@intel.com> wrote:
>
> > Has anyone experimented with adding Buffered IO support for "out"
> > instructions?  Currently, the buffered io pages is only used for
mmio
> > writes (and then only to vga space).  It seems quite
straight-forward to
> > add.
>
>Is it safe to buffer, and hence arbitrarily delay, any I/O port write?
>
>  -- Keir
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
><http://lists.xensource.com/xen-devel>http://lists.xensource.com/xen-de
vel
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
><http://lists.xensource.com/xen-devel>http://lists.xensource.com/xen-de
vel
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Buffered IO for IO?
       [not found]                 ` <BD262A443AD428499D90AF8368C4528D8A18BE@fmsmsx411.amr.corp. intel.com>
@ 2007-07-23 20:07                   ` Mats Petersson
  0 siblings, 0 replies; 11+ messages in thread
From: Mats Petersson @ 2007-07-23 20:07 UTC (permalink / raw)
  To: Zulauf, John, Keir Fraser, Trolle Selander; +Cc: xen-devel

At 20:38 23/07/2007, Zulauf, John wrote:
>I'm running on an 8-core system with currently only two HVM domains
>(with currently single VCPU each). Both top on Dom0 and xm top, don't
>seem to indicate qemu-dm as the performance bottleneck.  However, I'm
>not sure about roundtrip latency through the xenstore to qemu and back.


Not sure where Xenstore is involved in this, but what I'm suggesting 
with having a dedicated core for Dom0 is that there's no chance that 
the DomU is "competing" with Dom0 for CPU on that core. By locking 
DomU's off a particular core, and locking Dom0 to a particular core, 
you guarantee that there's no need to "world-switch" the Dom0 
CPU-core. A world-switch involves a lot of memory latencies, which 
may not show up "anywhere", but do still take a lot of time.


>As for the interrupt handling, buffered IO is on a 100ms(?) timer in
>qemu-dm, so we're not looking at a deadlock.  Buffered IO handling
>appears to handle this case as well.
>
>However, if the comport code is in a write/sleep/intr/ tight loop, this
>is going to be tragic w.r.t. performance.  (80bps!)  So it's not a clear
>win, and would need *something* (a new hvm_op to control interrupt
>generation on buffered io ops?) in order to not run the risk of being
>vastly slower.


That could easily lead to a "timeout" if the sender is expecting a 
reply within a set amount of time at a much higher speed, so yes, 
you'd need some sort of "enable/disable" functionality at the very least.

There are a few things I can think of that would make sense to do here:
1. Make a mock-up where IO-writes to ONLY 0x3F8 are buffered for 
(say) up to 16 writes.
2. Add some code to just count the number of reads/writes in a row to 
0x3F8..0x3FF ports[1].
3. Measure the average time (e.g. TSC) for a number of 0x3F8 writes 
and see how much time is spent in communicating from the IOIO handler 
until you get back to HVM-code.

[1] Something like this:

struct {
    int direction;
    int current_run
    int no_runs;
    int count[2];
    int max_run_length[2];
} portdata[8] = { {-1}, {-1}, {-1}, {-1}, {-1}, {-1}, {-1}, {-1} 
};    // set direction to "not valid value".

void count_io_action(int portno, int direction)
{
         if ((portno & 0xFFF8) == 0x3F8) {
                 portno &= 0x7;   // Get whch port it is.
                 if (direction == portdata[portno].direction)
                         portdata[portno].current_run ++;
                 else {
                         if (portdata[portno].direction != -1) {
                                 portdata[portno].count[portdata[portno].direction] 
+= portdata[portno].current_run;
                                 portdata[portno].no_runs ++;
                                 if 
(portdata[portno].max_run_length[portdata[portno].direction] < 
portdata[portno].current_run)
                                         portdata[portno].max_run_length[portdata[portno].direction].max_run_length 
= portdata[portno].current_run;
                         }
                         portdata[portno].current_run = 1
                         portdata[portno].direction = direction;
                 }
         }
}

With this you can get the average run length and max run length for 
the different ports. It would tell you which of the ports are most 
often accessed.

Taking a sample of "all the 0x3Fx port accesses" for a large-ish 
number of accesses could also be beneficial (I think xentrace can do 
that for you).

--
Mats


>So, this is definitely neither obvious, easy, nor a clear win.
>
>Thanks to all.
>
>John
>
>-----Original Message-----
>From: mats petersson [mailto:mats.o.petersson@googlemail.com] On Behalf
>Of Mats Petersson
>Sent: Monday, July 23, 2007 12:01 PM
>To: Zulauf, John; Keir Fraser; Trolle Selander
>Cc: xen-devel@lists.xensource.com
>Subject: RE: [Xen-devel] Buffered IO for IO?
>
>At 19:49 23/07/2007, Zulauf, John wrote:
> >Content-class: urn:content-classes:message
> >Content-Type: multipart/alternative;
> >         boundary="----_=_NextPart_001_01C7CD5A.31A74761"
> >
> >Thanks for the comments.  Frankly, I'm guessing the bulk of the time
> >in the COM port IO is VMEXIT time, and that saving qemu round-trip
> >would be a marginal effect**.
>
>I guess the question of how much of the time is spent where depends
>on the setup. One thing you may want to try, is to ensure that the
>guest domain(s) and Dom0 doesn't share the same CPU(core) - by giving
>Dom0 it's own CPU(core) to run on you eliminate the possibility that
>some other guest is still using Dom0's CPU when you want QEMU to run.
>If you have MANY HVM domains, you may also want to give more than a
>single core to Dom0.
>
> >
> >As for the read's flushing writes, this happens automatically as a
> >result of how the buffered_io page works (and assuming one sticks to
> >this design for IO buffering).  If dir == IOREQ_READ then attempt to
> >buffered the IO request will fail.  Thus, hvm_send_assist_req is
> >invoked.  When qemu catches the "notify" event of the READ it firsts
> >dispatches *all* of the buffered io requests before dispatching the
> >READ. Thus order is preserved and inb are synchronous from the vcpu
> >point of view.
>
>Yes, that's the trivial case. But what about a write to 0x3F8 (send
>data) and code that goes to sleep, waiting for an IRQ to say that the
>data has been sent? There may not be a read of any port in the serial
>port in between - thanks to Trolle for reminding me of this type of
>operation.
>
>--
>Mats
>
> >
> >As for controlling outbound FIFO depth, adding a per range
> >"max_depth" test to the "queue is full" test already in use for mmio
> >buffering would be straight forward.
> >
> >The interrupt issues are more concerning.  A one byte write "window"
> >at 3F8 doesn't seem to have this issue (c.f.)
> >ftp://ftp.phil.uni-sb.de/pub/staff/chris/The_Serial_Port
> >
> >But I agree that proxy device models are not desirable when not
> >performance critical. Regardless, they wouldn't be supported
> >directly though a simple "hvm_buffered_io_intercept" call.  This
> >would be more suited to the approach used in hvm_mmio_intercept to
> >do the lapic emulation.
> >
> >
> >John
> >
> >** For those interested, I'm looking at the performance of using
> >Windbg for Guest domain debug, and the time to do the serial port
> >based initialization of a kernel debug session. Starting a WinDBG
> >session on a Windows guest OS takes several minutes. Any suggestions
> >to optimize that process would be gladly entertained.
> >
> >
> >----------
> >From: Keir Fraser [mailto:keir@xensource.com]
> >Sent: Saturday, July 21, 2007 4:09 AM
> >To: Trolle Selander; Zulauf, John
> >Cc: xen-devel@lists.xensource.com
> >Subject: Re: [Xen-devel] Buffered IO for IO?
> >
> >Yes, it strikes me that this cannot be done safely without providing
> >a set of 'proxy device models' in the hypervisor that know when it
> >is safe to buffer and when the buffers must be flushed, according to
> >native hardware behaviour.
> >
> >  -- Keir
> >
> >On 21/7/07 11:59, "Trolle Selander" <trolle.selander@gmail.com> wrote:
> >Safety would depend on how the emulated device works. For serial
> >ports in particular, it's definitely not safe, since depending on
> >the model of UART emulated, and the settings of the UART control
> >registers, every outb may result in a serial interrupt and UART
> >register changes that will have to be processed before any further
> >io can be done.
> >It's possible that there might be some performance to be gained by
> >"upgrading" the emulated UART to a 16550A or better, and doing
> >buffered IO for the FIFO. Earlier this year I was experimenting with
> >a patch that made the qemu-dm serial emulation into a 16550A with
> >FIFO, but though the patch did fix some compatability issues with
> >software that assumed a 16550A UART in the HVM guest I'm working
> >with, serial performance actually got noticeably _worse_, so I never
> >bothered submitting it. Implementing the FIFO with buffered IO would
> >possibly make it work better, but I don't see how it could be done
> >without moving at least part of the serial device model into the
> >hypervisor, which just strikes me as more trouble than it's worth.
> >
> >/Trolle
> >
> >On 7/21/07, Keir Fraser <keir@xensource.com> wrote:
> >
> >
> >
> >On 20/7/07 22:33, "Zulauf, John" <john.zulauf@intel.com> wrote:
> >
> > > Has anyone experimented with adding Buffered IO support for "out"
> > > instructions?  Currently, the buffered io pages is only used for
>mmio
> > > writes (and then only to vga space).  It seems quite
>straight-forward to
> > > add.
> >
> >Is it safe to buffer, and hence arbitrarily delay, any I/O port write?
> >
> >  -- Keir
> >
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xensource.com
> ><http://lists.xensource.com/xen-devel>http://lists.xensource.com/xen-de
>vel
> >
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xensource.com
> ><http://lists.xensource.com/xen-devel>http://lists.xensource.com/xen-de
>vel
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xensource.com
> >http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Buffered IO for IO?
  2007-07-23 19:38               ` Zulauf, John
       [not found]                 ` <BD262A443AD428499D90AF8368C4528D8A18BE@fmsmsx411.amr.corp. intel.com>
@ 2007-07-24  6:31                 ` Keir Fraser
  1 sibling, 0 replies; 11+ messages in thread
From: Keir Fraser @ 2007-07-24  6:31 UTC (permalink / raw)
  To: Zulauf, John, Mats Petersson, Trolle Selander; +Cc: xen-devel

On 23/7/07 20:38, "Zulauf, John" <john.zulauf@intel.com> wrote:

> I'm running on an 8-core system with currently only two HVM domains
> (with currently single VCPU each). Both top on Dom0 and xm top, don't
> seem to indicate qemu-dm as the performance bottleneck.  However, I'm
> not sure about roundtrip latency through the xenstore to qemu and back.

Well, maybe you are just seeing VMEXIT time then. You might try adding some
RDTSC from when a notification is sent to qemu until when your HVM vcpu is
rewoken, and see what the latency looks like. It shouldn't be all that bad
if dom0 is on another cpu and is otherwise fairly idle.

Also, you could see how many characters are sent and received to start a
windbg session. If there are a surprisingly large number of bytes received
then it may be the receive direction is causing more trouble than the
transmit direction.

 -- Keir

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-07-24  6:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-19 11:53 special video mode numbers Jan Beulich
2007-07-20 21:33 ` Buffered IO for IO? Zulauf, John
2007-07-21  9:44   ` Keir Fraser
2007-07-21  9:50     ` Mats Petersson
2007-07-21 10:59     ` Trolle Selander
2007-07-21 11:08       ` Keir Fraser
2007-07-23 18:49         ` Zulauf, John
     [not found]           ` <BD262A443AD428499D90AF8368C4528D8A189E@fmsmsx411.amr.corp. intel.com>
2007-07-23 19:00             ` Mats Petersson
2007-07-23 19:38               ` Zulauf, John
     [not found]                 ` <BD262A443AD428499D90AF8368C4528D8A18BE@fmsmsx411.amr.corp. intel.com>
2007-07-23 20:07                   ` Mats Petersson
2007-07-24  6:31                 ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.