* [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
@ 2007-08-09 17:45 Guy Zana
2007-08-10 2:58 ` [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0) Tian, Kevin
2007-08-10 7:01 ` [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Keir Fraser
0 siblings, 2 replies; 17+ messages in thread
From: Guy Zana @ 2007-08-09 17:45 UTC (permalink / raw)
To: xen-devel; +Cc: Alex Novik
We propose the following method in order to support interdomain
interrupt sharing, where one of the domains is an HVM assigned with a
pass-through device. This method is limited in a way that we can support
sharing between just two domains: dom0 and an HVM. This method is based
on changing polarity.
Terminology
=========
Change polarity algorithm (CPA) - Algorithm when polarity inversion is
used for the EOI recognition. For details see
http://lists.xensource.com/archives/html/xen-devel/2007-05/msg01148.html
.
PLINE - Physical Line. This is the reflection of the physical line. By
changing polarity we know what is the physical line's status.
VLINE - Virtual Line. This is the HVM virtual line.
PT Device - A pass-through PCI device assigned to the HVM.
Dom0 Device - A PCI device assigned to dom0 (by default).
Interrupt Sharing - Determined by two or more PCI devices, which their's
intx lines are connected to the same IOAPIC's pin (OR wired), and
assigned to different domains.
Re-occurring interrupts - The pline is held asserted while the IOAPIC
fire interrupts continuously.
Spurious interrupts - Whitin a domain context, an interrupt that passed
the ISR chain without handling.
NOTE: A single PCI device can not be assigned to more than one domain
simultaneously.
When a single device is assigned to an HVM, using CPA, we update the
HVM's VLINE according to the PLINE state (both hold the same value) thus
providing complete reflection. It is trivial to see how more than one
device that shares the same line could be assigned to the HVM (using the
same CPA).
In general, we should consider the situation were N devices from Dom0
shares the same line with M devices from HVM. There are 3 cases
possible:
1. N=0, i.e. this line belongs to HVM devices. This case is already
solved with CPA.
2. M=0, i.e. this line belongs to Dom0 devices. This is basic dom0
functionality.
3. N != 0, M != 0. This is the situation that we want to handle now,
from now on we'll refer to this situation as interdomain shared
interrupt.
Although, our method could be extended to contain handling for all of
the above cases.
Problems related to Interdomain Interrupt Sharing
====================================
* Spurious interrupts.
* Interrupt starvation.
* When we use CPA, we are not getting re-occurring interrupts, this
should be taken into account.
* Even if a shared interrupt was handled by a domain specific ISR, it is
not guaranteed that the pline will be deasserted.
* Interrupt storming - _Physical_ storming is solved transparently by
CPA.
Goals
=====
* Letting both the HVM and DOM0 a chance to handle the interrupt
* Update the HVM's VLINE correctly when sharing an interrupt
* Avoid spurious interrupts or at least minimize the number of such
interrupts injected into HVM.
* Stay with a reasonable interrupt latency.
Proffered Method
=============
1. We gain shared line assertion state by using CPA, at an
assert/deassert event we save the line's state.
2. We perform most of the logic in a periodic timer module.
Modules
======
1. Timer module. Periodic callback that does all the logic processing.
2. XEN interrupt handler. Handler is replaced by CPA that updates PLINE.
3. Dom0 ISR chain. At the end of the chain, we know whether the
interrupt was handled or not, and update the status in Xen using a
hypercall.
States
=====
1. Idle. The PLINE is deasserted. This is "relax state". We're awaiting
the interrupt to come.
2. In Dom0. The interrupt is currently handled by Dom0. The event was
sent into Dom0 and Dom0 ISR is processing it.
3. Process Interrupt. The interrupt was handled by Dom0. Dom0 got back
to us with the results of the handling. Now we need to decide what to do
next. This state can be reached only from state [2].
State machine
===========
The timer callback implement the state machine, it freezes when we are
in the idle state.
The "events" described below are polled by the timer. We also perform
changes in dom0's ISR chain in order to generate these "events".
The following events are handled:
A. PLINE is deasserted. This event will move state machine to _Idle_
state from any state.
This can happen in one of 2 cases:
1. Initialization.
2. As a result of PLINE deassertion. If PLINE went down, it means that
we're done.
B. Idle state and PLINE is asserted. In this case the interrupt is
injected into DOM0. The state machine moves to "In Dom0". We always
firstly let domain0 try to handle the interrupt, thus logically creating
an interdomain ISR chain beginning with dom0.
C. "In Dom0" and PLINE status is asserted (We read the status from a
timer). Do nothing. We don't know what to do with this interrupt yet.
D. "Process Interrupt" and PLINE is asserted.
Few cases are possible:
1. If Dom0 successfully handled the last interrupt and the interrupt
wasn't injected into the HVM, inject the interrupt into Dom0 and move to
state "In Dom0". This is the Dom0 interrupt, keep injecting into Dom0.
2. If Dom0 successfully handled the last interrupt and the interrupt was
injected into the HVM, deassert the HVM vline, and re-inject the
interrupt into Dom0. Move to state "In Dom0".
(This is done in order to solve a case where the HVM was handling the
interrupt, but the line didn't get deasserted because a Dom0 device
asserted it before the a PT device deasserted it (as result of the HVM
handling). In this case we assume that the HVM is done with it and now
it's Dom0's turn.)
3. If Dom0 didn't successfully handle the last interrupt and the
interrupt was not injected into the HVM, inject the interrupt into the
HVM and stay in the same state. This is an HVM's interrupt. Dom0
rejected it.
4. If Dom0 didn't successfully handle the last interrupt and the
interrupt was injected into HVM, inject interrupt into Dom0 and move to
state "in Dom0". HVM is not done yet with current interrupt.
E."Process Interrupt" and the PLINE is deasserted,- deassert the HVM
interrupt(if neccesary) and move to idle. We handled the interrupt.
Prepare ourselves for the new one.
The main idea here is to inject the interrupt into Dom0 when we don't
know what to do with it. If Dom0 takes the ownership, then let it handle
the interrupt. If not, we inject it into the HVM. We recognize that all
of the PT devices are not asserting the line by PLINE deassertion or by
Dom0 taking the ownership back to it.
Any ideas and comments are welcome.
Best regards,
Alex Novik,
Neocleus.
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
2007-08-09 17:45 [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Guy Zana
@ 2007-08-10 2:58 ` Tian, Kevin
2007-08-10 10:10 ` Guy Zana
2007-08-10 7:01 ` [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Keir Fraser
1 sibling, 1 reply; 17+ messages in thread
From: Tian, Kevin @ 2007-08-10 2:58 UTC (permalink / raw)
To: Guy Zana, xen-devel; +Cc: Alex Novik
Hi, Guy,
Thanks for very good description.
Basically I think this should work, but with following concerns:
- How to choose the timeout value?
Small timeout may result more spurious injection and performance penalty, while large timeout may not satisfy driver expectation to high-speed device.
- How to cope with existing irq sharing mechanism for PV driver domain?
Existing approach between PV driver domain and dom0 is based
on some trigger point, i.e, guest EOI. Keep insertion count and track
guest response. Timeout mechanism is different, and I guess two paths
are difficult to share logic.
How about a mixed sharing case, say among dom0/PV domain/
HVM domain?
- interrupt delay within HVM may be exaggerated under some special
condition, if HVM is not ready to handle the injection at D.3 (like blocked
in I/O emulation) while later D.4 will cancel previous injection at next
timeout. Then only at next D.3 HVM gets re-injection again and it may
or may not be delayed again upon status at that time.
Did you run some heavy workload and observe any complains?
But, anyway, I think timeout is the only way to support sharing irq
case (if without MSI and we do want to allow it), though with
performance issue. :-)
Thanks,
Kevin
>From: Guy Zana
>Sent: 2007年8月10日 1:46
>
>We propose the following method in order to support interdomain
>interrupt sharing, where one of the domains is an HVM assigned with a
>pass-through device. This method is limited in a way that we can support
>sharing between just two domains: dom0 and an HVM. This method is
>based
>on changing polarity.
>
>Terminology
>=========
>Change polarity algorithm (CPA) - Algorithm when polarity inversion is
>used for the EOI recognition. For details see
>http://lists.xensource.com/archives/html/xen-devel/2007-05/msg01148.ht
>ml
>.
>PLINE - Physical Line. This is the reflection of the physical line. By
>changing polarity we know what is the physical line's status.
>VLINE - Virtual Line. This is the HVM virtual line.
>PT Device - A pass-through PCI device assigned to the HVM.
>Dom0 Device - A PCI device assigned to dom0 (by default).
>Interrupt Sharing - Determined by two or more PCI devices, which their's
>intx lines are connected to the same IOAPIC's pin (OR wired), and
>assigned to different domains.
>Re-occurring interrupts - The pline is held asserted while the IOAPIC
>fire interrupts continuously.
>Spurious interrupts - Whitin a domain context, an interrupt that passed
>the ISR chain without handling.
>
>NOTE: A single PCI device can not be assigned to more than one
>domain
>simultaneously.
>
>When a single device is assigned to an HVM, using CPA, we update the
>HVM's VLINE according to the PLINE state (both hold the same value)
>thus
>providing complete reflection. It is trivial to see how more than one
>device that shares the same line could be assigned to the HVM (using
>the
>same CPA).
>
>In general, we should consider the situation were N devices from Dom0
>shares the same line with M devices from HVM. There are 3 cases
>possible:
>
>1. N=0, i.e. this line belongs to HVM devices. This case is already
>solved with CPA.
>2. M=0, i.e. this line belongs to Dom0 devices. This is basic dom0
>functionality.
>3. N != 0, M != 0. This is the situation that we want to handle now,
>from now on we'll refer to this situation as interdomain shared
>interrupt.
>
>Although, our method could be extended to contain handling for all of
>the above cases.
>
>Problems related to Interdomain Interrupt Sharing
>====================================
>* Spurious interrupts.
>* Interrupt starvation.
>* When we use CPA, we are not getting re-occurring interrupts, this
>should be taken into account.
>* Even if a shared interrupt was handled by a domain specific ISR, it is
>not guaranteed that the pline will be deasserted.
>* Interrupt storming - _Physical_ storming is solved transparently by
>CPA.
>
>Goals
>=====
>* Letting both the HVM and DOM0 a chance to handle the interrupt
>* Update the HVM's VLINE correctly when sharing an interrupt
>* Avoid spurious interrupts or at least minimize the number of such
>interrupts injected into HVM.
>* Stay with a reasonable interrupt latency.
>
>Proffered Method
>=============
>1. We gain shared line assertion state by using CPA, at an
>assert/deassert event we save the line's state.
>2. We perform most of the logic in a periodic timer module.
>
>Modules
>======
>1. Timer module. Periodic callback that does all the logic processing.
>2. XEN interrupt handler. Handler is replaced by CPA that updates
>PLINE.
>3. Dom0 ISR chain. At the end of the chain, we know whether the
>interrupt was handled or not, and update the status in Xen using a
>hypercall.
>
>States
>=====
>1. Idle. The PLINE is deasserted. This is "relax state". We're awaiting
>the interrupt to come.
>2. In Dom0. The interrupt is currently handled by Dom0. The event was
>sent into Dom0 and Dom0 ISR is processing it.
>3. Process Interrupt. The interrupt was handled by Dom0. Dom0 got
>back
>to us with the results of the handling. Now we need to decide what to do
>next. This state can be reached only from state [2].
>
>State machine
>===========
>The timer callback implement the state machine, it freezes when we are
>in the idle state.
>The "events" described below are polled by the timer. We also perform
>changes in dom0's ISR chain in order to generate these "events".
>
>The following events are handled:
>
>A. PLINE is deasserted. This event will move state machine to _Idle_
>state from any state.
>This can happen in one of 2 cases:
>1. Initialization.
>2. As a result of PLINE deassertion. If PLINE went down, it means that
>we're done.
>
>B. Idle state and PLINE is asserted. In this case the interrupt is
>injected into DOM0. The state machine moves to "In Dom0". We always
>firstly let domain0 try to handle the interrupt, thus logically creating
>an interdomain ISR chain beginning with dom0.
>
>C. "In Dom0" and PLINE status is asserted (We read the status from a
>timer). Do nothing. We don't know what to do with this interrupt yet.
>
>D. "Process Interrupt" and PLINE is asserted.
>Few cases are possible:
>1. If Dom0 successfully handled the last interrupt and the interrupt
>wasn't injected into the HVM, inject the interrupt into Dom0 and move to
>state "In Dom0". This is the Dom0 interrupt, keep injecting into Dom0.
>2. If Dom0 successfully handled the last interrupt and the interrupt was
>injected into the HVM, deassert the HVM vline, and re-inject the
>interrupt into Dom0. Move to state "In Dom0".
>(This is done in order to solve a case where the HVM was handling the
>interrupt, but the line didn't get deasserted because a Dom0 device
>asserted it before the a PT device deasserted it (as result of the HVM
>handling). In this case we assume that the HVM is done with it and now
>it's Dom0's turn.)
>3. If Dom0 didn't successfully handle the last interrupt and the
>interrupt was not injected into the HVM, inject the interrupt into the
>HVM and stay in the same state. This is an HVM's interrupt. Dom0
>rejected it.
>4. If Dom0 didn't successfully handle the last interrupt and the
>interrupt was injected into HVM, inject interrupt into Dom0 and move to
>state "in Dom0". HVM is not done yet with current interrupt.
>
>E."Process Interrupt" and the PLINE is deasserted,- deassert the HVM
>interrupt(if neccesary) and move to idle. We handled the interrupt.
>Prepare ourselves for the new one.
>
>The main idea here is to inject the interrupt into Dom0 when we don't
>know what to do with it. If Dom0 takes the ownership, then let it handle
>the interrupt. If not, we inject it into the HVM. We recognize that all
>of the PT devices are not asserting the line by PLINE deassertion or by
>Dom0 taking the ownership back to it.
>
>Any ideas and comments are welcome.
>
>Best regards,
>Alex Novik,
>Neocleus.
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
2007-08-09 17:45 [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Guy Zana
2007-08-10 2:58 ` [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0) Tian, Kevin
@ 2007-08-10 7:01 ` Keir Fraser
2007-08-10 7:04 ` Keir Fraser
1 sibling, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2007-08-10 7:01 UTC (permalink / raw)
To: Guy Zana, xen-devel; +Cc: Alex Novik
On 9/8/07 18:45, "Guy Zana" <guy@neocleus.com> wrote:
> The main idea here is to inject the interrupt into Dom0 when we don't
> know what to do with it. If Dom0 takes the ownership, then let it handle
> the interrupt. If not, we inject it into the HVM. We recognize that all
> of the PT devices are not asserting the line by PLINE deassertion or by
> Dom0 taking the ownership back to it.
This needs dom0 kernel changes and does not solve the general sharing
problem (among multiple HVM domains, or among HVM domains and PV domains
other than dom0). Could you somehow track which guest is most likely to
handle the interrupt, deliver to it first, and then detect the immediate
re-interrupt if it EOIs without handling? Plus have a timeout if it does not
EOI in reasonable time?
-- Keir
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
2007-08-10 7:01 ` [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Keir Fraser
@ 2007-08-10 7:04 ` Keir Fraser
2007-08-10 7:15 ` [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0) Tian, Kevin
2007-08-10 10:22 ` [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Guy Zana
0 siblings, 2 replies; 17+ messages in thread
From: Keir Fraser @ 2007-08-10 7:04 UTC (permalink / raw)
To: Guy Zana, xen-devel; +Cc: Alex Novik
On 10/8/07 08:01, "Keir Fraser" <keir@xensource.com> wrote:
> On 9/8/07 18:45, "Guy Zana" <guy@neocleus.com> wrote:
>
>> The main idea here is to inject the interrupt into Dom0 when we don't
>> know what to do with it. If Dom0 takes the ownership, then let it handle
>> the interrupt. If not, we inject it into the HVM. We recognize that all
>> of the PT devices are not asserting the line by PLINE deassertion or by
>> Dom0 taking the ownership back to it.
>
> This needs dom0 kernel changes and does not solve the general sharing
> problem (among multiple HVM domains, or among HVM domains and PV domains
> other than dom0). Could you somehow track which guest is most likely to
> handle the interrupt, deliver to it first, and then detect the immediate
> re-interrupt if it EOIs without handling? Plus have a timeout if it does not
> EOI in reasonable time?
My thought here is a simple priority list with move-to-back of the frontmost
domain when we deliver him the interrupt but he does not deassert the line
either in reasonable time or by the time he EOIs the interrupt. This is
simple generic logic needing no PV guest changes.
-- Keir
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
2007-08-10 7:04 ` Keir Fraser
@ 2007-08-10 7:15 ` Tian, Kevin
2007-08-10 7:37 ` Keir Fraser
2007-08-10 10:22 ` [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Guy Zana
1 sibling, 1 reply; 17+ messages in thread
From: Tian, Kevin @ 2007-08-10 7:15 UTC (permalink / raw)
To: Keir Fraser, Guy Zana, xen-devel; +Cc: Alex Novik
>From: Keir raser
>Sent: 2007年8月10日 15:05
>
>On 10/8/07 08:01, "Keir Fraser" <keir@xensource.com> wrote:
>
>> On 9/8/07 18:45, "Guy Zana" <guy@neocleus.com> wrote:
>>
>>> The main idea here is to inject the interrupt into Dom0 when we don't
>>> know what to do with it. If Dom0 takes the ownership, then let it
>handle
>>> the interrupt. If not, we inject it into the HVM. We recognize that all
>>> of the PT devices are not asserting the line by PLINE deassertion or
>by
>>> Dom0 taking the ownership back to it.
>>
>> This needs dom0 kernel changes and does not solve the general
>sharing
>> problem (among multiple HVM domains, or among HVM domains and
>PV domains
>> other than dom0). Could you somehow track which guest is most likely
>to
>> handle the interrupt, deliver to it first, and then detect the immediate
>> re-interrupt if it EOIs without handling? Plus have a timeout if it does
>not
>> EOI in reasonable time?
>
>My thought here is a simple priority list with move-to-back of the
>frontmost
>domain when we deliver him the interrupt but he does not deassert the
>line
>either in reasonable time or by the time he EOIs the interrupt. This is
>simple generic logic needing no PV guest changes.
>
> -- Keir
>
How is the priority defined?
What's reasonable time for different device requirement?
PV irq sharing takes response from all shared side, and Guy's RFC
only takes dom0's response. Now your suggestion is much simpler
toward timeout only, but what do you expect the final performance
to be?
Thanks,
Kevin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
2007-08-10 7:15 ` [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0) Tian, Kevin
@ 2007-08-10 7:37 ` Keir Fraser
2007-08-10 8:02 ` Tian, Kevin
0 siblings, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2007-08-10 7:37 UTC (permalink / raw)
To: Tian, Kevin, Guy Zana, xen-devel; +Cc: Alex Novik
On 10/8/07 08:15, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> My thought here is a simple priority list with move-to-back of the
>> frontmost
>> domain when we deliver him the interrupt but he does not deassert the
>> line
>> either in reasonable time or by the time he EOIs the interrupt. This is
>> simple generic logic needing no PV guest changes.
>>
>> -- Keir
>
> How is the priority defined?
It is defined dynamically by the move-to-back policy of the priority list.
> What's reasonable time for different device requirement?
For the timeout? Actually I'm not sure how important having a timeout
actually is -- unless in the worst case it can reset the PCI device and
ensure the line is quiesced in that way. Otherwise a non-responsive guest is
unlikely to deassert its device and hence you cannot timeout and re-enable
the interrupt line anyway. I consider this to be a secondary issue in
implementing shared interrupts, and can reasonably be left until later.
> PV irq sharing takes response from all shared side, and Guy's RFC
> only takes dom0's response. Now your suggestion is much simpler
> toward timeout only, but what do you expect the final performance
> to be?
The timeout isn't part of this method's normal operation. The usual case
will be that we deliver to just one guest -- at the front of our priority
list -- and it was the correct single guest to deliver the interrupt to. In
which case the list does not change, and if using the polarity-change method
from Neocleus we would take the usual two interrupts per device assertion
(one on +ve edge, one on -ve edge), or just one interrupt if we use the
existing Xen late-EOI method or Intel's dummy-EOI method.
We take potentially two interrupts if the highest-prio domain is not the
service domain for this particular interrupt. In this case we move domain to
back of list and continue to deliver until the line is deasserted. Neocleus
polarity-change method works really nicely here because we take no second
interrupt until the physical INTx line is actually deasserted (and hence the
interrupt is serviced,a nd our delivery algorithm hence terminates). Using
Xen/Intel methods of EOI'ing we have to somehow detect the immediate
re-interrupt on EOI (which will happen because the physical INTx line is
still asserted)
Worst case is where multiple devices are issuing interrupts simultaneously,
of course. In this case we do truely *need* to issue the interrupt to
multiple guests. This will work, but be a bit slow. I think this is true of
the Neocleus algorithm too, however.
In conclusion, my algorithm works well when I run through it in my head. :-)
-- Keir
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
2007-08-10 7:37 ` Keir Fraser
@ 2007-08-10 8:02 ` Tian, Kevin
2007-08-10 8:16 ` Keir Fraser
0 siblings, 1 reply; 17+ messages in thread
From: Tian, Kevin @ 2007-08-10 8:02 UTC (permalink / raw)
To: Keir Fraser, Guy Zana, xen-devel; +Cc: Alex Novik
>From: Keir Fraser [mailto:keir@xensource.com]
>Sent: 2007年8月10日 15:37
>> How is the priority defined?
>
>It is defined dynamically by the move-to-back policy of the priority list.
Considering the sharing between high-speed device and low-speed
device, simple move-to-back policy (once EOI) is not most efficient.
At least we can take interrupt frequency as one factor of priority too.
>
>> What's reasonable time for different device requirement?
>
>For the timeout? Actually I'm not sure how important having a timeout
>actually is -- unless in the worst case it can reset the PCI device and
>ensure the line is quiesced in that way. Otherwise a non-responsive
>guest is
>unlikely to deassert its device and hence you cannot timeout and
>re-enable
>the interrupt line anyway. I consider this to be a secondary issue in
>implementing shared interrupts, and can reasonably be left until later.
>
Seems you are talking about a bogus case where guest is not willing
to handle the injection (like driver unload) but leaves device in assertion
state. Yes, for such bogus condition happen, there's nothing to do
except disabling the physical RTE.
While my question is about the efficiency of timeout under different
condition. Say the top of the list is HVM domain at the time, and
HVM domain has vRTE masked (driver unload, or previous injection is
in handle), in this case we may not want to inject now and wait same
'reasonable time' for non-response and instead move-to-back can
make effect immediately.
>> PV irq sharing takes response from all shared side, and Guy's RFC
>> only takes dom0's response. Now your suggestion is much simpler
>> toward timeout only, but what do you expect the final performance
>> to be?
>
>The timeout isn't part of this method's normal operation. The usual case
>will be that we deliver to just one guest -- at the front of our priority
>list -- and it was the correct single guest to deliver the interrupt to. In
This is hard to tell, since no clue to check whether it's right one due
to randomness of interrupt occurrence.
>
>Worst case is where multiple devices are issuing interrupts
>simultaneously,
>of course. In this case we do truely *need* to issue the interrupt to
>multiple guests. This will work, but be a bit slow. I think this is true of
>the Neocleus algorithm too, however.
>
>In conclusion, my algorithm works well when I run through it in my
>head. :-)
>
Definitely, this is a workable approach and can be applied to both
solutions. My concern is just how it behaves considering performance. :-)
Thanks,
Kevin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
2007-08-10 8:02 ` Tian, Kevin
@ 2007-08-10 8:16 ` Keir Fraser
2007-08-10 8:41 ` Tian, Kevin
0 siblings, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2007-08-10 8:16 UTC (permalink / raw)
To: Tian, Kevin, Guy Zana, xen-devel; +Cc: Alex Novik
On 10/8/07 09:02, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> Considering the sharing between high-speed device and low-speed
> device, simple move-to-back policy (once EOI) is not most efficient.
> At least we can take interrupt frequency as one factor of priority too.
My assumption would be that any given interrupt is due to only one device,
and that in this case it is always most probable that the interrupting
device is the high-speed one. Whenever a low-speed device interrupt occurs
that will slow things down because we will deliver to the high-speed driver
first, wait for unmask/EOI, then see the line is not deasserted, then move
high-speed device to back, and re-deliver to low-speed device. Plus, on the
next interrupt you will deliver to the low-speed device first even though it
is most likely a high-speed device interrupt. Clearly we could be smarter
here (only move-to-back after N failures, for example). I'm not convinced
the extra complexity is worth it though - I think this kind of scenario is
rare enough. I'd like to see a simple sharing method measured and found
wanting before adding extra heuristics.
> While my question is about the efficiency of timeout under different
> condition. Say the top of the list is HVM domain at the time, and
> HVM domain has vRTE masked (driver unload, or previous injection is
> in handle), in this case we may not want to inject now and wait same
> 'reasonable time' for non-response and instead move-to-back can
> make effect immediately.
Okay, yes, the driver-unloaded case at least needs to be handled. But it
seems to me that the timeout here could be in the hundreds of milliseconds,
minimum. It should be an extremely occasional event that the timeout is
needed.
>> The timeout isn't part of this method's normal operation. The usual case
>> will be that we deliver to just one guest -- at the front of our priority
>> list -- and it was the correct single guest to deliver the interrupt to. In
>
> This is hard to tell, since no clue to check whether it's right one due
> to randomness of interrupt occurrence.
Well yes. My interest here is in working well for one active device at a
time (ie. Other devices are basically quiescent). Or, if there are multiple
devices active at a time, only one is delivering a really significant number
of interrupts. If you have multiple high-speed devices and want maximum
performance, I think people know to avoid shared interrupts for those
devices if possible, by shuffling PCI cards and so on.
-- Keir
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
2007-08-10 8:16 ` Keir Fraser
@ 2007-08-10 8:41 ` Tian, Kevin
2007-08-10 8:52 ` Keir Fraser
0 siblings, 1 reply; 17+ messages in thread
From: Tian, Kevin @ 2007-08-10 8:41 UTC (permalink / raw)
To: Keir Fraser, Guy Zana, xen-devel; +Cc: Alex Novik
>From: Keir Fraser [mailto:keir@xensource.com]
>Sent: 2007年8月10日 16:16
>
>On 10/8/07 09:02, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>rare enough. I'd like to see a simple sharing method measured and
>found
>wanting before adding extra heuristics.
Sure, and let's start from simple first. Just remind for drivers with
timeout check on expected interrupt delivery, that slow condition may
exaggerate the complain opportunity though it's also not solved
when not sharing.
>
>> While my question is about the efficiency of timeout under different
>> condition. Say the top of the list is HVM domain at the time, and
>> HVM domain has vRTE masked (driver unload, or previous injection is
>> in handle), in this case we may not want to inject now and wait same
>> 'reasonable time' for non-response and instead move-to-back can
>> make effect immediately.
>
>Okay, yes, the driver-unloaded case at least needs to be handled. But it
>seems to me that the timeout here could be in the hundreds of
>milliseconds,
>minimum. It should be an extremely occasional event that the timeout is
>needed.
I can agree with 'occasional' but not 'extremely occasional'. :-) HVM, if
in head of the list, may be in block state waiting Qemu to respond, while
at same time Qemu may wait for driver (like disk r/w) and driver may
wait for interrupt. In such condition, 1st injection into HVM will cause
timeout anyway and only next injection can get handled after dom0 gets
its interrupt. Just think that such inter-domain-dependency may make
the case worse...
>
>>> The timeout isn't part of this method's normal operation. The usual
>case
>>> will be that we deliver to just one guest -- at the front of our priority
>>> list -- and it was the correct single guest to deliver the interrupt to. In
>>
>> This is hard to tell, since no clue to check whether it's right one due
>> to randomness of interrupt occurrence.
>
>Well yes. My interest here is in working well for one active device at a
>time (ie. Other devices are basically quiescent). Or, if there are multiple
>devices active at a time, only one is delivering a really significant number
>of interrupts. If you have multiple high-speed devices and want
>maximum
>performance, I think people know to avoid shared interrupts for those
>devices if possible, by shuffling PCI cards and so on.
>
If we are clear to keep such assumption, then simplest is the best after
warning to user. :-)
Thanks,
Kevin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
2007-08-10 8:41 ` Tian, Kevin
@ 2007-08-10 8:52 ` Keir Fraser
0 siblings, 0 replies; 17+ messages in thread
From: Keir Fraser @ 2007-08-10 8:52 UTC (permalink / raw)
To: Tian, Kevin, Guy Zana, xen-devel; +Cc: Alex Novik
On 10/8/07 09:41, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> Okay, yes, the driver-unloaded case at least needs to be handled. But it
>> seems to me that the timeout here could be in the hundreds of
>> milliseconds,
>> minimum. It should be an extremely occasional event that the timeout is
>> needed.
>
> I can agree with 'occasional' but not 'extremely occasional'. :-) HVM, if
> in head of the list, may be in block state waiting Qemu to respond, while
> at same time Qemu may wait for driver (like disk r/w) and driver may
> wait for interrupt. In such condition, 1st injection into HVM will cause
> timeout anyway and only next injection can get handled after dom0 gets
> its interrupt. Just think that such inter-domain-dependency may make
> the case worse...
Oh, I see. That's another separate case to deal with. We'd attempt delivery
to HVM, time out to dom0, then we would see interrupt is still asserted
and... I guess we'd re-set the timeout on the HVM guest a few times, perhaps
with some backoff. This case is a bit of a pain. :-(
-- Keir
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0)
2007-08-10 2:58 ` [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0) Tian, Kevin
@ 2007-08-10 10:10 ` Guy Zana
0 siblings, 0 replies; 17+ messages in thread
From: Guy Zana @ 2007-08-10 10:10 UTC (permalink / raw)
To: Tian, Kevin, xen-devel; +Cc: Alex Novik
Thanks Kevin for all of your comments, I agree with them all.
First, most the work here was done by Alex Novik, not me :)
More comments below...
Thanks,
Guy.
> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Friday, August 10, 2007 5:59 AM
> To: Guy Zana; xen-devel@lists.xensource.com
> Cc: Alex Novik
> Subject: RE: [Xen-devel] [RFC] Pass-through Interdomain
> Interrupts Sharing(HVM/Dom0)
>
> Hi, Guy,
> Thanks for very good description.
>
> Basically I think this should work, but with following concerns:
>
> - How to choose the timeout value?
> Small timeout may result more spurious injection and
> performance penalty, while large timeout may not satisfy
> driver expectation to high-speed device.
>
That's a good point. The Spurious vs Starving is exactly opposite between the HVM and dom0. For an HVM that holds a vline, when you have a large timeout value it'll result in more spurious interrupts since you hold the line asserted.
The timeout value could be adaptive, increased (made slower) anytime it fires and it decides to do nothing and decreased anytime it take decisions. This may complicate things even further.
Does the IOAPIC has a timeout value to fire an interrupt when the line is held asserted? Is using that value feasible?
Freezing the timer is logically the same as masking the IOAPIC.
> - How to cope with existing irq sharing mechanism for PV
> driver domain?
> Existing approach between PV driver domain and dom0 is
> based on some trigger point, i.e, guest EOI. Keep insertion
> count and track guest response. Timeout mechanism is
> different, and I guess two paths are difficult to share logic.
>
> How about a mixed sharing case, say among dom0/PV
> domain/ HVM domain?
Sharing is problematic between multiple domains, at least when you have an HVM involved. I guess that it is infrequently that you'll want to assign more than two devices sharing the same line to different domains other than dom0, I look at the M devices left to dom0 more as a nuisance.
Didn't give a lot of thought to that but you can probably allow PV domains in the shared interdomain ISR chain proposed. Injecting the interrupt to all of the PV domains & dom0 (simultaneously) and ORed their handling status result. Take actions based on that value. Sharing a line between 2 or more HVMs is much more difficult to solve.
>
> - interrupt delay within HVM may be exaggerated under some
> special condition, if HVM is not ready to handle the
> injection at D.3 (like blocked in I/O emulation) while later
> D.4 will cancel previous injection at next timeout. Then only
> at next D.3 HVM gets re-injection again and it may or may not
> be delayed again upon status at that time.
I'm not sure I understood -
In a D3 -> D4 -> D3 event cycle the HVM's vline is staying asserted. Dom0 always gets a chance to check out if the interrupt is his, but the vline stays asserted until dom0 handled it or until the pline is deasserted.
The HVM will be ready when it will unmask the IOAPIC's pin, and it's VCPU will be executing.
It doesn't matter if you choose to assert or deassert its vline. In the meantime the timer will fire and that will create spurious interrupts in dom0 eventually. But an assumption we took is that we can't avoid spurious interrupts and we rather get them in dom0.
>
> Did you run some heavy workload and observe any complains?
We didn’t implement it yet :-)
Thanks for the great comments!
Guy.
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
2007-08-10 7:04 ` Keir Fraser
2007-08-10 7:15 ` [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0) Tian, Kevin
@ 2007-08-10 10:22 ` Guy Zana
2007-08-10 11:21 ` Keir Fraser
1 sibling, 1 reply; 17+ messages in thread
From: Guy Zana @ 2007-08-10 10:22 UTC (permalink / raw)
To: Keir Fraser, xen-devel; +Cc: Alex Novik
> -----Original Message-----
> From: Keir Fraser [mailto:keir@xensource.com]
> Sent: Friday, August 10, 2007 10:05 AM
> To: Guy Zana; xen-devel@lists.xensource.com
> Cc: Alex Novik
> Subject: Re: [Xen-devel] [RFC] Pass-through Interdomain
> Interrupts Sharing (HVM/Dom0)
>
> My thought here is a simple priority list with move-to-back
> of the frontmost domain when we deliver him the interrupt but
> he does not deassert the line either in reasonable time or by
> the time he EOIs the interrupt. This is simple generic logic
> needing no PV guest changes.
>
Even if the HVM handled the interrupt successfully, it doesn't mean that the pline will be deasserted (if another device assigned to another domain asserted it while the HVM processed the interrupt).You can't tell whether the HVM handled the interrupt successfully or not. How this method overcome this?
Btw, with the method we proposed you could add PV domains to the interdomain ISR chain, but it may not contain more than one HVM.
Thanks,
Guy.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
2007-08-10 10:22 ` [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Guy Zana
@ 2007-08-10 11:21 ` Keir Fraser
2007-08-10 11:50 ` Guy Zana
0 siblings, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2007-08-10 11:21 UTC (permalink / raw)
To: Guy Zana, xen-devel; +Cc: Alex Novik
On 10/8/07 11:22, "Guy Zana" <guy@neocleus.com> wrote:
>> My thought here is a simple priority list with move-to-back
>> of the frontmost domain when we deliver him the interrupt but
>> he does not deassert the line either in reasonable time or by
>> the time he EOIs the interrupt. This is simple generic logic
>> needing no PV guest changes.
>
> Even if the HVM handled the interrupt successfully, it doesn't mean that the
> pline will be deasserted (if another device assigned to another domain
> asserted it while the HVM processed the interrupt).You can't tell whether the
> HVM handled the interrupt successfully or not. How this method overcome this?
It would cycle through the priority list, moving frontmost to back at each
stage, until the line is deasserted.
> Btw, with the method we proposed you could add PV domains to the interdomain
> ISR chain, but it may not contain more than one HVM.
Well, that kind of sucks doesn't it. And yet your method is significantly
more complicated than my approach, at least as described in your email.
Simple and more general wins the day, unless your approach handles more
cases or has better performance?
-- Keir
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
2007-08-10 11:21 ` Keir Fraser
@ 2007-08-10 11:50 ` Guy Zana
2007-08-10 13:18 ` Keir Fraser
0 siblings, 1 reply; 17+ messages in thread
From: Guy Zana @ 2007-08-10 11:50 UTC (permalink / raw)
To: Keir Fraser, xen-devel; +Cc: Alex Novik
> -----Original Message-----
> From: Keir Fraser [mailto:keir@xensource.com]
> Sent: Friday, August 10, 2007 2:22 PM
> To: Guy Zana; xen-devel@lists.xensource.com
> Cc: Alex Novik
> Subject: Re: [Xen-devel] [RFC] Pass-through Interdomain
> Interrupts Sharing (HVM/Dom0)
>
> On 10/8/07 11:22, "Guy Zana" <guy@neocleus.com> wrote:
>
> >> My thought here is a simple priority list with move-to-back of the
> >> frontmost domain when we deliver him the interrupt but he does not
> >> deassert the line either in reasonable time or by the time he EOIs
> >> the interrupt. This is simple generic logic needing no PV guest
> >> changes.
> >
> > Even if the HVM handled the interrupt successfully, it doesn't mean
> > that the pline will be deasserted (if another device assigned to
> > another domain asserted it while the HVM processed the
> interrupt).You
> > can't tell whether the HVM handled the interrupt
> successfully or not. How this method overcome this?
>
> It would cycle through the priority list, moving frontmost to
> back at each stage, until the line is deasserted.
1. When will you deassert the HVM vline?
2. How do you avoid HVM spurious interrupts?
Will you raise the line again?
It is still getting complicated, and doesn't handle all cases.
>
> > Btw, with the method we proposed you could add PV domains to the
> > interdomain ISR chain, but it may not contain more than one HVM.
>
> Well, that kind of sucks doesn't it. And yet your method is
> significantly more complicated than my approach, at least as
> described in your email.
> Simple and more general wins the day, unless your approach
> handles more cases or has better performance?
>
I'm really here to find the best method.
In your method, you just don't avoid HVM spurious interrupts, I think this is a _must_.
The priority list is a good addition, for PV guests. If you want to avoid spurious interrupts in the HVM, the HVM must be the last in the list, which is what we did, but started simple (with dom0 and a single hvm).
If you'll tell me that HVM spurious interrupts is not that important I'll agree to go with your method.
> -- Keir
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
2007-08-10 11:50 ` Guy Zana
@ 2007-08-10 13:18 ` Keir Fraser
2007-08-10 15:51 ` Guy Zana
0 siblings, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2007-08-10 13:18 UTC (permalink / raw)
To: Guy Zana, xen-devel; +Cc: Alex Novik
On 10/8/07 12:50, "Guy Zana" <guy@neocleus.com> wrote:
>> It would cycle through the priority list, moving frontmost to
>> back at each stage, until the line is deasserted.
>
> 1. When will you deassert the HVM vline?
I would turn vline assertions into pulses: the line would be asserted only
instantaneously, to get latched by the VPIC/VIOAPIC. Actually I think this
question is quite separate from whatever method we use for interrupt
sharing: when would you deassert the vline when the interrupt is *not*
shared? Whatever method we choose should be extendable to the shared case,
and applied to whichever HVM guest we are currently choosing to deliver the
interrupt to. So, whether the interrupt is shared or not, I see no value in
modelling the state of the level-triggered vline.
> 2. How do you avoid HVM spurious interrupts?
I avoid most of them by the fact that a HVM guest that is not handling
interrupts will get pushed down the priority list. Of course this won't get
rid of all spurious interrupts, but I'd expect it to get rid of enough
(e.g., at least 50% even in some worst cases I can think of). So the
question is: how sensitive is Windows to spurious interrupts? I know that
Linux needs something like 99% of interrupts to be spurious for it to
generate a warning. If Windows is similar then my approach would work just
fine.
-- Keir
> Will you raise the line again?
> It is still getting complicated, and doesn't handle all cases.
>
>>
>>> Btw, with the method we proposed you could add PV domains to the
>>> interdomain ISR chain, but it may not contain more than one HVM.
>>
>> Well, that kind of sucks doesn't it. And yet your method is
>> significantly more complicated than my approach, at least as
>> described in your email.
>> Simple and more general wins the day, unless your approach
>> handles more cases or has better performance?
>>
>
> I'm really here to find the best method.
>
> In your method, you just don't avoid HVM spurious interrupts, I think this is
> a _must_.
> The priority list is a good addition, for PV guests. If you want to avoid
> spurious interrupts in the HVM, the HVM must be the last in the list, which is
> what we did, but started simple (with dom0 and a single hvm).
>
> If you'll tell me that HVM spurious interrupts is not that important I'll
> agree to go with your method.
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
2007-08-10 13:18 ` Keir Fraser
@ 2007-08-10 15:51 ` Guy Zana
2007-08-10 16:00 ` Keir Fraser
0 siblings, 1 reply; 17+ messages in thread
From: Guy Zana @ 2007-08-10 15:51 UTC (permalink / raw)
To: Keir Fraser, xen-devel; +Cc: Alex Novik
> -----Original Message-----
> From: Keir Fraser [mailto:keir@xensource.com]
> Sent: Friday, August 10, 2007 4:18 PM
> To: Guy Zana; xen-devel@lists.xensource.com
> Cc: Alex Novik
> Subject: Re: [Xen-devel] [RFC] Pass-through Interdomain
> Interrupts Sharing (HVM/Dom0)
>
> On 10/8/07 12:50, "Guy Zana" <guy@neocleus.com> wrote:
>
> >> It would cycle through the priority list, moving frontmost
> to back at
> >> each stage, until the line is deasserted.
> >
> > 1. When will you deassert the HVM vline?
>
> I would turn vline assertions into pulses: the line would be
> asserted only instantaneously, to get latched by the
> VPIC/VIOAPIC. Actually I think this question is quite
> separate from whatever method we use for interrupt
> sharing: when would you deassert the vline when the interrupt
> is *not* shared? Whatever method we choose should be
> extendable to the shared case, and applied to whichever HVM
> guest we are currently choosing to deliver the interrupt to.
> So, whether the interrupt is shared or not, I see no value in
> modelling the state of the level-triggered vline.
Sounds good actually :-)
>
> > 2. How do you avoid HVM spurious interrupts?
>
> I avoid most of them by the fact that a HVM guest that is not
> handling interrupts will get pushed down the priority list.
> Of course this won't get rid of all spurious interrupts, but
> I'd expect it to get rid of enough (e.g., at least 50% even
> in some worst cases I can think of). So the question is: how
> sensitive is Windows to spurious interrupts? I know that
> Linux needs something like 99% of interrupts to be spurious
> for it to generate a warning. If Windows is similar then my
> approach would work just fine.
>From what I saw, Windows XP is not that sensitive to spurious interrupts (at least for ISA interrupts). In general, Windows tries hard to survive :-)
We'll have to check if a prioritize list will suffice, it would be simple, I agree.
But you still do bad stuff and hope it'll go unnoticed, sounds like a recipe for voodoo, it should be well tested at least.
Thanks,
Guy.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0)
2007-08-10 15:51 ` Guy Zana
@ 2007-08-10 16:00 ` Keir Fraser
0 siblings, 0 replies; 17+ messages in thread
From: Keir Fraser @ 2007-08-10 16:00 UTC (permalink / raw)
To: Guy Zana, xen-devel; +Cc: Alex Novik
On 10/8/07 16:51, "Guy Zana" <guy@neocleus.com> wrote:
>> From what I saw, Windows XP is not that sensitive to spurious interrupts (at
>> least for ISA interrupts). In general, Windows tries hard to survive :-)
> We'll have to check if a prioritize list will suffice, it would be simple, I
> agree.
> But you still do bad stuff and hope it'll go unnoticed, sounds like a recipe
> for voodoo, it should be well tested at least.
This whole PCI passthru feature is a recipe for voodoo ;-)
-- Keir
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2007-08-10 16:00 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-09 17:45 [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Guy Zana
2007-08-10 2:58 ` [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0) Tian, Kevin
2007-08-10 10:10 ` Guy Zana
2007-08-10 7:01 ` [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Keir Fraser
2007-08-10 7:04 ` Keir Fraser
2007-08-10 7:15 ` [RFC] Pass-through Interdomain Interrupts Sharing(HVM/Dom0) Tian, Kevin
2007-08-10 7:37 ` Keir Fraser
2007-08-10 8:02 ` Tian, Kevin
2007-08-10 8:16 ` Keir Fraser
2007-08-10 8:41 ` Tian, Kevin
2007-08-10 8:52 ` Keir Fraser
2007-08-10 10:22 ` [RFC] Pass-through Interdomain Interrupts Sharing (HVM/Dom0) Guy Zana
2007-08-10 11:21 ` Keir Fraser
2007-08-10 11:50 ` Guy Zana
2007-08-10 13:18 ` Keir Fraser
2007-08-10 15:51 ` Guy Zana
2007-08-10 16:00 ` Keir Fraser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.