From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=35524 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OHMcV-0004Bs-1o
	for qemu-devel@nongnu.org; Wed, 26 May 2010 15:50:29 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <blauwirbel@gmail.com>) id 1OHMcJ-0003n6-AJ
	for qemu-devel@nongnu.org; Wed, 26 May 2010 15:50:16 -0400
Received: from mail-px0-f173.google.com ([209.85.212.173]:56123)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <blauwirbel@gmail.com>) id 1OHMcJ-0003mk-1k
	for qemu-devel@nongnu.org; Wed, 26 May 2010 15:50:15 -0400
Received: by pxi2 with SMTP id 2so978pxi.4
	for <qemu-devel@nongnu.org>; Wed, 26 May 2010 12:50:14 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <4BFC3028.6030303@codemonkey.ws>
References: <cover.1274732025.git.jan.kiszka@web.de>
	<e39866bcfe2ebc72f9a02c95d308271036b560f5.1274732025.git.jan.kiszka@web.de>
	<AANLkTinP2QheAYmPzPfIZ0qeOcyCv_L5CRoif4XXe5qt@mail.gmail.com>
	<4BFC3028.6030303@codemonkey.ws>
From: Blue Swirl <blauwirbel@gmail.com>
Date: Wed, 26 May 2010 19:49:53 +0000
Message-ID: <AANLkTilgVDPP0tZAkJrAQN45hqB5g17RLTC3ONQPjKEp@mail.gmail.com>
Subject: Re: [Qemu-devel] Re: [RFT][PATCH 07/15] qemu_irq: Add IRQ handlers
	with delivery feedback
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Jan Kiszka <jan.kiszka@siemens.com>, Jan Kiszka <jan.kiszka@web.de>, qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>

On Tue, May 25, 2010 at 8:16 PM, Anthony Liguori <anthony@codemonkey.ws> wr=
ote:
> On 05/25/2010 02:09 PM, Blue Swirl wrote:
>>
>> On Mon, May 24, 2010 at 8:13 PM, Jan Kiszka<jan.kiszka@web.de> =C2=A0wro=
te:
>>
>>>
>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>
>>> This allows to communicate potential IRQ coalescing during delivery fro=
m
>>> the sink back to the source. Targets that support IRQ coalescing
>>> workarounds need to register handlers that return the appropriate
>>> QEMU_IRQ_* code, and they have to propergate the code across all IRQ
>>> redirections. If the IRQ source receives a QEMU_IRQ_COALESCED, it can
>>> apply its workaround. If multiple sinks exist, the source may only
>>> consider an IRQ coalesced if all other sinks either report
>>> QEMU_IRQ_COALESCED as well or QEMU_IRQ_MASKED.
>>>
>>
>> No real devices are interested whether any of their output lines are
>> even connected. This would introduce a new signal type, bidirectional
>> multi-level, which is not correct.
>>
>
> I don't think it's really an issue of correct, but I wouldn't disagree to=
 a
> suggestion that we ought to introduce a new signal type for this type of
> bidirectional feedback. =C2=A0Maybe it's qemu_coalesced_irq and has a sim=
ilar
> interface as qemu_irq.

I'd rather try harder to find a solution for the problem without this.

>> I think the real solution to coalescing is put the logic inside one
>> device, in this case APIC because it has the information about irq
>> delivery. APIC could monitor incoming RTC irqs for frequency
>> information and whether they get delivered or not. If not, an internal
>> timer is installed which injects the lost irqs.
>>
>> Of course, no real device could do such de-coalescing, but with this
>> approach, the voodoo is contained to insides of one device, APIC.
>>
>> We should also take a step back to think what was the cause of lost
>> irqs, IIRC uneven execution rate in QEMU.
>
> Not only that. =C2=A0The pathological case is where a host is limited to =
a 1khz
> timer frequency and the guest requests a 1khz timer frequency. =C2=A0Prac=
tically
> speaking, there is no way we'll ever be able to adjust timers to reinject
> lost interrupts because of the host timer limitation.

But APIC knows when the guest has acked a timer irq and so can
immediately issue another one to cover a lost irq. APIC could also
arrange a CPU exit. The irqs would not arrive at even rate, but in
this pathological case, the host can't ever schedule any delays less
than 1ms.

>> =C2=A0Could this be fixed or taken
>> into account in timer handling? For example, CPU loop could analyze
>> the wall clock time between CPU exits and use that to offset the
>> timers. Thus the timer frequency (in wall clock time) could be made to
>> correspond a bit more to VCPU execution rate.
>>
>
> A lot of what motivates the timer reinjection work is very old linux kern=
els
> that had fixed userspace timer frequencies. =C2=A0On newer host kernels, =
it's
> probably not nearly as important except when you get into pathological ca=
ses
> like exposing a high frequency HPET timer to the guest where you cannot k=
eep
> up with the host.

There will always be some delays and overhead because of the
emulation. We can either hide this completely from the guest (by
stretching the guest time) or we can try to keep the guest time base
in synch with host but then the guest can't expect native class
performance to be available between timer irqs.

On a native hardware, you could have a 1kHz timer and a kernel task
always running for max. 999us with interrupts disabled, then for at
least 1us the interrupts are enabled and the timer interrupt handler
gets a chance to run. On QEMU, such a small time window may be
impossible to emulate. No coalescing would help in this case. With a
cycle counting emulation it's possible, but the speed would probably
be slower than native. With the interrupt coalescing you may be able
to cover some easy cases.

>
> Regards,
>
> Anthony Liguori
>
>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>> ---
>>> =C2=A0hw/irq.c | =C2=A0 38 +++++++++++++++++++++++++++++---------
>>> =C2=A0hw/irq.h | =C2=A0 22 +++++++++++++++-------
>>> =C2=A02 files changed, 44 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/hw/irq.c b/hw/irq.c
>>> index 7703f62..db2cce6 100644
>>> --- a/hw/irq.c
>>> +++ b/hw/irq.c
>>> @@ -26,19 +26,27 @@
>>>
>>> =C2=A0struct IRQState {
>>> =C2=A0 =C2=A0 qemu_irq_handler handler;
>>> + =C2=A0 =C2=A0qemu_irq_fb_handler feedback_handler;
>>> =C2=A0 =C2=A0 void *opaque;
>>> =C2=A0 =C2=A0 int n;
>>> =C2=A0};
>>>
>>> -void qemu_set_irq(qemu_irq irq, int level)
>>> +int qemu_set_irq(qemu_irq irq, int level)
>>> =C2=A0{
>>> - =C2=A0 =C2=A0if (!irq)
>>> - =C2=A0 =C2=A0 =C2=A0 =C2=A0return;
>>> -
>>> - =C2=A0 =C2=A0irq->handler(irq->opaque, irq->n, level);
>>> + =C2=A0 =C2=A0if (!irq) {
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;
>>> + =C2=A0 =C2=A0}
>>> + =C2=A0 =C2=A0if (irq->feedback_handler) {
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0return irq->feedback_handler(irq->opaque, =
irq->n, level);
>>> + =C2=A0 =C2=A0} else {
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0irq->handler(irq->opaque, irq->n, level);
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0return QEMU_IRQ_DELIVERED;
>>> + =C2=A0 =C2=A0}
>>> =C2=A0}
>>>
>>> -qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, i=
nt
>>> n)
>>> +static qemu_irq *allocate_irqs(qemu_irq_handler handler,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 qemu_irq_fb_handler feedback_handler,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 void *opaque, int n)
>>> =C2=A0{
>>> =C2=A0 =C2=A0 qemu_irq *s;
>>> =C2=A0 =C2=A0 struct IRQState *p;
>>> @@ -48,6 +56,7 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler=
,
>>> void *opaque, int n)
>>> =C2=A0 =C2=A0 p =3D (struct IRQState *)qemu_mallocz(sizeof(struct IRQSt=
ate) * n);
>>> =C2=A0 =C2=A0 for (i =3D 0; i< =C2=A0n; i++) {
>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 p->handler =3D handler;
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0p->feedback_handler =3D feedback_handler;
>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 p->opaque =3D opaque;
>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 p->n =3D i;
>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 s[i] =3D p;
>>> @@ -56,22 +65,33 @@ qemu_irq *qemu_allocate_irqs(qemu_irq_handler
>>> handler, void *opaque, int n)
>>> =C2=A0 =C2=A0 return s;
>>> =C2=A0}
>>>
>>> +qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaque, i=
nt
>>> n)
>>> +{
>>> + =C2=A0 =C2=A0return allocate_irqs(handler, NULL, opaque, n);
>>> +}
>>> +
>>> +qemu_irq *qemu_allocate_feedback_irqs(qemu_irq_fb_handler handler,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0void *opaque=
, int n)
>>> +{
>>> + =C2=A0 =C2=A0return allocate_irqs(NULL, handler, opaque, n);
>>> +}
>>> +
>>> =C2=A0void qemu_free_irqs(qemu_irq *s)
>>> =C2=A0{
>>> =C2=A0 =C2=A0 qemu_free(s[0]);
>>> =C2=A0 =C2=A0 qemu_free(s);
>>> =C2=A0}
>>>
>>> -static void qemu_notirq(void *opaque, int line, int level)
>>> +static int qemu_notirq(void *opaque, int line, int level)
>>> =C2=A0{
>>> =C2=A0 =C2=A0 struct IRQState *irq =3D opaque;
>>>
>>> - =C2=A0 =C2=A0irq->handler(irq->opaque, irq->n, !level);
>>> + =C2=A0 =C2=A0return qemu_set_irq(irq, !level);
>>> =C2=A0}
>>>
>>> =C2=A0qemu_irq qemu_irq_invert(qemu_irq irq)
>>> =C2=A0{
>>> =C2=A0 =C2=A0 /* The default state for IRQs is low, so raise the output=
 now. =C2=A0*/
>>> =C2=A0 =C2=A0 qemu_irq_raise(irq);
>>> - =C2=A0 =C2=A0return qemu_allocate_irqs(qemu_notirq, irq, 1)[0];
>>> + =C2=A0 =C2=A0return allocate_irqs(NULL, qemu_notirq, irq, 1)[0];
>>> =C2=A0}
>>> diff --git a/hw/irq.h b/hw/irq.h
>>> index 5daae44..eee03e6 100644
>>> --- a/hw/irq.h
>>> +++ b/hw/irq.h
>>> @@ -3,15 +3,18 @@
>>>
>>> =C2=A0/* Generic IRQ/GPIO pin infrastructure. =C2=A0*/
>>>
>>> -/* FIXME: Rmove one of these. =C2=A0*/
>>> +#define QEMU_IRQ_DELIVERED =C2=A0 =C2=A0 =C2=A00
>>> +#define QEMU_IRQ_COALESCED =C2=A0 =C2=A0 =C2=A0(-1)
>>> +#define QEMU_IRQ_MASKED =C2=A0 =C2=A0 =C2=A0 =C2=A0 (-2)
>>> +
>>> =C2=A0typedef void (*qemu_irq_handler)(void *opaque, int n, int level);
>>> -typedef void SetIRQFunc(void *opaque, int irq_num, int level);
>>> +typedef int (*qemu_irq_fb_handler)(void *opaque, int n, int level);
>>>
>>> -void qemu_set_irq(qemu_irq irq, int level);
>>> +int qemu_set_irq(qemu_irq irq, int level);
>>>
>>> -static inline void qemu_irq_raise(qemu_irq irq)
>>> +static inline int qemu_irq_raise(qemu_irq irq)
>>> =C2=A0{
>>> - =C2=A0 =C2=A0qemu_set_irq(irq, 1);
>>> + =C2=A0 =C2=A0return qemu_set_irq(irq, 1);
>>> =C2=A0}
>>>
>>> =C2=A0static inline void qemu_irq_lower(qemu_irq irq)
>>> @@ -19,14 +22,19 @@ static inline void qemu_irq_lower(qemu_irq irq)
>>> =C2=A0 =C2=A0 qemu_set_irq(irq, 0);
>>> =C2=A0}
>>>
>>> -static inline void qemu_irq_pulse(qemu_irq irq)
>>> +static inline int qemu_irq_pulse(qemu_irq irq)
>>> =C2=A0{
>>> - =C2=A0 =C2=A0qemu_set_irq(irq, 1);
>>> + =C2=A0 =C2=A0int ret;
>>> +
>>> + =C2=A0 =C2=A0ret =3D qemu_set_irq(irq, 1);
>>> =C2=A0 =C2=A0 qemu_set_irq(irq, 0);
>>> + =C2=A0 =C2=A0return ret;
>>> =C2=A0}
>>>
>>> =C2=A0/* Returns an array of N IRQs. =C2=A0*/
>>> =C2=A0qemu_irq *qemu_allocate_irqs(qemu_irq_handler handler, void *opaq=
ue, int
>>> n);
>>> +qemu_irq *qemu_allocate_feedback_irqs(qemu_irq_fb_handler handler,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0void *opaque=
, int n);
>>> =C2=A0void qemu_free_irqs(qemu_irq *s);
>>>
>>> =C2=A0/* Returns a new IRQ with opposite polarity. =C2=A0*/
>>> --
>>> 1.6.0.2
>>>
>>>
>>>
>>
>>
>
>