xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Interrupt latency measurement technique
@ 2014-08-27  9:14 Andrii Tseglytskyi
  2014-08-27 10:07 ` Andrii Tseglytskyi
  2014-08-28  1:07 ` Ian Campbell
  0 siblings, 2 replies; 7+ messages in thread
From: Andrii Tseglytskyi @ 2014-08-27  9:14 UTC (permalink / raw)
  To: xen-devel@lists.xen.org, Ian Campbell, Julien Grall,
	Stefano Stabellini

Hi,

I need to measure IRQ latency introduced by Xen: (GIC -> Xen IRQ
handler -> Dom0 IRQ handler)
I need to know how many time I spend in Xen IRQ handler.

Can someone comment - is the following algorithm works:

- in function xen/arch/arm/gic.c: gic_interrupt() store timer counter value:
xen_timer_val = READ_SYSREG64(CNTPCT_EL0) - READ_SYSREG64(CNTVOFF_EL2)

- in any IRQ handler in dom0 store timer counter value:
dom0_timer_val = READ_SYSREG64(CNTPCT_EL0)

- calculate time diff in nanoseconds:
time_diff_ns = ticks_to_ns(dom0_timer_val - xen_timer_val)

Using this technique I measured display IRQ latency and got about
- 20 to 30 usec latency on 1 GHz MPU frequency
- 10 to 20 usec latencyon 1.5 GHz MPU frequency

Are these numbers expectable? Can this technique be used?

Regards,
Andrii

-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Interrupt latency measurement technique
  2014-08-27  9:14 [RFC] Interrupt latency measurement technique Andrii Tseglytskyi
@ 2014-08-27 10:07 ` Andrii Tseglytskyi
  2014-08-29 18:07   ` Jonathan Fraser
  2014-08-28  1:07 ` Ian Campbell
  1 sibling, 1 reply; 7+ messages in thread
From: Andrii Tseglytskyi @ 2014-08-27 10:07 UTC (permalink / raw)
  To: xen-devel@lists.xen.org, Ian Campbell, Julien Grall,
	Stefano Stabellini

Test setup is the following:

Platform - Jacinto6 (OMAP5) with two ARMv7 cores (Cortex A15), 1.5 Gb of RAM
Hypervisor - Xen 4.4 stable
Dom0 - Linux 3.8 running with 2 vcpus, 128 Mb RAM
DomU was not launched during the test

Regards,
Andrii


On Wed, Aug 27, 2014 at 12:14 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> Hi,
>
> I need to measure IRQ latency introduced by Xen: (GIC -> Xen IRQ
> handler -> Dom0 IRQ handler)
> I need to know how many time I spend in Xen IRQ handler.
>
> Can someone comment - is the following algorithm works:
>
> - in function xen/arch/arm/gic.c: gic_interrupt() store timer counter value:
> xen_timer_val = READ_SYSREG64(CNTPCT_EL0) - READ_SYSREG64(CNTVOFF_EL2)
>
> - in any IRQ handler in dom0 store timer counter value:
> dom0_timer_val = READ_SYSREG64(CNTPCT_EL0)
>
> - calculate time diff in nanoseconds:
> time_diff_ns = ticks_to_ns(dom0_timer_val - xen_timer_val)
>
> Using this technique I measured display IRQ latency and got about
> - 20 to 30 usec latency on 1 GHz MPU frequency
> - 10 to 20 usec latencyon 1.5 GHz MPU frequency
>
> Are these numbers expectable? Can this technique be used?
>
> Regards,
> Andrii
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Interrupt latency measurement technique
  2014-08-27  9:14 [RFC] Interrupt latency measurement technique Andrii Tseglytskyi
  2014-08-27 10:07 ` Andrii Tseglytskyi
@ 2014-08-28  1:07 ` Ian Campbell
  2014-08-28  7:58   ` Andrii Tseglytskyi
  1 sibling, 1 reply; 7+ messages in thread
From: Ian Campbell @ 2014-08-28  1:07 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, Stefano Stabellini, xen-devel@lists.xen.org

On Wed, 2014-08-27 at 12:14 +0300, Andrii Tseglytskyi wrote:
> Hi,
> 
> I need to measure IRQ latency introduced by Xen: (GIC -> Xen IRQ
> handler -> Dom0 IRQ handler)
> I need to know how many time I spend in Xen IRQ handler.
> 
> Can someone comment - is the following algorithm works:
> 
> - in function xen/arch/arm/gic.c: gic_interrupt() store timer counter value:
> xen_timer_val = READ_SYSREG64(CNTPCT_EL0) - READ_SYSREG64(CNTVOFF_EL2)
> 
> - in any IRQ handler in dom0 store timer counter value:
> dom0_timer_val = READ_SYSREG64(CNTPCT_EL0)
> 
> - calculate time diff in nanoseconds:
> time_diff_ns = ticks_to_ns(dom0_timer_val - xen_timer_val)
> 
> Using this technique I measured display IRQ latency and got about
> - 20 to 30 usec latency on 1 GHz MPU frequency
> - 10 to 20 usec latencyon 1.5 GHz MPU frequency
> 
> Are these numbers expectable?

How do they compare to just native Linux?

One thing to watch out for is the virtualised offset of dom0's timer, I
can't remember if this is expected to be zero or non-zero, if the latter
then you will need to subtract it out.

>  Can this technique be used?

One thing which it misses is the time from the IRQ line going high to
reaching gic_interrupt. I don't know if that is a concern for you.

Not sure what embedded folks usually do here (you probably know better
than me). One technique I've heard of is to inject an interrupt via an
external GPIO input and toggle another GPIO in response, then you can
use a scope to observe the difference.

Ian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Interrupt latency measurement technique
  2014-08-28  1:07 ` Ian Campbell
@ 2014-08-28  7:58   ` Andrii Tseglytskyi
  2014-08-28 17:47     ` Ian Campbell
  0 siblings, 1 reply; 7+ messages in thread
From: Andrii Tseglytskyi @ 2014-08-28  7:58 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Julien Grall, Stefano Stabellini, xen-devel@lists.xen.org

Hi Ian

On Thu, Aug 28, 2014 at 4:07 AM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Wed, 2014-08-27 at 12:14 +0300, Andrii Tseglytskyi wrote:
>> Hi,
>>
>> I need to measure IRQ latency introduced by Xen: (GIC -> Xen IRQ
>> handler -> Dom0 IRQ handler)
>> I need to know how many time I spend in Xen IRQ handler.
>>
>> Can someone comment - is the following algorithm works:
>>
>> - in function xen/arch/arm/gic.c: gic_interrupt() store timer counter value:
>> xen_timer_val = READ_SYSREG64(CNTPCT_EL0) - READ_SYSREG64(CNTVOFF_EL2)
>>
>> - in any IRQ handler in dom0 store timer counter value:
>> dom0_timer_val = READ_SYSREG64(CNTPCT_EL0)
>>
>> - calculate time diff in nanoseconds:
>> time_diff_ns = ticks_to_ns(dom0_timer_val - xen_timer_val)
>>
>> Using this technique I measured display IRQ latency and got about
>> - 20 to 30 usec latency on 1 GHz MPU frequency
>> - 10 to 20 usec latencyon 1.5 GHz MPU frequency
>>
>> Are these numbers expectable?
>
> How do they compare to just native Linux?
>

Never measured native Linux. I just need to measure overhead, introduced by Xen.
So I take a time difference between git_interrupt() in Xen and
gic_interrupt() in Linux


> One thing to watch out for is the virtualised offset of dom0's timer, I
> can't remember if this is expected to be zero or non-zero, if the latter
> then you will need to subtract it out.
>

Yes, I do READ_SYSREG64(CNTVOFF_EL2), it is virtualized offset of dom0
timer, and it is not zero, always

>>  Can this technique be used?
>
> One thing which it misses is the time from the IRQ line going high to
> reaching gic_interrupt. I don't know if that is a concern for you.
>

I know. But the only way to measure above gic_interrupt() call is an
oscilloscope measurement, You already mentioned it below.

> Not sure what embedded folks usually do here (you probably know better
> than me). One technique I've heard of is to inject an interrupt via an
> external GPIO input and toggle another GPIO in response, then you can
> use a scope to observe the difference.

Yes. This is the most precise method. But I'm looking for pure software method.

Thank you for your comments,

Regards,
Andrii

>
> Ian.
>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Interrupt latency measurement technique
  2014-08-28  7:58   ` Andrii Tseglytskyi
@ 2014-08-28 17:47     ` Ian Campbell
  0 siblings, 0 replies; 7+ messages in thread
From: Ian Campbell @ 2014-08-28 17:47 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, Stefano Stabellini, xen-devel@lists.xen.org

On Thu, 2014-08-28 at 10:58 +0300, Andrii Tseglytskyi wrote:
> Hi Ian
> 
> On Thu, Aug 28, 2014 at 4:07 AM, Ian Campbell <ian.campbell@citrix.com> wrote:
> > On Wed, 2014-08-27 at 12:14 +0300, Andrii Tseglytskyi wrote:
> >> Hi,
> >>
> >> I need to measure IRQ latency introduced by Xen: (GIC -> Xen IRQ
> >> handler -> Dom0 IRQ handler)
> >> I need to know how many time I spend in Xen IRQ handler.
> >>
> >> Can someone comment - is the following algorithm works:
> >>
> >> - in function xen/arch/arm/gic.c: gic_interrupt() store timer counter value:
> >> xen_timer_val = READ_SYSREG64(CNTPCT_EL0) - READ_SYSREG64(CNTVOFF_EL2)
> >>
> >> - in any IRQ handler in dom0 store timer counter value:
> >> dom0_timer_val = READ_SYSREG64(CNTPCT_EL0)
> >>
> >> - calculate time diff in nanoseconds:
> >> time_diff_ns = ticks_to_ns(dom0_timer_val - xen_timer_val)
> >>
> >> Using this technique I measured display IRQ latency and got about
> >> - 20 to 30 usec latency on 1 GHz MPU frequency
> >> - 10 to 20 usec latencyon 1.5 GHz MPU frequency
> >>
> >> Are these numbers expectable?
> >
> > How do they compare to just native Linux?
> >
> 
> Never measured native Linux. I just need to measure overhead, introduced by Xen.
> So I take a time difference between git_interrupt() in Xen and
> gic_interrupt() in Linux.

OK. I'm afraid I don't have much of a reference point for these numbers
so I don't know if 30usec is e.g .1% additional overhead or 100%.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Interrupt latency measurement technique
  2014-08-27 10:07 ` Andrii Tseglytskyi
@ 2014-08-29 18:07   ` Jonathan Fraser
  2014-08-29 20:11     ` Andrii Tseglytskyi
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Fraser @ 2014-08-29 18:07 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Stefano Stabellini, Jonathan Fraser, Ian Campbell, Julien Grall,
	xen-devel@lists.xen.org

I need to make the same measurements.
How did you pass the value of 'xen_timer_val' to the dom0 kernel?  I can
store it in a scratch hardware register in the SoC, but how did you do
it?

Thanks,
Jon


On Wed, 2014-08-27 at 13:07 +0300, Andrii Tseglytskyi wrote:
> Test setup is the following:
> 
> Platform - Jacinto6 (OMAP5) with two ARMv7 cores (Cortex A15), 1.5 Gb of RAM
> Hypervisor - Xen 4.4 stable
> Dom0 - Linux 3.8 running with 2 vcpus, 128 Mb RAM
> DomU was not launched during the test
> 
> Regards,
> Andrii
> 
> 
> On Wed, Aug 27, 2014 at 12:14 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@globallogic.com> wrote:
> > Hi,
> >
> > I need to measure IRQ latency introduced by Xen: (GIC -> Xen IRQ
> > handler -> Dom0 IRQ handler)
> > I need to know how many time I spend in Xen IRQ handler.
> >
> > Can someone comment - is the following algorithm works:
> >
> > - in function xen/arch/arm/gic.c: gic_interrupt() store timer counter value:
> > xen_timer_val = READ_SYSREG64(CNTPCT_EL0) - READ_SYSREG64(CNTVOFF_EL2)
> >
> > - in any IRQ handler in dom0 store timer counter value:
> > dom0_timer_val = READ_SYSREG64(CNTPCT_EL0)
> >
> > - calculate time diff in nanoseconds:
> > time_diff_ns = ticks_to_ns(dom0_timer_val - xen_timer_val)
> >
> > Using this technique I measured display IRQ latency and got about
> > - 20 to 30 usec latency on 1 GHz MPU frequency
> > - 10 to 20 usec latencyon 1.5 GHz MPU frequency
> >
> > Are these numbers expectable? Can this technique be used?
> >
> > Regards,
> > Andrii
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Interrupt latency measurement technique
  2014-08-29 18:07   ` Jonathan Fraser
@ 2014-08-29 20:11     ` Andrii Tseglytskyi
  0 siblings, 0 replies; 7+ messages in thread
From: Andrii Tseglytskyi @ 2014-08-29 20:11 UTC (permalink / raw)
  To: Jonathan Fraser
  Cc: Julien Grall, Ian Campbell, Stefano Stabellini,
	xen-devel@lists.xen.org

Hi Jonatan

I didn't pass xen_timer_val to dom0. I stored time counters in the
static buffer in Xen, and after in dom0, and then compare them using
Exel.
During printing of buffer I can do with counters whatever I want -
convert them to nanosecond, etc

Let me show the code which does this in Xen:


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 39e0f2f..340dfa5 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -847,12 +847,39 @@ static void do_sgi(struct cpu_user_regs *regs,
int othercpu, enum gic_sgi sgi)
     GICC[GICC_DIR] = sgi;
 }

+
+s_time_t ticks_to_ns(uint64_t ticks);
+
+static const u32 test_irq = 57; <-- IRQ, which I profiled (Display
IRQ on my platform)
+static u32 total_irqs;
+
+#define NUM_TICKS_TO_STORE  100
+struct irq_tick {
+    u64 cnt_tick;
+    u64 cnt_off;
+};
+
+static struct irq_tick irq_ticks[NUM_TICKS_TO_STORE]; <-- buffer to store ticks
+static u32 offs;
+
+static void print_irq_ticks(void)
+{
+    u32 i;
+    for ( i = 0; i < NUM_TICKS_TO_STORE; i++ )
+    {
+        printk("[%02u] irq %u tick -> %llu off -> %llu off_ns -> %llu
dom_tick %llu dom_tick_ns -> %llu cpu_khz %lu total %u\n",
+                i, test_irq, irq_ticks[i].cnt_tick,
irq_ticks[i].cnt_off, ticks_to_ns(irq_ticks[i].cnt_off),
+                irq_ticks[i].cnt_tick - irq_ticks[i].cnt_off,
+                ticks_to_ns(irq_ticks[i].cnt_tick -
irq_ticks[i].cnt_off), cpu_khz, total_irqs);
+    }
+}
+
 /* Accept an interrupt from the GIC and dispatch its handler */
 void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
 {
     uint32_t intack;
     unsigned int irq;
-
+    static u32 step;

     do  {
         intack = GICC[GICC_IAR];
@@ -860,6 +887,22 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)

         if ( likely(irq >= 16 && irq < 1021) )
         {
+            if ( irq == test_irq )
+            {
+                total_irqs++;
+                irq_ticks[step].cnt_tick = READ_SYSREG64(CNTPCT_EL0);
 <-- Timer base value
+                irq_ticks[step].cnt_off = READ_SYSREG64(CNTVOFF_EL2);
 <--Time offset for domain, dom0 in my case
+
+                step++;
+                if ( step % NUM_TICKS_TO_STORE == 0 )
+                {
+                    print_irq_ticks();
+                    step = 0;
+                }
+            }

CNTVOFF_EL2 register contains constant value, it is a difference
between tick in Xen and tick in dom0. If you subtract base tick value
and this offset you'll get tick value in dom0.
After, in dom0 you check CNTPCT_EL0 and that's it. You have 2 tick
values for dom0 - one from Xen, second from dom0 itself.

My diff for dom0 contains some overhead - I copy pasted some Xen code there:

diff --git a/drivers/video/omap2/dss/dispc-compat.c
b/drivers/video/omap2/dss/dispc-compat.c
index 4b31058..71c807e 100644
--- a/drivers/video/omap2/dss/dispc-compat.c
+++ b/drivers/video/omap2/dss/dispc-compat.c
@@ -255,6 +255,63 @@ static void print_irq_status(u32 status)
 #undef PIS
 }

+
+#define CNTPCT          p15,0,c14       /* Time counter value */
+#define CNTPCT_EL0      CNTPCT
+
+#define __CP64(r1, r2, coproc, opc, crm) coproc, opc, r1, r2, crm
+#define CP64(r, name...) __CP64(r, name)
+#define LOAD_CP64(r, name...)  "mrrc " __stringify(CP64(%r, %H##r, name)) ";"
+#define READ_CP64(name...) ({                                   \
+     register uint64_t _r;                                       \
+     asm volatile(LOAD_CP64(0, name) : "=r" (_r));               \
+     _r; })
+#define READ_SYSREG64(R...)     READ_CP64(R)
+
+static u64 muldiv64(u64 a, u32 b, u32 c)
+{
+#define mod_64(x, y) ((x) - (y) * div64_u64(x, y))
+       union {
+               u64 ll;
+               struct {
+                       u32 low, high;
+               } l;
+       } u, res;
+       u64 rl, rh;
+
+       u.ll = a;
+       rl = (u64)u.l.low * (u64)b;
+       rh = (u64)u.l.high * (u64)b;
+       rh += (rl >> 32);
+       res.l.high = div64_u64(rh, c);
+       res.l.low = div64_u64(((mod_64(rh, c) << 32) + (rl & 0xffffffff)), c);
+       return res.ll;
+}
+
+static const u32 cpu_khz = 6144;
+static inline s64 ticks_to_ns(uint64_t ticks)
+{
+#define SECONDS(_s)     ((s64)((_s)  * 1000000000ULL))
+
+    return muldiv64(ticks, SECONDS(1), 1000 * cpu_khz);
+}
+
+static const u32 test_irq = 57;
+static u32 total_irqs;
+
+#define NUM_TICKS_TO_STORE  100
+static u64 irq_ticks[NUM_TICKS_TO_STORE];
+
+static void print_irq_ticks(void)
+{
+    u32 i;
+    for ( i = 0; i < NUM_TICKS_TO_STORE; i++ )
+    {
+        printk("(Dom0)[%02u] irq %u tick -> %lld tick_ns %lld cpu_khz
%u total %u\n",
+                i, test_irq, irq_ticks[i], ticks_to_ns(irq_ticks[i]),
cpu_khz, total_irqs);
+    }
+}
+
 /* Called from dss.c. Note that we don't touch clocks here,
  * but we presume they are on because we got an IRQ. However,
  * an irq handler may turn the clocks off, so we may not have
@@ -267,6 +324,17 @@ static irqreturn_t omap_dispc_irq_handler(int
irq, void *arg)
        u32 unhandled_errors;
        struct omap_dispc_isr_data *isr_data;
        struct omap_dispc_isr_data registered_isr[DISPC_MAX_NR_ISRS];
+    static u32 step;
+
+    total_irqs++;
+    irq_ticks[step] = READ_SYSREG64(CNTPCT_EL0);
+
+    step++;
+    if ( step % NUM_TICKS_TO_STORE == 0 )
+    {
+        print_irq_ticks();
+        step = 0;
+    }


It would be nice to know whether this works for you.

Regards,
Andrii

On Fri, Aug 29, 2014 at 9:07 PM, Jonathan Fraser <jfraser@broadcom.com> wrote:
> I need to make the same measurements.
> How did you pass the value of 'xen_timer_val' to the dom0 kernel?  I can
> store it in a scratch hardware register in the SoC, but how did you do
> it?
>
> Thanks,
> Jon
>
>
> On Wed, 2014-08-27 at 13:07 +0300, Andrii Tseglytskyi wrote:
>> Test setup is the following:
>>
>> Platform - Jacinto6 (OMAP5) with two ARMv7 cores (Cortex A15), 1.5 Gb of RAM
>> Hypervisor - Xen 4.4 stable
>> Dom0 - Linux 3.8 running with 2 vcpus, 128 Mb RAM
>> DomU was not launched during the test
>>
>> Regards,
>> Andrii
>>
>>
>> On Wed, Aug 27, 2014 at 12:14 PM, Andrii Tseglytskyi
>> <andrii.tseglytskyi@globallogic.com> wrote:
>> > Hi,
>> >
>> > I need to measure IRQ latency introduced by Xen: (GIC -> Xen IRQ
>> > handler -> Dom0 IRQ handler)
>> > I need to know how many time I spend in Xen IRQ handler.
>> >
>> > Can someone comment - is the following algorithm works:
>> >
>> > - in function xen/arch/arm/gic.c: gic_interrupt() store timer counter value:
>> > xen_timer_val = READ_SYSREG64(CNTPCT_EL0) - READ_SYSREG64(CNTVOFF_EL2)
>> >
>> > - in any IRQ handler in dom0 store timer counter value:
>> > dom0_timer_val = READ_SYSREG64(CNTPCT_EL0)
>> >
>> > - calculate time diff in nanoseconds:
>> > time_diff_ns = ticks_to_ns(dom0_timer_val - xen_timer_val)
>> >
>> > Using this technique I measured display IRQ latency and got about
>> > - 20 to 30 usec latency on 1 GHz MPU frequency
>> > - 10 to 20 usec latencyon 1.5 GHz MPU frequency
>> >
>> > Are these numbers expectable? Can this technique be used?
>> >
>> > Regards,
>> > Andrii
>> >
>> > --
>> >
>> > Andrii Tseglytskyi | Embedded Dev
>> > GlobalLogic
>> > www.globallogic.com
>>
>>
>>
>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-08-29 20:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-27  9:14 [RFC] Interrupt latency measurement technique Andrii Tseglytskyi
2014-08-27 10:07 ` Andrii Tseglytskyi
2014-08-29 18:07   ` Jonathan Fraser
2014-08-29 20:11     ` Andrii Tseglytskyi
2014-08-28  1:07 ` Ian Campbell
2014-08-28  7:58   ` Andrii Tseglytskyi
2014-08-28 17:47     ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).