All of lore.kernel.org
 help / color / mirror / Atom feed
* Very large value from get_nsec_offset() in timer_interrupt
@ 2006-04-20 22:47 Ryan Harper
  2006-04-21 16:16 ` Ryan Harper
  0 siblings, 1 reply; 5+ messages in thread
From: Ryan Harper @ 2006-04-20 22:47 UTC (permalink / raw)
  To: xen-devel

I've been trying to debug live-locking of some 64-bit paravirt guests
for the past week and after finally getting gdb to give a backtrace,
I've been able to do some investigation.

The guest makes no forward progress as seen from the user's perspective
(console provides no output), but the kernel is always running and
consuming cpu time according to xm list.

Examining the vcpu context I was able to glean that it was usually
processing time of some sort, either in timer_interrupt, or do_timer
(more often).  After looking at some xentrace records, everything seemed
to be fine from Xen's perspective in that I was seeing timer interrupts
on the proper periodic schedule.  This left me with the guest.  

I attached to a live-locked guest and got the following backtrace:

(gdb) bt
#0  0xffffffff8014e296 in softlockup_tick (regs=0xffff88002efdfd48) at kernel/softlockup.c:52
#1  0xffffffff80134cb9 in do_timer (regs=0xffff88002efdfd48) at kernel/timer.c:947
#2  0xffffffff8010f325 in timer_interrupt (irq=788397384, dev_id=0x989680, regs=0xffff88002efdfd48)
    at arch/x86_64/kernel/../../i386/kernel/time-xen.c:674
#3  0xffffffff8014e5b9 in handle_IRQ_event (irq=256, regs=0xffff88002efdfd48, action=0xffff880001dfe600)
    at kernel/irq/handle.c:88
#4  0xffffffff8014e6b2 in __do_IRQ (irq=256, regs=0xffff88002efdfd48) at kernel/irq/handle.c:173
#5  0xffffffff8010d6ae in do_IRQ (regs=0xffff88002efdfd48) at arch/x86_64/kernel/irq-xen.c:105
#6  0xffffffff80249c50 in evtchn_do_upcall (regs=0xffff88002efdfd48) at drivers/xen/core/evtchn.c:215
#7  0xffffffff8010b87e in do_hypervisor_callback ()
#8  0xffff88002efdfd48 in ?? ()
#9  0x0000000000000000 in ?? ()

Jumping into frame 2:

(gdb) frame 2
#2  0xffffffff8010f325 in timer_interrupt (irq=788397384, dev_id=0x989680, regs=0xffff88002efdfd48)
    at arch/x86_64/kernel/../../i386/kernel/time-xen.c:674
674                     do_timer(regs);


Which is in this loop (with some of my hackery):
(gdb) list
669             ticks = 0ULL;
670             while (delta >= NS_PER_TICK) {
671                     ticks++;
672                     delta -= NS_PER_TICK;
673                     processed_system_time += NS_PER_TICK;
674                     do_timer(regs);
675             }

Checking out delta:

(gdb) p delta
$5 = 6139533539238523629

>From the source code, NS_PER_TICK is 1000000000LL/HZ.

With those values, it is no surprise why the guest is always busy.
Adding some more debugging where we set delta, I captured the two values
used to create delta.

(gdb) list timer_interrupt
610     s64 ns_offset;
611     u64 ticks;
612     u64 shadow_system_timestamp;
613
614     irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
615     {
616             s64 delta, delta_cpu, stolen, blocked;
617             u64 sched_time;
618             int i, cpu = smp_processor_id();
619             struct shadow_time_info *shadow = &per_cpu(shadow_time, cpu);
(gdb)
620             struct vcpu_runstate_info *runstate = &per_cpu(runstate, cpu);
621
622             write_seqlock(&xtime_lock);
623
624             do {
625                     get_time_values_from_xen();
626
627                     /* Obtain a consistent snapshot of elapsed wallclock cycles. */
628                     shadow_system_timestamp = shadow->system_timestamp;
629                     ns_offset = get_nsec_offset(shadow);
(gdb)
630                     //delta = delta_cpu =
631                     //      shadow->system_timestamp + get_nsec_offset(shadow);
632                     delta = delta_cpu = shadow_system_timestamp + ns_offset;
633                     delta     -= processed_system_time;
634                     delta_cpu -= per_cpu(processed_system_time, cpu);

The value of the system_timestap and the offset are:
(gdb) p shadow_system_timestamp
$6 = 5295129204996
(gdb) p ns_offset
$7 = 6148883847633798812

ns_offset is calculated from get_nsec_offset().  I'm going to dig
a little further, but I wanted to get what I'm seeing out there.


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Very large value from get_nsec_offset() in timer_interrupt
  2006-04-20 22:47 Very large value from get_nsec_offset() in timer_interrupt Ryan Harper
@ 2006-04-21 16:16 ` Ryan Harper
  2006-04-21 19:53   ` [PATCH] update_vcpu_system_time work-around Ryan Harper
  0 siblings, 1 reply; 5+ messages in thread
From: Ryan Harper @ 2006-04-21 16:16 UTC (permalink / raw)
  To: xen-devel

* Ryan Harper <ryanh@us.ibm.com> [2006-04-20 17:47]:
> 
> ns_offset is calculated from get_nsec_offset().  I'm going to dig
> a little further, but I wanted to get what I'm seeing out there.

static u64 get_nsec_offset(struct shadow_time_info *shadow)
{
   u64 now, delta;
   rdtscll(now);
   delta = now - shadow->tsc_timestamp;
   return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift);
}

For some currently unknown reason, now is < shadow->tsc_timestamp;  This
results in a very large delta value, which is scaled up even larger.

The output from the multi-cpu clock info dump doesn't show anything
bogus:

(XEN) Min = 68672793737292 ; Max = 68672793747633 ; Diff = 10341 (10 microseconds)
(XEN) Min = 68899728719792 ; Max = 68899728727283 ; Diff = 7491 (7 microseconds)

Any thoughts on why the shadow copy of the tsc_timestamp would be > than
the value returned from reading the tsc would be helpful.

Very likely this is related to the hardware, this is a 4-node x460
32-way.  I've not reproduced this on anything but multi-node setups.  

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] update_vcpu_system_time work-around
  2006-04-21 16:16 ` Ryan Harper
@ 2006-04-21 19:53   ` Ryan Harper
  2006-04-22  9:30     ` Keir Fraser
  0 siblings, 1 reply; 5+ messages in thread
From: Ryan Harper @ 2006-04-21 19:53 UTC (permalink / raw)
  To: xen-devel

* Ryan Harper <ryanh@us.ibm.com> [2006-04-21 14:03]:
> * Ryan Harper <ryanh@us.ibm.com> [2006-04-20 17:47]:
> > 
> > ns_offset is calculated from get_nsec_offset().  I'm going to dig
> > a little further, but I wanted to get what I'm seeing out there.
> 
> static u64 get_nsec_offset(struct shadow_time_info *shadow)
> {
>    u64 now, delta;
>    rdtscll(now);
>    delta = now - shadow->tsc_timestamp;
>    return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift);
> }
> 
> For some currently unknown reason, now is < shadow->tsc_timestamp;  This
> results in a very large delta value, which is scaled up even larger.
> 
> The output from the multi-cpu clock info dump doesn't show anything
> bogus:
> 
> (XEN) Min = 68672793737292 ; Max = 68672793747633 ; Diff = 10341 (10 microseconds)
> (XEN) Min = 68899728719792 ; Max = 68899728727283 ; Diff = 7491 (7 microseconds)
> 
> Any thoughts on why the shadow copy of the tsc_timestamp would be > than
> the value returned from reading the tsc would be helpful.
> 
> Very likely this is related to the hardware, this is a 4-node x460
> 32-way.  I've not reproduced this on anything but multi-node setups.  

I've used this patch to work-around an issue that I don't have root
cause on, but hopefully should help shed some light on the real issue.
When I launch a domain with a cpus string which invokes
set_vcpu_affinity() calls, the result is that when the guest time is
updated, the wrong cpu's tsc is being used.  This leads to the guest
getting calculating a bogus delta value when the guest processor's tsc
is slower than the tsc of the processor xen grabs the shadow timestamp
from.  

Launching the following domain:

[root@x460-3 ~]# xm create /tmp/dom1.cfg -c -n
Using config file "/tmp/dom1.cfg".
(vm
    (name dom1)
    (memory 768)
    (ssidref 131074)
    (on_poweroff destroy)
    (on_reboot destroy)
    (on_crash destroy)
    (vcpus 2)
    (cpus '2,3')
    (image
        (linux
            (kernel /boot/vmlinuz-2.6.16-xenU-smp)
            (root '/dev/sda1 ro')
            (args 4)
        )
    )
    (device (vbd (uname phy:/dev/virt-blkdev-backend/dom1) (dev sda1) (mode w)))
    (device (vbd (uname phy:/dev/sdc1) (dev sda2) (mode r)))
    (device (vif))
)

[root@x460-3 ~]# xm vcpu-list 1
Name                              ID  VCPU  CPU  State  Time(s)  CPU Affinity
dom1                               1     0    2   -b-      13.6  2
dom1                               1     1    3   -b-       7.4  3


Here is some output I see on the multi-node box console where I was
encountering the bugus delta values:

(XEN) ACK! Comparing DOM1 VCPU0 's tsc_timestamp to CPU31 instead of CPU2, fixing...
(XEN) ACK! Comparing DOM1 VCPU0 's tsc_timestamp to CPU31 instead of CPU2, fixing...
(XEN) ACK! Comparing DOM1 VCPU0 's tsc_timestamp to CPU31 instead of CPU2, fixing...
(XEN) ACK! Comparing DOM1 VCPU0 's tsc_timestamp to CPU31 instead of CPU2, fixing...
(XEN) ACK! Comparing DOM1 VCPU1 's tsc_timestamp to CPU0 instead of CPU3, fixing...
(XEN) ACK! Comparing DOM1 VCPU1 's tsc_timestamp to CPU0 instead of CPU3, fixing...
(XEN) ACK! Comparing DOM1 VCPU0 's tsc_timestamp to CPU31 instead of CPU2, fixing...
(XEN) ACK! Updating DOM1 VCPU0 's tsc_timestamp from CPU31 instead of CPU2, fixing...
(XEN) ACK! Comparing DOM1 VCPU1 's tsc_timestamp to CPU0 instead of CPU3, fixing...
(XEN) ACK! Updating DOM1 VCPU1 's tsc_timestamp from CPU0 instead of CPU3, fixing...
(XEN) ACK! Comparing DOM1 VCPU0 's tsc_timestamp to CPU31 instead of CPU2, fixing...

This work-around is only masking the real issue.  I'm hoping someone
else can help me track down why update_vcpu_system_time() is being
called from the wrong CPU after pinning operations.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com


diffstat output:
 time.c |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
---
diff -r 65894fff3649 xen/arch/x86/time.c
--- a/xen/arch/x86/time.c	Fri Apr 21 13:03:07 2006
+++ b/xen/arch/x86/time.c	Fri Apr 21 21:38:07 2006
@@ -672,10 +672,18 @@
 
 static inline void __update_vcpu_system_time(struct vcpu *v)
 {
+    unsigned int cpu = smp_processor_id();
     struct cpu_time       *t;
     struct vcpu_time_info *u;
 
-    t = &cpu_time[smp_processor_id()];
+    if ( v->processor != cpu ) {
+        printk("ACK! Updating DOM%d VCPU%d 's tsc_timestamp from "
+               "CPU%d instead of CPU%d, fixing...\n", 
+               v->domain->domain_id, v->vcpu_id, cpu, v->processor);
+        cpu = v->processor;
+    }
+
+    t = &cpu_time[cpu];
     u = &v->domain->shared_info->vcpu_info[v->vcpu_id].time;
 
     version_update_begin(&u->version);
@@ -690,8 +698,15 @@
 
 void update_vcpu_system_time(struct vcpu *v)
 {
+    unsigned int cpu = smp_processor_id();
+    if ( v->processor != cpu ) {
+        printk("ACK! Comparing DOM%d VCPU%d 's tsc_timestamp to "
+               "CPU%d instead of CPU%d, fixing...\n", 
+               v->domain->domain_id, v->vcpu_id, cpu, v->processor);
+        cpu = v->processor;
+    }
     if ( v->domain->shared_info->vcpu_info[v->vcpu_id].time.tsc_timestamp != 
-         cpu_time[smp_processor_id()].local_tsc_stamp )
+         cpu_time[cpu].local_tsc_stamp )
         __update_vcpu_system_time(v);
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] update_vcpu_system_time work-around
  2006-04-21 19:53   ` [PATCH] update_vcpu_system_time work-around Ryan Harper
@ 2006-04-22  9:30     ` Keir Fraser
  2006-04-24 15:06       ` Ryan Harper
  0 siblings, 1 reply; 5+ messages in thread
From: Keir Fraser @ 2006-04-22  9:30 UTC (permalink / raw)
  To: Ryan Harper; +Cc: xen-devel


On 21 Apr 2006, at 20:53, Ryan Harper wrote:

> This work-around is only masking the real issue.  I'm hoping someone
> else can help me track down why update_vcpu_system_time() is being
> called from the wrong CPU after pinning operations.

Looks like dom_timer_fn() could cause this. The 
update_vcpu_system_time() there is unnecessary anyway. I'll check in a 
fix.

  -- Keir

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] update_vcpu_system_time work-around
  2006-04-22  9:30     ` Keir Fraser
@ 2006-04-24 15:06       ` Ryan Harper
  0 siblings, 0 replies; 5+ messages in thread
From: Ryan Harper @ 2006-04-24 15:06 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

* Keir Fraser <Keir.Fraser@cl.cam.ac.uk> [2006-04-22 04:35]:
> 
> On 21 Apr 2006, at 20:53, Ryan Harper wrote:
> 
> >This work-around is only masking the real issue.  I'm hoping someone
> >else can help me track down why update_vcpu_system_time() is being
> >called from the wrong CPU after pinning operations.
> 
> Looks like dom_timer_fn() could cause this. The 
> update_vcpu_system_time() there is unnecessary anyway. I'll check in a 
> fix.

That fixed it.  Thanks Keir.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-04-24 15:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-20 22:47 Very large value from get_nsec_offset() in timer_interrupt Ryan Harper
2006-04-21 16:16 ` Ryan Harper
2006-04-21 19:53   ` [PATCH] update_vcpu_system_time work-around Ryan Harper
2006-04-22  9:30     ` Keir Fraser
2006-04-24 15:06       ` Ryan Harper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.