All of lore.kernel.org
 help / color / mirror / Atom feed
* Soft lockup/Time went backwards in latest unstable
@ 2007-02-23 19:10 Derek Murray
  2007-02-23 23:08 ` Keir Fraser
  0 siblings, 1 reply; 6+ messages in thread
From: Derek Murray @ 2007-02-23 19:10 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]

I'm running Xen on a Lenovo Thinkpad T60p, based on an Intel Core Duo  
T2500 processor. Since upgrading to the latest unstable, the console  
is filled with "Time went backwards" messages. An example:

Timer ISR/1: Time went backwards: delta=-52249969 delta_cpu=67750031  
shadow=197099583154 off=168169166 processed=197320000000  
cpu_processed=197200000000
0: 197310000000
1: 197200000000
Timer ISR/1: Time went backwards: delta=-76271442 delta_cpu=73728558  
shadow=197099583154 off=224147323 processed=197400000000  
cpu_processed=197250000000
0: 197390000000
1: 197250000000
Timer ISR/1: Time went backwards: delta=-90285138 delta_cpu=69714862  
shadow=197099583154 off=280133627 processed=197470000000  
cpu_processed=197310000000
0: 197460000000
1: 197310000000

In addition, I noticed a soft lockup, with the following stack trace:

BUG: soft lockup detected on CPU#0!
[<c014ef57>] softlockup_tick+0x97/0xd0
[<c01097ea>] timer_interrupt+0x2fa/0x6b0
[<ee06a875>] sd_rw_intr+0x75/0x2e0 [sd_mod]
[<c014f2c3>] handle_IRQ_event+0x33/0xa0
[<c014f3d8>] __do_IRQ+0xa8/0x120
[<c010733f>] do_IRQ+0x3f/0xa0
[<c0244b4e>] evtchn_do_upcall+0xbe/0x100
[<c010580d>] hypervisor_callback+0x3d/0x48
[<c020007b>] cfb_imageblit+0x10b/0x530
[<c0243f5a>] force_evtchn_callback+0xa/0x10
[<ee07c63f>] acpi_processor_idle+0x26d/0x422 [processor]
[<ee07c3d2>] acpi_processor_idle+0x0/0x422 [processor]
[<c01035a1>] cpu_idle+0x71/0xd0
[<c03b482d>] start_kernel+0x39d/0x480
[<c03b4220>] unknown_bootoption+0x0/0x270

and

BUG: soft lockup detected on CPU#1!
[<c014ef57>] softlockup_tick+0x97/0xd0
[<c01097ea>] timer_interrupt+0x2fa/0x6b0
[<c014f2c3>] handle_IRQ_event+0x33/0xa0
[<c014f3d8>] __do_IRQ+0xa8/0x120
[<c010733f>] do_IRQ+0x3f/0xa0
[<c0107344>] do_IRQ+0x44/0xa0
[<c0244b4e>] evtchn_do_upcall+0xbe/0x100
[<c010580d>] hypervisor_callback+0x3d/0x48
[<c0243f5a>] force_evtchn_callback+0xa/0x10
[<ee07c63f>] acpi_processor_idle+0x26d/0x422 [processor]
[<ee07c3d2>] acpi_processor_idle+0x0/0x422 [processor]
[<c01035a1>] cpu_idle+0x71/0xd0

It appears that this issue was introduced in Xen 3.0.4, as this  
exhibits the same issue. 3.0.3 works fine. The version of unstable  
that I was using previously (which did not have this issue) worked  
with kernel version 2.6.16.29; 3.0.4 comes with 2.6.16.33. Could the  
problem have been introduced in the move from 2.6.16.29 to 2.6.16.33  
(or somewhere in between)?

I'm not sure how to go about handling this apparent bug, though I am  
prepared to offer any assistance necessary in fixing it. I've  
attached my xm dmesg, in case that's any help.

Thanks in advance.

Regards,

Derek Murray.

[-- Attachment #2: xen_dmesg --]
[-- Type: application/octet-stream, Size: 3380 bytes --]

 __  __            _____  ___                     _        _     _      
 \ \/ /___ _ __   |___ / / _ \    _   _ _ __  ___| |_ __ _| |__ | | ___ 
  \  // _ \ '_ \    |_ \| | | |__| | | | '_ \/ __| __/ _` | '_ \| |/ _ \
  /  \  __/ | | |  ___) | |_| |__| |_| | | | \__ \ || (_| | |_) | |  __/
 /_/\_\___|_| |_| |____(_)___/    \__,_|_| |_|___/\__\__,_|_.__/|_|\___|
                                                                        
 http://www.cl.cam.ac.uk/netos/xen
 University of Cambridge Computer Laboratory

 Xen version 3.0-unstable (root@site) (gcc version 4.1.0 (SUSE Linux)) Fri Feb 23 12:15:08 GMT 2007
 Latest ChangeSet: unavailable

(XEN) Command line: /boot/xen.gz 
(XEN)  0000000000000000 - 000000000009f000 (usable)
(XEN)  000000000009f000 - 00000000000a0000 (reserved)
(XEN)  00000000000d2000 - 00000000000d4000 (reserved)
(XEN)  00000000000dc000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000003fed0000 (usable)
(XEN)  000000003fed0000 - 000000003fee3000 (ACPI data)
(XEN)  000000003fee3000 - 000000003ff00000 (ACPI NVS)
(XEN)  000000003ff00000 - 0000000040000000 (reserved)
(XEN)  00000000f0000000 - 00000000f4000000 (reserved)
(XEN)  00000000fec00000 - 00000000fec10000 (reserved)
(XEN)  00000000fed00000 - 00000000fed00400 (reserved)
(XEN)  00000000fed14000 - 00000000fed1a000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed90000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ff800000 - 0000000100000000 (reserved)
(XEN) System RAM: 1022MB (1046972kB)
(XEN) Domain heap initialised: DMA width 30 bits
(XEN) Xen heap: 10MB (10248kB)
(XEN) PAE enabled, limit: 16 GB
(XEN) Processor #0 6:14 APIC version 20
(XEN) Processor #1 6:14 APIC version 20
(XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 1995.036 MHz processor.
(XEN) VMXON is done
(XEN) CPU0: Intel Genuine Intel(R) CPU           T2500  @ 2.00GHz stepping 08
(XEN) Mapping cpu 0 to node 255
(XEN) Booting processor 1/1 eip 90000
(XEN) Mapping cpu 1 to node 255
(XEN) VMXON is done
(XEN) CPU1: Intel Genuine Intel(R) CPU           T2500  @ 2.00GHz stepping 08
(XEN) Total of 2 processors activated.
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using new ACK method
(XEN) Platform timer is 14.318MHz HPET
(XEN) Brought up 2 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 32-bit, PAE, lsb
(XEN)  Dom0 kernel: 32-bit, PAE, lsb, paddr 0xc0100000 -> 0xc044653c
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   000000003e000000->000000003f000000 (236579 pages to be allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: c0100000->c044653c
(XEN)  Init. ramdisk: c0447000->c0974600
(XEN)  Phys-Mach map: c0975000->c0a6008c
(XEN)  Start info:    c0a61000->c0a6146c
(XEN)  Page tables:   c0a62000->c0a6d000
(XEN)  Boot stack:    c0a6d000->c0a6e000
(XEN)  TOTAL:         c0000000->c0c00000
(XEN)  ENTRY ADDRESS: c0100000
(XEN) Dom0 has maximum 2 VCPUs
(XEN) Initrd len 0x52d600, start at 0xc0447000
(XEN) Scrubbing Free RAM: done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen).


[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Soft lockup/Time went backwards in latest unstable
  2007-02-23 19:10 Soft lockup/Time went backwards in latest unstable Derek Murray
@ 2007-02-23 23:08 ` Keir Fraser
  2007-02-26 12:39   ` Derek Murray
  0 siblings, 1 reply; 6+ messages in thread
From: Keir Fraser @ 2007-02-23 23:08 UTC (permalink / raw)
  To: Derek Murray, xen-devel

On 23/2/07 19:10, "Derek Murray" <Derek.Murray@cl.cam.ac.uk> wrote:

> I'm not sure how to go about handling this apparent bug, though I am
> prepared to offer any assistance necessary in fixing it. I've
> attached my xm dmesg, in case that's any help.

You should be able to run a 3.0.3 dom0 on 3.0.4 Xen and vice versa. By
trying both combinations you should be able to work out whether it is the
Xen upgrade or the Linux upgrade which introduces the problem.

 -- Keir

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Soft lockup/Time went backwards in latest unstable
  2007-02-23 23:08 ` Keir Fraser
@ 2007-02-26 12:39   ` Derek Murray
  2007-02-26 13:22     ` Keir Fraser
  0 siblings, 1 reply; 6+ messages in thread
From: Derek Murray @ 2007-02-26 12:39 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel


On 23 Feb 2007, at 23:08, Keir Fraser wrote:

> You should be able to run a 3.0.3 dom0 on 3.0.4 Xen and vice versa. By
> trying both combinations you should be able to work out whether it  
> is the
> Xen upgrade or the Linux upgrade which introduces the problem.

It appears I was being hasty in blaming 3.0.4 for introducing the  
bug. I tested the following configurations:

Xen 3.0.3_0: works with 3.0.3 kernel (2.6.16.29-xen) and 3.0.4 kernel  
(2.6.16.33-xen). Fails with -unstable kernel (2.6.18).
Xen 3.0.4_1: works with 3.0.3 and 3.0.4 kernels. Fails with -unstable  
kernel.
Xen unstable: works with 3.0.3 and 3.0.4 kernels. Fails with - 
unstable kernel.

So, 3.0.3 and 3.0.4 work as distributed, but the latest -unstable is  
broken.

The reason I attributed the blame to 3.0.4 was because I had seen the  
same bug in -unstable during the 3.0.3-3.0.4 interregnum; however,  
this bug seems to have been fixed for the 3.0.4 release, and re- 
emerged since with the transition to 2.6.18.

Is there a pattern for how these bugs have been fixed in the past, so  
that I could go about trying to make a patch?

Regards,

Derek Murray.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Soft lockup/Time went backwards in latest unstable
  2007-02-26 12:39   ` Derek Murray
@ 2007-02-26 13:22     ` Keir Fraser
  2007-02-28 11:25       ` Derek Murray
  0 siblings, 1 reply; 6+ messages in thread
From: Keir Fraser @ 2007-02-26 13:22 UTC (permalink / raw)
  To: Derek Murray; +Cc: xen-devel




On 26/2/07 12:39, "Derek Murray" <Derek.Murray@cl.cam.ac.uk> wrote:

> The reason I attributed the blame to 3.0.4 was because I had seen the
> same bug in -unstable during the 3.0.3-3.0.4 interregnum; however,
> this bug seems to have been fixed for the 3.0.4 release, and re-
> emerged since with the transition to 2.6.18.
> 
> Is there a pattern for how these bugs have been fixed in the past, so
> that I could go about trying to make a patch?

Unfortunately not. There were big changes to Linux's time handling between
2.6.16 and 2.6.18 so bugs may hav ebeen introduced while porting out own
timer code. It might be worth diffing the file time-xen.c from working and
non-working Linux kernels. It's weird that the failure mode is bad on the
T60p yet noone else has reported this bug, nor has our testing reproduced
it. :-(

 -- Keir

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Soft lockup/Time went backwards in latest unstable
  2007-02-26 13:22     ` Keir Fraser
@ 2007-02-28 11:25       ` Derek Murray
  2007-02-28 12:03         ` Keir Fraser
  0 siblings, 1 reply; 6+ messages in thread
From: Derek Murray @ 2007-02-28 11:25 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

On 26 Feb 2007, at 13:22, Keir Fraser wrote:

> There were big changes to Linux's time handling between
> 2.6.16 and 2.6.18 so bugs may hav ebeen introduced while porting  
> out own
> timer code. It might be worth diffing the file time-xen.c from  
> working and
> non-working Linux kernels. It's weird that the failure mode is bad  
> on the
> T60p yet noone else has reported this bug, nor has our testing  
> reproduced
> it. :-(

Some more information on the bug hunt in progress, in case anyone  
with more experience has any ideas:

* Changeset 13508 (i.e. the last one before the transition to a  
2.6.17 kernel) fails in the same way as the current unstable, but it  
uses a 2.6.16.33 kernel.
* Changeset 13213 (which introduced the idle=poll option in the  
kernel when running on Xen) also fails with a 2.6.16.33 kernel.
* Changeset 13216 fails to boot at all (or, at least, locks up for >=  
30 seconds on boot).
* Changeset 13217 ("Make sure we always have a sensible idle  
function; this fixed problems") fails with time going backwards.
* Changeset 13212, however, works.

The only relevant changes between changesets 13212 and 13213 apply to  
the file linux-2.6-xen-sparse/arch/i386/kernel/process-xen.c (and the  
corresponding x86_64 file), to which a poll_idle() function has been  
added. I don't quite understand the significance of this, but could  
it be responsible for the bug that I'm seeing?

Regards,

Derek Murray.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Soft lockup/Time went backwards in latest unstable
  2007-02-28 11:25       ` Derek Murray
@ 2007-02-28 12:03         ` Keir Fraser
  0 siblings, 0 replies; 6+ messages in thread
From: Keir Fraser @ 2007-02-28 12:03 UTC (permalink / raw)
  To: Derek Murray, Keir Fraser; +Cc: xen-devel

On 28/2/07 11:25, "Derek Murray" <Derek.Murray@cl.cam.ac.uk> wrote:

> The only relevant changes between changesets 13212 and 13213 apply to
> the file linux-2.6-xen-sparse/arch/i386/kernel/process-xen.c (and the
> corresponding x86_64 file), to which a poll_idle() function has been
> added. I don't quite understand the significance of this, but could
> it be responsible for the bug that I'm seeing?

You are almost certainly picking up acpi_processor_idle() or apm_cpu_idle().
You might want to check this. I don't think any good could come of this. :-)

I'll have a think about the best fix...

 -- Keir

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-02-28 12:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-23 19:10 Soft lockup/Time went backwards in latest unstable Derek Murray
2007-02-23 23:08 ` Keir Fraser
2007-02-26 12:39   ` Derek Murray
2007-02-26 13:22     ` Keir Fraser
2007-02-28 11:25       ` Derek Murray
2007-02-28 12:03         ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.