All of lore.kernel.org
 help / color / mirror / Atom feed
* ntp hangs odd-num domains
@ 2004-02-16 17:35 David Becker
  2004-02-16 17:42 ` Keir Fraser
  0 siblings, 1 reply; 6+ messages in thread
From: David Becker @ 2004-02-16 17:35 UTC (permalink / raw)
  To: xen-devel


This is kinda quirky. 

Somtimes my guests would hang after ntp started.  With some playing around
I determined this happens only in odd numbered domains.   Even numbered
domains run fine but when ntpd starts in the odd domains existing ssh
connetions hang, and I can't make new ssh connections.   The afflicted
domain does respond to ping and to 'xc_dom_control.py shutdown'.

This happens if I run the guests serially using the same root partition.

Domain 0 is not running ntp.  I suppose the clock is a shared resource
so I ought to only have DOM0 set it, and not the guests.

This is with the xeno-1.2.bk tree.


As for HIGHMEM4G, I am interested in working on it, but it will be a
couple weeks before I have an opportunity.



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ntp hangs odd-num domains
  2004-02-16 17:35 ntp hangs odd-num domains David Becker
@ 2004-02-16 17:42 ` Keir Fraser
  2004-02-16 18:11   ` ntp hangs CPU1 David Becker
  0 siblings, 1 reply; 6+ messages in thread
From: Keir Fraser @ 2004-02-16 17:42 UTC (permalink / raw)
  To: David Becker; +Cc: xen-devel

> 
> This is kinda quirky. 
> 
> Somtimes my guests would hang after ntp started.  With some playing around
> I determined this happens only in odd numbered domains.   Even numbered
> domains run fine but when ntpd starts in the odd domains existing ssh
> connetions hang, and I can't make new ssh connections.   The afflicted
> domain does respond to ping and to 'xc_dom_control.py shutdown'.
> 
> This happens if I run the guests serially using the same root partition.
> 
> Domain 0 is not running ntp.  I suppose the clock is a shared resource
> so I ought to only have DOM0 set it, and not the guests.
> 
> This is with the xeno-1.2.bk tree.

Is the box a dual? It could be an SMP-related issue -- odd domains
would run on CPU1 by default.

Running NTP on non-privileged domains should work -- by default a
non-privileged domain will sync against Xen (and Xen is in turn synced
against ntpd in DOM0). If a non-priv domain runs an NTP daemon then it
essentially detaches itself from Xen wall-time.

 -- Keir


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ntp hangs CPU1
  2004-02-16 17:42 ` Keir Fraser
@ 2004-02-16 18:11   ` David Becker
  2004-02-16 18:19     ` Keir Fraser
  0 siblings, 1 reply; 6+ messages in thread
From: David Becker @ 2004-02-16 18:11 UTC (permalink / raw)
  To: xen-devel


" Is the box a dual? It could be an SMP-related issue -- odd domains
" would run on CPU1 by default.

Aha, yes it is a dual.   And, sure enough, I can start/stop ntp fine on CPU0,
then pin the domain to cpu1, and hang it by starting ntp.




-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ntp hangs CPU1
  2004-02-16 18:11   ` ntp hangs CPU1 David Becker
@ 2004-02-16 18:19     ` Keir Fraser
  2004-02-16 18:35       ` David Becker
  0 siblings, 1 reply; 6+ messages in thread
From: Keir Fraser @ 2004-02-16 18:19 UTC (permalink / raw)
  To: David Becker; +Cc: xen-devel

> 
> " Is the box a dual? It could be an SMP-related issue -- odd domains
> " would run on CPU1 by default.
> 
> Aha, yes it is a dual.   And, sure enough, I can start/stop ntp fine on CPU0,
> then pin the domain to cpu1, and hang it by starting ntp.

Is this on Redhat 9? I can have a go at reproducing it tomorrow -- if
I can do that then it shouldn't be that hard to fix.

 -- Keir


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ntp hangs CPU1
  2004-02-16 18:19     ` Keir Fraser
@ 2004-02-16 18:35       ` David Becker
  2004-02-17  2:53         ` Ian Pratt
  0 siblings, 1 reply; 6+ messages in thread
From: David Becker @ 2004-02-16 18:35 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel


" Is this on Redhat 9?

I'm running debian/testing.  That particular guest was not entirely
up-to-date, running  ntp-4.0.99g / libc-2.3.1

So I updated it to ntp-4.1.2a / libc-2.3.2  and the problem has gone away.
Now I can restart ntpd on either cpu.



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ntp hangs CPU1
  2004-02-16 18:35       ` David Becker
@ 2004-02-17  2:53         ` Ian Pratt
  0 siblings, 0 replies; 6+ messages in thread
From: Ian Pratt @ 2004-02-17  2:53 UTC (permalink / raw)
  To: David Becker; +Cc: Keir Fraser, xen-devel, Ian.Pratt


> I'm running debian/testing.  That particular guest was not entirely
> up-to-date, running  ntp-4.0.99g / libc-2.3.1
> 
> So I updated it to ntp-4.1.2a / libc-2.3.2  and the problem has gone away.
> Now I can restart ntpd on either cpu.

Bizarre. 

We mostly use RH9 which is ntp-4.1.2 and libc-2.3.2.

It's hard to understand why the CPU the domain is running on
makes a difference with ntp-4.0.99. There's obviously a
Xen/Xenolinux bug that this particular versions tickles...

If you've got a serial line connected, please can you try hitting
'q' when you've got a couple of domains in the dead state. That
should at least tell us whether they're looping or waiting for an
event.

It might be worth doing the following to see if the system calls
are any different between the two ntp versions, and also to see
what the system call before the hang is:
 strace ntpd [args] >/dev/console 2>&1


Cheers,
Ian



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-02-17  2:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-16 17:35 ntp hangs odd-num domains David Becker
2004-02-16 17:42 ` Keir Fraser
2004-02-16 18:11   ` ntp hangs CPU1 David Becker
2004-02-16 18:19     ` Keir Fraser
2004-02-16 18:35       ` David Becker
2004-02-17  2:53         ` Ian Pratt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.