All of lore.kernel.org
 help / color / mirror / Atom feed
* [uml-devel] actually useful backtrace from a CPU-chewing hang
@ 2008-04-25 19:59 Nix
  2008-04-25 22:06 ` Nix
  0 siblings, 1 reply; 27+ messages in thread
From: Nix @ 2008-04-25 19:59 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

I suspect this can go wrong anywhere, but it happens to have been a
CBQ-triggered gettimeofday() while sending an arp that did it. (My ADSL
router pretty much bombs the poor damn machine with ARP packets all the
time.)

#0  getnstimeofday (ts=0x8217d10) at include/linux/time.h:182
#1  0x080824b4 in ktime_get_ts (ts=0x8217d10) at kernel/hrtimer.c:122
#2  0x0808244b in ktime_get () at kernel/hrtimer.c:58
#3  0x08148d93 in cbq_dequeue (sch=0xdde3800) at include/net/pkt_sched.h:53
#4  0x08143355 in __qdisc_run (dev=0xded1800) at net/sched/sch_generic.c:85
#5  0x08137991 in dev_queue_xmit (skb=0xdc89380) at include/net/pkt_sched.h:89
#6  0x08185537 in arp_xmit (skb=0xdc89380) at net/ipv4/arp.c:668
#7  0x08185574 in arp_send (type=1, ptype=2054, dest_ip=3457507285, dev=0xded1800, src_ip=2668538048, dest_hw=0xdd65244 "", src_hw=0xded192c "\002`\227yâÁ", target_hw=0x0) at net/ipv4/arp.c:694
#8  0x08184fb2 in arp_solicit (neigh=0xdd65220, skb=0x0) at net/ipv4/arp.c:379
#9  0x0813c898 in neigh_timer_handler (arg=232149536) at net/core/neighbour.c:866
#10 0x08077afd in run_timer_softirq (h=0x823a3ec) at kernel/timer.c:664
#11 0x080740a4 in __do_softirq () at kernel/softirq.c:234
#12 0x0807412e in do_softirq () at kernel/softirq.c:271
#13 0x08074186 in irq_exit () at kernel/softirq.c:310
#14 0x080576fd in do_IRQ (irq=0, regs=0x8217ec4) at arch/um/kernel/irq.c:336
#15 0x080590a0 in timer_handler (sig=26, regs=0x8217ec4) at arch/um/kernel/time.c:28
#16 0x08064c84 in real_alarm_handler (sc=0x0) at arch/um/os-Linux/signal.c:93
#17 0x08064cac in alarm_handler (sig=26, sc=0x0) at arch/um/os-Linux/signal.c:108
#18 0x080654a0 in deliver_alarm () at arch/um/os-Linux/time.c:117
#19 0x08065640 in idle_sleep (nsecs=<value optimized out>) at arch/um/os-Linux/time.c:181
#20 0x080580d4 in default_idle () at arch/um/kernel/process.c:248
#21 0x080580fe in cpu_idle () at arch/um/kernel/process.c:256
#22 0x081afdee in rest_init () at init/main.c:453
#23 0x0804953b in start_kernel () at init/main.c:650
#24 0x0804a7e1 in start_kernel_proc (unused=0x0) at arch/um/kernel/skas/process.c:46
#25 0x08064261 in run_kernel_thread (fn=0x804a7b5 <start_kernel_proc>, arg=0x0, jmp_ptr=0x821d184) at arch/um/os-Linux/process.c:267
#26 0x08057ee0 in new_thread_handler () at arch/um/kernel/process.c:151
#27 0x00000000 in ?? ()

(gdb) frame 1
#1  0x080824b4 in ktime_get_ts (ts=0x8217d10) at kernel/hrtimer.c:122
122                     getnstimeofday(ts);
(gdb) print *ts
$3 = {tv_sec = -1963377075, tv_nsec = 320917000}

It's stuck inside timespec_add_ns(), never leaving: this has been
inlined, so spying on its locals or parameters with GDB is pretty much a
no-hoper :/

Still, it's obvious enough to me that __get_nsec_offset() is returning a
negative value in getnstimeofday(), so timespec_add_ns() is looping,
well, not *forever*, but near enough, trying to push that u64 back down
to zero in units of NSEC_PER_SEC...

I hope this gives you a bit more of a clue.


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-04-25 19:59 [uml-devel] actually useful backtrace from a CPU-chewing hang Nix
@ 2008-04-25 22:06 ` Nix
  2008-04-26 18:31   ` Nix
  0 siblings, 1 reply; 27+ messages in thread
From: Nix @ 2008-04-25 22:06 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 25 Apr 2008, nix@esperi.org.uk told this:

> I suspect this can go wrong anywhere, but it happens to have been a
> CBQ-triggered gettimeofday() while sending an arp that did it. (My ADSL
> router pretty much bombs the poor damn machine with ARP packets all the
> time.)

Woo, it's happening a lot tonight.

The CBQ/ARP problem recurred, and also this:

(gdb) bt
#0  0xb7f87410 in __kernel_vsyscall ()
#1  0x45f27b36 in gettimeofday () from /lib/libc.so.6
#2  0x08065417 in os_nsecs () at arch/um/os-Linux/time.c:79
#3  0x080590fc in itimer_read () at arch/um/kernel/time.c:77
#4  0x08084a5f in update_wall_time () at include/linux/clocksource.h:167
#5  0x08077ba1 in do_timer (ticks=1) at kernel/timer.c:929
#6  0x08086350 in tick_periodic (cpu=0) at kernel/time/tick-common.c:66
#7  0x08086382 in tick_handle_periodic (dev=0x821d384) at kernel/time/tick-common.c:82
#8  0x080590ed in um_timer (irq=0, dev=0x0) at arch/um/kernel/time.c:70
#9  0x0808d42f in handle_IRQ_event (irq=0, action=0xdc49480) at kernel/irq/handle.c:140
#10 0x0808d4ad in __do_IRQ (irq=0) at kernel/irq/handle.c:236
#11 0x080576f8 in do_IRQ (irq=0, regs=0x8217ec4) at arch/um/kernel/irq.c:335
#12 0x080590a0 in timer_handler (sig=26, regs=0x8217ec4) at arch/um/kernel/time.c:28
#13 0x08064c84 in real_alarm_handler (sc=0x0) at arch/um/os-Linux/signal.c:93
#14 0x08064cac in alarm_handler (sig=26, sc=0x0) at arch/um/os-Linux/signal.c:108
#15 0x080654a0 in deliver_alarm () at arch/um/os-Linux/time.c:117
#16 0x08065640 in idle_sleep (nsecs=<value optimized out>) at arch/um/os-Linux/time.c:181
#17 0x080580d4 in default_idle () at arch/um/kernel/process.c:248
#18 0x080580fe in cpu_idle () at arch/um/kernel/process.c:256
#19 0x081afdee in rest_init () at init/main.c:453
#20 0x0804953b in start_kernel () at init/main.c:650
#21 0x0804a7e1 in start_kernel_proc (unused=0x0) at arch/um/kernel/skas/process.c:46
#22 0x08064261 in run_kernel_thread (fn=0x804a7b5 <start_kernel_proc>, arg=0x0, jmp_ptr=0x821d184) at arch/um/os-Linux/process.c:267
#23 0x08057ee0 in new_thread_handler () at arch/um/kernel/process.c:151
#24 0x00000000 in ?? ()

It looks to me like os_nsecs()/gettimeofday()/et al are the only really
important parts of this backtrace.

-- 
`If you are having a "ua luea luea le ua le" kind of day, I can only
 assume that you are doing no work due [to] incapacitating nausea caused 
 by numerous lazy demons.' --- Frossie

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-04-25 22:06 ` Nix
@ 2008-04-26 18:31   ` Nix
  2008-04-27 14:02     ` Nix
  2008-04-28 16:44     ` Jeff Dike
  0 siblings, 2 replies; 27+ messages in thread
From: Nix @ 2008-04-26 18:31 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 25 Apr 2008, nix@esperi.org.uk uttered the following:

> On 25 Apr 2008, nix@esperi.org.uk told this:
>
>> I suspect this can go wrong anywhere, but it happens to have been a
>> CBQ-triggered gettimeofday() while sending an arp that did it. (My ADSL
>> router pretty much bombs the poor damn machine with ARP packets all the
>> time.)
>
> Woo, it's happening a lot tonight.

The cause of this is almost certainly time-skewing on the *host* via
adjtimex(). I stopped ntpd and there were no problems for half a day: I
restarted it, and as soon as ntpd had synched and begun slewing the time
(within a second of slewing beginning, probably less than that), *wham*:

(gdb) bt
#0  getnstimeofday (ts=0x8217d10) at include/linux/time.h:176
#1  0x080824b4 in ktime_get_ts (ts=0x8217d10) at kernel/hrtimer.c:122
#2  0x0808244b in ktime_get () at kernel/hrtimer.c:58
#3  0x08148d93 in cbq_dequeue (sch=0xddfe000) at include/net/pkt_sched.h:53
#4  0x08143355 in __qdisc_run (dev=0xded1800) at net/sched/sch_generic.c:85
#5  0x08137991 in dev_queue_xmit (skb=0xdf1b240) at include/net/pkt_sched.h:89
#6  0x08185537 in arp_xmit (skb=0xdf1b240) at net/ipv4/arp.c:668
#7  0x08185574 in arp_send (type=1, ptype=2054, dest_ip=3536983368, dev=0xded1800, src_ip=2668538048, dest_hw=0xddf2e44 "", src_hw=0xded192c "\002`\227yâÁ", target_hw=0x0) at net/ipv4/arp.c:694
#8  0x08184fb2 in arp_solicit (neigh=0xddf2e20, skb=0x0) at net/ipv4/arp.c:379
#9  0x0813c898 in neigh_timer_handler (arg=232730144) at net/core/neighbour.c:866
#10 0x08077afd in run_timer_softirq (h=0x823a3ec) at kernel/timer.c:664
#11 0x080740a4 in __do_softirq () at kernel/softirq.c:234
#12 0x0807412e in do_softirq () at kernel/softirq.c:271
#13 0x08074186 in irq_exit () at kernel/softirq.c:310
#14 0x080576fd in do_IRQ (irq=0, regs=0x8217ec4) at arch/um/kernel/irq.c:336
#15 0x080590a0 in timer_handler (sig=26, regs=0x8217ec4) at arch/um/kernel/time.c:28
#16 0x08064c84 in real_alarm_handler (sc=0x0) at arch/um/os-Linux/signal.c:93
#17 0x08064cac in alarm_handler (sig=26, sc=0x0) at arch/um/os-Linux/signal.c:108
#18 0x080654a0 in deliver_alarm () at arch/um/os-Linux/time.c:117
#19 0x08065640 in idle_sleep (nsecs=<value optimized out>) at arch/um/os-Linux/time.c:181
#20 0x080580d4 in default_idle () at arch/um/kernel/process.c:248
#21 0x080580fe in cpu_idle () at arch/um/kernel/process.c:256
#22 0x081afdee in rest_init () at init/main.c:453
#23 0x0804953b in start_kernel () at init/main.c:650
#24 0x0804a7e1 in start_kernel_proc (unused=0x0) at arch/um/kernel/skas/process.c:46
#25 0x08064261 in run_kernel_thread (fn=0x804a7b5 <start_kernel_proc>, arg=0x0, jmp_ptr=0x821d184) at arch/um/os-Linux/process.c:267
#26 0x08057ee0 in new_thread_handler () at arch/um/kernel/process.c:151
#27 0x00000000 in ?? ()

Obviously, given that it's now busy decrementing that 64-bit variable,
stopping ntpd didn't help: UML was looping and would continue looping
until we were fossilized or I killed it.

UML seems to be vulnerable to microsecond-level time slews on the
host. This is not ideal :/

-- 
`If you are having a "ua luea luea le ua le" kind of day, I can only
 assume that you are doing no work due [to] incapacitating nausea caused 
 by numerous lazy demons.' --- Frossie

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-04-26 18:31   ` Nix
@ 2008-04-27 14:02     ` Nix
  2008-04-28 16:44     ` Jeff Dike
  1 sibling, 0 replies; 27+ messages in thread
From: Nix @ 2008-04-27 14:02 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 26 Apr 2008, nix@esperi.org.uk uttered the following:

> On 25 Apr 2008, nix@esperi.org.uk uttered the following:
>
>> On 25 Apr 2008, nix@esperi.org.uk told this:
>>
>>> I suspect this can go wrong anywhere, but it happens to have been a
>>> CBQ-triggered gettimeofday() while sending an arp that did it. (My ADSL
>>> router pretty much bombs the poor damn machine with ARP packets all the
>>> time.)
>>
>> Woo, it's happening a lot tonight.
>
> The cause of this is almost certainly time-skewing on the *host* via
> adjtimex(). I stopped ntpd and there were no problems for half a day: I
> restarted it, and as soon as ntpd had synched and begun slewing the time
> (within a second of slewing beginning, probably less than that), *wham*:

Time-stepping on the host is enough to cause trouble. I switched to
using ntpdate -b to keep time in synch, and since then my UML instances
have gone into mad loops only on the hour, when the ntpdate time-step
should take place.

-- 
`If you are having a "ua luea luea le ua le" kind of day, I can only
 assume that you are doing no work due [to] incapacitating nausea caused 
 by numerous lazy demons.' --- Frossie

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-04-26 18:31   ` Nix
  2008-04-27 14:02     ` Nix
@ 2008-04-28 16:44     ` Jeff Dike
  2008-04-30 21:49       ` Nix
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Dike @ 2008-04-28 16:44 UTC (permalink / raw)
  To: Nix; +Cc: user-mode-linux-devel

On Sat, Apr 26, 2008 at 07:31:44PM +0100, Nix wrote:
> The cause of this is almost certainly time-skewing on the *host* via
> adjtimex(). I stopped ntpd and there were no problems for half a day: I
> restarted it, and as soon as ntpd had synched and begun slewing the time
> (within a second of slewing beginning, probably less than that), *wham*:

Can you try the patch below?  Time should definitely not go backwards
- it's a bug if it happens.  If you want more a more definite
indication about whether this is having an effect, stick a printk in
that test and see if it shows up.

     	      	  Jeff

-- 
Work email - jdike at linux dot intel dot com

Index: linux-2.6.22/arch/um/os-Linux/time.c
===================================================================
--- linux-2.6.22.orig/arch/um/os-Linux/time.c	2008-04-24 13:21:28.000000000 -0400
+++ linux-2.6.22/arch/um/os-Linux/time.c	2008-04-28 12:41:37.000000000 -0400
@@ -107,6 +107,10 @@ static void deliver_alarm(void)
 	unsigned long long this_tick = os_nsecs();
 	int one_tick = UM_NSEC_PER_SEC / UM_HZ;
 
+	/* Protection against the host's time going backwards */
+	if ((last_tick != 0) && (this_tick < last_tick))
+		this_tick = last_tick;
+
 	if (last_tick == 0)
 		last_tick = this_tick - one_tick;
 

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-04-28 16:44     ` Jeff Dike
@ 2008-04-30 21:49       ` Nix
  2008-05-01 15:13         ` Jeff Dike
  0 siblings, 1 reply; 27+ messages in thread
From: Nix @ 2008-04-30 21:49 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 28 Apr 2008, Jeff Dike told this:

> On Sat, Apr 26, 2008 at 07:31:44PM +0100, Nix wrote:
>> The cause of this is almost certainly time-skewing on the *host* via
>> adjtimex(). I stopped ntpd and there were no problems for half a day: I
>> restarted it, and as soon as ntpd had synched and begun slewing the time
>> (within a second of slewing beginning, probably less than that), *wham*:
>
> Can you try the patch below?  Time should definitely not go backwards
> - it's a bug if it happens.

I suspect that just a slowing of the rate of passing of time had nasty
negative consequences, because when ntpd is running resident all the
time, it should slew, not skew: yet we were getting loops on a regular
basis...

>                              If you want more a more definite
> indication about whether this is having an effect, stick a printk in
> that test and see if it shows up.

Done, and tested by stepping the time (five seconds --- five seconds per
day! I have pendulum clocks that keep better time than that!), and,
oops, instant loop as before, with this debugging patch applied, which
did not fire...

Index: i686-esperi/arch/um/os-Linux/time.c
===================================================================
--- i686-esperi.orig/arch/um/os-Linux/time.c    2008-04-29 19:34:07.000000000 +0100
+++ i686-esperi/arch/um/os-Linux/time.c 2008-04-29 20:08:50.000000000 +0100
@@ -108,7 +108,11 @@

        /* Protection against the host's time going backwards */
        if ((last_tick != 0) && (this_tick < last_tick))
+       {
+               printk(UM_KERN_ERR "Time went backwards: %llu -> %llu",
+                      last_tick, this_tick);
                this_tick = last_tick;
+       }

        if (last_tick == 0)
                last_tick = this_tick - one_tick;

getnstimeofday (ts=0x821ba2c) at include/linux/time.h:176
176             while(unlikely(ns >= NSEC_PER_SEC)) {
(gdb) bt
#0  getnstimeofday (ts=0x821ba2c) at include/linux/time.h:176
#1  0x080824e4 in ktime_get_ts (ts=0x821ba2c) at kernel/hrtimer.c:122
#2  0x0808247b in ktime_get () at kernel/hrtimer.c:58
#3  0x0814757c in tcf_act_police (skb=0xdc89800, a=0xded0e40, res=0x821bb68) at include/net/pkt_sched.h:53
#4  0x08146123 in tcf_action_exec (skb=0xdc89800, act=0xded0e40, res=0x821bb68) at net/sched/act_api.c:370
#5  0x08153779 in u32_classify (skb=0xdc89800, tp=0xde83f60, res=0x821bb68) at include/net/pkt_cls.h:127
#6  0x08144fed in tc_classify_compat (skb=0xdc89800, tp=0xde83f60, res=0x821bb68) at net/sched/sch_api.c:1203
#7  0x08145032 in tc_classify (skb=0xdc89800, tp=0xde83f60, res=0x821bb68) at net/sched/sch_api.c:1228
#8  0x08150a1a in ingress_enqueue (skb=0xdc89800, sch=0xdedc760) at net/sched/sch_ingress.c:77
#9  0x08137c31 in ing_filter (skb=0xfffffffe) at net/core/dev.c:1973
#10 0x08137db0 in netif_receive_skb (skb=0xdc89800) at net/core/dev.c:1994
#11 0x08137ecc in process_backlog (napi=0x8240650, quota=64) at net/core/dev.c:2132
#12 0x08137fc3 in net_rx_action (h=0x823a3fc) at net/core/dev.c:2202
#13 0x080740d4 in __do_softirq () at kernel/softirq.c:234
#14 0x0807415e in do_softirq () at kernel/softirq.c:271
#15 0x080741b6 in irq_exit () at kernel/softirq.c:310
#16 0x080576fd in do_IRQ (irq=5, regs=0x821bc6c) at arch/um/kernel/irq.c:336
#17 0x08057367 in sigio_handler (sig=29, regs=0x821bc6c) at arch/um/kernel/irq.c:95
#18 0x08064c07 in sig_handler_common (sig=29, sc=0x821bd24) at arch/um/os-Linux/signal.c:48
#19 0x08064c44 in sig_handler (sig=29, sc=0x821bd24) at arch/um/os-Linux/signal.c:80
#20 0x08064d67 in handle_signal (sig=-2, sc=0x821bd24) at arch/um/os-Linux/signal.c:157
#21 0x08066263 in hard_handler (sig=29) at arch/um/os-Linux/sys-i386/signal.c:12
#22 <signal handler called>
#23 getnstimeofday (ts=0xde3feec) at include/linux/time.h:182
#24 0x08084306 in do_gettimeofday (tv=0xde3ff04) at kernel/time/timekeeping.c:118
#25 0x0807396c in sys_gettimeofday (tv=0xbfdd7008, tz=0x0) at kernel/time.c:103
#26 0x0805a55c in handle_syscall (r=0xde78628) at arch/um/kernel/skas/syscall.c:35
#27 0x08066cdf in handle_trap (pid=26156, regs=0xde78628, local_using_sysemu=2) at arch/um/os-Linux/skas/process.c:202
#28 0x0806716b in userspace (regs=0xde78628) at arch/um/os-Linux/skas/process.c:418
#29 0x08057f5a in fork_handler () at arch/um/kernel/process.c:179
#30 0x00000000 in ?? ()

-- 
`If you are having a "ua luea luea le ua le" kind of day, I can only
 assume that you are doing no work due [to] incapacitating nausea caused 
 by numerous lazy demons.' --- Frossie

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-04-30 21:49       ` Nix
@ 2008-05-01 15:13         ` Jeff Dike
  2008-05-01 23:34           ` Nix
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Dike @ 2008-05-01 15:13 UTC (permalink / raw)
  To: Nix; +Cc: user-mode-linux-devel

On Wed, Apr 30, 2008 at 10:49:27PM +0100, Nix wrote:
> Done, and tested by stepping the time (five seconds --- five seconds per
> day! I have pendulum clocks that keep better time than that!), and,
> oops, instant loop as before, with this debugging patch applied, which
> did not fire...

Hmmm.  I stepped the time (date -set "now+5sec") and nothing bad
happened.  Is that more or less what you did?

Can you send me your config?

    	     	Jeff

-- 
Work email - jdike at linux dot intel dot com

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-01 15:13         ` Jeff Dike
@ 2008-05-01 23:34           ` Nix
  2008-05-02 16:30             ` Jeff Dike
  0 siblings, 1 reply; 27+ messages in thread
From: Nix @ 2008-05-01 23:34 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 1 May 2008, Jeff Dike outgrape:

> On Wed, Apr 30, 2008 at 10:49:27PM +0100, Nix wrote:
>> Done, and tested by stepping the time (five seconds --- five seconds per
>> day! I have pendulum clocks that keep better time than that!), and,
>> oops, instant loop as before, with this debugging patch applied, which
>> did not fire...
>
> Hmmm.  I stepped the time (date -set "now+5sec") and nothing bad
> happened.  Is that more or less what you did?

loki:/tmp# /usr/bin/ntpdate -bv hades
30 Apr 22:45:52 ntpdate[8833]: ntpdate 4.2.4p4@1.1520-o Fri Feb 22 18:37:11 UTC 2008 (1)
30 Apr 22:45:52 ntpdate[8833]: step time server 192.168.14.18 offset -5.728539 sec

so in effect yes :)

Hm. A five-second skip forwards via date --set works fine. A five-second
skip *backwards*, and instant boom.

(I suspect from earlier failures that sometimes time-slewing can cause
this too, because normally I run ntpd all the time, and that should only
slew the time under normal circumstances, never skip it: yet UML
2.6.24+ visited hang city every few days.)

I've been busy birthdaying today, but tomorrow I'll instrument things
and figure out just where a backward skip hangs and why (assuming you
haven't already done so: it's my own damn fault for being so laggardly,
I should have done this weeks ago).

> Can you send me your config?

Here:

CONFIG_DEFCONFIG_LIST="arch/$ARCH/defconfig"
CONFIG_GENERIC_HARDIRQS=y
CONFIG_UML=y
CONFIG_MMU=y
CONFIG_NO_IOMEM=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_IRQ_RELEASE_METHOD=y
CONFIG_HZ=100
CONFIG_MPENTIUMIII=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_XADD=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_P6_NOP=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=6
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_UML_X86=y
CONFIG_X86_32=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_ARCH_HAS_SC_SIGNALS=y
CONFIG_ARCH_REUSE_HOST_VSYSCALL_AREA=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_SUPPORTS_AOUT=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_ZONE_DMA_FLAG=0
CONFIG_VIRT_TO_BUS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_LD_SCRIPT_DYN=y
CONFIG_BINFMT_ELF=y
CONFIG_MCONSOLE=y
CONFIG_KERNEL_STACK_ORDER=2
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=128
CONFIG_LOCALVERSION=""
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_CGROUPS=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_USER_SCHED=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_BLOCK=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_DEFAULT_DEADLINE=y
CONFIG_DEFAULT_IOSCHED="deadline"
CONFIG_CLASSIC_RCU=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_UBD=y
CONFIG_BLK_DEV_COW_COMMON=y
CONFIG_STDERR_CONSOLE=y
CONFIG_STDIO_CONSOLE=y
CONFIG_SSL=y
CONFIG_NULL_CHAN=y
CONFIG_PORT_CHAN=y
CONFIG_PTY_CHAN=y
CONFIG_TTY_CHAN=y
CONFIG_XTERM_CHAN=y
CONFIG_CON_ZERO_CHAN="fd:0,fd:1"
CONFIG_CON_CHAN="tty"
CONFIG_SSL_CHAN="pty"
CONFIG_UNIX98_PTYS=y
CONFIG_UML_RANDOM=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_SYN_COOKIES=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_NETFILTER=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_NF_CONNTRACK=y
CONFIG_NF_CT_ACCT=y
CONFIG_NF_CONNTRACK_MARK=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_NF_CONNTRACK_FTP=y
CONFIG_NF_CONNTRACK_IRC=y
CONFIG_NF_CONNTRACK_SIP=y
CONFIG_NETFILTER_XTABLES=y
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
CONFIG_NETFILTER_XT_TARGET_MARK=y
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
CONFIG_NETFILTER_XT_MATCH_HELPER=y
CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
CONFIG_NETFILTER_XT_MATCH_LIMIT=y
CONFIG_NETFILTER_XT_MATCH_MARK=y
CONFIG_NETFILTER_XT_MATCH_OWNER=y
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=y
CONFIG_NETFILTER_XT_MATCH_QUOTA=y
CONFIG_NETFILTER_XT_MATCH_STATE=y
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
CONFIG_NF_CONNTRACK_IPV4=y
CONFIG_NF_CONNTRACK_PROC_COMPAT=y
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_RECENT=y
CONFIG_IP_NF_MATCH_ADDRTYPE=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_TARGET_REJECT=y
CONFIG_IP_NF_TARGET_LOG=y
CONFIG_NF_NAT=y
CONFIG_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_REDIRECT=y
CONFIG_NF_NAT_FTP=y
CONFIG_NF_NAT_IRC=y
CONFIG_NF_NAT_SIP=y
CONFIG_IP_NF_MANGLE=y
CONFIG_IP_NF_TARGET_ECN=y
CONFIG_NET_SCHED=y
CONFIG_NET_SCH_CBQ=y
CONFIG_NET_SCH_HTB=y
CONFIG_NET_SCH_HFSC=y
CONFIG_NET_SCH_PRIO=y
CONFIG_NET_SCH_RED=y
CONFIG_NET_SCH_SFQ=y
CONFIG_NET_SCH_TEQL=y
CONFIG_NET_SCH_TBF=y
CONFIG_NET_SCH_GRED=y
CONFIG_NET_SCH_DSMARK=y
CONFIG_NET_SCH_INGRESS=y
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=y
CONFIG_NET_CLS_TCINDEX=y
CONFIG_NET_CLS_ROUTE4=y
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_FW=y
CONFIG_NET_CLS_U32=y
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=y
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=y
CONFIG_NET_EMATCH_NBYTE=y
CONFIG_NET_EMATCH_U32=y
CONFIG_NET_EMATCH_META=y
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=y
CONFIG_NET_ACT_GACT=y
CONFIG_NET_ACT_PEDIT=y
CONFIG_NET_SCH_FIFO=y
CONFIG_FIB_RULES=y
CONFIG_UML_NET=y
CONFIG_UML_NET_TUNTAP=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
CONFIG_TUN=y
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_FS_MBCACHE=y
CONFIG_FS_POSIX_ACL=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_QUOTA=y
CONFIG_PRINT_QUOTA_WARNING=y
CONFIG_QUOTACTL=y
CONFIG_GENERIC_ACL=y
CONFIG_PROC_FS=y
CONFIG_PROC_SYSCTL=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_CONFIGFS_FS=y
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_MSDOS_PARTITION=y
CONFIG_KEYS=y
CONFIG_SECURITY=y
CONFIG_SECURITY_CAPABILITIES=y
CONFIG_SECURITY_FILE_CAPABILITIES=y
CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=16384
CONFIG_CRYPTO=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_SHA1=y
CONFIG_BITREVERSE=y
CONFIG_CRC32=y
CONFIG_PLIST=y
CONFIG_HAS_DMA=y
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_FRAME_POINTER=y

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-01 23:34           ` Nix
@ 2008-05-02 16:30             ` Jeff Dike
  2008-05-02 18:55               ` Nix
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Dike @ 2008-05-02 16:30 UTC (permalink / raw)
  To: Nix; +Cc: user-mode-linux-devel

On Fri, May 02, 2008 at 12:34:54AM +0100, Nix wrote:
> loki:/tmp# /usr/bin/ntpdate -bv hades
> 30 Apr 22:45:52 ntpdate[8833]: ntpdate 4.2.4p4@1.1520-o Fri Feb 22 18:37:11 UTC 2008 (1)
> 30 Apr 22:45:52 ntpdate[8833]: step time server 192.168.14.18 offset -5.728539 sec
> 
> so in effect yes :)
> 
> Hm. A five-second skip forwards via date --set works fine. A five-second
> skip *backwards*, and instant boom.

I tried both directions for the hell of it.

With your config, I'm seeing a hang until the system time catches up
to what UML thought it should have been in the first place.  But it's
only a few seconds, not forever.

However, stracing it did reveal a bogus interval trying to be set,
which the patch below fixes.  It doesn't cause any behavior change
here, so YMMV.

This includes the previous patch, which I think is a good idea anyway,
so back that out and drop this in its place.

			Jeff

Index: linux-2.6.22/arch/um/os-Linux/time.c
===================================================================
--- linux-2.6.22.orig/arch/um/os-Linux/time.c	2008-04-24 13:21:28.000000000 -0400
+++ linux-2.6.22/arch/um/os-Linux/time.c	2008-05-02 12:21:01.000000000 -0400
@@ -107,6 +107,10 @@ static void deliver_alarm(void)
 	unsigned long long this_tick = os_nsecs();
 	int one_tick = UM_NSEC_PER_SEC / UM_HZ;
 
+	/* Protection against the host's time going backwards */
+	if ((last_tick != 0) && (this_tick < last_tick))
+		this_tick = last_tick;
+
 	if (last_tick == 0)
 		last_tick = this_tick - one_tick;
 
@@ -149,6 +153,9 @@ static int after_sleep_interval(struct t
 		start_usecs = usec;
 
 	start_usecs -= skew / UM_NSEC_PER_USEC;
+	if (start_usecs < 0)
+		start_usecs = 0;
+
 	tv = ((struct timeval) { .tv_sec  = start_usecs / UM_USEC_PER_SEC,
 				 .tv_usec = start_usecs % UM_USEC_PER_SEC });
 	interval = ((struct itimerval) { { 0, usec }, tv });

-- 
Work email - jdike at linux dot intel dot com

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-02 16:30             ` Jeff Dike
@ 2008-05-02 18:55               ` Nix
  2008-05-02 19:57                 ` Jeff Dike
  0 siblings, 1 reply; 27+ messages in thread
From: Nix @ 2008-05-02 18:55 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 2 May 2008, Jeff Dike verbalised:
> With your config, I'm seeing a hang until the system time catches up
> to what UML thought it should have been in the first place.  But it's
> only a few seconds, not forever.

This is true sometimes, but not always: I just tried twice and got
a rapid recovery the first time, but not the second. Trace the second
time:

0x08084b89 in update_wall_time () at kernel/time/timekeeping.c:475
475                     clock->error -= clock->xtime_interval << (TICK_LENGTH_SHIFT - clock->shift);
(gdb) bt
#0  0x08084b89 in update_wall_time () at kernel/time/timekeeping.c:475
#1  0x08077bd1 in do_timer (ticks=1) at kernel/timer.c:929
#2  0x08086314 in tick_periodic (cpu=0) at kernel/time/tick-common.c:66
#3  0x08086346 in tick_handle_periodic (dev=0x821d384) at kernel/time/tick-common.c:82
#4  0x080590ed in um_timer (irq=0, dev=0x0) at arch/um/kernel/time.c:70
#5  0x0808d3f3 in handle_IRQ_event (irq=0, action=0xdc49480) at kernel/irq/handle.c:140
#6  0x0808d471 in __do_IRQ (irq=0) at kernel/irq/handle.c:236
#7  0x080576f8 in do_IRQ (irq=0, regs=0x821bc80) at arch/um/kernel/irq.c:335
#8  0x080590a0 in timer_handler (sig=26, regs=0x821bc80) at arch/um/kernel/time.c:28
#9  0x08064c84 in real_alarm_handler (sc=0x821bd24) at arch/um/os-Linux/signal.c:93
#10 0x08064cac in alarm_handler (sig=26, sc=0x821bd24) at arch/um/os-Linux/signal.c:108
#11 0x08064d67 in handle_signal (sig=32, sc=0x821bd24) at arch/um/os-Linux/signal.c:157
#12 0x08066263 in hard_handler (sig=26) at arch/um/os-Linux/sys-i386/signal.c:12
#13 <signal handler called>
#14 getnstimeofday (ts=0xde3beec) at include/linux/time.h:182
#15 0x08084306 in do_gettimeofday (tv=0xde3bf04) at kernel/time/timekeeping.c:118
#16 0x0807396c in sys_gettimeofday (tv=0xbf8a4ad8, tz=0x0) at kernel/time.c:103
#17 0x0805a55c in handle_syscall (r=0xde66e28) at arch/um/kernel/skas/syscall.c:35
#18 0x08066cdf in handle_trap (pid=13310, regs=0xde66e28, local_using_sysemu=2) at arch/um/os-Linux/skas/process.c:202
#19 0x0806716b in userspace (regs=0xde66e28) at arch/um/os-Linux/skas/process.c:418
#20 0x08057f5a in fork_handler () at arch/um/kernel/process.c:179
#21 0x00000000 in ?? ()
(gdb) quit

> However, stracing it did reveal a bogus interval trying to be set,
> which the patch below fixes.  It doesn't cause any behavior change
> here, so YMMV.
>
> This includes the previous patch, which I think is a good idea anyway,
> so back that out and drop this in its place.

No behavioural change :(

I'm trying something else now, arranging for os_nsecs() itself to do the
never-backwards stuff on the assumption that something depends on
monotonic timers not skipping backwards which presently they might
(there are callers of os_nsecs() outside of os-Linux/time.c, notably in
kernel/time.c, and currently they see an unadjusted time, jumping
backwards and forwards and whatever).

-- 
`If you are having a "ua luea luea le ua le" kind of day, I can only
 assume that you are doing no work due [to] incapacitating nausea caused 
 by numerous lazy demons.' --- Frossie

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-02 18:55               ` Nix
@ 2008-05-02 19:57                 ` Jeff Dike
  2008-05-02 23:21                   ` Nix
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Dike @ 2008-05-02 19:57 UTC (permalink / raw)
  To: Nix; +Cc: user-mode-linux-devel

On Fri, May 02, 2008 at 07:55:11PM +0100, Nix wrote:
> No behavioural change :(

Oh well.

> I'm trying something else now, arranging for os_nsecs() itself to do the
> never-backwards stuff on the assumption that something depends on
> monotonic timers not skipping backwards which presently they might
> (there are callers of os_nsecs() outside of os-Linux/time.c, notably in
> kernel/time.c, and currently they see an unadjusted time, jumping
> backwards and forwards and whatever).

This sounds like a reasonable idea.

Another thing you might try is the following:
	get two extra windows
	in one, get a "strace -p uml-pid -o strace.out -tt" ready
	in two, get a "date ; sudo date -s "now-5 sec" ; date" ready
	hit return in one, then in two, then in your UML window
	when the UML gives you another prompt (or you've decided it's
hung), ^C the strace

Look at the times reported by the dates, find the time change in the
strace log, and send the section following that (and a bit before) to
me.

			Jeff

-- 
Work email - jdike at linux dot intel dot com

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-02 19:57                 ` Jeff Dike
@ 2008-05-02 23:21                   ` Nix
  2008-05-03  0:56                     ` Jeff Dike
  0 siblings, 1 reply; 27+ messages in thread
From: Nix @ 2008-05-02 23:21 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 2 May 2008, Jeff Dike stated:

> On Fri, May 02, 2008 at 07:55:11PM +0100, Nix wrote:
>> I'm trying something else now, arranging for os_nsecs() itself to do the
>> never-backwards stuff on the assumption that something depends on
>> monotonic timers not skipping backwards which presently they might
>> (there are callers of os_nsecs() outside of os-Linux/time.c, notably in
>> kernel/time.c, and currently they see an unadjusted time, jumping
>> backwards and forwards and whatever).
>
> This sounds like a reasonable idea.

With this patch (migrating most of the work into os_nsecs(), with a
non-NO_HZ version doing skew computations too, atop your first patch,
and making a couple of variables static for good measure), I still had
no luck:

Index: linux/arch/um/os-Linux/time.c
===================================================================
--- linux.orig/arch/um/os-Linux/time.c	2008-05-02 16:06:14.000000000 +0100
+++ linux/arch/um/os-Linux/time.c	2008-05-02 16:28:29.000000000 +0100
@@ -73,15 +73,29 @@
 	return remain;
 }
 
+static unsigned long long last_tick;
+
+#ifdef UML_CONFIG_NO_HZ
 long long os_nsecs(void)
 {
 	struct timeval tv;
+	long long this_tick;
 
 	gettimeofday(&tv, NULL);
-	return timeval_to_ns(&tv);
+	this_tick = timeval_to_ns(&tv);
+
+	/* Ensure that host time going backwards is treated as if it had stood
+	 * still.
+	 */
+
+	if ((last_tick != 0) && (this_tick < last_tick))
+		this_tick = last_tick;
+
+	last_tick = this_tick;
+
+	return this_tick;
 }
 
-#ifdef UML_CONFIG_NO_HZ
 static int after_sleep_interval(struct timespec *ts)
 {
 	return 0;
@@ -98,15 +112,22 @@
 }
 
 #else
-unsigned long long last_tick;
-unsigned long long skew;
+static unsigned long long skew;
+static int one_tick = UM_NSEC_PER_SEC / UM_HZ;
 
-static void deliver_alarm(void)
+long long os_nsecs(void)
 {
-	unsigned long long this_tick = os_nsecs();
-	int one_tick = UM_NSEC_PER_SEC / UM_HZ;
+	struct timeval tv;
+	long long this_tick;
+
+	gettimeofday(&tv, NULL);
+	this_tick = timeval_to_ns(&tv);
+
+	/* Keep a running computation of the number of ticks lost due to host
+	 * load, and ensure that host time going backwards is treated as if it
+	 * had stood still.
+	 */
 
-	/* Protection against the host's time going backwards */
 	if ((last_tick != 0) && (this_tick < last_tick))
 		this_tick = last_tick;
 
@@ -114,13 +135,20 @@
 		last_tick = this_tick - one_tick;
 
 	skew += this_tick - last_tick;
+	last_tick = this_tick;
+
+	return this_tick;
+}
 
+static void deliver_alarm(void)
+{
+	/* Recompute the skew values and wait until skew is near-zero again. */
+	os_nsecs();
 	while (skew >= one_tick) {
 		alarm_handler(SIGVTALRM, NULL);
 		skew -= one_tick;
 	}
 
-	last_tick = this_tick;
 }
 
 static unsigned long long sleep_time(unsigned long long nsecs)

> Another thing you might try is the following:
> 	get two extra windows
> 	in one, get a "strace -p uml-pid -o strace.out -tt" ready
> 	in two, get a "date ; sudo date -s "now-5 sec" ; date" ready
> 	hit return in one, then in two, then in your UML window
> 	when the UML gives you another prompt (or you've decided it's
> hung), ^C the strace

*slap* why didn't I think of that?

... unfortunately it's not terribly informative. Here, with an earlier
gettimeofday() included for context:

00:15:10.606808 gettimeofday({1209770110, 606883}, NULL) = 0
[...]
00:15:10.804151 waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], WSTOPPED|__WALL) = 9963
00:15:10.804271 ptrace(PTRACE_GETREGS, 9963, 0, 0xde42a28) = 0
00:15:10.804391 ptrace(PTRACE_SETREGS, 9963, 0, 0xde42a28) = 0
00:15:10.804499 ptrace(0x1f /* PTRACE_??? */, 9963, 0, 0) = 0
00:15:10.804671 --- SIGCHLD (Child exited) @ 0 (0) ---
00:15:10.804754 waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], WSTOPPED|__WALL) = 9963
00:15:10.804876 ptrace(PTRACE_GETREGS, 9963, 0, 0xde42a28) = 0
00:15:10.804994 ptrace(PTRACE_SETREGS, 9963, 0, 0xde42a28) = 0
00:15:10.805103 ptrace(0x1f /* PTRACE_??? */, 9963, 0, 0) = 0
00:15:10.805249 --- SIGCHLD (Child exited) @ 0 (0) ---
00:15:10.805332 waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], WSTOPPED|__WALL) = 9963
00:15:10.805453 ptrace(PTRACE_GETREGS, 9963, 0, 0xde42a28) = 0
00:15:10.805572 ptrace(PTRACE_SETREGS, 9963, 0, 0xde42a28) = 0
00:15:10.805680 ptrace(0x1f /* PTRACE_??? */, 9963, 0, 0) = 0
00:15:10.805925 --- SIGCHLD (Child exited) @ 0 (0) ---
00:15:10.806020 waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}], WSTOPPED|__WALL) = 9963
00:15:10.806141 ptrace(PTRACE_GETREGS, 9963, 0, 0xde42a28) = 0
00:15:10.806257 ptrace(0x34 /* PTRACE_??? */, 9963, 0, 0xde42a6c) = 0
00:15:10.806393 write(36, "6\0\0\0\0000\255*\0\20\0\0\5\0\0\0\21\0\0\0\3\0\0\0\0000\244\0", 28) = 28
00:15:10.806613 ptrace(PTRACE_SETREGS, 9963, 0, 0xde42a28) = 0
00:15:10.806732 ptrace(0x1f /* PTRACE_??? */, 9963, 0, 0) = 0
00:15:10.806867 --- SIGCHLD (Child exited) @ 0 (0) ---
00:15:10.806949 waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}], WSTOPPED|__WALL) = 9963
00:15:05.811501 ptrace(PTRACE_GETREGS, 9963, 0, 0xde42a28) = 0
00:15:05.811674 ptrace(0x34 /* PTRACE_??? */, 9963, 0, 0xde42a6c) = 0
00:15:05.812544 write(36, "6\0\0\0\0\0\255*\0\20\0\0\5\0\0\0\21\0\0\0\3\0\0\0\0\0\244\0", 28) = 28
00:15:05.812853 ptrace(PTRACE_SETREGS, 9963, 0, 0xde42a28) = 0
00:15:05.812983 ptrace(0x1f /* PTRACE_??? */, 9963, 0, 0) = 0
00:15:05.813467 --- SIGCHLD (Child exited) @ 0 (0) ---
00:15:05.813640 waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}], WSTOPPED|__WALL) = 9963
00:15:05.813797 ptrace(PTRACE_GETREGS, 9963, 0, 0xde42a28) = 0
00:15:05.813918 ptrace(0x34 /* PTRACE_??? */, 9963, 0, 0xde42a6c) = 0
00:15:05.814055 write(36, "6\0\0\0\0\220\257*\0\20\0\0\5\0\0\0\21\0\0\0\3\0\0\0\0\220\243\0", 28) = 28
00:15:05.814276 ptrace(PTRACE_SETREGS, 9963, 0, 0xde42a28) = 0
00:15:05.814400 ptrace(0x1f /* PTRACE_??? */, 9963, 0, 0) = 0
00:15:05.814559 --- SIGCHLD (Child exited) @ 0 (0) ---
00:15:05.814645 waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}], WSTOPPED|__WALL) = 9963
00:15:05.814764 ptrace(PTRACE_GETREGS, 9963, 0, 0xde42a28) = 0
00:15:05.814879 ptrace(0x34 /* PTRACE_??? */, 9963, 0, 0xde42a6c) = 0
00:15:05.815003 write(36, "6\0\0\0\0\20\255*\0\20\0\0\5\0\0\0\21\0\0\0\3\0\0\0\0 \244\0", 28) = 28
00:15:05.815198 ptrace(PTRACE_SETREGS, 9963, 0, 0xde42a28) = 0
00:15:05.815316 ptrace(0x1f /* PTRACE_??? */, 9963, 0, 0) = 0
00:15:05.815453 --- SIGCHLD (Child exited) @ 0 (0) ---
00:15:05.815535 waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}], WSTOPPED|__WALL) = 9963
00:15:05.815654 ptrace(PTRACE_GETREGS, 9963, 0, 0xde42a28) = 0
00:15:05.815768 ptrace(0x34 /* PTRACE_??? */, 9963, 0, 0xde42a6c) = 0
00:15:05.815890 write(36, "6\0\0\0\0 \255*\0\20\0\0\5\0\0\0\21\0\0\0\3\0\0\0\0@\244\0", 28) = 28
00:15:05.816081 ptrace(PTRACE_SETREGS, 9963, 0, 0xde42a28) = 0
00:15:05.816234 ptrace(0x1f /* PTRACE_??? */, 9963, 0, 0) = 0
00:15:05.816381 --- SIGCHLD (Child exited) @ 0 (0) ---
00:15:05.816463 waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}], WSTOPPED|__WALL) = 9963
00:15:05.816582 ptrace(PTRACE_GETREGS, 9963, 0, 0xde42a28) = 0
00:15:05.816695 ptrace(0x34 /* PTRACE_??? */, 9963, 0, 0xde42a6c) = 0
00:15:05.816819 write(36, "6\0\0\0\0\300\260*\0\20\0\0\5\0\0\0\21\0\0\0\3\0\0\0\0p\241\0", 28) = 28
00:15:05.817013 ptrace(PTRACE_SETREGS, 9963, 0, 0xde42a28) = 0
00:15:05.817132 ptrace(0x1f /* PTRACE_??? */, 9963, 0, 0) = 0
00:15:05.817274 --- SIGCHLD (Child exited) @ 0 (0) ---
00:15:05.817357 waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}], WSTOPPED|__WALL) = 9963
00:15:05.817574 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
00:15:05.817842 gettimeofday({1209770105, 817925}, NULL) = 0

So there's basically nothing unusual here. The first gettimeofday() call
after the jump, and *whoompf* we're off into the magic land of looping
with no extra syscalls to speak of at all.

Next attempt, tomorrow: run it under gdb with an LD_PRELOADed wrapper
around gettimeofday() that raises some unusual signal (SIGILL maybe)
when it spots time going backwards, then run it under gdb and trap that,
and trace forwards from there...

-- 
`If you are having a "ua luea luea le ua le" kind of day, I can only
 assume that you are doing no work due [to] incapacitating nausea caused 
 by numerous lazy demons.' --- Frossie

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-02 23:21                   ` Nix
@ 2008-05-03  0:56                     ` Jeff Dike
  2008-05-03 10:08                       ` Nix
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Dike @ 2008-05-03  0:56 UTC (permalink / raw)
  To: Nix; +Cc: user-mode-linux-devel

On Sat, May 03, 2008 at 12:21:15AM +0100, Nix wrote:
> With this patch (migrating most of the work into os_nsecs(), with a
> non-NO_HZ version doing skew computations too, atop your first patch,
> and making a couple of variables static for good measure), I still had
> no luck:

From a quick look, this seems right.  And this would be the gold
standard of preventing UML from seeing time going backwards.

> *slap* why didn't I think of that?
  ....
> ... unfortunately it's not terribly informative. Here, with an earlier
> gettimeofday() included for context:
> 
> So there's basically nothing unusual here. The first gettimeofday() call
> after the jump, and *whoompf* we're off into the magic land of looping
> with no extra syscalls to speak of at all.

This was with a solid hang?  The thing is still handling page faults
from something at the end of that trace.  I would try again, making
sure it's a solid hang, and including enough of the trace so that you
stop seeing page faults:
     waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}])
and system calls (WSTOPSIG(s) == 133) and see what's happening at that
point.

And if you never stop seeing page faults or system calls, then that's
interesting too.

				Jeff

-- 
Work email - jdike at linux dot intel dot com

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-03  0:56                     ` Jeff Dike
@ 2008-05-03 10:08                       ` Nix
  2008-05-09 15:16                         ` Jeff Dike
  0 siblings, 1 reply; 27+ messages in thread
From: Nix @ 2008-05-03 10:08 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 3 May 2008, Jeff Dike told this:
> On Sat, May 03, 2008 at 12:21:15AM +0100, Nix wrote:
>> With this patch (migrating most of the work into os_nsecs(), with a
>> non-NO_HZ version doing skew computations too, atop your first patch,
>> and making a couple of variables static for good measure), I still had
>> no luck:
>
>>From a quick look, this seems right.  And this would be the gold
> standard of preventing UML from seeing time going backwards.

Oh. And it still doesn't work. Damn.

>> So there's basically nothing unusual here. The first gettimeofday() call
>> after the jump, and *whoompf* we're off into the magic land of looping
>> with no extra syscalls to speak of at all.
>
> This was with a solid hang?  The thing is still handling page faults
> from something at the end of that trace.  I would try again, making
> sure it's a solid hang, and including enough of the trace so that you
> stop seeing page faults:
>      waitpid(9963, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSEGV}])
> and system calls (WSTOPSIG(s) == 133) and see what's happening at that
> point.

I gave it thirty seconds after the time change. After that
gettimeofday() there is *nothing* (which is unsurprising; the kernel
thread is stuck in a tight loop trying to decrement 2^64 down to 0 by
repeated subtraction... there's not much time for syscalls there.)

> And if you never stop seeing page faults or system calls, then that's
> interesting too.

Howzabout if we never see any after that gettimeofday()? :/

-- 
`If you are having a "ua luea luea le ua le" kind of day, I can only
 assume that you are doing no work due [to] incapacitating nausea caused 
 by numerous lazy demons.' --- Frossie

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-03 10:08                       ` Nix
@ 2008-05-09 15:16                         ` Jeff Dike
  2008-05-09 17:13                           ` vincent-perrier
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff Dike @ 2008-05-09 15:16 UTC (permalink / raw)
  To: Nix; +Cc: user-mode-linux-devel

On Sat, May 03, 2008 at 11:08:24AM +0100, Nix wrote:
> >>From a quick look, this seems right.  And this would be the gold
> > standard of preventing UML from seeing time going backwards.
> 
> Oh. And it still doesn't work. Damn.

Is it possible for me to get my hands on a UML which is doing this?

      	       	   Jeff

-- 
Work email - jdike at linux dot intel dot com

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-09 15:16                         ` Jeff Dike
@ 2008-05-09 17:13                           ` vincent-perrier
  2008-05-09 20:32                             ` Nix
  2008-05-14 17:23                             ` Jeff Dike
  0 siblings, 2 replies; 27+ messages in thread
From: vincent-perrier @ 2008-05-09 17:13 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Nix, user-mode-linux-devel

For the "date --set" back 5 sec on the host that 
freezes an uml indefinitely 
(in kernel/time/timekeeping.c, update_wall_time,)
then download 

plug_and_play_clownix_network_03

at http://clownix.net

On my host, it does get stuck, but it may depend also on
host and host kernel, I have: 

uml_clownix_net$ uname -a
Linux localhost 2.6.22.15.tex1 #1 SMP Sat Dec 15 13:15:05 CST 2007 i686
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz GNU/Linux


The download is long, the untar too, but the freeze is garanteed!

to start after download: 

"./start_clownix_net virtual_platform_configs/single_machine"

Hope you try it!




On Fri, 2008-05-09 at 11:16 -0400, Jeff Dike wrote:
> On Sat, May 03, 2008 at 11:08:24AM +0100, Nix wrote:
> > >>From a quick look, this seems right.  And this would be the gold
> > > standard of preventing UML from seeing time going backwards.
> > 
> > Oh. And it still doesn't work. Damn.
> 
> Is it possible for me to get my hands on a UML which is doing this?
> 
>       	       	   Jeff
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-09 17:13                           ` vincent-perrier
@ 2008-05-09 20:32                             ` Nix
  2008-05-14 17:23                             ` Jeff Dike
  1 sibling, 0 replies; 27+ messages in thread
From: Nix @ 2008-05-09 20:32 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 9 May 2008, vincent-perrier@club-internet.fr verbalised:
> The download is long, the untar too, but the freeze is garanteed!

If this doesn't work I can give you an account on, hm, the box which
freezes uses a UML for its network link so if you flip the time on it
you'd get cut off... and on the other machine it doesn't freeze at all,
even if I slam the time backwards and forwards in 5s increments
constantly.

I wonder if I have to be pushing network packets around or something
before it goes wrong? (I can certainly send you the binary that's going
wrong for me, and induce a core dump when it's gone wrong and send you
that: it'll be 90Mb+ though. I'm not sure how useful that is.)

-- 
`If you are having a "ua luea luea le ua le" kind of day, I can only
 assume that you are doing no work due [to] incapacitating nausea caused 
 by numerous lazy demons.' --- Frossie

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-09 17:13                           ` vincent-perrier
  2008-05-09 20:32                             ` Nix
@ 2008-05-14 17:23                             ` Jeff Dike
  2008-05-14 18:03                               ` vincent-perrier
                                                 ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Jeff Dike @ 2008-05-14 17:23 UTC (permalink / raw)
  To: vincent-perrier; +Cc: Nix, user-mode-linux-devel

On Fri, May 09, 2008 at 07:13:22PM +0200, vincent-perrier wrote:
> The download is long, the untar too, but the freeze is garanteed!
> 
> to start after download: 
> 
> "./start_clownix_net virtual_platform_configs/single_machine"
> 
> Hope you try it!

Why is it mucking with root's ssh keys:

  RSA_PUB=/root/.ssh/id_rsa.pub
  RSA_PRIV=/root/.ssh/id_rsa

			Jeff

-- 
Work email - jdike at linux dot intel dot com

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-14 17:23                             ` Jeff Dike
@ 2008-05-14 18:03                               ` vincent-perrier
  2008-05-14 18:18                               ` vincent-perrier
  2008-05-14 18:29                               ` vincent-perrier
  2 siblings, 0 replies; 27+ messages in thread
From: vincent-perrier @ 2008-05-14 18:03 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Nix, user-mode-linux-devel

Hello,
I know that I should not work as root, but I am too used to it,
so my example is done with the root user.

I am not sure that the network simulator works with another user
for the version you have, but I have checked that for next one, 
a normal user will do as long as the daemon is owned by root and
is suid root.

The daemon has to create taps, open a socket to ping, and configure
an ip address for tap, so it will always have to be suid root even
for a normal user.

The RSA files are copied into the machines at startup so as to be 
able to scp or ssh the machine without any password.

I can garantee that the software does nothing nasty with those files.

On Wed, 2008-05-14 at 13:23 -0400, Jeff Dike wrote:
> On Fri, May 09, 2008 at 07:13:22PM +0200, vincent-perrier wrote:
> > The download is long, the untar too, but the freeze is garanteed!
> > 
> > to start after download: 
> > 
> > "./start_clownix_net virtual_platform_configs/single_machine"
> > 
> > Hope you try it!
> 
> Why is it mucking with root's ssh keys:
> 
>   RSA_PUB=/root/.ssh/id_rsa.pub
>   RSA_PRIV=/root/.ssh/id_rsa
> 
> 			Jeff
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-14 17:23                             ` Jeff Dike
  2008-05-14 18:03                               ` vincent-perrier
@ 2008-05-14 18:18                               ` vincent-perrier
  2008-05-14 18:29                               ` vincent-perrier
  2 siblings, 0 replies; 27+ messages in thread
From: vincent-perrier @ 2008-05-14 18:18 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Nix, user-mode-linux-devel

I forgot to tell you, if you launch the daemon as a normal user,
the RSA files of this normal user will be used, and not the root
ones.
And also, the daemon has to mount a file when it creates the ubdb 
config file, one more reason to have the root sticky bit.

This tool is made for network study and tests, a standalone PC
is good enough to use it and observe routing daemons at work.
 

On Wed, 2008-05-14 at 13:23 -0400, Jeff Dike wrote:
> On Fri, May 09, 2008 at 07:13:22PM +0200, vincent-perrier wrote:
> > The download is long, the untar too, but the freeze is garanteed!
> > 
> > to start after download: 
> > 
> > "./start_clownix_net virtual_platform_configs/single_machine"
> > 
> > Hope you try it!
> 
> Why is it mucking with root's ssh keys:
> 
>   RSA_PUB=/root/.ssh/id_rsa.pub
>   RSA_PRIV=/root/.ssh/id_rsa
> 
> 			Jeff
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-14 17:23                             ` Jeff Dike
  2008-05-14 18:03                               ` vincent-perrier
  2008-05-14 18:18                               ` vincent-perrier
@ 2008-05-14 18:29                               ` vincent-perrier
  2008-05-14 19:39                                 ` Jeff Dike
  2008-05-19 18:54                                 ` [uml-devel] plug_and_play_clownix_network vincent-perrier
  2 siblings, 2 replies; 27+ messages in thread
From: vincent-perrier @ 2008-05-14 18:29 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Nix, user-mode-linux-devel

Sorry if you receive this message twice, but I sent 2 messages
following each other by a few minutes, and received only the 
second one, this was the first one:


Hello,
I know that I should not work as root, but I am too used to it,
so my example is done with the root user.

I am not sure that the network simulator works with another user
for the version you have, but I have checked that for next one, 
a normal user will do as long as the daemon is owned by root and
is suid root.

The daemon has to create taps, open a socket to ping, and configure
an ip address for tap, so it will always have to be suid root even
for a normal user.

The RSA files are copied into the machines at startup so as to be 
able to scp or ssh the machine without any password.
Those files are given to the machines through an ubdb config file
at uml start.

I can garantee that the software does nothing nasty with those files.



On Wed, 2008-05-14 at 13:23 -0400, Jeff Dike wrote:
> On Fri, May 09, 2008 at 07:13:22PM +0200, vincent-perrier wrote:
> > The download is long, the untar too, but the freeze is garanteed!
> > 
> > to start after download: 
> > 
> > "./start_clownix_net virtual_platform_configs/single_machine"
> > 
> > Hope you try it!
> 
> Why is it mucking with root's ssh keys:
> 
>   RSA_PUB=/root/.ssh/id_rsa.pub
>   RSA_PRIV=/root/.ssh/id_rsa
> 
> 			Jeff
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-14 18:29                               ` vincent-perrier
@ 2008-05-14 19:39                                 ` Jeff Dike
  2008-05-14 20:12                                   ` Nix
  2008-05-19 18:54                                 ` [uml-devel] plug_and_play_clownix_network vincent-perrier
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Dike @ 2008-05-14 19:39 UTC (permalink / raw)
  To: vincent-perrier; +Cc: Nix, user-mode-linux-devel

I finally reproduced this using 2.6.25.1 (2.6.25-mm1 was no good) with
your config.

The patch below fixes it for me.  You'll notice a certain similarity
between this and a previous patch that you posted.  It's not clear to
me why yours didn't work.

       	     	    Jeff

-- 
Work email - jdike at linux dot intel dot com

Index: 2.6/stable/arch/um/os-Linux/time.c
===================================================================
--- 2.6.orig/stable/arch/um/os-Linux/time.c	2008-05-14 14:55:56.000000000 -0400
+++ 2.6/stable/arch/um/os-Linux/time.c	2008-05-14 15:30:48.000000000 -0400
@@ -66,12 +66,21 @@ long long disable_timer(void)
 	return timeval_to_ns(&time.it_value);
 }
 
+static long long last_time;
+
 long long os_nsecs(void)
 {
 	struct timeval tv;
+	long long ret;
 
 	gettimeofday(&tv, NULL);
-	return timeval_to_ns(&tv);
+	ret = timeval_to_ns(&tv);
+
+	if((last_time != 0) && (last_time > ret))
+		ret = last_time;
+
+	last_time = ret;
+	return ret;
 }
 
 #ifdef UML_CONFIG_NO_HZ

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-14 19:39                                 ` Jeff Dike
@ 2008-05-14 20:12                                   ` Nix
  2008-05-14 20:31                                     ` Nix
  2008-05-14 20:43                                     ` Jeff Dike
  0 siblings, 2 replies; 27+ messages in thread
From: Nix @ 2008-05-14 20:12 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 14 May 2008, Jeff Dike verbalised:

> I finally reproduced this using 2.6.25.1 (2.6.25-mm1 was no good) with
> your config.

YAY! (I wonder why this was so .config-dependent? You'd think it would
trigger on anything, but I couldn't even make it happen on all my
hosts...)

> The patch below fixes it for me.  You'll notice a certain similarity
> between this and a previous patch that you posted.  It's not clear to
> me why yours didn't work.

As far as I can tell it's identical in behaviour in the !NO_HZ
case.

Annoyingly, now I've upgraded the host to 2.6.25 (hence sans skas3),
timings or something have changed such that I now can't make the freeze
happen at all. I'll reboot the host into 2.6.24.x shortly to get skas
back and try again.


Oh, and, it's pedantic of me, I know, but what does this do if time goes
backwards in the NO_HZ case? (Or is handling that a 2.6.26 thing?)

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-14 20:12                                   ` Nix
@ 2008-05-14 20:31                                     ` Nix
  2008-05-14 20:43                                     ` Jeff Dike
  1 sibling, 0 replies; 27+ messages in thread
From: Nix @ 2008-05-14 20:31 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

[vincent-perrier <vincent-perrier@club-internet.fr> removed from Cc;,
 his MTA says `Client host rejected: AP0002 Please use your ISP mailserver'
 only I don't *have* an ISP mailserver.]

On 14 May 2008, nix@esperi.org.uk verbalised:
> Annoyingly, now I've upgraded the host to 2.6.25 (hence sans skas3),
> timings or something have changed such that I now can't make the freeze
> happen at all. I'll reboot the host into 2.6.24.x shortly to get skas
> back and try again.

It seems to fix it with a 2.6.24 host too, but I still can't see why it
works when my previous fix didn't. (Actually, while I'm trying it with
lots of differently mangled variations on a host .config theme, is there
a skas3 or skas4 against 2.6.25? I may as well add that into the mix and
see if things still work.)

I'll keep ntpd running here for a day or so as well, and see if it stalls
out. (The host loses nearly half a minute a day without ntpd, it's
awful.)

-- 
`If you are having a "ua luea luea le ua le" kind of day, I can only
 assume that you are doing no work due [to] incapacitating nausea caused 
 by numerous lazy demons.' --- Frossie

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-14 20:12                                   ` Nix
  2008-05-14 20:31                                     ` Nix
@ 2008-05-14 20:43                                     ` Jeff Dike
  2008-05-14 22:02                                       ` Nix
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Dike @ 2008-05-14 20:43 UTC (permalink / raw)
  To: Nix; +Cc: user-mode-linux-devel

On Wed, May 14, 2008 at 09:12:12PM +0100, Nix wrote:
> Oh, and, it's pedantic of me, I know, but what does this do if time goes
> backwards in the NO_HZ case? (Or is handling that a 2.6.26 thing?)

In all cases, it holds time steady until the host catches up with what
the guest thinks the time should be.

I noticed you wrapped last_tick business in one of the ifdefs - I
didn't look carefully to see whether that would explain the behavior
difference.  I just made it unconditional.

	       	    	 Jeff

-- 
Work email - jdike at linux dot intel dot com

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [uml-devel] actually useful backtrace from a CPU-chewing hang
  2008-05-14 20:43                                     ` Jeff Dike
@ 2008-05-14 22:02                                       ` Nix
  0 siblings, 0 replies; 27+ messages in thread
From: Nix @ 2008-05-14 22:02 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

On 14 May 2008, Jeff Dike verbalised:

> On Wed, May 14, 2008 at 09:12:12PM +0100, Nix wrote:
>> Oh, and, it's pedantic of me, I know, but what does this do if time goes
>> backwards in the NO_HZ case? (Or is handling that a 2.6.26 thing?)
>
> In all cases, it holds time steady until the host catches up with what
> the guest thinks the time should be.

Yeah, that's what I was doing too: but mine didn't work. Odd.

> I noticed you wrapped last_tick business in one of the ifdefs - I
> didn't look carefully to see whether that would explain the behavior
> difference.  I just made it unconditional.

The only reason I wrapped it in the ifdefs was because the !NO_HZ case
had to keep the skew calculations up to date, while the NO_HZ case
didn't have to bother. It shouldn't have affected anything else.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [uml-devel] plug_and_play_clownix_network
  2008-05-14 18:29                               ` vincent-perrier
  2008-05-14 19:39                                 ` Jeff Dike
@ 2008-05-19 18:54                                 ` vincent-perrier
  1 sibling, 0 replies; 27+ messages in thread
From: vincent-perrier @ 2008-05-19 18:54 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Nix, user-mode-linux-devel

You are right, the private key is useless for my purposes, the clownix
daemon just has to copy the public key to the ssh authorized keys file
inside the UML guest machines, then the access to the UML guests can
be done without password.
In my next version of the clownix network, I will not "muck" with the
private rsa key. 
> > 
> > Why is it mucking with root's ssh keys:
> > 
> >   RSA_PUB=/root/.ssh/id_rsa.pub
> >   RSA_PRIV=/root/.ssh/id_rsa
> > 
> > 			Jeff
> > 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-05-19 18:54 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-25 19:59 [uml-devel] actually useful backtrace from a CPU-chewing hang Nix
2008-04-25 22:06 ` Nix
2008-04-26 18:31   ` Nix
2008-04-27 14:02     ` Nix
2008-04-28 16:44     ` Jeff Dike
2008-04-30 21:49       ` Nix
2008-05-01 15:13         ` Jeff Dike
2008-05-01 23:34           ` Nix
2008-05-02 16:30             ` Jeff Dike
2008-05-02 18:55               ` Nix
2008-05-02 19:57                 ` Jeff Dike
2008-05-02 23:21                   ` Nix
2008-05-03  0:56                     ` Jeff Dike
2008-05-03 10:08                       ` Nix
2008-05-09 15:16                         ` Jeff Dike
2008-05-09 17:13                           ` vincent-perrier
2008-05-09 20:32                             ` Nix
2008-05-14 17:23                             ` Jeff Dike
2008-05-14 18:03                               ` vincent-perrier
2008-05-14 18:18                               ` vincent-perrier
2008-05-14 18:29                               ` vincent-perrier
2008-05-14 19:39                                 ` Jeff Dike
2008-05-14 20:12                                   ` Nix
2008-05-14 20:31                                     ` Nix
2008-05-14 20:43                                     ` Jeff Dike
2008-05-14 22:02                                       ` Nix
2008-05-19 18:54                                 ` [uml-devel] plug_and_play_clownix_network vincent-perrier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.