diff for duplicates of <20160930134835.GT14933@linux.vnet.ibm.com> diff --git a/a/1.txt b/N1/1.txt index 4153bdd..b2d6877 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -12,8 +12,8 @@ Some RCU commentary, on the off-chance that it helps... > into a NOHZ idle sleep. > > > <idle>-0 [001] d... 109.953436: tick_stop: success=1 dependency=NONE -> > <idle>-0 [001] d... 109.953617: hrtimer_cancel: hrtimer\x109f449c -> > <idle>-0 [001] d... 109.953818: hrtimer_start: hrtimer\x109f449c function=tick_sched_timer expires\x109880000000 softexpires\x109880000000 +> > <idle>-0 [001] d... 109.953617: hrtimer_cancel: hrtimer=109f449c +> > <idle>-0 [001] d... 109.953818: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109880000000 softexpires=109880000000 > > which is (using the 0.087621us delta between the trace clock and clock > MONO) at: 109.880 + 0.087621 = 109.968 @@ -34,40 +34,40 @@ the rcuo callback-offload kthreads are constrained. And yes, in the most constrained case where all CPUs except for CPU 0 are nohz CPUs, they will by default all run on CPU 0. -> > rcu_sched-7 [001] d... 109.952633: timer_start: timer\x160a9eb0 function=process_timeout expiresB94948284 [timeout=1] flags=0x00000001 +> > rcu_sched-7 [001] d... 109.952633: timer_start: timer=160a9eb0 function=process_timeout expires=4294948284 [timeout=1] flags=0x00000001 > > The CPU nr encoded in flags is: 1 > > Now we cancel and restart the timer w/o seing the interrupt expiring > it. And that expiry should have happened at 109.968000 !?! > -> > <idle>-0 [001] d... 109.968225: hrtimer_cancel: hrtimer\x109f449c -> > <idle>-0 [001] d... 109.968526: hrtimer_start: hrtimer\x109f449c function=tick_sched_timer expires\x109890000000 softexpires\x109890000000 +> > <idle>-0 [001] d... 109.968225: hrtimer_cancel: hrtimer=109f449c +> > <idle>-0 [001] d... 109.968526: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109890000000 softexpires=109890000000 > > So this advances the next tick even further out. And CPU 0 sets the timer to > the exact smae value: > -> > <idle>-0 [000] d.h. 109.969104: hrtimer_start: hrtimer\x109e949c function=tick_sched_timer expires\x109890000000 softexpires\x109890000000 +> > <idle>-0 [000] d.h. 109.969104: hrtimer_start: hrtimer=109e949c function=tick_sched_timer expires=109890000000 softexpires=109890000000 > > -> > <idle>-0 [000] d.h. 109.977690: irq_handler_entry: irq\x16 name=jcore_pit -> > <idle>-0 [000] d.h. 109.977911: hrtimer_cancel: hrtimer\x109e949c -> > <idle>-0 [000] d.h. 109.978053: hrtimer_expire_entry: hrtimer\x109e949c function=tick_sched_timer now\x109890434160 +> > <idle>-0 [000] d.h. 109.977690: irq_handler_entry: irq=16 name=jcore_pit +> > <idle>-0 [000] d.h. 109.977911: hrtimer_cancel: hrtimer=109e949c +> > <idle>-0 [000] d.h. 109.978053: hrtimer_expire_entry: hrtimer=109e949c function=tick_sched_timer now=109890434160 > > Which expires here. And CPU1 instead of getting an interrupt and expiring > the timer does the cancel/restart to the next jiffie again: > -> > <idle>-0 [001] d... 109.978206: hrtimer_cancel: hrtimer\x109f449c -> > <idle>-0 [001] d... 109.978495: hrtimer_start: hrtimer\x109f449c function=tick_sched_timer expires\x109900000000 softexpires\x109900000000 +> > <idle>-0 [001] d... 109.978206: hrtimer_cancel: hrtimer=109f449c +> > <idle>-0 [001] d... 109.978495: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109900000000 softexpires=109900000000 > > And this repeats; > -> > <idle>-0 [000] d.h. 109.987726: irq_handler_entry: irq\x16 name=jcore_pit -> > <idle>-0 [000] d.h. 109.987954: hrtimer_cancel: hrtimer\x109e949c -> > <idle>-0 [000] d.h. 109.988095: hrtimer_expire_entry: hrtimer\x109e949c function=tick_sched_timer now\x109900474620 +> > <idle>-0 [000] d.h. 109.987726: irq_handler_entry: irq=16 name=jcore_pit +> > <idle>-0 [000] d.h. 109.987954: hrtimer_cancel: hrtimer=109e949c +> > <idle>-0 [000] d.h. 109.988095: hrtimer_expire_entry: hrtimer=109e949c function=tick_sched_timer now=109900474620 > -> > <idle>-0 [001] d... 109.988243: hrtimer_cancel: hrtimer\x109f449c -> > <idle>-0 [001] d... 109.988537: hrtimer_start: hrtimer\x109f449c fun9c +> > <idle>-0 [001] d... 109.988243: hrtimer_cancel: hrtimer=109f449c +> > <idle>-0 [001] d... 109.988537: hrtimer_start: hrtimer=109f449c fun9c > > There is something badly wrong here. > @@ -75,7 +75,7 @@ by default all run on CPU 0. > > <idle>-0 [000] ..s. 110.019617: softirq_exit: vec=1 [action=TIMER] > > <idle>-0 [000] ..s. 110.019730: softirq_entry: vec=7 [action=SCHED] > > <idle>-0 [000] ..s. 110.020174: softirq_exit: vec=7 [action=SCHED] -> > <idle>-0 [000] d.h. 110.027674: irq_handler_entry: irq\x16 name=jcore_pit +> > <idle>-0 [000] d.h. 110.027674: irq_handler_entry: irq=16 name=jcore_pit > > > > The rcu_sched process does not run again after the tick_stop until > > 132s, and only a few RCU softirqs happen (all shown above). During @@ -98,22 +98,22 @@ by default all run on CPU 0. > Please put a tracepoint into your set_next_event() callback as well. SO > this changes here: > -> > <idle>-0 [001] d... 132.198170: hrtimer_cancel: hrtimer\x109f449c -> > <idle>-0 [001] d... 132.198451: hrtimer_start: hrtimer\x109f449c function=tick_sched_timer expires\x132120000000 softexpires\x132120000000 +> > <idle>-0 [001] d... 132.198170: hrtimer_cancel: hrtimer=109f449c +> > <idle>-0 [001] d... 132.198451: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=132120000000 softexpires=132120000000 > -> > <idle>-0 [001] dnh. 132.205860: irq_handler_entry: irq name=ipi -> > <idle>-0 [001] dnh. 132.206041: irq_handler_exit: irq ret=handle +> > <idle>-0 [001] dnh. 132.205860: irq_handler_entry: irq=20 name=ipi +> > <idle>-0 [001] dnh. 132.206041: irq_handler_exit: irq=20 ret=handle > > So CPU1 gets an IPI > -> > <idle>-0 [001] dn.. 132.206650: hrtimer_cancel: hrtimer\x109f449c -> 49c function=tick_sched_timer now\x132119115200 -> > <idle>-0 [001] dn.. 132.206936: hrtimer_start: hrtimer\x109f449c function=tick_sched_timer expires\x132120000000 softexpires\x132120000000 +> > <idle>-0 [001] dn.. 132.206650: hrtimer_cancel: hrtimer=109f449c +> 49c function=tick_sched_timer now=132119115200 +> > <idle>-0 [001] dn.. 132.206936: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=132120000000 softexpires=132120000000 > > And rcu-sched-7 gets running magically, but we don't know what woke it > up. Definitely not the timer, because that did not fire. > -> > rcu_sched-7 [001] d... 132.207710: timer_cancel: timer\x160a9eb0 +> > rcu_sched-7 [001] d... 132.207710: timer_cancel: timer=160a9eb0 It could have been an explicit wakeup at the end of a grace period. That would explain its cancelling the timer, anyway. @@ -139,8 +139,13 @@ would explain its cancelling the timer, anyway. > obviously either getting out of idle and then moves the tick ahead for some > unknown reason. -And a one-jiffy timeout is in fact expected behavior when HZ\x100. -You have to be running HZ%0 or better to have two-jiffy timeouts, -and HZP0 or better for three-jiffy timeouts. +And a one-jiffy timeout is in fact expected behavior when HZ=100. +You have to be running HZ=250 or better to have two-jiffy timeouts, +and HZ=500 or better for three-jiffy timeouts. Thanx, Paul + +-- +To unsubscribe from this list: send the line "unsubscribe devicetree" in +the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org +More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/a/content_digest b/N1/content_digest index 3bae75f..56545b5 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -7,9 +7,9 @@ "ref\020160927004258.GF19318@brightrain.aerifal.cx\0" "ref\020160927220820.GH19318@brightrain.aerifal.cx\0" "ref\0alpine.DEB.2.20.1609301319410.4799@nanos\0" - "From\0Paul E. McKenney <paulmck@linux.vnet.ibm.com>\0" + "From\0Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>\0" "Subject\0Re: [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver\0" - "Date\0Fri, 30 Sep 2016 13:48:35 +0000\0" + "Date\0Fri, 30 Sep 2016 06:48:35 -0700\0" "To\0Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>\0" "Cc\0Rich Felker <dalias-8zAoT0mYgF4@public.gmane.org>" Daniel Lezcano <daniel.lezcano-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> @@ -34,8 +34,8 @@ "> into a NOHZ idle sleep.\n" "> \n" "> > <idle>-0 [001] d... 109.953436: tick_stop: success=1 dependency=NONE\n" - "> > <idle>-0 [001] d... 109.953617: hrtimer_cancel: hrtimer\0209f449c\n" - "> > <idle>-0 [001] d... 109.953818: hrtimer_start: hrtimer\0209f449c function=tick_sched_timer expires\0209880000000 softexpires\0209880000000\n" + "> > <idle>-0 [001] d... 109.953617: hrtimer_cancel: hrtimer=109f449c\n" + "> > <idle>-0 [001] d... 109.953818: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109880000000 softexpires=109880000000\n" "> \n" "> which is (using the 0.087621us delta between the trace clock and clock\n" "> MONO) at: 109.880 + 0.087621 = 109.968\n" @@ -56,40 +56,40 @@ "constrained case where all CPUs except for CPU 0 are nohz CPUs, they will\n" "by default all run on CPU 0.\n" "\n" - "> > rcu_sched-7 [001] d... 109.952633: timer_start: timer\0260a9eb0 function=process_timeout expiresB94948284 [timeout=1] flags=0x00000001\n" + "> > rcu_sched-7 [001] d... 109.952633: timer_start: timer=160a9eb0 function=process_timeout expires=4294948284 [timeout=1] flags=0x00000001\n" "> \n" "> The CPU nr encoded in flags is: 1\n" "> \n" "> Now we cancel and restart the timer w/o seing the interrupt expiring\n" "> it. And that expiry should have happened at 109.968000 !?!\n" "> \n" - "> > <idle>-0 [001] d... 109.968225: hrtimer_cancel: hrtimer\0209f449c\n" - "> > <idle>-0 [001] d... 109.968526: hrtimer_start: hrtimer\0209f449c function=tick_sched_timer expires\0209890000000 softexpires\0209890000000\n" + "> > <idle>-0 [001] d... 109.968225: hrtimer_cancel: hrtimer=109f449c\n" + "> > <idle>-0 [001] d... 109.968526: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109890000000 softexpires=109890000000\n" "> \n" "> So this advances the next tick even further out. And CPU 0 sets the timer to\n" "> the exact smae value:\n" "> \n" - "> > <idle>-0 [000] d.h. 109.969104: hrtimer_start: hrtimer\0209e949c function=tick_sched_timer expires\0209890000000 softexpires\0209890000000\n" + "> > <idle>-0 [000] d.h. 109.969104: hrtimer_start: hrtimer=109e949c function=tick_sched_timer expires=109890000000 softexpires=109890000000\n" "> \n" "> \n" - "> > <idle>-0 [000] d.h. 109.977690: irq_handler_entry: irq\026 name=jcore_pit\n" - "> > <idle>-0 [000] d.h. 109.977911: hrtimer_cancel: hrtimer\0209e949c\n" - "> > <idle>-0 [000] d.h. 109.978053: hrtimer_expire_entry: hrtimer\0209e949c function=tick_sched_timer now\0209890434160\n" + "> > <idle>-0 [000] d.h. 109.977690: irq_handler_entry: irq=16 name=jcore_pit\n" + "> > <idle>-0 [000] d.h. 109.977911: hrtimer_cancel: hrtimer=109e949c\n" + "> > <idle>-0 [000] d.h. 109.978053: hrtimer_expire_entry: hrtimer=109e949c function=tick_sched_timer now=109890434160\n" "> \n" "> Which expires here. And CPU1 instead of getting an interrupt and expiring\n" "> the timer does the cancel/restart to the next jiffie again:\n" "> \n" - "> > <idle>-0 [001] d... 109.978206: hrtimer_cancel: hrtimer\0209f449c\n" - "> > <idle>-0 [001] d... 109.978495: hrtimer_start: hrtimer\0209f449c function=tick_sched_timer expires\0209900000000 softexpires\0209900000000\n" + "> > <idle>-0 [001] d... 109.978206: hrtimer_cancel: hrtimer=109f449c\n" + "> > <idle>-0 [001] d... 109.978495: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109900000000 softexpires=109900000000\n" "> \n" "> And this repeats;\n" "> \n" - "> > <idle>-0 [000] d.h. 109.987726: irq_handler_entry: irq\026 name=jcore_pit\n" - "> > <idle>-0 [000] d.h. 109.987954: hrtimer_cancel: hrtimer\0209e949c\n" - "> > <idle>-0 [000] d.h. 109.988095: hrtimer_expire_entry: hrtimer\0209e949c function=tick_sched_timer now\0209900474620\n" + "> > <idle>-0 [000] d.h. 109.987726: irq_handler_entry: irq=16 name=jcore_pit\n" + "> > <idle>-0 [000] d.h. 109.987954: hrtimer_cancel: hrtimer=109e949c\n" + "> > <idle>-0 [000] d.h. 109.988095: hrtimer_expire_entry: hrtimer=109e949c function=tick_sched_timer now=109900474620\n" "> \n" - "> > <idle>-0 [001] d... 109.988243: hrtimer_cancel: hrtimer\0209f449c\n" - "> > <idle>-0 [001] d... 109.988537: hrtimer_start: hrtimer\0209f449c fun9c\n" + "> > <idle>-0 [001] d... 109.988243: hrtimer_cancel: hrtimer=109f449c\n" + "> > <idle>-0 [001] d... 109.988537: hrtimer_start: hrtimer=109f449c fun9c\n" "> \n" "> There is something badly wrong here.\n" "> \n" @@ -97,7 +97,7 @@ "> > <idle>-0 [000] ..s. 110.019617: softirq_exit: vec=1 [action=TIMER]\n" "> > <idle>-0 [000] ..s. 110.019730: softirq_entry: vec=7 [action=SCHED]\n" "> > <idle>-0 [000] ..s. 110.020174: softirq_exit: vec=7 [action=SCHED]\n" - "> > <idle>-0 [000] d.h. 110.027674: irq_handler_entry: irq\026 name=jcore_pit\n" + "> > <idle>-0 [000] d.h. 110.027674: irq_handler_entry: irq=16 name=jcore_pit\n" "> > \n" "> > The rcu_sched process does not run again after the tick_stop until\n" "> > 132s, and only a few RCU softirqs happen (all shown above). During\n" @@ -120,22 +120,22 @@ "> Please put a tracepoint into your set_next_event() callback as well. SO\n" "> this changes here:\n" "> \n" - "> > <idle>-0 [001] d... 132.198170: hrtimer_cancel: hrtimer\0209f449c\n" - "> > <idle>-0 [001] d... 132.198451: hrtimer_start: hrtimer\0209f449c function=tick_sched_timer expires\0232120000000 softexpires\0232120000000\n" + "> > <idle>-0 [001] d... 132.198170: hrtimer_cancel: hrtimer=109f449c\n" + "> > <idle>-0 [001] d... 132.198451: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=132120000000 softexpires=132120000000\n" "> \n" - "> > <idle>-0 [001] dnh. 132.205860: irq_handler_entry: irq name=ipi\n" - "> > <idle>-0 [001] dnh. 132.206041: irq_handler_exit: irq ret=handle\n" + "> > <idle>-0 [001] dnh. 132.205860: irq_handler_entry: irq=20 name=ipi\n" + "> > <idle>-0 [001] dnh. 132.206041: irq_handler_exit: irq=20 ret=handle\n" "> \n" "> So CPU1 gets an IPI\n" "> \n" - "> > <idle>-0 [001] dn.. 132.206650: hrtimer_cancel: hrtimer\0209f449c\n" - "> 49c function=tick_sched_timer now\0232119115200\n" - "> > <idle>-0 [001] dn.. 132.206936: hrtimer_start: hrtimer\0209f449c function=tick_sched_timer expires\0232120000000 softexpires\0232120000000\n" + "> > <idle>-0 [001] dn.. 132.206650: hrtimer_cancel: hrtimer=109f449c\n" + "> 49c function=tick_sched_timer now=132119115200\n" + "> > <idle>-0 [001] dn.. 132.206936: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=132120000000 softexpires=132120000000\n" "> \n" "> And rcu-sched-7 gets running magically, but we don't know what woke it\n" "> up. Definitely not the timer, because that did not fire.\n" "> \n" - "> > rcu_sched-7 [001] d... 132.207710: timer_cancel: timer\0260a9eb0\n" + "> > rcu_sched-7 [001] d... 132.207710: timer_cancel: timer=160a9eb0\n" "\n" "It could have been an explicit wakeup at the end of a grace period. That\n" "would explain its cancelling the timer, anyway.\n" @@ -161,10 +161,15 @@ "> obviously either getting out of idle and then moves the tick ahead for some\n" "> unknown reason.\n" "\n" - "And a one-jiffy timeout is in fact expected behavior when HZ\0200.\n" - "You have to be running HZ%0 or better to have two-jiffy timeouts,\n" - "and HZP0 or better for three-jiffy timeouts.\n" + "And a one-jiffy timeout is in fact expected behavior when HZ=100.\n" + "You have to be running HZ=250 or better to have two-jiffy timeouts,\n" + "and HZ=500 or better for three-jiffy timeouts.\n" "\n" - "\t\t\t\t\t\t\tThanx, Paul" + "\t\t\t\t\t\t\tThanx, Paul\n" + "\n" + "--\n" + "To unsubscribe from this list: send the line \"unsubscribe devicetree\" in\n" + "the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org\n" + More majordomo info at http://vger.kernel.org/majordomo-info.html -690b395e651aa16d2d6a63d09cc60d7be279e4779e2eb7231176f9f8d93d0001 +4d01b4044a544488635053f0d02fdec7a4eb3d064eb130d4550523064d3deb4c
diff --git a/a/1.txt b/N2/1.txt index 4153bdd..7403bfa 100644 --- a/a/1.txt +++ b/N2/1.txt @@ -12,8 +12,8 @@ Some RCU commentary, on the off-chance that it helps... > into a NOHZ idle sleep. > > > <idle>-0 [001] d... 109.953436: tick_stop: success=1 dependency=NONE -> > <idle>-0 [001] d... 109.953617: hrtimer_cancel: hrtimer\x109f449c -> > <idle>-0 [001] d... 109.953818: hrtimer_start: hrtimer\x109f449c function=tick_sched_timer expires\x109880000000 softexpires\x109880000000 +> > <idle>-0 [001] d... 109.953617: hrtimer_cancel: hrtimer=109f449c +> > <idle>-0 [001] d... 109.953818: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109880000000 softexpires=109880000000 > > which is (using the 0.087621us delta between the trace clock and clock > MONO) at: 109.880 + 0.087621 = 109.968 @@ -34,40 +34,40 @@ the rcuo callback-offload kthreads are constrained. And yes, in the most constrained case where all CPUs except for CPU 0 are nohz CPUs, they will by default all run on CPU 0. -> > rcu_sched-7 [001] d... 109.952633: timer_start: timer\x160a9eb0 function=process_timeout expiresB94948284 [timeout=1] flags=0x00000001 +> > rcu_sched-7 [001] d... 109.952633: timer_start: timer=160a9eb0 function=process_timeout expires=4294948284 [timeout=1] flags=0x00000001 > > The CPU nr encoded in flags is: 1 > > Now we cancel and restart the timer w/o seing the interrupt expiring > it. And that expiry should have happened at 109.968000 !?! > -> > <idle>-0 [001] d... 109.968225: hrtimer_cancel: hrtimer\x109f449c -> > <idle>-0 [001] d... 109.968526: hrtimer_start: hrtimer\x109f449c function=tick_sched_timer expires\x109890000000 softexpires\x109890000000 +> > <idle>-0 [001] d... 109.968225: hrtimer_cancel: hrtimer=109f449c +> > <idle>-0 [001] d... 109.968526: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109890000000 softexpires=109890000000 > > So this advances the next tick even further out. And CPU 0 sets the timer to > the exact smae value: > -> > <idle>-0 [000] d.h. 109.969104: hrtimer_start: hrtimer\x109e949c function=tick_sched_timer expires\x109890000000 softexpires\x109890000000 +> > <idle>-0 [000] d.h. 109.969104: hrtimer_start: hrtimer=109e949c function=tick_sched_timer expires=109890000000 softexpires=109890000000 > > -> > <idle>-0 [000] d.h. 109.977690: irq_handler_entry: irq\x16 name=jcore_pit -> > <idle>-0 [000] d.h. 109.977911: hrtimer_cancel: hrtimer\x109e949c -> > <idle>-0 [000] d.h. 109.978053: hrtimer_expire_entry: hrtimer\x109e949c function=tick_sched_timer now\x109890434160 +> > <idle>-0 [000] d.h. 109.977690: irq_handler_entry: irq=16 name=jcore_pit +> > <idle>-0 [000] d.h. 109.977911: hrtimer_cancel: hrtimer=109e949c +> > <idle>-0 [000] d.h. 109.978053: hrtimer_expire_entry: hrtimer=109e949c function=tick_sched_timer now=109890434160 > > Which expires here. And CPU1 instead of getting an interrupt and expiring > the timer does the cancel/restart to the next jiffie again: > -> > <idle>-0 [001] d... 109.978206: hrtimer_cancel: hrtimer\x109f449c -> > <idle>-0 [001] d... 109.978495: hrtimer_start: hrtimer\x109f449c function=tick_sched_timer expires\x109900000000 softexpires\x109900000000 +> > <idle>-0 [001] d... 109.978206: hrtimer_cancel: hrtimer=109f449c +> > <idle>-0 [001] d... 109.978495: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109900000000 softexpires=109900000000 > > And this repeats; > -> > <idle>-0 [000] d.h. 109.987726: irq_handler_entry: irq\x16 name=jcore_pit -> > <idle>-0 [000] d.h. 109.987954: hrtimer_cancel: hrtimer\x109e949c -> > <idle>-0 [000] d.h. 109.988095: hrtimer_expire_entry: hrtimer\x109e949c function=tick_sched_timer now\x109900474620 +> > <idle>-0 [000] d.h. 109.987726: irq_handler_entry: irq=16 name=jcore_pit +> > <idle>-0 [000] d.h. 109.987954: hrtimer_cancel: hrtimer=109e949c +> > <idle>-0 [000] d.h. 109.988095: hrtimer_expire_entry: hrtimer=109e949c function=tick_sched_timer now=109900474620 > -> > <idle>-0 [001] d... 109.988243: hrtimer_cancel: hrtimer\x109f449c -> > <idle>-0 [001] d... 109.988537: hrtimer_start: hrtimer\x109f449c fun9c +> > <idle>-0 [001] d... 109.988243: hrtimer_cancel: hrtimer=109f449c +> > <idle>-0 [001] d... 109.988537: hrtimer_start: hrtimer=109f449c fun9c > > There is something badly wrong here. > @@ -75,7 +75,7 @@ by default all run on CPU 0. > > <idle>-0 [000] ..s. 110.019617: softirq_exit: vec=1 [action=TIMER] > > <idle>-0 [000] ..s. 110.019730: softirq_entry: vec=7 [action=SCHED] > > <idle>-0 [000] ..s. 110.020174: softirq_exit: vec=7 [action=SCHED] -> > <idle>-0 [000] d.h. 110.027674: irq_handler_entry: irq\x16 name=jcore_pit +> > <idle>-0 [000] d.h. 110.027674: irq_handler_entry: irq=16 name=jcore_pit > > > > The rcu_sched process does not run again after the tick_stop until > > 132s, and only a few RCU softirqs happen (all shown above). During @@ -98,22 +98,22 @@ by default all run on CPU 0. > Please put a tracepoint into your set_next_event() callback as well. SO > this changes here: > -> > <idle>-0 [001] d... 132.198170: hrtimer_cancel: hrtimer\x109f449c -> > <idle>-0 [001] d... 132.198451: hrtimer_start: hrtimer\x109f449c function=tick_sched_timer expires\x132120000000 softexpires\x132120000000 +> > <idle>-0 [001] d... 132.198170: hrtimer_cancel: hrtimer=109f449c +> > <idle>-0 [001] d... 132.198451: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=132120000000 softexpires=132120000000 > -> > <idle>-0 [001] dnh. 132.205860: irq_handler_entry: irq name=ipi -> > <idle>-0 [001] dnh. 132.206041: irq_handler_exit: irq ret=handle +> > <idle>-0 [001] dnh. 132.205860: irq_handler_entry: irq=20 name=ipi +> > <idle>-0 [001] dnh. 132.206041: irq_handler_exit: irq=20 ret=handle > > So CPU1 gets an IPI > -> > <idle>-0 [001] dn.. 132.206650: hrtimer_cancel: hrtimer\x109f449c -> 49c function=tick_sched_timer now\x132119115200 -> > <idle>-0 [001] dn.. 132.206936: hrtimer_start: hrtimer\x109f449c function=tick_sched_timer expires\x132120000000 softexpires\x132120000000 +> > <idle>-0 [001] dn.. 132.206650: hrtimer_cancel: hrtimer=109f449c +> 49c function=tick_sched_timer now=132119115200 +> > <idle>-0 [001] dn.. 132.206936: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=132120000000 softexpires=132120000000 > > And rcu-sched-7 gets running magically, but we don't know what woke it > up. Definitely not the timer, because that did not fire. > -> > rcu_sched-7 [001] d... 132.207710: timer_cancel: timer\x160a9eb0 +> > rcu_sched-7 [001] d... 132.207710: timer_cancel: timer=160a9eb0 It could have been an explicit wakeup at the end of a grace period. That would explain its cancelling the timer, anyway. @@ -139,8 +139,8 @@ would explain its cancelling the timer, anyway. > obviously either getting out of idle and then moves the tick ahead for some > unknown reason. -And a one-jiffy timeout is in fact expected behavior when HZ\x100. -You have to be running HZ%0 or better to have two-jiffy timeouts, -and HZP0 or better for three-jiffy timeouts. +And a one-jiffy timeout is in fact expected behavior when HZ=100. +You have to be running HZ=250 or better to have two-jiffy timeouts, +and HZ=500 or better for three-jiffy timeouts. Thanx, Paul diff --git a/a/content_digest b/N2/content_digest index 3bae75f..bddbe67 100644 --- a/a/content_digest +++ b/N2/content_digest @@ -9,15 +9,15 @@ "ref\0alpine.DEB.2.20.1609301319410.4799@nanos\0" "From\0Paul E. McKenney <paulmck@linux.vnet.ibm.com>\0" "Subject\0Re: [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver\0" - "Date\0Fri, 30 Sep 2016 13:48:35 +0000\0" - "To\0Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>\0" - "Cc\0Rich Felker <dalias-8zAoT0mYgF4@public.gmane.org>" - Daniel Lezcano <daniel.lezcano-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> - devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org - linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org - linux-sh-u79uwXL29TY76Z2rM5mHXA@public.gmane.org - Rob Herring <robh+dt-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> - " Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org>\0" + "Date\0Fri, 30 Sep 2016 06:48:35 -0700\0" + "To\0Thomas Gleixner <tglx@linutronix.de>\0" + "Cc\0Rich Felker <dalias@libc.org>" + Daniel Lezcano <daniel.lezcano@linaro.org> + devicetree@vger.kernel.org + linux-kernel@vger.kernel.org + linux-sh@vger.kernel.org + Rob Herring <robh+dt@kernel.org> + " Mark Rutland <mark.rutland@arm.com>\0" "\00:1\0" "b\0" "On Fri, Sep 30, 2016 at 03:15:11PM +0200, Thomas Gleixner wrote:\n" @@ -34,8 +34,8 @@ "> into a NOHZ idle sleep.\n" "> \n" "> > <idle>-0 [001] d... 109.953436: tick_stop: success=1 dependency=NONE\n" - "> > <idle>-0 [001] d... 109.953617: hrtimer_cancel: hrtimer\0209f449c\n" - "> > <idle>-0 [001] d... 109.953818: hrtimer_start: hrtimer\0209f449c function=tick_sched_timer expires\0209880000000 softexpires\0209880000000\n" + "> > <idle>-0 [001] d... 109.953617: hrtimer_cancel: hrtimer=109f449c\n" + "> > <idle>-0 [001] d... 109.953818: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109880000000 softexpires=109880000000\n" "> \n" "> which is (using the 0.087621us delta between the trace clock and clock\n" "> MONO) at: 109.880 + 0.087621 = 109.968\n" @@ -56,40 +56,40 @@ "constrained case where all CPUs except for CPU 0 are nohz CPUs, they will\n" "by default all run on CPU 0.\n" "\n" - "> > rcu_sched-7 [001] d... 109.952633: timer_start: timer\0260a9eb0 function=process_timeout expiresB94948284 [timeout=1] flags=0x00000001\n" + "> > rcu_sched-7 [001] d... 109.952633: timer_start: timer=160a9eb0 function=process_timeout expires=4294948284 [timeout=1] flags=0x00000001\n" "> \n" "> The CPU nr encoded in flags is: 1\n" "> \n" "> Now we cancel and restart the timer w/o seing the interrupt expiring\n" "> it. And that expiry should have happened at 109.968000 !?!\n" "> \n" - "> > <idle>-0 [001] d... 109.968225: hrtimer_cancel: hrtimer\0209f449c\n" - "> > <idle>-0 [001] d... 109.968526: hrtimer_start: hrtimer\0209f449c function=tick_sched_timer expires\0209890000000 softexpires\0209890000000\n" + "> > <idle>-0 [001] d... 109.968225: hrtimer_cancel: hrtimer=109f449c\n" + "> > <idle>-0 [001] d... 109.968526: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109890000000 softexpires=109890000000\n" "> \n" "> So this advances the next tick even further out. And CPU 0 sets the timer to\n" "> the exact smae value:\n" "> \n" - "> > <idle>-0 [000] d.h. 109.969104: hrtimer_start: hrtimer\0209e949c function=tick_sched_timer expires\0209890000000 softexpires\0209890000000\n" + "> > <idle>-0 [000] d.h. 109.969104: hrtimer_start: hrtimer=109e949c function=tick_sched_timer expires=109890000000 softexpires=109890000000\n" "> \n" "> \n" - "> > <idle>-0 [000] d.h. 109.977690: irq_handler_entry: irq\026 name=jcore_pit\n" - "> > <idle>-0 [000] d.h. 109.977911: hrtimer_cancel: hrtimer\0209e949c\n" - "> > <idle>-0 [000] d.h. 109.978053: hrtimer_expire_entry: hrtimer\0209e949c function=tick_sched_timer now\0209890434160\n" + "> > <idle>-0 [000] d.h. 109.977690: irq_handler_entry: irq=16 name=jcore_pit\n" + "> > <idle>-0 [000] d.h. 109.977911: hrtimer_cancel: hrtimer=109e949c\n" + "> > <idle>-0 [000] d.h. 109.978053: hrtimer_expire_entry: hrtimer=109e949c function=tick_sched_timer now=109890434160\n" "> \n" "> Which expires here. And CPU1 instead of getting an interrupt and expiring\n" "> the timer does the cancel/restart to the next jiffie again:\n" "> \n" - "> > <idle>-0 [001] d... 109.978206: hrtimer_cancel: hrtimer\0209f449c\n" - "> > <idle>-0 [001] d... 109.978495: hrtimer_start: hrtimer\0209f449c function=tick_sched_timer expires\0209900000000 softexpires\0209900000000\n" + "> > <idle>-0 [001] d... 109.978206: hrtimer_cancel: hrtimer=109f449c\n" + "> > <idle>-0 [001] d... 109.978495: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=109900000000 softexpires=109900000000\n" "> \n" "> And this repeats;\n" "> \n" - "> > <idle>-0 [000] d.h. 109.987726: irq_handler_entry: irq\026 name=jcore_pit\n" - "> > <idle>-0 [000] d.h. 109.987954: hrtimer_cancel: hrtimer\0209e949c\n" - "> > <idle>-0 [000] d.h. 109.988095: hrtimer_expire_entry: hrtimer\0209e949c function=tick_sched_timer now\0209900474620\n" + "> > <idle>-0 [000] d.h. 109.987726: irq_handler_entry: irq=16 name=jcore_pit\n" + "> > <idle>-0 [000] d.h. 109.987954: hrtimer_cancel: hrtimer=109e949c\n" + "> > <idle>-0 [000] d.h. 109.988095: hrtimer_expire_entry: hrtimer=109e949c function=tick_sched_timer now=109900474620\n" "> \n" - "> > <idle>-0 [001] d... 109.988243: hrtimer_cancel: hrtimer\0209f449c\n" - "> > <idle>-0 [001] d... 109.988537: hrtimer_start: hrtimer\0209f449c fun9c\n" + "> > <idle>-0 [001] d... 109.988243: hrtimer_cancel: hrtimer=109f449c\n" + "> > <idle>-0 [001] d... 109.988537: hrtimer_start: hrtimer=109f449c fun9c\n" "> \n" "> There is something badly wrong here.\n" "> \n" @@ -97,7 +97,7 @@ "> > <idle>-0 [000] ..s. 110.019617: softirq_exit: vec=1 [action=TIMER]\n" "> > <idle>-0 [000] ..s. 110.019730: softirq_entry: vec=7 [action=SCHED]\n" "> > <idle>-0 [000] ..s. 110.020174: softirq_exit: vec=7 [action=SCHED]\n" - "> > <idle>-0 [000] d.h. 110.027674: irq_handler_entry: irq\026 name=jcore_pit\n" + "> > <idle>-0 [000] d.h. 110.027674: irq_handler_entry: irq=16 name=jcore_pit\n" "> > \n" "> > The rcu_sched process does not run again after the tick_stop until\n" "> > 132s, and only a few RCU softirqs happen (all shown above). During\n" @@ -120,22 +120,22 @@ "> Please put a tracepoint into your set_next_event() callback as well. SO\n" "> this changes here:\n" "> \n" - "> > <idle>-0 [001] d... 132.198170: hrtimer_cancel: hrtimer\0209f449c\n" - "> > <idle>-0 [001] d... 132.198451: hrtimer_start: hrtimer\0209f449c function=tick_sched_timer expires\0232120000000 softexpires\0232120000000\n" + "> > <idle>-0 [001] d... 132.198170: hrtimer_cancel: hrtimer=109f449c\n" + "> > <idle>-0 [001] d... 132.198451: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=132120000000 softexpires=132120000000\n" "> \n" - "> > <idle>-0 [001] dnh. 132.205860: irq_handler_entry: irq name=ipi\n" - "> > <idle>-0 [001] dnh. 132.206041: irq_handler_exit: irq ret=handle\n" + "> > <idle>-0 [001] dnh. 132.205860: irq_handler_entry: irq=20 name=ipi\n" + "> > <idle>-0 [001] dnh. 132.206041: irq_handler_exit: irq=20 ret=handle\n" "> \n" "> So CPU1 gets an IPI\n" "> \n" - "> > <idle>-0 [001] dn.. 132.206650: hrtimer_cancel: hrtimer\0209f449c\n" - "> 49c function=tick_sched_timer now\0232119115200\n" - "> > <idle>-0 [001] dn.. 132.206936: hrtimer_start: hrtimer\0209f449c function=tick_sched_timer expires\0232120000000 softexpires\0232120000000\n" + "> > <idle>-0 [001] dn.. 132.206650: hrtimer_cancel: hrtimer=109f449c\n" + "> 49c function=tick_sched_timer now=132119115200\n" + "> > <idle>-0 [001] dn.. 132.206936: hrtimer_start: hrtimer=109f449c function=tick_sched_timer expires=132120000000 softexpires=132120000000\n" "> \n" "> And rcu-sched-7 gets running magically, but we don't know what woke it\n" "> up. Definitely not the timer, because that did not fire.\n" "> \n" - "> > rcu_sched-7 [001] d... 132.207710: timer_cancel: timer\0260a9eb0\n" + "> > rcu_sched-7 [001] d... 132.207710: timer_cancel: timer=160a9eb0\n" "\n" "It could have been an explicit wakeup at the end of a grace period. That\n" "would explain its cancelling the timer, anyway.\n" @@ -161,10 +161,10 @@ "> obviously either getting out of idle and then moves the tick ahead for some\n" "> unknown reason.\n" "\n" - "And a one-jiffy timeout is in fact expected behavior when HZ\0200.\n" - "You have to be running HZ%0 or better to have two-jiffy timeouts,\n" - "and HZP0 or better for three-jiffy timeouts.\n" + "And a one-jiffy timeout is in fact expected behavior when HZ=100.\n" + "You have to be running HZ=250 or better to have two-jiffy timeouts,\n" + "and HZ=500 or better for three-jiffy timeouts.\n" "\n" "\t\t\t\t\t\t\tThanx, Paul" -690b395e651aa16d2d6a63d09cc60d7be279e4779e2eb7231176f9f8d93d0001 +0f6e5710a83aa19d8ff3392cb83a20e430d73f5af6090cfe6a8d95c20204ea23
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.