* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
@ 2014-10-20 9:51 ` Thomas Meyer
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Meyer @ 2014-10-20 9:51 UTC (permalink / raw)
To: Richard Weinberger; +Cc: user-mode-linux-devel, Linux Kernel Mailing List
Am 20.10.2014 10:27 schrieb Richard Weinberger <richard.weinberger@gmail.com>:
>
> On Sun, Oct 19, 2014 at 2:39 PM, Thomas Meyer <thomas@m3y3r.de> wrote:
> > Hello,
> >
> > in UML kernel I get a long cpu using loop in __getnstimeofday()
> > (kernel/time/timekeeping.c:315) in the call of timespec_add_ns(),
> > when I left the host kernel suspended to ram for a few hours and resume
> > again.
> > this is because it seems like the tk->xtime_sec wasn't updated yet, but
> > the nsecs were. nsecs can be as high as 8111000111000111000l
> >
> > the function timespec_add_ns() (include/linux/time.h:266) will call
> > __iter_div_u64_rem() which has an optimized loop for the case
> > that the dividend is not much bigger as the divisior.
> > but this isn't the case for resume from ram on the host kernel.
> >
> > any ideas how to fix this? is it possible to intercept the resume from
> > ram and update the timekeeper->xtime_sec somehow?
> > or can the um arch somehow overwrite timespec_add_ns() to always use
> > div_u64_rem() instead?
>
> Hmm, does this always happen?
Yes, my single core system seems to trigger this every time after resume from ram.
> At least on my notebook it did not happen. I've started an UML yesterday
> suspended it and after more than 12h it worked fine today.
>
> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
> Would be easier to debug. :)
>
> --
> Thanks,
> //richard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
@ 2014-10-20 9:51 ` Thomas Meyer
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Meyer @ 2014-10-20 9:51 UTC (permalink / raw)
To: Richard Weinberger; +Cc: user-mode-linux-devel, Linux Kernel Mailing List
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 1733 bytes --]
Am 20.10.2014 10:27 schrieb Richard Weinberger <richard.weinberger@gmail.com>:
>
> On Sun, Oct 19, 2014 at 2:39 PM, Thomas Meyer <thomas@m3y3r.de> wrote:
> > Hello,
> >
> > in UML kernel I get a long cpu using loop in __getnstimeofday()
> > (kernel/time/timekeeping.c:315) in the call of timespec_add_ns(),
> > when I left the host kernel suspended to ram for a few hours and resume
> > again.
> > this is because it seems like the tk->xtime_sec wasn't updated yet, but
> > the nsecs were. nsecs can be as high as 8111000111000111000l
> >
> > the function timespec_add_ns() (include/linux/time.h:266) will call
> > __iter_div_u64_rem() which has an optimized loop for the case
> > that the dividend is not much bigger as the divisior.
> > but this isn't the case for resume from ram on the host kernel.
> >
> > any ideas how to fix this? is it possible to intercept the resume from
> > ram and update the timekeeper->xtime_sec somehow?
> > or can the um arch somehow overwrite timespec_add_ns() to always use
> > div_u64_rem() instead?
>
> Hmm, does this always happen?
Yes, my single core system seems to trigger this every time after resume from ram.
> At least on my notebook it did not happen. I've started an UML yesterday
> suspended it and after more than 12h it worked fine today.
>
> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
> Would be easier to debug. :)
>
> --
> Thanks,
> //richard
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2014-10-20 9:51 ` Thomas Meyer
(?)
@ 2014-10-20 9:56 ` Richard Weinberger
2014-10-20 19:19 ` Thomas Meyer
2015-04-24 19:58 ` Thomas Meyer
-1 siblings, 2 replies; 16+ messages in thread
From: Richard Weinberger @ 2014-10-20 9:56 UTC (permalink / raw)
To: Thomas Meyer; +Cc: user-mode-linux-devel, Linux Kernel Mailing List
Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
>> Hmm, does this always happen?
>
> Yes, my single core system seems to trigger this every time after resume from ram.
What is your host kernel?
>> At least on my notebook it did not happen. I've started an UML yesterday
>> suspended it and after more than 12h it worked fine today.
>>
>> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
>
> I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
Thanks,
//richard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2014-10-20 9:56 ` Richard Weinberger
@ 2014-10-20 19:19 ` Thomas Meyer
2015-04-24 19:58 ` Thomas Meyer
1 sibling, 0 replies; 16+ messages in thread
From: Thomas Meyer @ 2014-10-20 19:19 UTC (permalink / raw)
To: Richard Weinberger; +Cc: Linux Kernel Mailing List, user-mode-linux-devel
Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
> >> Hmm, does this always happen?
> >
> > Yes, my single core system seems to trigger this every time after resume from ram.
>
> What is your host kernel?
The standard Fedora kernel: 3.16.4-200.fc20.x86_64
>
> >> At least on my notebook it did not happen. I've started an UML yesterday
> >> suspended it and after more than 12h it worked fine today.
> >>
> >> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
> >
> > I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
>
> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
>
I'll try this.
Thanks for this tip.
with kind regards
thomas
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2014-10-20 9:56 ` Richard Weinberger
2014-10-20 19:19 ` Thomas Meyer
@ 2015-04-24 19:58 ` Thomas Meyer
2015-04-26 18:32 ` Richard Weinberger
1 sibling, 1 reply; 16+ messages in thread
From: Thomas Meyer @ 2015-04-24 19:58 UTC (permalink / raw)
To: Richard Weinberger; +Cc: Linux Kernel Mailing List, user-mode-linux-devel
Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
> >> Hmm, does this always happen?
> >
> > Yes, my single core system seems to trigger this every time after resume from ram.
>
> What is your host kernel?
>
> >> At least on my notebook it did not happen. I've started an UML yesterday
> >> suspended it and after more than 12h it worked fine today.
> >>
> >> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
> >
> > I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
>
> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
>
Sadly, this also happens with a cgroup freezer group :-(
bt
#0 __iter_div_u64_rem (remainder=<optimized out>, divisor=<optimized out>, dividend=14641577537827850536) at include/linux/math64.h:12
7
#1 timespec_add_ns (ns=<optimized out>, a=<optimized out>) at include/linux/time.h:235
#2 __getnstimeofday64 (ts=0xffffffffffffffff) at kernel/time/timekeeping.c:658
#3 0x0000000060098a00 in getnstimeofday64 (ts=<optimized out>) at kernel/time/timekeeping.c:678
#4 0x0000000060098a4c in do_gettimeofday (tv=0xab359e50) at kernel/time/timekeeping.c:897
#5 0x0000000060090d66 in SYSC_gettimeofday (tz=<optimized out>, tv=<optimized out>) at kernel/time/time.c:107
#6 SyS_gettimeofday (tv=-1, tz=2097152000) at kernel/time/time.c:102
#7 0x0000000060032cf3 in handle_syscall (r=0xa39db9e8) at arch/um/kernel/skas/syscall.c:35
#8 0x000000006004a247 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Lin
ux/skas/process.c:174
#9 userspace (regs=0xa39db9e8) at arch/um/os-Linux/skas/process.c:399
#10 0x000000006002f125 in fork_handler () at arch/um/kernel/process.c:149
#11 0x0000000000000000 in ?? ()
It seems as only very few people running UML kernels and suspend their host systems...
Any ideas?
I'll go with this patch so long:
diff --git a/include/linux/time.h b/include/linux/time.h
index beebe3a..3486050 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -232,7 +232,7 @@ extern struct timeval ns_to_timeval(const s64 nsec);
*/
static __always_inline void timespec_add_ns(struct timespec *a, u64 ns)
{
- a->tv_sec += __iter_div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
+ a->tv_sec += div_u64_rem(a->tv_nsec + ns, NSEC_PER_SEC, &ns);
a->tv_nsec = ns;
}
kind regards
thomas
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2015-04-24 19:58 ` Thomas Meyer
@ 2015-04-26 18:32 ` Richard Weinberger
2015-04-26 20:20 ` Richard Weinberger
2015-04-26 20:57 ` Thomas Meyer
0 siblings, 2 replies; 16+ messages in thread
From: Richard Weinberger @ 2015-04-26 18:32 UTC (permalink / raw)
To: Thomas Meyer
Cc: Richard Weinberger, Linux Kernel Mailing List,
user-mode-linux-devel
[-- Attachment #1: Type: text/plain, Size: 2528 bytes --]
On Fri, Apr 24, 2015 at 9:58 PM, Thomas Meyer <thomas@m3y3r.de> wrote:
> Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
>> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
>> >> Hmm, does this always happen?
>> >
>> > Yes, my single core system seems to trigger this every time after resume from ram.
>>
>> What is your host kernel?
>>
>> >> At least on my notebook it did not happen. I've started an UML yesterday
>> >> suspended it and after more than 12h it worked fine today.
>> >>
>> >> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
>> >
>> > I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
>>
>> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
>> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
>> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
>> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
>>
>
> Sadly, this also happens with a cgroup freezer group :-(
>
> bt
> #0 __iter_div_u64_rem (remainder=<optimized out>, divisor=<optimized out>, dividend=14641577537827850536) at include/linux/math64.h:12
> 7
> #1 timespec_add_ns (ns=<optimized out>, a=<optimized out>) at include/linux/time.h:235
> #2 __getnstimeofday64 (ts=0xffffffffffffffff) at kernel/time/timekeeping.c:658
> #3 0x0000000060098a00 in getnstimeofday64 (ts=<optimized out>) at kernel/time/timekeeping.c:678
> #4 0x0000000060098a4c in do_gettimeofday (tv=0xab359e50) at kernel/time/timekeeping.c:897
> #5 0x0000000060090d66 in SYSC_gettimeofday (tz=<optimized out>, tv=<optimized out>) at kernel/time/time.c:107
> #6 SyS_gettimeofday (tv=-1, tz=2097152000) at kernel/time/time.c:102
> #7 0x0000000060032cf3 in handle_syscall (r=0xa39db9e8) at arch/um/kernel/skas/syscall.c:35
> #8 0x000000006004a247 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Lin
> ux/skas/process.c:174
> #9 userspace (regs=0xa39db9e8) at arch/um/os-Linux/skas/process.c:399
> #10 0x000000006002f125 in fork_handler () at arch/um/kernel/process.c:149
> #11 0x0000000000000000 in ?? ()
>
> It seems as only very few people running UML kernels and suspend their host systems...
>
> Any ideas?
Can you give the attached patch a try?
Let's see if it proves my theory.
Looks like UML's clocksource needs fixing.
--
Thanks,
//richard
[-- Attachment #2: uml_clock_fix.diff --]
[-- Type: text/plain, Size: 966 bytes --]
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index 117568d..3936948 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -67,14 +67,14 @@ static irqreturn_t um_timer(int irq, void *dev)
static cycle_t itimer_read(struct clocksource *cs)
{
- return os_nsecs() / 1000;
+ return os_nsecs();
}
static struct clocksource itimer_clocksource = {
.name = "itimer",
.rating = 300,
.read = itimer_read,
- .mask = CLOCKSOURCE_MASK(64),
+ .mask = CLOCKSOURCE_MASK(32),
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
@@ -92,7 +92,7 @@ static void __init setup_itimer(void)
clockevent_delta2ns(60 * HZ, &itimer_clockevent);
itimer_clockevent.min_delta_ns =
clockevent_delta2ns(1, &itimer_clockevent);
- err = clocksource_register_hz(&itimer_clocksource, USEC_PER_SEC);
+ err = clocksource_register_hz(&itimer_clocksource, NSEC_PER_SEC);
if (err) {
printk(KERN_ERR "clocksource_register_hz returned %d\n", err);
return;
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2015-04-26 18:32 ` Richard Weinberger
@ 2015-04-26 20:20 ` Richard Weinberger
2015-04-26 20:57 ` Thomas Meyer
1 sibling, 0 replies; 16+ messages in thread
From: Richard Weinberger @ 2015-04-26 20:20 UTC (permalink / raw)
To: Thomas Meyer; +Cc: Linux Kernel Mailing List, user-mode-linux-devel
[-- Attachment #1: Type: text/plain, Size: 357 bytes --]
Am 26.04.2015 um 20:32 schrieb Richard Weinberger:
> On Fri, Apr 24, 2015 at 9:58 PM, Thomas Meyer <thomas@m3y3r.de> wrote:
>> Any ideas?
>
> Can you give the attached patch a try?
> Let's see if it proves my theory.
> Looks like UML's clocksource needs fixing.
Please give also this patch a try.
I should fix your issue in a sane way.
Thanks,
//richard
[-- Attachment #2: uml_mono.diff --]
[-- Type: text/x-patch, Size: 1334 bytes --]
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index d824528..b386cee 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -244,6 +244,7 @@ extern int timer_one_shot(int ticks);
extern long long disable_timer(void);
extern void uml_idle_timer(void);
extern long long os_nsecs(void);
+extern long long os_nsecs_monotonic(void);
/* skas/mem.c */
extern long run_syscall_stub(struct mm_id * mm_idp,
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index 117568d..399687c 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -67,7 +67,7 @@ static irqreturn_t um_timer(int irq, void *dev)
static cycle_t itimer_read(struct clocksource *cs)
{
- return os_nsecs() / 1000;
+ return os_nsecs_monotonic() / 1000;
}
static struct clocksource itimer_clocksource = {
diff --git a/arch/um/os-Linux/time.c b/arch/um/os-Linux/time.c
index e9824d5..0ef8faa 100644
--- a/arch/um/os-Linux/time.c
+++ b/arch/um/os-Linux/time.c
@@ -79,6 +79,15 @@ long long os_nsecs(void)
return timeval_to_ns(&tv);
}
+long long os_nsecs_monotonic(void)
+{
+ struct timespec tp;
+
+ clock_gettime(CLOCK_MONOTONIC, &tp);
+
+ return ((long long)tp.tv_sec * UM_NSEC_PER_SEC) + tp.tv_nsec;
+}
+
#ifdef UML_CONFIG_NO_HZ_COMMON
static int after_sleep_interval(struct timespec *ts)
{
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2015-04-26 18:32 ` Richard Weinberger
@ 2015-04-26 20:57 ` Thomas Meyer
2015-04-26 20:57 ` Thomas Meyer
1 sibling, 0 replies; 16+ messages in thread
From: Thomas Meyer @ 2015-04-26 20:57 UTC (permalink / raw)
To: Richard Weinberger; +Cc: Linux Kernel Mailing List, user-mode-linux-devel
Am Sonntag, den 26.04.2015, 20:32 +0200 schrieb Richard Weinberger:
> On Fri, Apr 24, 2015 at 9:58 PM, Thomas Meyer <thomas@m3y3r.de> wrote:
> > Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
> >> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
> >> >> Hmm, does this always happen?
> >> >
> >> > Yes, my single core system seems to trigger this every time after resume from ram.
> >>
> >> What is your host kernel?
> >>
> >> >> At least on my notebook it did not happen. I've started an UML yesterday
> >> >> suspended it and after more than 12h it worked fine today.
> >> >>
> >> >> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
> >> >
> >> > I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
> >>
> >> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
> >> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
> >> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
> >> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
> >>
> >
> > Sadly, this also happens with a cgroup freezer group :-(
> >
> > bt
> > #0 __iter_div_u64_rem (remainder=<optimized out>, divisor=<optimized out>, dividend=14641577537827850536) at include/linux/math64.h:12
> > 7
> > #1 timespec_add_ns (ns=<optimized out>, a=<optimized out>) at include/linux/time.h:235
> > #2 __getnstimeofday64 (ts=0xffffffffffffffff) at kernel/time/timekeeping.c:658
> > #3 0x0000000060098a00 in getnstimeofday64 (ts=<optimized out>) at kernel/time/timekeeping.c:678
> > #4 0x0000000060098a4c in do_gettimeofday (tv=0xab359e50) at kernel/time/timekeeping.c:897
> > #5 0x0000000060090d66 in SYSC_gettimeofday (tz=<optimized out>, tv=<optimized out>) at kernel/time/time.c:107
> > #6 SyS_gettimeofday (tv=-1, tz=2097152000) at kernel/time/time.c:102
> > #7 0x0000000060032cf3 in handle_syscall (r=0xa39db9e8) at arch/um/kernel/skas/syscall.c:35
> > #8 0x000000006004a247 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Lin
> > ux/skas/process.c:174
> > #9 userspace (regs=0xa39db9e8) at arch/um/os-Linux/skas/process.c:399
> > #10 0x000000006002f125 in fork_handler () at arch/um/kernel/process.c:149
> > #11 0x0000000000000000 in ?? ()
> >
> > It seems as only very few people running UML kernels and suspend their host systems...
> >
> > Any ideas?
>
> Can you give the attached patch a try?
> Let's see if it proves my theory.
> Looks like UML's clocksource needs fixing.
Hi Richard,
I did run this for an hour and did 4 suspend/resume cycles and it seems
not to hang any more!
I'll test your other patch the next week, but AFAIU using clock_gettime
should solve this hangs in a sane way.
Thanks for your support.
kind regards
thomas
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
@ 2015-04-26 20:57 ` Thomas Meyer
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Meyer @ 2015-04-26 20:57 UTC (permalink / raw)
To: Richard Weinberger; +Cc: Linux Kernel Mailing List, user-mode-linux-devel
Am Sonntag, den 26.04.2015, 20:32 +0200 schrieb Richard Weinberger:
> On Fri, Apr 24, 2015 at 9:58 PM, Thomas Meyer <thomas@m3y3r.de> wrote:
> > Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
> >> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
> >> >> Hmm, does this always happen?
> >> >
> >> > Yes, my single core system seems to trigger this every time after resume from ram.
> >>
> >> What is your host kernel?
> >>
> >> >> At least on my notebook it did not happen. I've started an UML yesterday
> >> >> suspended it and after more than 12h it worked fine today.
> >> >>
> >> >> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
> >> >
> >> > I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
> >>
> >> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
> >> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
> >> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
> >> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
> >>
> >
> > Sadly, this also happens with a cgroup freezer group :-(
> >
> > bt
> > #0 __iter_div_u64_rem (remainder=<optimized out>, divisor=<optimized out>, dividend=14641577537827850536) at include/linux/math64.h:12
> > 7
> > #1 timespec_add_ns (ns=<optimized out>, a=<optimized out>) at include/linux/time.h:235
> > #2 __getnstimeofday64 (ts=0xffffffffffffffff) at kernel/time/timekeeping.c:658
> > #3 0x0000000060098a00 in getnstimeofday64 (ts=<optimized out>) at kernel/time/timekeeping.c:678
> > #4 0x0000000060098a4c in do_gettimeofday (tv=0xab359e50) at kernel/time/timekeeping.c:897
> > #5 0x0000000060090d66 in SYSC_gettimeofday (tz=<optimized out>, tv=<optimized out>) at kernel/time/time.c:107
> > #6 SyS_gettimeofday (tv=-1, tz=2097152000) at kernel/time/time.c:102
> > #7 0x0000000060032cf3 in handle_syscall (r=0xa39db9e8) at arch/um/kernel/skas/syscall.c:35
> > #8 0x000000006004a247 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Lin
> > ux/skas/process.c:174
> > #9 userspace (regs=0xa39db9e8) at arch/um/os-Linux/skas/process.c:399
> > #10 0x000000006002f125 in fork_handler () at arch/um/kernel/process.c:149
> > #11 0x0000000000000000 in ?? ()
> >
> > It seems as only very few people running UML kernels and suspend their host systems...
> >
> > Any ideas?
>
> Can you give the attached patch a try?
> Let's see if it proves my theory.
> Looks like UML's clocksource needs fixing.
Hi Richard,
I did run this for an hour and did 4 suspend/resume cycles and it seems
not to hang any more!
I'll test your other patch the next week, but AFAIU using clock_gettime
should solve this hangs in a sane way.
Thanks for your support.
kind regards
thomas
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2015-04-26 20:57 ` Thomas Meyer
(?)
@ 2015-04-26 21:00 ` Richard Weinberger
2015-04-27 5:47 ` Anton Ivanov
-1 siblings, 1 reply; 16+ messages in thread
From: Richard Weinberger @ 2015-04-26 21:00 UTC (permalink / raw)
To: Thomas Meyer; +Cc: Linux Kernel Mailing List, user-mode-linux-devel
Hi!
Am 26.04.2015 um 22:57 schrieb Thomas Meyer:
> Am Sonntag, den 26.04.2015, 20:32 +0200 schrieb Richard Weinberger:
>> On Fri, Apr 24, 2015 at 9:58 PM, Thomas Meyer <thomas@m3y3r.de> wrote:
>>> Am Montag, den 20.10.2014, 11:56 +0200 schrieb Richard Weinberger:
>>>> Am 20.10.2014 um 11:51 schrieb Thomas Meyer:
>>>>>> Hmm, does this always happen?
>>>>>
>>>>> Yes, my single core system seems to trigger this every time after resume from ram.
>>>>
>>>> What is your host kernel?
>>>>
>>>>>> At least on my notebook it did not happen. I've started an UML yesterday
>>>>>> suspended it and after more than 12h it worked fine today.
>>>>>>
>>>>>> BTW: Do you see the issue also then freezing UML using the freezer cgroup?
>>>>>
>>>>> I'm not sure what do you mean by this. Do I need to enable some special configs for this in the host or uml kernel?
>>>>
>>>> Create on the host side a new freezer cgroup, put UML into it and freeze/thaw it.
>>>> i.e. mkdir /sys/fs/cgroup/freezer/uml ; echo <pid of a shell> > /sys/fs/cgroup/freezer/uml/tasks.
>>>> In the said shell run UML and then freeze it using echo FROZEN > /sys/fs/cgroup/freezer/uml/freezer.state.
>>>> Later thaw it: echo THAWED > /sys/fs/cgroup/freezer/uml/freezer.state
>>>>
>>>
>>> Sadly, this also happens with a cgroup freezer group :-(
>>>
>>> bt
>>> #0 __iter_div_u64_rem (remainder=<optimized out>, divisor=<optimized out>, dividend=14641577537827850536) at include/linux/math64.h:12
>>> 7
>>> #1 timespec_add_ns (ns=<optimized out>, a=<optimized out>) at include/linux/time.h:235
>>> #2 __getnstimeofday64 (ts=0xffffffffffffffff) at kernel/time/timekeeping.c:658
>>> #3 0x0000000060098a00 in getnstimeofday64 (ts=<optimized out>) at kernel/time/timekeeping.c:678
>>> #4 0x0000000060098a4c in do_gettimeofday (tv=0xab359e50) at kernel/time/timekeeping.c:897
>>> #5 0x0000000060090d66 in SYSC_gettimeofday (tz=<optimized out>, tv=<optimized out>) at kernel/time/time.c:107
>>> #6 SyS_gettimeofday (tv=-1, tz=2097152000) at kernel/time/time.c:102
>>> #7 0x0000000060032cf3 in handle_syscall (r=0xa39db9e8) at arch/um/kernel/skas/syscall.c:35
>>> #8 0x000000006004a247 in handle_trap (local_using_sysemu=<optimized out>, regs=<optimized out>, pid=<optimized out>) at arch/um/os-Lin
>>> ux/skas/process.c:174
>>> #9 userspace (regs=0xa39db9e8) at arch/um/os-Linux/skas/process.c:399
>>> #10 0x000000006002f125 in fork_handler () at arch/um/kernel/process.c:149
>>> #11 0x0000000000000000 in ?? ()
>>>
>>> It seems as only very few people running UML kernels and suspend their host systems...
>>>
>>> Any ideas?
>>
>> Can you give the attached patch a try?
>> Let's see if it proves my theory.
>> Looks like UML's clocksource needs fixing.
>
> Hi Richard,
>
> I did run this for an hour and did 4 suspend/resume cycles and it seems
> not to hang any more!
Yay!
BTW: Changing the host's time should also work for testing...
> I'll test your other patch the next week, but AFAIU using clock_gettime
> should solve this hangs in a sane way.
Yep. I have no idea why UML is currently using gettimeofday() as clocksource,
this is completely bogus. ;-\
Thanks,
//richard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2015-04-26 21:00 ` Richard Weinberger
@ 2015-04-27 5:47 ` Anton Ivanov
2015-04-27 7:23 ` Richard Weinberger
0 siblings, 1 reply; 16+ messages in thread
From: Anton Ivanov @ 2015-04-27 5:47 UTC (permalink / raw)
To: user-mode-linux-devel
On 26/04/15 22:00, Richard Weinberger wrote:
>
>>> Can you give the attached patch a try?
>>> Let's see if it proves my theory.
>>> Looks like UML's clocksource needs fixing.
>> Hi Richard,
>>
>> I did run this for an hour and did 4 suspend/resume cycles and it seems
>> not to hang any more!
> Yay!
> BTW: Changing the host's time should also work for testing...
>
>> I'll test your other patch the next week, but AFAIU using clock_gettime
>> should solve this hangs in a sane way.
> Yep. I have no idea why UML is currently using gettimeofday() as clocksource,
> this is completely bogus. ;-\
It is even more bogus than you think - read time.c, it is using itimer
on the virtual clock for all timers for extra spice so the timers are
dependent on how much cpu it uses at any given time.
It does not need to be - I wrote a patchset for it to use posix timers
and CLOCK_MONOTONIC. It went into Richard's queue ~ a year ago along
with an epoll based IRQ controller and some vNIC drivers.
Have not heard of it since.
A.
>
> Thanks,
> //richard
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2015-04-27 5:47 ` Anton Ivanov
@ 2015-04-27 7:23 ` Richard Weinberger
2015-04-27 8:20 ` Anton Ivanov
2015-04-30 16:40 ` Thomas Meyer
0 siblings, 2 replies; 16+ messages in thread
From: Richard Weinberger @ 2015-04-27 7:23 UTC (permalink / raw)
To: Anton Ivanov; +Cc: user-mode-linux-devel@lists.sourceforge.net
On Mon, Apr 27, 2015 at 7:47 AM, Anton Ivanov
<anton.ivanov@kot-begemot.co.uk> wrote:
> On 26/04/15 22:00, Richard Weinberger wrote:
>>
>>>> Can you give the attached patch a try?
>>>> Let's see if it proves my theory.
>>>> Looks like UML's clocksource needs fixing.
>>> Hi Richard,
>>>
>>> I did run this for an hour and did 4 suspend/resume cycles and it seems
>>> not to hang any more!
>> Yay!
>> BTW: Changing the host's time should also work for testing...
>>
>>> I'll test your other patch the next week, but AFAIU using clock_gettime
>>> should solve this hangs in a sane way.
>> Yep. I have no idea why UML is currently using gettimeofday() as clocksource,
>> this is completely bogus. ;-\
>
> It is even more bogus than you think - read time.c, it is using itimer
> on the virtual clock for all timers for extra spice so the timers are
> dependent on how much cpu it uses at any given time.
Yep.
> It does not need to be - I wrote a patchset for it to use posix timers
> and CLOCK_MONOTONIC. It went into Richard's queue ~ a year ago along
> with an epoll based IRQ controller and some vNIC drivers.
>
> Have not heard of it since.
mea culpa!
If I forget a patch, just shout at me.
Can you please rebase your patches against 4.1-rc1?
Such that I can merge them ASAP...
--
Thanks,
//richard
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2015-04-27 7:23 ` Richard Weinberger
@ 2015-04-27 8:20 ` Anton Ivanov
2015-04-30 16:40 ` Thomas Meyer
1 sibling, 0 replies; 16+ messages in thread
From: Anton Ivanov @ 2015-04-27 8:20 UTC (permalink / raw)
To: Richard Weinberger; +Cc: user-mode-linux-devel@lists.sourceforge.net
On 27/04/15 08:23, Richard Weinberger wrote:
> On Mon, Apr 27, 2015 at 7:47 AM, Anton Ivanov
> <anton.ivanov@kot-begemot.co.uk> wrote:
>> On 26/04/15 22:00, Richard Weinberger wrote:
>>>>> Can you give the attached patch a try?
>>>>> Let's see if it proves my theory.
>>>>> Looks like UML's clocksource needs fixing.
>>>> Hi Richard,
>>>>
>>>> I did run this for an hour and did 4 suspend/resume cycles and it seems
>>>> not to hang any more!
>>> Yay!
>>> BTW: Changing the host's time should also work for testing...
>>>
>>>> I'll test your other patch the next week, but AFAIU using clock_gettime
>>>> should solve this hangs in a sane way.
>>> Yep. I have no idea why UML is currently using gettimeofday() as clocksource,
>>> this is completely bogus. ;-\
>> It is even more bogus than you think - read time.c, it is using itimer
>> on the virtual clock for all timers for extra spice so the timers are
>> dependent on how much cpu it uses at any given time.
> Yep.
>
>> It does not need to be - I wrote a patchset for it to use posix timers
>> and CLOCK_MONOTONIC. It went into Richard's queue ~ a year ago along
>> with an epoll based IRQ controller and some vNIC drivers.
>>
>> Have not heard of it since.
> mea culpa!
> If I forget a patch, just shout at me.
>
> Can you please rebase your patches against 4.1-rc1?
Sure.
I will do my best to sort it out by ~ Tue next week latest. I am working
mostly on opendaylight now so I will have to do it in my free time.
> Such that I can merge them ASAP...
>
There were several patches in that patchset, I suggest we try to do it
step-by-step - merge only the timer and the epoll controller on first
pass (while they are not dependent, they have not seen extensive testing
separately). This is what I will re-submit as a "first pass".
These are prerequisites for the network interfaces, etc which we can do
later. The last tests I did in my previous job on them were showing >
3GBit/s forwarding rate across a UML instance pinned to a single core
and measured to a remote network device (real forwarding rate on real
network traffic - not moving packets locally on the host).
A.
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2015-04-27 7:23 ` Richard Weinberger
2015-04-27 8:20 ` Anton Ivanov
@ 2015-04-30 16:40 ` Thomas Meyer
1 sibling, 0 replies; 16+ messages in thread
From: Thomas Meyer @ 2015-04-30 16:40 UTC (permalink / raw)
To: Richard Weinberger; +Cc: user-mode-linux-devel@lists.sourceforge.net
> Am 27.04.2015 um 09:23 schrieb Richard Weinberger <richard.weinberger@gmail.com>:
>
Hi Richard,
> mea culpa!
> If I forget a patch, just shout at me.
I crawled through the mailing list, you may want to also check these:
http://sourceforge.net/p/user-mode-linux/mailman/message/32922307/
http://sourceforge.net/p/user-mode-linux/mailman/message/32922306/
http://sourceforge.net/p/user-mode-linux/mailman/message/32922305/
http://sourceforge.net/p/user-mode-linux/mailman/message/32896700/
>
>
> --
> Thanks,
> //richard
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
@ 2014-10-19 12:39 Thomas Meyer
2014-10-20 8:27 ` Richard Weinberger
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Meyer @ 2014-10-19 12:39 UTC (permalink / raw)
To: user-mode-linux-devel, Linux Kernel Mailing List
Hello,
in UML kernel I get a long cpu using loop in __getnstimeofday()
(kernel/time/timekeeping.c:315) in the call of timespec_add_ns(),
when I left the host kernel suspended to ram for a few hours and resume
again.
this is because it seems like the tk->xtime_sec wasn't updated yet, but
the nsecs were. nsecs can be as high as 8111000111000111000l
the function timespec_add_ns() (include/linux/time.h:266) will call
__iter_div_u64_rem() which has an optimized loop for the case
that the dividend is not much bigger as the divisior.
but this isn't the case for resume from ram on the host kernel.
any ideas how to fix this? is it possible to intercept the resume from
ram and update the timekeeper->xtime_sec somehow?
or can the um arch somehow overwrite timespec_add_ns() to always use
div_u64_rem() instead?
with kind regards
PS: repost on these lists, because nobody did respond to my original
email.
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram
2014-10-19 12:39 Thomas Meyer
@ 2014-10-20 8:27 ` Richard Weinberger
0 siblings, 0 replies; 16+ messages in thread
From: Richard Weinberger @ 2014-10-20 8:27 UTC (permalink / raw)
To: Thomas Meyer; +Cc: user-mode-linux-devel, Linux Kernel Mailing List
On Sun, Oct 19, 2014 at 2:39 PM, Thomas Meyer <thomas@m3y3r.de> wrote:
> Hello,
>
> in UML kernel I get a long cpu using loop in __getnstimeofday()
> (kernel/time/timekeeping.c:315) in the call of timespec_add_ns(),
> when I left the host kernel suspended to ram for a few hours and resume
> again.
> this is because it seems like the tk->xtime_sec wasn't updated yet, but
> the nsecs were. nsecs can be as high as 8111000111000111000l
>
> the function timespec_add_ns() (include/linux/time.h:266) will call
> __iter_div_u64_rem() which has an optimized loop for the case
> that the dividend is not much bigger as the divisior.
> but this isn't the case for resume from ram on the host kernel.
>
> any ideas how to fix this? is it possible to intercept the resume from
> ram and update the timekeeper->xtime_sec somehow?
> or can the um arch somehow overwrite timespec_add_ns() to always use
> div_u64_rem() instead?
Hmm, does this always happen?
At least on my notebook it did not happen. I've started an UML yesterday
suspended it and after more than 12h it worked fine today.
BTW: Do you see the issue also then freezing UML using the freezer cgroup?
Would be easier to debug. :)
--
Thanks,
//richard
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-04-30 16:40 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-20 9:51 [uml-devel] [UM] Long loop in __getnsdayoftime() after resume from ram Thomas Meyer
2014-10-20 9:51 ` Thomas Meyer
2014-10-20 9:56 ` Richard Weinberger
2014-10-20 19:19 ` Thomas Meyer
2015-04-24 19:58 ` Thomas Meyer
2015-04-26 18:32 ` Richard Weinberger
2015-04-26 20:20 ` Richard Weinberger
2015-04-26 20:57 ` Thomas Meyer
2015-04-26 20:57 ` Thomas Meyer
2015-04-26 21:00 ` Richard Weinberger
2015-04-27 5:47 ` Anton Ivanov
2015-04-27 7:23 ` Richard Weinberger
2015-04-27 8:20 ` Anton Ivanov
2015-04-30 16:40 ` Thomas Meyer
-- strict thread matches above, loose matches on Subject: below --
2014-10-19 12:39 Thomas Meyer
2014-10-20 8:27 ` Richard Weinberger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.