* [PATCH] Avoid triggering the softlockup BUG when offline for too long.
@ 2006-11-24 13:10 Glauber de Oliveira Costa
2006-11-27 10:21 ` Keir Fraser
0 siblings, 1 reply; 8+ messages in thread
From: Glauber de Oliveira Costa @ 2006-11-24 13:10 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 469 bytes --]
After being offline for a long time, the softlockup watchdog triggers
a BUG() on our faces. This is expected, as in fact, we spent more than
a fixed 10*HZ amount of time without touching the watchdog.
However, by inspecting the contents of RUNSTATE_offline, we can gain
awareness of the fact, and do better than that. This patch fixes it.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
--
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"
[-- Attachment #2: xen-safepause.patch --]
[-- Type: text/plain, Size: 2788 bytes --]
# HG changeset patch
# User gcosta@redhat.com
# Date 1164376767 18000
# Node ID 0f235d94eeabbca64c14ae6d5ae3708870522f60
# Parent 47fcd5f768fef50cba2fc6dbadc7b75de55e88a5
[LINUX] Avoid triggering the softlockup BUG when offline for too long.
After being offline for a long time, the softlockup watchdog triggers
a BUG() on our faces. This is expected, as in fact, we spent more than
a fixed 10*HZ amount of time without touching the watchdog.
However, by inspecting the contents of RUNSTATE_offline, we can gain
awareness of the fact, and do better than that. This patch fixes it.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
diff -r 47fcd5f768fe -r 0f235d94eeab linux-2.6-xen-sparse/arch/i386/kernel/time-xen.c
--- a/linux-2.6-xen-sparse/arch/i386/kernel/time-xen.c Fri Nov 17 08:30:43 2006 -0500
+++ b/linux-2.6-xen-sparse/arch/i386/kernel/time-xen.c Fri Nov 24 08:59:27 2006 -0500
@@ -129,6 +129,8 @@ static DEFINE_PER_CPU(u64, processed_sys
/* How much CPU time was spent blocked and how much was 'stolen'? */
static DEFINE_PER_CPU(u64, processed_stolen_time);
static DEFINE_PER_CPU(u64, processed_blocked_time);
+/* How much time did we spend offline? */
+static DEFINE_PER_CPU(u64, offline_time);
/* Current runstate of each CPU (updated automatically by the hypervisor). */
static DEFINE_PER_CPU(struct vcpu_runstate_info, runstate);
@@ -607,7 +609,7 @@ EXPORT_SYMBOL(profile_pc);
irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
- s64 delta, delta_cpu, stolen, blocked;
+ s64 delta, delta_cpu, stolen, blocked, offline;
u64 sched_time;
int i, cpu = smp_processor_id();
struct shadow_time_info *shadow = &per_cpu(shadow_time, cpu);
@@ -636,6 +638,8 @@ irqreturn_t timer_interrupt(int irq, voi
per_cpu(processed_stolen_time, cpu);
blocked = runstate->time[RUNSTATE_blocked] -
per_cpu(processed_blocked_time, cpu);
+ offline = runstate->time[RUNSTATE_offline] -
+ per_cpu(offline_time, cpu);
barrier();
} while (sched_time != runstate->state_entry_time);
} while (!time_values_up_to_date(cpu));
@@ -710,6 +714,13 @@ irqreturn_t timer_interrupt(int irq, voi
(cputime_t)delta_cpu);
}
+ /* We know we were offline for too long, avoid triggering the
+ * softlockup_tick bug */
+ if ((offline > 10*HZ)) {
+ touch_softlockup_watchdog();
+ per_cpu(offline_time, cpu) += offline;
+ }
+
/* Local timer processing (see update_process_times()). */
run_local_timers();
if (rcu_pending(cpu))
@@ -734,6 +745,8 @@ static void init_missing_ticks_accountin
runstate->time[RUNSTATE_blocked];
per_cpu(processed_stolen_time, cpu) =
runstate->time[RUNSTATE_runnable] +
+ runstate->time[RUNSTATE_offline];
+ per_cpu(offline_time, cpu) =
runstate->time[RUNSTATE_offline];
}
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] Avoid triggering the softlockup BUG when offline for too long.
2006-11-24 13:10 [PATCH] Avoid triggering the softlockup BUG when offline for too long Glauber de Oliveira Costa
@ 2006-11-27 10:21 ` Keir Fraser
2006-11-27 15:31 ` Glauber de Oliveira Costa
0 siblings, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2006-11-27 10:21 UTC (permalink / raw)
To: Glauber de Oliveira Costa, xen-devel
On 24/11/06 13:10, "Glauber de Oliveira Costa" <gcosta@redhat.com> wrote:
> After being offline for a long time, the softlockup watchdog triggers
> a BUG() on our faces. This is expected, as in fact, we spent more than
> a fixed 10*HZ amount of time without touching the watchdog.
>
> However, by inspecting the contents of RUNSTATE_offline, we can gain
> awareness of the fact, and do better than that. This patch fixes it.
>
> Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Would 'stolen' not be a good enough thing to test? Presumably this is really
just dealing with xm pause/unpause (a single long offline) so this simpler
fix would work just as well?
-- Keir
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Avoid triggering the softlockup BUG when offline for too long.
2006-11-27 10:21 ` Keir Fraser
@ 2006-11-27 15:31 ` Glauber de Oliveira Costa
2006-11-27 16:47 ` Glauber de Oliveira Costa
0 siblings, 1 reply; 8+ messages in thread
From: Glauber de Oliveira Costa @ 2006-11-27 15:31 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
On Mon, Nov 27, 2006 at 10:21:54AM +0000, Keir Fraser wrote:
>
>
>
> On 24/11/06 13:10, "Glauber de Oliveira Costa" <gcosta@redhat.com> wrote:
>
> > After being offline for a long time, the softlockup watchdog triggers
> > a BUG() on our faces. This is expected, as in fact, we spent more than
> > a fixed 10*HZ amount of time without touching the watchdog.
> >
> > However, by inspecting the contents of RUNSTATE_offline, we can gain
> > awareness of the fact, and do better than that. This patch fixes it.
> >
> > Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
>
> Would 'stolen' not be a good enough thing to test? Presumably this is really
> just dealing with xm pause/unpause (a single long offline) so this simpler
> fix would work just as well?
I thought about it, but I'm not 100 % sure. Reasons I had for not using
stolen, were basically:
* Conceptually, (maybe not in practice) stolen could grow due to
runnable time only.
* stolen time, as well as blocked time, does not have it's corresponding
per processor variable updated all in once, but in multiples of
NS_PER_TICK chuncks. If we're out for too long, we could detect stolen
being too great multiple times, leading to far more calls to the
softlockup watchdog then we want too.
Waiting for your comments on this,
--
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Avoid triggering the softlockup BUG when offline for too long.
2006-11-27 15:31 ` Glauber de Oliveira Costa
@ 2006-11-27 16:47 ` Glauber de Oliveira Costa
2006-11-27 18:54 ` Keir Fraser
0 siblings, 1 reply; 8+ messages in thread
From: Glauber de Oliveira Costa @ 2006-11-27 16:47 UTC (permalink / raw)
To: Glauber de Oliveira Costa; +Cc: xen-devel, Keir Fraser
> * stolen time, as well as blocked time, does not have it's corresponding
> per processor variable updated all in once, but in multiples of
> NS_PER_TICK chuncks. If we're out for too long, we could detect stolen
> being too great multiple times, leading to far more calls to the
> softlockup watchdog then we want too.
FYI, I just made a simple test checking for stolen time instead of
offline, and it's in fact called way too oftenly.
--
Glauber de Oliveira Costa.
"Free as in Freedom"
Add your comments to GPLv3 at:
http://gplv3.fsf.org/comments/gplv3-draft-2.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Avoid triggering the softlockup BUG when offline for too long.
2006-11-27 16:47 ` Glauber de Oliveira Costa
@ 2006-11-27 18:54 ` Keir Fraser
2006-11-29 11:46 ` Glauber de Oliveira Costa
0 siblings, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2006-11-27 18:54 UTC (permalink / raw)
To: Glauber de Oliveira Costa, Glauber de Oliveira Costa
Cc: xen-devel, Keir Fraser
On 27/11/06 4:47 pm, "Glauber de Oliveira Costa" <glommer@gmail.com> wrote:
>> * stolen time, as well as blocked time, does not have it's corresponding
>> per processor variable updated all in once, but in multiples of
>> NS_PER_TICK chuncks. If we're out for too long, we could detect stolen
>> being too great multiple times, leading to far more calls to the
>> softlockup watchdog then we want too.
>
> FYI, I just made a simple test checking for stolen time instead of
> offline, and it's in fact called way too oftenly.
That doesn't make sense. Processed_stolen_time should lag at most 1 jiffy
behind actual stolen time. So you still need to accumulate at least 10*HZ-1
jiffies of stolen time in one go to end up touching the softlockup watchdog.
As far as I can see, anyway. What workload did you run to test using stolen
time?
-- Keir
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Avoid triggering the softlockup BUG when offline for too long.
2006-11-27 18:54 ` Keir Fraser
@ 2006-11-29 11:46 ` Glauber de Oliveira Costa
0 siblings, 0 replies; 8+ messages in thread
From: Glauber de Oliveira Costa @ 2006-11-29 11:46 UTC (permalink / raw)
To: Keir Fraser; +Cc: Glauber de Oliveira Costa, xen-devel
On Mon, Nov 27, 2006 at 06:54:26PM +0000, Keir Fraser wrote:
> > FYI, I just made a simple test checking for stolen time instead of
> > offline, and it's in fact called way too oftenly.
>
> That doesn't make sense. Processed_stolen_time should lag at most 1 jiffy
> behind actual stolen time. So you still need to accumulate at least 10*HZ-1
> jiffies of stolen time in one go to end up touching the softlockup watchdog.
> As far as I can see, anyway. What workload did you run to test using stolen
> time?
Thanks for pointing it, Keir.
After going back to it, I found it to be a small mistake of mine. I was
calling the softlockup watchdog before accounting stolen ticks. Calling
it after it does the trick.
I'll resend the patch soon.
--
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] Avoid triggering the softlockup BUG when offline for too long.
@ 2006-11-29 12:08 Glauber de Oliveira Costa
2006-11-29 12:18 ` Keir Fraser
0 siblings, 1 reply; 8+ messages in thread
From: Glauber de Oliveira Costa @ 2006-11-29 12:08 UTC (permalink / raw)
To: Keir Fraser, xen-devel
[-- Attachment #1: Type: text/plain, Size: 555 bytes --]
[LINUX] Avoid triggering the softlockup BUG when offline for too long.
After being offline for a long time, the softlockup watchdog triggers
a BUG() on our faces. This is expected, as in fact, we spent more than
a fixed 10*HZ amount of time without touching the watchdog.
However, by inspecting the contents of stolen inside timer irq handler,
we can gain awareness of the fact, and do better than that.
This patch fixes it.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
--
Glauber de Oliveira Costa
Red Hat Inc.
"Free as in Freedom"
[-- Attachment #2: xen-safepause.patch --]
[-- Type: text/plain, Size: 1186 bytes --]
# HG changeset patch
# User gcosta@redhat.com
# Date 1164805648 18000
# Node ID 24d24bca629bbd8f37319de34e12202814ccdde1
# Parent 47fcd5f768fef50cba2fc6dbadc7b75de55e88a5
[LINUX] Avoid triggering the softlockup BUG when offline for too long.
After being offline for a long time, the softlockup watchdog triggers
a BUG() on our faces. This is expected, as in fact, we spent more than
a fixed 10*HZ amount of time without touching the watchdog.
However, by inspecting the contents of stolen inside timer irq handler,
we can gain awareness of the fact, and do better than that.
This patch fixes it.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
diff -r 47fcd5f768fe -r 24d24bca629b linux-2.6-xen-sparse/arch/i386/kernel/time-xen.c
--- a/linux-2.6-xen-sparse/arch/i386/kernel/time-xen.c Fri Nov 17 08:30:43 2006 -0500
+++ b/linux-2.6-xen-sparse/arch/i386/kernel/time-xen.c Wed Nov 29 08:07:28 2006 -0500
@@ -710,6 +710,9 @@ irqreturn_t timer_interrupt(int irq, voi
(cputime_t)delta_cpu);
}
+ if (stolen > 10*HZ)
+ touch_softlockup_watchdog();
+
/* Local timer processing (see update_process_times()). */
run_local_timers();
if (rcu_pending(cpu))
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Avoid triggering the softlockup BUG when offline for too long.
2006-11-29 12:08 Glauber de Oliveira Costa
@ 2006-11-29 12:18 ` Keir Fraser
0 siblings, 0 replies; 8+ messages in thread
From: Keir Fraser @ 2006-11-29 12:18 UTC (permalink / raw)
To: Glauber de Oliveira Costa, xen-devel
On 29/11/06 12:08, "Glauber de Oliveira Costa" <gcosta@redhat.com> wrote:
> [LINUX] Avoid triggering the softlockup BUG when offline for too long.
>
> After being offline for a long time, the softlockup watchdog triggers
> a BUG() on our faces. This is expected, as in fact, we spent more than
> a fixed 10*HZ amount of time without touching the watchdog.
>
> However, by inspecting the contents of stolen inside timer irq handler,
> we can gain awareness of the fact, and do better than that.
> This patch fixes it.
Thanks. I changed the threshold to 5*HZ just to avoid marginal cases where
we might be offlined for just less than 10 seconds, and then if the per-cpu
watchdog process hasn't run for a second or two before we were offlined then
that would push us over the edge to print a warning. 5*HZ is much more
comfortable.
-- Keir
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-11-29 12:18 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-24 13:10 [PATCH] Avoid triggering the softlockup BUG when offline for too long Glauber de Oliveira Costa
2006-11-27 10:21 ` Keir Fraser
2006-11-27 15:31 ` Glauber de Oliveira Costa
2006-11-27 16:47 ` Glauber de Oliveira Costa
2006-11-27 18:54 ` Keir Fraser
2006-11-29 11:46 ` Glauber de Oliveira Costa
-- strict thread matches above, loose matches on Subject: below --
2006-11-29 12:08 Glauber de Oliveira Costa
2006-11-29 12:18 ` Keir Fraser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.