From: linux@arm.linux.org.uk (Russell King - ARM Linux)
To: linux-arm-kernel@lists.infradead.org
Subject: [BUG] 2.6.37-rc3 massive interactivity regression on ARM
Date: Sun, 5 Dec 2010 14:19:21 +0000 [thread overview]
Message-ID: <20101205141921.GF9138@n2100.arm.linux.org.uk> (raw)
In-Reply-To: <20101205131702.GE9138@n2100.arm.linux.org.uk>
On Sun, Dec 05, 2010 at 01:17:02PM +0000, Russell King - ARM Linux wrote:
> On Sun, Dec 05, 2010 at 01:32:37PM +0100, Mikael Pettersson wrote:
> > Mikael Pettersson writes:
> > > The scenario is that I do a remote login to an ARM build server,
> > > use screen to start a sub-shell, in that shell start a largish
> > > compile job, detach from that screen, and from the original login
> > > shell I occasionally monitor the compile job with top or ps or
> > > by attaching to the screen.
> > >
> > > With kernels 2.6.37-rc2 and -rc3 this causes the machine to become
> > > very sluggish: top takes forever to start, once started it shows no
> > > activity from the compile job (it's as if it's sleeping on a lock),
> > > and ps also takes forever and shows no activity from the compile job.
> > >
> > > Rebooting into 2.6.36 eliminates these issues.
> > >
> > > I do pretty much the same thing (remote login -> screen -> compile job)
> > > on other archs, but so far I've only seen the 2.6.37-rc misbehaviour
> > > on ARM EABI, specifically on an IOP n2100. (I have access to other ARM
> > > sub-archs, but haven't had time to test 2.6.37-rc on them yet.)
> > >
> > > Has anyone else seen this? Any ideas about the cause?
> >
> > (Re-followup since I just realised my previous followups were to Rafael's
> > regressions mailbot rather than the original thread.)
> >
> > > The bug is still present in 2.6.37-rc4. I'm currently trying to bisect it.
> >
> > git bisect identified
> >
> > [305e6835e05513406fa12820e40e4a8ecb63743c] sched: Do not account irq time to current task
> >
> > as the cause of this regression. Reverting it from 2.6.37-rc4 (requires some
> > hackery due to subsequent changes in the same area) restores sane behaviour.
> >
> > The original patch submission talks about irq-heavy scenarios. My case is the
> > exact opposite: UP, !PREEMPT, NO_HZ, very low irq rate, essentially 100% CPU
> > bound in userspace but expected to schedule quickly when needed (e.g. running
> > top or ps or just hitting CR in one shell while another runs a compile job).
> >
> > I've reproduced the misbehaviour with 2.6.37-rc4 on ARM/mach-iop32x and
> > ARM/mach-ixp4xx, but ARM/mach-kirkwood does not misbehave, and other archs
> > (x86 SMP, SPARC64 UP and SMP, PowerPC32 UP, Alpha UP) also do not misbehave.
> >
> > So it looks like an ARM-only issue, possibly depending on platform specifics.
> >
> > One difference I noticed between my Kirkwood machine and my ixp4xx and iop32x
> > machines is that even though all have CONFIG_NO_HZ=y, the timer irq rate is
> > much higher on Kirkwood, even when the machine is idle.
>
> The above patch you point out is fundamentally broken.
>
> + rq->clock = sched_clock_cpu(cpu);
> + irq_time = irq_time_cpu(cpu);
> + if (rq->clock - irq_time > rq->clock_task)
> + rq->clock_task = rq->clock - irq_time;
>
> This means that we will only update rq->clock_task if it is smaller than
> rq->clock. So, eventually over time, rq->clock_task becomes the maximum
> value that rq->clock can ever be. Or in other words, the maximum value
> of sched_clock_cpu().
>
> Once that has been reached, although rq->clock will wrap back to zero,
> rq->clock_task will not, and so (I think) task execution time accounting
> effectively stops dead.
>
> I guess this hasn't been noticed on x86 as they have a 64-bit sched_clock,
> and so need to wait a long time for this to be noticed. However, on ARM
> where we tend to have 32-bit counters feeding sched_clock(), this value
> will wrap far sooner.
I'm not so sure about this - certainly that if() statement looks very
suspicious above. As irq_time_cpu() will always be zero, can you try
removing the conditional?
In any case, sched_clock_cpu() should be resilient against sched_clock()
wrapping. However, your comments about it being iop32x and ixp4xx
(both of which are 32-bit-counter-to-ns based implementations) and
kirkwood being a 32-bit-extended-to-63-bit-counter-to-ns implementation
does make me wonder...
next prev parent reply other threads:[~2010-12-05 14:19 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-27 15:16 [BUG] 2.6.37-rc3 massive interactivity regression on ARM Mikael Pettersson
2010-12-05 12:32 ` Mikael Pettersson
2010-12-05 13:17 ` Russell King - ARM Linux
2010-12-05 14:19 ` Russell King - ARM Linux [this message]
2010-12-05 16:07 ` Mikael Pettersson
2010-12-05 16:21 ` Russell King - ARM Linux
2010-12-08 12:40 ` Peter Zijlstra
2010-12-08 12:55 ` Russell King - ARM Linux
2010-12-08 14:04 ` Peter Zijlstra
2010-12-08 14:28 ` Russell King - ARM Linux
2010-12-08 14:44 ` Peter Zijlstra
2010-12-08 15:05 ` Russell King - ARM Linux
2010-12-08 15:43 ` Linus Walleij
2010-12-08 20:42 ` john stultz
2010-12-08 23:31 ` Venkatesh Pallipadi
2010-12-09 12:52 ` Peter Zijlstra
2010-12-09 17:43 ` Venkatesh Pallipadi
2010-12-09 17:55 ` Peter Zijlstra
2010-12-09 18:11 ` Venkatesh Pallipadi
2010-12-09 18:55 ` Peter Zijlstra
2010-12-09 22:21 ` Venkatesh Pallipadi
2010-12-09 23:16 ` Peter Zijlstra
2010-12-09 23:35 ` Venkatesh Pallipadi
2010-12-10 10:08 ` Peter Zijlstra
2010-12-10 13:17 ` Peter Zijlstra
2010-12-10 13:27 ` Peter Zijlstra
2010-12-10 13:47 ` Peter Zijlstra
2010-12-10 16:50 ` Russell King - ARM Linux
2010-12-10 16:54 ` Peter Zijlstra
2010-12-10 17:18 ` Eric Dumazet
2010-12-10 17:49 ` Peter Zijlstra
2010-12-10 18:14 ` Eric Dumazet
2010-12-10 18:39 ` Christoph Lameter
2010-12-10 18:46 ` Peter Zijlstra
2010-12-10 19:51 ` Christoph Lameter
2010-12-10 20:07 ` Peter Zijlstra
2010-12-10 20:23 ` Christoph Lameter
2010-12-10 20:32 ` Peter Zijlstra
2010-12-10 20:39 ` Eric Dumazet
2010-12-10 20:49 ` Eric Dumazet
2010-12-10 21:09 ` Christoph Lameter
2010-12-10 21:22 ` Eric Dumazet
2010-12-10 21:45 ` Christoph Lameter
2010-12-10 17:56 ` Russell King - ARM Linux
2010-12-10 18:10 ` Peter Zijlstra
2010-12-10 18:43 ` Peter Zijlstra
2010-12-10 19:17 ` Russell King - ARM Linux
2010-12-10 19:37 ` Peter Zijlstra
2010-12-10 19:25 ` Peter Zijlstra
2010-12-13 14:33 ` Jack Daniel
2010-12-06 21:29 ` Venkatesh Pallipadi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101205141921.GF9138@n2100.arm.linux.org.uk \
--to=linux@arm.linux.org.uk \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).