Re: Processes spinning forever, apparently in lock_timer_base()?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: richard kennedy <richard@rsk.demon.co.uk>
To: linux-kernel@vger.kernel.org
Subject: Re: Processes spinning forever, apparently in lock_timer_base()?
Date: Thu, 2 Aug 2007 10:37:56 +0000 (UTC)	[thread overview]
Message-ID: <f8sc63$aia$1@sea.gmane.org> (raw)
In-Reply-To: 46B10BB7.60900@redhat.com

On Wed, 01 Aug 2007 18:39:51 -0400, Chuck Ebbert wrote:

> Looks like the same problem with spinlock unfairness we've seen
> elsewhere: it seems to be looping here? Or is everyone stuck just
> waiting for writeout?
> 
> lock_timer_base():
>         for (;;) {
>                 tvec_base_t *prelock_base = timer->base; base =
>                 tbase_get_base(prelock_base);
>                 if (likely(base != NULL)) {
>                         spin_lock_irqsave(&base->lock, *flags); if
>                         (likely(prelock_base == timer->base))
>                                 return base;
>                         /* The timer has migrated to another CPU */
>                         spin_unlock_irqrestore(&base->lock, *flags);
>                 }
>                 cpu_relax();
>         }
> 
> The problem goes away completely if filesystem are mounted *without*
> noatime. Has happened in 2.6.20 through 2.6.22...
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249563
> 
> Part of sysrq-t listing:
> 
> mysqld        D 000017c0  2196 23162   1562
>        e383fcb8 00000082 61650954 000017c0 e383fc9c 00000000 c0407208
>        e383f000 a12b0434 00004d1d c6ed2c00 c6ed2d9c c200fa80 00000000
>        c0724640 f6c60540 c4ff3c70 00000508 00000286 c042ffcb e383fcc8
>        00014926 00000000 00000286
> Call Trace:
>  [<c0407208>] do_IRQ+0xbd/0xd1
>  [<c042ffcb>] lock_timer_base+0x19/0x35 [<c04300df>]
>  __mod_timer+0x9a/0xa4
>  [<c060bb55>] schedule_timeout+0x70/0x8f [<c042fd37>]
>  process_timeout+0x0/0x5
>  [<c060bb50>] schedule_timeout+0x6b/0x8f [<c060b67c>]
>  io_schedule_timeout+0x39/0x5d [<c0465eea>] congestion_wait+0x50/0x64
>  [<c0438539>] autoremove_wake_function+0x0/0x35 [<c04620e2>]
>  balance_dirty_pages_ratelimited_nr+0x148/0x193 [<c045e7fd>]
>  generic_file_buffered_write+0x4c7/0x5d3
> 
> 
> named         D 000017c0  2024  1454      1
>        f722acb0 00000082 6165ed96 000017c0 c1523e80 c16f0c00 c16f20e0
>        f722a000 a12be87d 00004d1d f768ac00 f768ad9c c200fa80 00000000
>        00000000 f75bda80 c0407208 00000508 00000286 c042ffcb f722acc0
>        00020207 00000000 00000286
> Call Trace:
>  [<c0407208>] do_IRQ+0xbd/0xd1
>  [<c042ffcb>] lock_timer_base+0x19/0x35 [<c04300df>]
>  __mod_timer+0x9a/0xa4
>  [<c060bb55>] schedule_timeout+0x70/0x8f [<c042fd37>]
>  process_timeout+0x0/0x5
>  [<c060bb50>] schedule_timeout+0x6b/0x8f [<c060b67c>]
>  io_schedule_timeout+0x39/0x5d [<c0465eea>] congestion_wait+0x50/0x64
>  [<c0438539>] autoremove_wake_function+0x0/0x35 [<c04620e2>]
>  balance_dirty_pages_ratelimited_nr+0x148/0x193 [<c045e7fd>]
>  generic_file_buffered_write+0x4c7/0x5d3
> 
> 
> mysqld        D 000017c0  2196 23456   1562
>        e9293cb8 00000082 616692ed 000017c0 e9293c9c 00000000 e9293cc8
>        e9293000 a12c8dd0 00004d1d c3d5ac00 c3d5ad9c c200fa80 00000000
>        c0724640 f6c60540 e9293d10 c07e1f00 00000286 c042ffcb e9293cc8
>        0002b57f 00000000 00000286
> Call Trace:
>  [<c042ffcb>] lock_timer_base+0x19/0x35 [<c04300df>]
>  __mod_timer+0x9a/0xa4
>  [<c060bb55>] schedule_timeout+0x70/0x8f [<c042fd37>]
>  process_timeout+0x0/0x5
>  [<c060bb50>] schedule_timeout+0x6b/0x8f [<c060b67c>]
>  io_schedule_timeout+0x39/0x5d [<c0465eea>] congestion_wait+0x50/0x64
>  [<c0438539>] autoremove_wake_function+0x0/0x35 [<c04620e2>]
>  balance_dirty_pages_ratelimited_nr+0x148/0x193 [<c045e7fd>]
>  generic_file_buffered_write+0x4c7/0x5d3

I'm not sure if this is related to your problem, but I posted a patch the 
other day that fixes an issue in balance_dirty_pages.

If a drive is under light load is can get stuck looping waiting for 
enough pages to be available to be written. In my test case this occurs 
when one drive is under heavy load and the other isn't.

I sent it with the subject "[PATCH] balance_dirty_pages - exit loop when 
no more pages available"
here's the patch again in case it helps

Cheers
Richard

------
--- linux-2.6.22.1/mm/page-writeback.c.orig	2007-07-30 16:36:09.000000000 +0100
+++ linux-2.6.22.1/mm/page-writeback.c	2007-07-31 16:26:43.000000000 +0100
@@ -250,6 +250,8 @@ static void balance_dirty_pages(struct a
 			pages_written += write_chunk - wbc.nr_to_write;
 			if (pages_written >= write_chunk)
 				break;		/* We've done our duty */
+			if (!wbc.encountered_congestion && wbc.nr_to_write > 0)
+				break;	/* didn't find enough to do */
 		}
 		congestion_wait(WRITE, HZ/10);
 	}

next prev parent reply	other threads:[~2007-08-02 10:40 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-01 22:39 Processes spinning forever, apparently in lock_timer_base()? Chuck Ebbert
2007-08-02 10:37 ` richard kennedy [this message]
2007-08-03 18:34 ` Andrew Morton
2007-08-04  8:44   ` Matthias Hensler
2007-08-09  9:59     ` Matthias Hensler
2007-08-09 16:55       ` Andrew Morton
2007-08-09 17:37         ` Matthias Hensler
2007-09-20 21:07         ` Chuck Ebbert
2007-09-20 21:29           ` Andrew Morton
2007-09-20 22:04             ` Chuck Ebbert
2007-09-20 22:36               ` Andrew Morton
2007-09-20 22:44                 ` Chuck Ebbert
2007-09-21  8:08                 ` Matthias Hensler
2007-09-21  8:22                   ` Andrew Morton
2007-09-21 10:25                 ` richard kennedy
2007-09-21 10:33                   ` Andrew Morton
2007-09-21 10:47                     ` richard kennedy
2007-09-22 12:08                     ` richard kennedy
2007-09-21  9:39             ` Andy Whitcroft
2007-09-21 15:43               ` Chuck Ebbert
2007-09-21 15:58               ` Hugh Dickins
2007-09-21 16:16                 ` Chuck Ebbert
2007-09-21 18:54                 ` Peter Zijlstra
2007-10-29 18:55                 ` Bruno Wolff III
  -- strict thread matches above, loose matches on Subject: below --
2007-08-03 20:14 Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='f8sc63$aia$1@sea.gmane.org' \
    --to=richard@rsk.demon.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox