public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	xfs@oss.sgi.com
Subject: Re: XFS / writeback invoking soft lockup.
Date: Fri, 13 Dec 2013 21:48:53 +1100	[thread overview]
Message-ID: <20131213104853.GS10988@dastard> (raw)
In-Reply-To: <20131213071407.GA6527@redhat.com>

On Fri, Dec 13, 2013 at 02:14:07AM -0500, Dave Jones wrote:
> I can hit this pretty reliably on one of my slower test machines.
> (8gb ram, 1 slow sata disk)
> 
> the machine is pretty responsive, and recovers after a while.
> anything we can do to shut it up ?

Actually, I think this indicates a problem.

> BUG: soft lockup - CPU#2 stuck for 22s! [kworker/u8:2:8479]
...
> Call Trace:
>  [<c112f8f8>] lru_add_drain+0x1c/0x39
>  [<c112f934>] __pagevec_release+0x10/0x26
>  [<c112baba>] write_cache_pages+0x2f9/0x486

That code in write_cache_pages():

1907         while (!done && (index <= end)) {
1908                 int i;
1909
1910                 nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
1911                               min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1)
1912                 if (nr_pages == 0)
1913                         break;
1914
1915                 for (i = 0; i < nr_pages; i++) {
1916                         struct page *page = pvec.pages[i];
....
....
2001                 }
2002                 pagevec_release(&pvec);
2003                 cond_resched();
2004         }

So after all the pages in a pagevec are processed, we release the
CPU before we grab the next pagevec. This softlockup implies we
have been processing this pagevec for 22s. That tells me the code
is actually stuck spinning on something, not that this is a false
positive. i.e. it should not take 22s to process 14 pages. 

[ Yes, I know XFS can process more than that ->writepage, but it's
still only a millisecond of work if it doesn't block on anything.
And it can't be blocking, otherwise we wouldn't be firing the
softlockup warning. ]

The page cache LRU code is a maze of twisty per-cpu passages that go
deep into the mm subsystem and memcg code - I'm not really sure what
all that code is doing, so you'll probably have to ask someone who
knows about that code.

All I can say is that there doesn't look to be any obvious signs
that this is a XFS or writeback problem fom the stack trace, and
without more information or a reproducable test case I'm not going
to be able to understand the cause.

Is the problem reproducable, or is it just a one-off?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2013-12-13 10:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-13  7:14 XFS / writeback invoking soft lockup Dave Jones
2013-12-13 10:48 ` Dave Chinner [this message]
2013-12-13 11:16   ` Christoph Hellwig
2013-12-13 16:22     ` Dave Jones
2013-12-14  8:20       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131213104853.GS10988@dastard \
    --to=david@fromorbit.com \
    --cc=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox