All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joshua Schmid <jschmid@suse.de>
To: linux-bcache@vger.kernel.org
Subject: Re: bcache_gc: BUG: soft lockup - CPU#2 stuck for 23s!
Date: Tue, 03 Feb 2015 12:26:29 +0100	[thread overview]
Message-ID: <54D0B065.4010205@suse.de> (raw)
In-Reply-To: <54D0A4CF.4040204@suse.de>

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 02/03/2015 11:37 AM, Joshua Schmid wrote:
> Hi,
> 
> 
> 
> I tested this patch for some time and it really helps to fix the
> gc issue we are running in. Since it might got lost, i will resend
> it.

patch attached*

> 
> 
> 
> Best Regards, Joshua
> 
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJU0LBlAAoJEPUnwXO0u5uWMVgP/iUGNroCzUSW0lGvV+awLyPX
+p7O5tERuzs7Uq+yKBoE8Zvad76WuS1vnrLqjZgfI+3P4dGJPpf1DlLHlFU7dHkF
/UBgqQFf6OT5MgXk+TXKLXitH+55Chl7WaVe3RMgXEMP3Q5R/rjvLM8Po5gLfRc8
uru7k0oFi+B7cnOmxXruD74baCXY+nwuXoE7PTyOzH+1Wf7SHsz8ygGNVbDhVWI4
A1zaTqoBMNXribBd3wN5uRV7XDHL5un+zF7sR8P9BiXHn1GKPwH7qLesHHTT2nvs
joN7+xMx8In0cR4uloSDqgmwT/6OdeFCujKErwIkVIl57YqEUimByxcA4lGzXvXZ
C5n760Ei/nw0X70qeA2Jkfg/Zk1HfwPcfkHVIi5KB4LelfVa1jbIEaSzdnclH2aD
GlakovGJOkrcPhBgH0xc+7h2eH1kv6Vw8Al1AbTNho92vzVCKmcmGMUyuBpkmxnD
zdxoXUKZLEuGWCt0ExYBU8tIw3twpatCA15sZ9kbgwkggqri7HHqjA8M/cMjPXKC
VNStPPnscMbMjxDvZiJyNrCchi2kInJWLKJWNofk5RZs7f2eQcT1bXKYWxXPGIxn
QZhKsrItZ/k0g2GrxWCRxH6fymHVKkokzt6R0FZcs/1Quf+CnwrhJ5ZPQivkdKLX
BUIG+BeOeYwkLm27/VoH
=XQtd
-----END PGP SIGNATURE-----

[-- Attachment #2: 0001-3.17-rc6-bcache_gc-BUG-soft-lockup-CPU-2-stuck-for-2.patch --]
[-- Type: text/x-patch, Size: 2379 bytes --]

From ab8c276a4997f394e252688f855e1b35374aedee Mon Sep 17 00:00:00 2001
From: Kent Overstreet <kmo@daterainc.com>
Date: Sat, 1 Nov 2014 13:44:47 -0700
Subject: [PATCH] 3.17-rc6: bcache_gc: BUG: soft lockup - CPU#2 stuck for 23s!

On Sun, Sep 28, 2014 at 05:25:37PM -0700, Eric Wheeler wrote:
> Hello Kent, Ross, all:
>
> We're getting bcache_gc backtraces and soft lockups; the system continues to
> be responsive and eventually recovers.  We are running 3.17-rc6. (This
> appears to be a continuation of the thread from 2014-09-15)
>
> Please see the following two backtraces.  The first shows up in
> btree_gc_count_keys(), the other is triggered somehow by rcu_sched.  We will
> test with -rc7 this week, though I didn't see any bcache commits in rc7.
>
> The server is quite busy:
>   dd in userspace from dm-thinp snapshots to another server
>   two DRBD verify's active backed by dm-thinp volumes
>   note that, dd fills up the buffers so this could be operating with few
>   pages free. (Though we have min-mem set to 256MB.)
>
> I see we are hitting functions like bch_ptr_bad() and bch_extent_bad().
> Could that indicate a cache corruption on our volume?

No - those are the normal "check the validity of medata" functions.

> I'm happy to test patches if you have any suggestions or tests that I should
> run it through.

I think it might just be a missing cond_resched()... there's a check during
garbage collection for need_resched() but it appears we might not actually be
calling schedule() then.

Try this patch:

Hi, i tested this patch and it fixes our hangups. I'm afraid it got lost so i am resending it.

commit a64afc92e17e709bdd1618edd04bc608f6a44c55
Author: Kent Overstreet <kmo@daterainc.com>
Date:   Sat Nov 1 13:44:13 2014 -0700

    bcache: Add a cond_resched() call to gc

    Change-Id: Id4f18c533b80ddb40df94ed0bb5e2a236a4bc325

Tested-By: Joshua Schmid <jschmid@suse.com>
---
 drivers/md/bcache/btree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 00cde40..218f21a 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1741,6 +1741,7 @@ static void bch_btree_gc(struct cache_set *c)
 	do {
 		ret = btree_root(gc_root, c, &op, &writes, &stats);
 		closure_sync(&writes);
+		cond_resched();
 
 		if (ret && ret != -EAGAIN)
 			pr_warn("gc failed!");
-- 
2.1.2


      reply	other threads:[~2015-02-03 11:26 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-03 10:37 bcache_gc: BUG: soft lockup - CPU#2 stuck for 23s! Joshua Schmid
2015-02-03 11:26 ` Joshua Schmid [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54D0B065.4010205@suse.de \
    --to=jschmid@suse.de \
    --cc=linux-bcache@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.