From: Vojtech Pavlik <vojtech@suse.com>
To: "Jens-U. Mozdzen" <jmozdzen@nde.ag>
Cc: Kent Overstreet <kent.overstreet@gmail.com>,
linux-bcache@vger.kernel.org
Subject: Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach
Date: Mon, 7 Sep 2015 17:52:17 +0200 [thread overview]
Message-ID: <20150907155217.GA27227@suse.com> (raw)
In-Reply-To: <20150907171318.Horde.LCssHlKH7jqwmHt1ENeCSCd@www3.nde.ag>
On Mon, Sep 07, 2015 at 05:13:18PM +0200, Jens-U. Mozdzen wrote:
> and the patches
>
> bcache001.eml:Subject: [PATCH] bcache: [BUG] clear
> BCACHE_DEV_UNLINK_DONE flag when attaching a backing device
> bcache002.eml:Subject: [PATCH] bcache: fix a livelock in btree lock
> bcache003.eml:Subject: [PATCH] bcache: unregister reboot notifier
> when bcache fails to register a block device
> bcache004.eml:Subject: [PATCH] fix a leak in bch_cached_dev_run()
> bcache005.eml:Subject: [PATCH] bcache: Fix writeback_thread never
> writing back incomplete stripes.
>
> I can confirm that running with writeback_percent to zero now works
> much smoother (or "at all", for certain circumstances).
I'm glad to hear that.
> >>PS: We're still facing random reboots (of unknown cause), which may
> >>correlate with bcache's "amount dirty" being near the limit set by
> >>writeback_percent.
>
> For a test, after a few hours running the latest patch, I switched
> from writeback_percent==0 to writeback_percent==1, and had a full
> kernel crash within an hour! Luckily, I still had a console open on
> the machine, so I could for the first time see a hint (but not much
> more) of what is going on:
I'm running the openSUSE most recent stable kernel, available here:
http://download.opensuse.org/repositories/Kernel:/stable/standard/
It's currently at 4.2.0 and contains all of the above patches. I've
seen crashes in __find_stripe a couple times a few months apart on older
kernels, but these aren't likely related to bcache. Similar to this:
https://bugzilla.kernel.org/show_bug.cgi?id=100321
But except for these, the system has been running stable (at
writeback_percent=40 the last few months), so I would bet on a different
source of your crashes than bcache.
> --- cut here ---
> Message from syslogd@san02 at Sep 7 14:56:15 ...
> kernel:[74182.424659] Kernel panic - not syncing: stack-protector:
> Kernel stack is corrupted in: ffffffffa001a815
>
> Message from syslogd@san02 at Sep 7 14:56:15 ...
> kernel:[74182.424659]
>
> Message from syslogd@san02 at Sep 7 14:56:15 ...
> kernel:[74182.474050] Kernel Offset: 0x0 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> --- cut here ---
Maybe you could set up a serial console? That way you'd be able to catch
all the kernel messages.
> Since there's no stack trace, this lets much room for speculation.
> But at least I now have an idea where the reboots (and two other
> "full stops") might stem from: stack corruption. I have run
> scripts/checkstack.pl on the bcache module and found no excessive
> stack use, but checking for memset() and memcpy() in bcache's code
> gave a number of hits - I'll have to have a look at them, one by
> one, and hope to find my way around.
>
> I'll give my servers at least two weeks to run with your patch and
> writeback_percent==0 to see if we're hit by reboots with that code
> as well. If not, I'll take that as an indicator that the
> implementation of the "PID regulator" may need a closer look.
>
> Kent, do you remember having fixed anything that might explain this
> stack corruption behavior, in code later than what's included in
> kernel 3.18.8?
--
Vojtech Pavlik
Director SUSE Labs
next prev parent reply other threads:[~2015-09-07 15:52 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-30 8:54 Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach Vojtech Pavlik
2015-08-31 14:39 ` Emmanuel Florac
2015-08-31 14:49 ` Vojtech Pavlik
2015-08-31 15:04 ` Kent Overstreet
2015-08-31 16:45 ` Vojtech Pavlik
2015-08-31 16:53 ` Kent Overstreet
2015-08-31 17:09 ` Vojtech Pavlik
2015-09-01 13:34 ` Vojtech Pavlik
2015-08-31 16:54 ` Vojtech Pavlik
2015-08-31 15:09 ` Emmanuel Florac
2015-08-31 15:54 ` Vojtech Pavlik
2015-09-05 11:06 ` Jens-U. Mozdzen
2015-09-05 11:29 ` Vojtech Pavlik
2015-09-07 15:13 ` Jens-U. Mozdzen
2015-09-07 15:52 ` Vojtech Pavlik [this message]
2015-09-07 16:01 ` Vojtech Pavlik
[not found] ` <B7A73681-AF9A-438C-9323-B2CE3BEFCA98@profihost.ag>
2015-09-07 18:56 ` Vojtech Pavlik
2015-09-08 9:04 ` Jens-U. Mozdzen
2015-09-08 9:10 ` Vojtech Pavlik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150907155217.GA27227@suse.com \
--to=vojtech@suse.com \
--cc=jmozdzen@nde.ag \
--cc=kent.overstreet@gmail.com \
--cc=linux-bcache@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox