From mboxrd@z Thu Jan 1 00:00:00 1970 From: Emmanuel Florac Subject: Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach Date: Mon, 31 Aug 2015 16:39:37 +0200 Message-ID: <20150831163937.00ca3f7a@harpe.intellique.com> References: <20150830085442.GA31722@suse.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from smtp5-g21.free.fr ([212.27.42.5]:41773 "EHLO smtp5-g21.free.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751445AbbHaOjd convert rfc822-to-8bit (ORCPT ); Mon, 31 Aug 2015 10:39:33 -0400 In-Reply-To: <20150830085442.GA31722@suse.com> Sender: linux-bcache-owner@vger.kernel.org List-Id: linux-bcache@vger.kernel.org To: Vojtech Pavlik Cc: kmo@daterainc.com, linux-bcache@vger.kernel.org Le Sun, 30 Aug 2015 10:54:42 +0200 Vojtech Pavlik =C3=A9crivait: > Then I noticed that during those situations where the system was > slow, and processes stuck in D, bcache_writeback CPU usage was > soaring all the way to saturating a core, In my experience, bcache_writeback stays in Wait state, therefore always saturate a core: any machine I'm running bcache on has a constant load of 1.00 even when completely idle. > showing this backtrace, > spending time in refill_keybuf_fn(): > Changing the configuration to writeback_percent=3D40 helped. For some > time at least. >=20 > When the issue returned, without any further changes to the system, I > started investigating deeper. Since writeback_percent was large, also > the amount of dirty data was large. In my case, when dirty data reaches the upper limit (i.e. when the amount of dirty data equals the writeback_percent * backing device size ), and it occurs regularly, the system just freezes... > Before poking deeper, I decided I > want to clear the dirty data entierly. So I set the system to > cache_mode=3Dwritethrough and watched the dirty data trickle to the > backing device. >=20 > But then it stopped at 2.8G and didn't progress any further. The > bcache_writeback thread was at 100% CPU usage again and system was > near unusable. Reverting to writeback made the system responsive > again. The bcache_writeback stays at 100% _even_ when in writethrough mode, alas. So this looks normal. However dirty_data definitely should drop to zero... =20 > I consider this a rather serious bug, even though it is most likely > caused by the cache device being corrupted. Any hints? Did you check what "smartctl -a" has to say about your backing device, and maybe your spinning drives too? Just in case... --=20 -----------------------------------------------------------------------= - Emmanuel Florac | Direction technique | Intellique | | +33 1 78 94 84 02 -----------------------------------------------------------------------= -