From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vojtech Pavlik Subject: Re: Bcache stuck at writeback of a key, consuming 100% CPU, not possible to detach Date: Mon, 31 Aug 2015 18:45:31 +0200 Message-ID: <20150831164531.GA9810@suse.com> References: <20150830085442.GA31722@suse.com> <20150831163937.00ca3f7a@harpe.intellique.com> <20150831144949.GA3276@suse.com> <20150831150429.GA27538@kmo-pixel> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mx2.suse.de ([195.135.220.15]:36313 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753302AbbHaQpd (ORCPT ); Mon, 31 Aug 2015 12:45:33 -0400 Content-Disposition: inline In-Reply-To: <20150831150429.GA27538@kmo-pixel> Sender: linux-bcache-owner@vger.kernel.org List-Id: linux-bcache@vger.kernel.org To: Kent Overstreet Cc: Emmanuel Florac , kmo@daterainc.com, linux-bcache@vger.kernel.org On Mon, Aug 31, 2015 at 07:04:29AM -0800, Kent Overstreet wrote: > I suspect there's two different bugs here. > > - I'm starting to suspect there's a bug in the dirty data accounting, and it's > getting out of sync - i.e. reading 2.8 GB or whatever when it's actually 0. > that would explain it spinning when there actually isn't any work for it to > do. That may be the case, but doesn't quite match my observation. Using this command line: echo 2 > writeback_percent; echo 0 > writeback_percent; echo 100000 > writeback_rate; echo none > cache_mode; while true; do if top -b -n 1 | grep 'R.*bcache_write'; then date; echo looping; echo writeback > cache_mode; echo 40 > writeback_percent; sleep 1; echo 2 > writeback_percent; echo 0 > writeback_percent; echo 100000 > writeback_rate; echo none > cache_mode; echo fixed; cat /sys/block/bcache0/bcache/dirty_data; fi; done I managed to get down to about 200 MB od dirty data reported. If the reporting was off by a fixed offset, I wouldn't be getting the 100% CPU and running bcache_writeback at 5GB of dirty data already. At least unless the accounting of dirty data is very wrong and fluctuating. > - with a large enough amount of data, the 30 second writeback_delay may be > insufficient; if it takes longer than that just to scan the entire keyspace > it'll never get a chance to sleep. try bumping writeback_delay up and see if > that helps. That shouldn't be the case when the amount of dirty data is below a gigabyte, or is it? > the ratelimiting on scanning for dirty data needs to be changed to something > more sophisticated, the existing fixed delay is problematic. -- Vojtech Pavlik Director SuSE Labs