From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754132AbcBBIG4 (ORCPT ); Tue, 2 Feb 2016 03:06:56 -0500 Received: from dbmail.hebserv.net ([78.40.121.80]:52624 "EHLO dbmail.hebserv.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753111AbcBBIGx convert rfc822-to-8bit (ORCPT ); Tue, 2 Feb 2016 03:06:53 -0500 X-Greylist: delayed 506 seconds by postgrey-1.27 at vger.kernel.org; Tue, 02 Feb 2016 03:06:53 EST Mime-Version: 1.0 Date: Tue, 02 Feb 2016 07:58:24 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8BIT Message-ID: X-Mailer: RainLoop/1.9.3.365 From: "Yannis Aribaud" Subject: Re: bcache_writeback: bch_writeback_thread...blk_queue_bio IO hang [was BUG: soft lockup] To: "Eric Wheeler" , "Johannes Thumshirn" Cc: linux-bcache@vger.kernel.org, "Kent Overstreet" , linux-kernel@vger.kernel.org, vojtech@suse.com In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, 1 février 2016 20:42 "Eric Wheeler" a écrit: > [ changed subject, more below ] > Are you using md-based raid5/raid6? Not at all. In my setup Bcache is used on two bare drives (HDD and SSD) behing a hardware RAID controller (disks configured in non-RAID mode). > > If so, it could be the raid_partial_stripes_expensive bug. > > If it *is* the bcache optimizations around partial_stripes_expensive, then > please try the patch below and then do this *before* loading the bcache > module (or at least before registering the bcache backing volume): > echo 0 > /sys/block/BLK/queue/limits/raid_partial_stripes_expensive > where BLK is your backing volume. > > I wrote this patch because we use hardware RAID5/6 and wanted to get the > partial_stripes_expensive optimizations on by setting > raid_partial_stripes_expensive=1 and io_opt=our_stride_width. > Unfortunately it caused the backtrace in the LKML thread below, so we > stopped using it. > > See also this thread, however, it shows a backtrace prior to > removing the bio splitting code: > https://lkml.org/lkml/2016/1/7/844 I'll take a look. > Note that commit 749b61dab30736eb95b1ee23738cae90973d4fc3 might not > exactly address this issue, but it might prevent full hangs. Make sure > you cherry-pick commit 749b61dab30736eb95b1ee23738cae90973d4fc3 and > hand-clean-up as necessary. It simplifies the bio code and deletes a bunch > of stuff. It showed up in 4.4 and isn't in my patchset. > > After commit the commit above and setting > raid_partial_stripes_expensive=1 > we still get errors like this: > bcache: bch_count_io_errors() dm-3: IO error on writing data to cache, > recovering but they don't lock up the system. Ultimately we run with > raid_partial_stripes_expensive=0 because of these related problemsand > haven't had any issues. Also, fwiw, we run 4.1.y as our stable branch > backed by hardware raid. I'll give a try if find some time to do so. Thks. -- Open is better