From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:16549 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750849AbbLDAwy (ORCPT ); Thu, 3 Dec 2015 19:52:54 -0500 Date: Thu, 3 Dec 2015 16:52:37 -0800 From: Liu Bo To: Codebird Cc: linux-btrfs@vger.kernel.org Subject: Re: btrfs crashing the kernel with Seagate 8TB SMR drives. Message-ID: <20151204005237.GE19589@localhost.localdomain> Reply-To: bo.li.liu@oracle.com References: <566084F8.5050705@birds-are-nice.me> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <566084F8.5050705@birds-are-nice.me> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Thu, Dec 03, 2015 at 06:07:52PM +0000, Codebird wrote: > I've got a nice bug for you - because I can offer you what everyone likes to > see, a precise error message. > > I've got a btrfs filesystem spread over six devices, RAID1 mode. Four of > these are Seagate 8TB archive drives - those SMR ones that a few others have > reported failing when used with btrfs. I've had that issue too, and I just > can't explain why, other than to say that it only occurs when using them on > my mainboard SATA ports, not via USB dock. But that's not what I'm reporting > - that's just the source of the problem that causes the crash I am > reporting. > > The crash occurs when scrubbing, after some time and some terabytes - or > possibly just when reading a certain place, I'm not sure - and it gives this > helpful error left on the screen along with a system so unresponsive numlock > won't flash: > > BTRFS: Error (device sdg1) in __btrfs_free_extent:6360: errno=-5 IO failure > BTRFS: Error (device sdg1) in __btrfs_free_extent:6360: errno=-5 IO failure > BTRFS: Error (device sdg1) in btrfs_run_delayed_refs:2851: errno=-5 IO > failure > BTRFS: Error (device sdg1) in btrfs_run_delayed_refs:2851: errno=-5 IO > failure > BTRFS: Error (device sdg1) in btrfs_run_delayed_refs:2851: errno=-5 IO > failure > BTRFS: assertion failed: > f(fs_info->sb->s_flags & MS > -----------[ cut here ]------------ > kernel BUG at ../fs/btrfs/ctree.h:4057! > > Not sure if some of those 5 might be 6, as I was in a hurry to get it back > up both times and just got a blurry photo. But it looks to me like there > might be a chunk of code that doesn't handle a hardware fault - rather than > cleanly return an error it's causing the kernel to hang entirely. I've > managed to get this to happen twice now, so it's certainly something worth > looking into. This is on SUSE tumbleweed, with kernel 4.3.0-2-default. We do set btrfs to readonly state when handing this EIO error, but what's happening here is that btrfs failed to stop scrub workers calling repair_io_failure() and hit that ASSERT. Will send a patch to you. Thanks, -liubo > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html