From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay2.corp.sgi.com [137.38.102.29]) by oss.sgi.com (Postfix) with ESMTP id 857F97CA0 for ; Fri, 3 Jun 2016 21:28:55 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay2.corp.sgi.com (Postfix) with ESMTP id 44FB73040A4 for ; Fri, 3 Jun 2016 19:28:52 -0700 (PDT) Received: from smtp-out-so.shaw.ca (smtp-out-so.shaw.ca [64.59.136.139]) by cuda.sgi.com with ESMTP id m3eeovh3GD2tGqjI (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Fri, 03 Jun 2016 19:28:50 -0700 (PDT) From: Andrew Ryder Subject: Re: xfs_repair fails after trying to format log cycle? References: <56F6DE67.60403@shaw.ca> <20160328085541.GA27040@bfoster.bfoster> <570C8D4D.3060304@shaw.ca> <20160412140512.GA59690@bfoster.bfoster> <570D578D.5010706@shaw.ca> <570DB6CD.1000007@shaw.ca> <20160413045129.GO567@dastard> Message-ID: <57523CE6.7020906@shaw.ca> Date: Fri, 3 Jun 2016 22:28:54 -0400 MIME-Version: 1.0 In-Reply-To: <20160413045129.GO567@dastard> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner Cc: Brian Foster , xfs@oss.sgi.com Sorry to dig up old stuff here but after replacing hardware and pulling my hair out to no end, I traced the source of the problem out. On every drive in the array, the points on the PCB which the contacts for the drives motors and head/actuators make contact with were oxidized enough to cause the issue. I ended up pulling the PCB off each drive and cleaning them as well as cleaning out all other cable and drive connectors from the HBA outward and everything is happy again. http://s33.postimg.org/uhjmvw4dr/Not_Cleaned.jpg http://s33.postimg.org/xo94ieii7/Partial_Cleaned_1.jpg http://s33.postimg.org/hoqgyumgf/Partial_Cleaned_2.jpg http://s33.postimg.org/68k20t8a7/Partial_Cleaned_3.jpg On 04/13/2016 12:51 AM, Dave Chinner wrote: > On Tue, Apr 12, 2016 at 11:02:37PM -0400, Andrew Ryder wrote: >> Is it possible the location its searching for at block >> >> 02:34:43.887528 pread64(4, 0x7fb8f53e0200, 2097152, 3001552175104) = >> -1 EIO (Input/output error) > > so offset is 3001552175104, or roughly around the 3TB mark. Given > the log i always placed int eh middle of the filesystem and you have > a 6TB device, then the above definitely looks like a valid place to > be reading from the log. > >> xfs_logprint: >> data device: 0x902 >> log device: 0x902 daddr: 5860130880 length: 4173824 > > daddr converted to offset is 5860130880 * 512 = 3001552175104, which > tells us that the above pread64 failure was definitely coming from > an attempt to read the log. > > That this is coming from the block device from userspace indicates a > problem below XFS. There is something going wrong with your > underlying block device and/or hardware here; AFAICT it's not > related to XFS at all. > >>> GNU Parted 3.2 >>> Using /dev/sdk >>> Welcome to GNU Parted! Type 'help' to view a list of commands. >>> (parted) p >>> Model: ATA ST2000DL001-9VT1 (scsi) >>> Disk /dev/sdk: 2000GB >>> Sector size (logical/physical): 512B/512B >>> Partition Table: msdos >>> Disk Flags: >>> >>> Number Start End Size Type File system Flags >>> 1 512B 2000GB 2000GB primary raid >>> Number Start End Size Type File system Flags >>> 1 1s 3907029167s 3907029167s primary raid > > Compared to the other devices, it has a different start sector, a > different size, and an msdos partition table rather than gpt. > Definitely a red flag... > >>>>> This all began when the RR2722 driver running under 3.18.15 >>>>> complained and > > Reported physical IO errors to a write command. Really, this looks > like a hardware issue, not something that can be fixed by running > xfs_repair... > > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs