From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rob Harris Subject: Re: Custom driver FS brokenness at 4GB? Date: Thu, 28 May 2015 14:30:58 -0400 Message-ID: <55675EE2.80609@gmail.com> References: <5565CD0D.4080408@gmail.com> <20150528105931.GA31813@quack.suse.cz> <4DDF59F1-C0CA-43E2-BB66-4868A09C3081@dilger.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-ext4@vger.kernel.org To: Andreas Dilger , Jan Kara Return-path: Received: from mail-qk0-f178.google.com ([209.85.220.178]:34976 "EHLO mail-qk0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751556AbbE1SbA (ORCPT ); Thu, 28 May 2015 14:31:00 -0400 Received: by qkhq76 with SMTP id q76so2902579qkh.2 for ; Thu, 28 May 2015 11:31:00 -0700 (PDT) In-Reply-To: <4DDF59F1-C0CA-43E2-BB66-4868A09C3081@dilger.ca> Sender: linux-ext4-owner@vger.kernel.org List-ID: Thanks for the pointers everyone. After further testing and code review, I was boneheadedly truncating a u64 to a u32 for the sector address as part of a function signature with an obscured typedef. *facepalm* All seems well now. Thanks for the help! -R On 05/28/2015 01:43 PM, Andreas Dilger wrote: > On May 28, 2015, at 4:59 AM, Jan Kara wrote: >> On Wed 27-05-15 09:56:29, Rob Harris wrote: >>> Greetings. I have an odd issue and need some ideas of where to go >>> next -- I'm out of hair to rip out. >>> >>> I'm writing a custom block device driver talking to some custom RAID >>> hardware (>32TB) using DMA scatter-gather, with no partitions and am >>> using make_request() to service all the BIO requests to simplify >>> debugging. I have the driver working to the point where using DD >>> against the block device seems to work fine (I'm setting >>> iflag|oflag=direct to ensure it's writing to the disk). I also have >>> the blk_queue set to only request a single 4k I/O per BIO (again to >>> simplify debugging for now.) Also, again to debug, I have a mutex >>> wrapping the entire make_request call to ensure that only a single >>> request is being serviced at a time. So, this should be as "simple" >>> as I can make the environment to debug this problem. >>> >>> Once the driver is loaded, when I try to create a file system (ext4 >>> but the same thing happens with xfs) it seems like there is some >>> corruption occurring, but only when I set the sector size of the >>> block device over 4GB. For instance, when I set the size to 4G, I >>> can mkfs.ext4, but after 2 or 3 mount/umounts the FS refuses to >>> mount anymore and the kernel log complains that the journal is >>> missing. This was discovered running this loop... >> Hard to tell exactly but with 4GB being 32-bit limit, I would first look >> for some int / unsigned int number overflow. You could possibly better >> debug this when writing some pattern via DD that is different for each >> block to verify that each block indeed lands in the expected location... > We have a tool "llverdev" which does exactly this - write a pattern > to each block in the block device (or in sparse regions covering the > device) with a timestamp and block number to track down sources of > block addressing errors: > > http://git.hpdd.intel.com/fs/lustre-release.git/blob/HEAD:/lustre/utils/llverdev.c > > Cheers, Andreas > >> Honza >>> #!/bin/sh >>> COUNT=4032 >>> >>> while [ 1 ] ; do >>> >>> figlet ${COUNT} >>> >>> ( umount /mnt ; rmmod smc ) || true >>> modprobe smc capacity_in_mb=${COUNT} debug=1 >>> mkfs.ext4 -m 0 /dev/smcd >>> >>> mount /dev/smcd /mnt >>> cp count_512m.dat /mnt/test >>> umount /mnt >>> mount /dev/smcd /mnt >>> umount /mnt >>> mount /dev/smcd /mnt >>> cmp count_512m.dat /mnt/test >>> umount /mnt >>> mount /dev/smcd /mnt # *** >>> sync >>> umount /mnt >>> mount /dev/smcd /mnt >>> sleep 1 >>> umount /mnt >>> >>> COUNT=$(( COUNT + 64 )) >>> sleep 1 >>> >>> done >>> >>> Sometimes I'll get in the kernel log: >>> May 27 09:39:01 febtober kernel: [64547.304695] EXT4-fs (smcd): >>> ext4_check_descriptors: Checksum for group 0 failed (7009!=0) >>> May 27 09:39:01 febtober kernel: [64547.305744] EXT4-fs (smcd): >>> group descriptors corrupted! >>> >>> Others I'll get: >>> May 27 09:46:49 ryftone-smcdrv kernel: [65014.342850] EXT4-fs >>> (smcd): no journal found >>> >>> >>> I've seen this loop fail as early as COUNT=4096, but as late as >>> COUNT=4220; removing the sync changes the behavior. >>> When it fails, it usually does so on the 3rd mount (***). >>> FYI, I effectively call: set_capacity( disk, capacity_in_mb * 2048 >>> ); ( 2048 * 512b (kernel sector) = 1M ) >>> >>> Another example: if I set the sector count of the disk to 16G, I can >>> run mkfs.ext4 but the first mount fails and I see May 27 09:07:27 >>> febtober kernel: [62653.269387] EXT4-fs (smcd): >>> ext4_check_descriptors: Block bitmap for group 0 not in group (block >>> 4294967295)! >>> >>> But, again, if I set the sector size < 4G, everything seems fine. I >>> can currently DD read and write across that 4G boundary without >>> issue -- it's ONLY the filesystem accesses. My gut is screaming >>> there's 32/64 bit overflow condition somewhere but for the life of >>> me I can't find it. Is there something I need to set to tell the >>> block layer I have a 64-bit addressible device? set_capacity is >>> always the number of LINUX KERNEL sectors (not what I set >>> blk_queue_logical|physical_block_size to) correct? >>> >>> I'm currently on 3.16.0 (Ubuntu 14.04.2 LTS) if it matters. >>> >>> Any help/pointers would be greatly appreciated. >>> >>> --Rob Harris >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> Jan Kara >> SUSE Labs, CR >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Cheers, Andreas > > > > >