* resize2fs problem with stride calc @ 2014-09-20 20:46 TR Reardon 2014-09-21 16:13 ` Eric Sandeen 0 siblings, 1 reply; 5+ messages in thread From: TR Reardon @ 2014-09-20 20:46 UTC (permalink / raw) To: linux-ext4@vger.kernel.org resize2fs seems to come up with some crazy default stride numbers. This occurs with and without bigalloc. I was testing enabling/disabling 64bit using latest patches from DJW, and noticed that s_raid_stride was being written with nonsensical values, in particular determine_fs_stride() is coming up with overly large values. The code is old (2006) and lacks comment so I'm not sure what the intended operation is. Does this just need to be updated for flex_bg? Should s_raid_stride ever be auto-changed on resize? If it should change, should stripe also change? +Reardon ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: resize2fs problem with stride calc 2014-09-20 20:46 resize2fs problem with stride calc TR Reardon @ 2014-09-21 16:13 ` Eric Sandeen 2014-09-21 22:32 ` TR Reardon 0 siblings, 1 reply; 5+ messages in thread From: Eric Sandeen @ 2014-09-21 16:13 UTC (permalink / raw) To: TR Reardon, linux-ext4@vger.kernel.org On 9/20/14 3:46 PM, TR Reardon wrote: > resize2fs seems to come up with some crazy default stride numbers. > This occurs with and without bigalloc. > > > I was testing enabling/disabling 64bit using latest patches from DJW, > and noticed that s_raid_stride was being written with nonsensical > values, in particular determine_fs_stride() is coming up with overly > large values. The code is old (2006) and lacks comment so I'm not > sure what the intended operation is. Does this just need to be > updated for flex_bg? Should s_raid_stride ever be auto-changed on > resize? If it should change, should stripe also change? That old commit says: + In addition, add code so that resize2fs can automatically + determine the RAID stride parameter that had been + previously used on the filesystem. but a year later, in 2007, this: commit 96c6a3acd377698cb99ffd9925bec9b20ca4f6f9 Author: Theodore Ts'o <tytso@mit.edu> Date: Fri May 18 22:06:53 2007 -0400 Store the RAID stride value in the superblock and take advantage of it stored it properly in the superblock (this hit e2fsprogs-1.40). So maybe the whole heuristic could just be removed now, but from a simple test, it's working here. What was the geometry (dumpe2fs -h) of your filesystem before the resize? -Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: resize2fs problem with stride calc 2014-09-21 16:13 ` Eric Sandeen @ 2014-09-21 22:32 ` TR Reardon 2014-09-29 20:59 ` Darrick J. Wong 0 siblings, 1 reply; 5+ messages in thread From: TR Reardon @ 2014-09-21 22:32 UTC (permalink / raw) To: Eric Sandeen, linux-ext4@vger.kernel.org, darrick.wong@oracle.com > Date: Sun, 21 Sep 2014 11:13:06 -0500 > From: sandeen@redhat.com > > On 9/20/14 3:46 PM, TR Reardon wrote: >> resize2fs seems to come up with some crazy default stride numbers. >> This occurs with and without bigalloc. >> >> >> I was testing enabling/disabling 64bit using latest patches from DJW, >> and noticed that s_raid_stride was being written with nonsensical >> values, in particular determine_fs_stride() is coming up with overly >> large values. The code is old (2006) and lacks comment so I'm not >> sure what the intended operation is. Does this just need to be >> updated for flex_bg? Should s_raid_stride ever be auto-changed on >> resize? If it should change, should stripe also change? > > That old commit says: > > + In addition, add code so that resize2fs can automatically > + determine the RAID stride parameter that had been > + previously used on the filesystem. > > but a year later, in 2007, this: > > commit 96c6a3acd377698cb99ffd9925bec9b20ca4f6f9 > Author: Theodore Ts'o <tytso@mit.edu> > Date: Fri May 18 22:06:53 2007 -0400 > > Store the RAID stride value in the superblock and take advantage of it > > stored it properly in the superblock (this hit e2fsprogs-1.40). > > So maybe the whole heuristic could just be removed now, but from a simple > test, it's working here. Have you tried to test with flex_bg? I think that's what raises the problem. > > What was the geometry (dumpe2fs -h) of your filesystem before the resize? Here is one example. I am changing _only_ the bitness, geometry stays the same other than changes to reserved GDT. Prior to resize2fs -b : ... Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file dir_nlink extra_isize bigalloc metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 715776 Block count: 732566384 Reserved block count: 0 Free blocks: 13977472 Free inodes: 714665 First block: 0 Block size: 4096 Cluster size: 65536 Reserved GDT blocks: 53 Blocks per group: 524288 Clusters per group: 32768 Inodes per group: 512 Inode blocks per group: 32 Flex block group size: 16 ... Inode size: 256 Required extra isize: 28 Desired extra isize: 28 ... And following resize2fs -b: Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize bigalloc metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 715776 Block count: 732566384 Reserved block count: 0 Free blocks: 13977472 Free inodes: 714665 First block: 0 Block size: 4096 Cluster size: 65536 Group descriptor size: 64 Reserved GDT blocks: 42 Blocks per group: 524288 Clusters per group: 32768 Inodes per group: 512 Inode blocks per group: 32 RAID stride: 65520 Flex block group size: 16 ... Inode size: 256 Required extra isize: 28 Desired extra isize: 28 ... -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: resize2fs problem with stride calc 2014-09-21 22:32 ` TR Reardon @ 2014-09-29 20:59 ` Darrick J. Wong 2014-09-29 22:46 ` Andreas Dilger 0 siblings, 1 reply; 5+ messages in thread From: Darrick J. Wong @ 2014-09-29 20:59 UTC (permalink / raw) To: TR Reardon; +Cc: Eric Sandeen, linux-ext4@vger.kernel.org On Sun, Sep 21, 2014 at 06:32:53PM -0400, TR Reardon wrote: > > Date: Sun, 21 Sep 2014 11:13:06 -0500 > > From: sandeen@redhat.com > > > > On 9/20/14 3:46 PM, TR Reardon wrote: > >> resize2fs seems to come up with some crazy default stride numbers. > >> This occurs with and without bigalloc. > >> > >> > >> I was testing enabling/disabling 64bit using latest patches from DJW, > >> and noticed that s_raid_stride was being written with nonsensical > >> values, in particular determine_fs_stride() is coming up with overly > >> large values. The code is old (2006) and lacks comment so I'm not > >> sure what the intended operation is. Does this just need to be > >> updated for flex_bg? Should s_raid_stride ever be auto-changed on > >> resize? If it should change, should stripe also change? > > > > That old commit says: > > > > + In addition, add code so that resize2fs can automatically > > + determine the RAID stride parameter that had been > > + previously used on the filesystem. > > > > but a year later, in 2007, this: > > > > commit 96c6a3acd377698cb99ffd9925bec9b20ca4f6f9 > > Author: Theodore Ts'o <tytso@mit.edu> > > Date: Fri May 18 22:06:53 2007 -0400 > > > > Store the RAID stride value in the superblock and take advantage of it > > > > stored it properly in the superblock (this hit e2fsprogs-1.40). > > > > So maybe the whole heuristic could just be removed now, but from a simple > > test, it's working here. > > Have you tried to test with flex_bg? I think that's what raises the problem. It'll end up recalculating stride for any flexbg FS with more than 12 BGs and more than 3 flexbgs. This piece is neither a part of nor is used for 32>64bit conversion. AFAICT, the point of determine_fs_stride() is to try to recover the RAID stride by inferring it from minor variations in the block/inode bitmap locations between successive block groups. This explodes when flexbg is turned on because bitmap blocks are stored in "other" bgs and there's a "big jump" between the bitmaps in the last bg of one flexbg and the bitmaps of the first bg of the next flexbg. Between bgs in a single flexbg the *_stride values are "negative" and don't contribute to the stride calculation. I /think/ the solution is to ignore first blockgroup when crossing a flexbg boundary when there are flexbgs. Can you give the following patch a spin? It shouldn't spit out "group XXX has stride..." messages after that. I'm not sure that "negative" stride ought to be ignored either, but.... Honestly I'd rather just kill the whole thing, but someone must've had a reason to put it there? Ted? --D diff --git a/resize/main.c b/resize/main.c index 060e67d..b993dfb 100644 --- a/resize/main.c +++ b/resize/main.c @@ -105,6 +105,7 @@ static void determine_fs_stride(ext2_filsys fs) unsigned long long sum; unsigned int has_sb, prev_has_sb = 0, num; int i_stride, b_stride; + int flexbg_size = 1 << fs->super->s_log_groups_per_flex; if (fs->stride) return; @@ -120,10 +121,11 @@ static void determine_fs_stride(ext2_filsys fs) ext2fs_inode_bitmap_loc(fs, group - 1) - fs->super->s_blocks_per_group; if (b_stride != i_stride || - b_stride < 0) + b_stride < 0 || + (flexbg_size > 1 && (group % flexbg_size == 0))) goto next; - /* printf("group %d has stride %d\n", group, b_stride); */ + printf("group %d has stride %d %d\n", group, b_stride, i_stride); sum += b_stride; num++; @@ -133,7 +135,7 @@ static void determine_fs_stride(ext2_filsys fs) if (fs->group_desc_count > 12 && num < 3) sum = 0; ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: resize2fs problem with stride calc 2014-09-29 20:59 ` Darrick J. Wong @ 2014-09-29 22:46 ` Andreas Dilger 0 siblings, 0 replies; 5+ messages in thread From: Andreas Dilger @ 2014-09-29 22:46 UTC (permalink / raw) To: Darrick J. Wong; +Cc: TR Reardon, Eric Sandeen, linux-ext4@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3232 bytes --] On Sep 29, 2014, at 2:59 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote: > > It'll end up recalculating stride for any flexbg FS with more than > 12 BGs and more than 3 flexbgs. This piece is neither a part of nor > is used for 32>64bit conversion. > > AFAICT, the point of determine_fs_stride() is to try to recover the > RAID stride by inferring it from minor variations in the block/inode > bitmap locations between successive block groups. This explodes when > flexbg is turned on because bitmap blocks are stored in "other" bgs > and there's a "big jump" between the bitmaps in the last bg of one > flexbg and the bitmaps of the first bg of the next flexbg. Between > bgs in a single flexbg the *_stride values are "negative" and don't > contribute to the stride calculation. > > I /think/ the solution is to ignore first blockgroup when crossing a > flexbg boundary when there are flexbgs. Can you give the following > patch a spin? It shouldn't spit out "group XXX has stride..." messages > after that. I'm not sure that "negative" stride ought to be ignored > either, but.... > > Honestly I'd rather just kill the whole thing, but someone must've had > a reason to put it there? Ted? I added this to try and preserve the RAID stride while doing the resize, to avoid making one disk in a RAID be a hot-spot for bitmap updates. With flex_bg the RAID stride becomes less critical, because the bitmaps are contiguous and will naturally span the RAID stripes if the flex_bg factor is large enough to have blocks on every stripe. We normally specify the flex_bg factor to be 256 (== 1MB of contiguous bitmaps) to exactly match the RAID stripe width when formatting Lustre, so we don't need the stride for this, but still specify it to help the mballoc align the IO blocks properly. I haven't given much though about what happens if these two do not line up evenly. Cheers, Andreas > --D > > diff --git a/resize/main.c b/resize/main.c > index 060e67d..b993dfb 100644 > --- a/resize/main.c > +++ b/resize/main.c > @@ -105,6 +105,7 @@ static void determine_fs_stride(ext2_filsys fs) > unsigned long long sum; > unsigned int has_sb, prev_has_sb = 0, num; > int i_stride, b_stride; > + int flexbg_size = 1 << fs->super->s_log_groups_per_flex; > > if (fs->stride) > return; > @@ -120,10 +121,11 @@ static void determine_fs_stride(ext2_filsys fs) > ext2fs_inode_bitmap_loc(fs, group - 1) - > fs->super->s_blocks_per_group; > if (b_stride != i_stride || > - b_stride < 0) > + b_stride < 0 || > + (flexbg_size > 1 && (group % flexbg_size == 0))) > goto next; > > - /* printf("group %d has stride %d\n", group, b_stride); */ > + printf("group %d has stride %d %d\n", group, b_stride, i_stride); > sum += b_stride; > num++; > > @@ -133,7 +135,7 @@ static void determine_fs_stride(ext2_filsys fs) > > if (fs->group_desc_count > 12 && num < 3) > sum = 0; > - > +printf("sum is %llu %u\n", sum, num); > if (num) > fs->stride = sum / num; > else > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-09-29 22:46 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-09-20 20:46 resize2fs problem with stride calc TR Reardon 2014-09-21 16:13 ` Eric Sandeen 2014-09-21 22:32 ` TR Reardon 2014-09-29 20:59 ` Darrick J. Wong 2014-09-29 22:46 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox