* problem(?) in ext4 or mke2fs
@ 2011-04-03 18:07 Zeev Tarantov
2011-04-03 18:40 ` Eric Sandeen
0 siblings, 1 reply; 8+ messages in thread
From: Zeev Tarantov @ 2011-04-03 18:07 UTC (permalink / raw)
To: linux-ext4
While testing zram I ran a script that creates a block devices,
creates a filesystem on in and untars Qt on that filesystem.
I was surprised to find ext4_mb_scan_aligned near the top of the profile output.
This was evidently because the command "mke2fs -t ext4 -m 0 -I 128 -O
^has_journal,^ext_attr <block device>"
created a filesystem with (output of tune2fs):
RAID stride: 1
RAID stripe width: 1
I thought this strange and removed these values using debugfs:
set_super_value raid_stride 0
set_super_value raid_stripe_width 0
With this "fix" the symbol ext4_mb_scan_aligned disappeared from perf's output:
18.98% -3.66% gzip [.] zip
0.00% +14.84% [kernel.kallsyms] [k] ext4_mb_scan_aligned
17.91% -3.44% gzip [.]
treat_file.part.4.2264
13.73% -2.47% [csnappy_compress] [k]
snappy_compress_fragment
3.96% -0.77% [kernel.kallsyms] [k]
copy_user_generic_string
3.05% -0.41% libc-2.13.so [.] __memcpy_ssse3
0.89% +1.49% [kernel.kallsyms] [k] _raw_spin_lock
2.63% -0.49% [kernel.kallsyms] [k] __memcpy
1.61% -0.17% [kernel.kallsyms] [k] __memset
0.78% -0.11% [kernel.kallsyms] [k] ext4_mark_iloc_dirty
0.63% -0.11% [kernel.kallsyms] [k] system_call
0.66% -0.14% gzip [.] treat_stdin.2262
0.58% -0.12% libc-2.13.so [.] _int_malloc
This is using mke2fs 1.41.14 (22-Dec-2010) on Linux 2.6.38.2.
Is this expected behavior? Do you need me to provide more information?
regards,
-Z.T.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: problem(?) in ext4 or mke2fs 2011-04-03 18:07 problem(?) in ext4 or mke2fs Zeev Tarantov @ 2011-04-03 18:40 ` Eric Sandeen 2011-04-03 18:52 ` Zeev Tarantov 0 siblings, 1 reply; 8+ messages in thread From: Eric Sandeen @ 2011-04-03 18:40 UTC (permalink / raw) To: Zeev Tarantov; +Cc: linux-ext4 On 4/3/11 11:07 AM, Zeev Tarantov wrote: > While testing zram I ran a script that creates a block devices, > creates a filesystem on in and untars Qt on that filesystem. > I was surprised to find ext4_mb_scan_aligned near the top of the profile output. > This was evidently because the command "mke2fs -t ext4 -m 0 -I 128 -O > ^has_journal,^ext_attr <block device>" > created a filesystem with (output of tune2fs): > RAID stride: 1 > RAID stripe width: 1 mke2fs queries the block device for its geometry, based on what is reported via sysfs: /* * Sets the geometry of a device (stripe/stride), and returns the * device's alignment offset, if any, or a negative error. */ static int get_device_geometry( ... ... min_io = blkid_topology_get_minimum_io_size(tp); opt_io = blkid_topology_get_optimal_io_size(tp); ... fs_param->s_raid_stride = min_io / blocksize; fs_param->s_raid_stripe_width = opt_io / blocksize; What does # blockdev --getiomin --getioopt /dev/<yourdevice> say for your device? The device may be reporting odd values, but mke2fs probably should be smart enough not to set block-sized stripe unit and width... -Eric > I thought this strange and removed these values using debugfs: > set_super_value raid_stride 0 > set_super_value raid_stripe_width 0 > > With this "fix" the symbol ext4_mb_scan_aligned disappeared from perf's output: > > 18.98% -3.66% gzip [.] zip > 0.00% +14.84% [kernel.kallsyms] [k] ext4_mb_scan_aligned > 17.91% -3.44% gzip [.] > treat_file.part.4.2264 > 13.73% -2.47% [csnappy_compress] [k] > snappy_compress_fragment > 3.96% -0.77% [kernel.kallsyms] [k] > copy_user_generic_string > 3.05% -0.41% libc-2.13.so [.] __memcpy_ssse3 > 0.89% +1.49% [kernel.kallsyms] [k] _raw_spin_lock > 2.63% -0.49% [kernel.kallsyms] [k] __memcpy > 1.61% -0.17% [kernel.kallsyms] [k] __memset > 0.78% -0.11% [kernel.kallsyms] [k] ext4_mark_iloc_dirty > 0.63% -0.11% [kernel.kallsyms] [k] system_call > 0.66% -0.14% gzip [.] treat_stdin.2262 > 0.58% -0.12% libc-2.13.so [.] _int_malloc > > This is using mke2fs 1.41.14 (22-Dec-2010) on Linux 2.6.38.2. > > Is this expected behavior? Do you need me to provide more information? > > regards, > -Z.T. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem(?) in ext4 or mke2fs 2011-04-03 18:40 ` Eric Sandeen @ 2011-04-03 18:52 ` Zeev Tarantov 2011-04-03 18:56 ` Eric Sandeen 0 siblings, 1 reply; 8+ messages in thread From: Zeev Tarantov @ 2011-04-03 18:52 UTC (permalink / raw) To: Eric Sandeen; +Cc: linux-ext4 On Sun, Apr 3, 2011 at 21:40, Eric Sandeen <sandeen@redhat.com> wrote: > On 4/3/11 11:07 AM, Zeev Tarantov wrote: >> While testing zram I ran a script that creates a block devices, >> creates a filesystem on in and untars Qt on that filesystem. >> I was surprised to find ext4_mb_scan_aligned near the top of the profile output. >> This was evidently because the command "mke2fs -t ext4 -m 0 -I 128 -O >> ^has_journal,^ext_attr <block device>" >> created a filesystem with (output of tune2fs): >> RAID stride: 1 >> RAID stripe width: 1 > > mke2fs queries the block device for its geometry, based on what is > reported via sysfs: > > /* > * Sets the geometry of a device (stripe/stride), and returns the > * device's alignment offset, if any, or a negative error. > */ > static int get_device_geometry( ... > ... > min_io = blkid_topology_get_minimum_io_size(tp); > opt_io = blkid_topology_get_optimal_io_size(tp); > ... > > fs_param->s_raid_stride = min_io / blocksize; > fs_param->s_raid_stripe_width = opt_io / blocksize; > > What does > > # blockdev --getiomin --getioopt /dev/<yourdevice> > > say for your device? get logical block (sector) size: 4096 get physical block (sector) size: 4096 get minimum I/O size: 4096 get optimal I/O size: 4096 get alignment offset in bytes: 0 get max sectors per request: 255 get blocksize: 4096 get readahead: 256 > The device may be reporting odd values, but mke2fs probably > should be smart enough not to set block-sized stripe unit and width... If the filesystem created with the default options is slow or has higher cpu usage, it should be changed. > -Eric -Z.T. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem(?) in ext4 or mke2fs 2011-04-03 18:52 ` Zeev Tarantov @ 2011-04-03 18:56 ` Eric Sandeen 2011-04-03 19:01 ` Zeev Tarantov ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Eric Sandeen @ 2011-04-03 18:56 UTC (permalink / raw) To: Zeev Tarantov; +Cc: linux-ext4 On 4/3/11 11:52 AM, Zeev Tarantov wrote: > On Sun, Apr 3, 2011 at 21:40, Eric Sandeen <sandeen@redhat.com> wrote: ... >> What does >> >> # blockdev --getiomin --getioopt /dev/<yourdevice> >> >> say for your device? > > get logical block (sector) size: 4096 > get physical block (sector) size: 4096 > get minimum I/O size: 4096 > get optimal I/O size: 4096 > get alignment offset in bytes: 0 > get max sectors per request: 255 > get blocksize: 4096 > get readahead: 256 > >> The device may be reporting odd values, but mke2fs probably >> should be smart enough not to set block-sized stripe unit and width... > > If the filesystem created with the default options is slow or has > higher cpu usage, it should be changed. I agree. For actual striped storage, this makes it faster, but this case is a problem; block-sized stripe width is never going to be good. What device is this, exactly? -Eric (losing my free airport wifi in about 8 minutes, so I may have to continue this later...!) >> -Eric > > -Z.T. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem(?) in ext4 or mke2fs 2011-04-03 18:56 ` Eric Sandeen @ 2011-04-03 19:01 ` Zeev Tarantov 2011-04-03 19:21 ` Eric Sandeen 2011-04-04 0:40 ` Andreas Dilger 2011-04-04 0:50 ` Andreas Dilger 2 siblings, 1 reply; 8+ messages in thread From: Zeev Tarantov @ 2011-04-03 19:01 UTC (permalink / raw) To: Eric Sandeen; +Cc: linux-ext4 On Sun, Apr 3, 2011 at 21:56, Eric Sandeen <sandeen@redhat.com> wrote: > On 4/3/11 11:52 AM, Zeev Tarantov wrote: >> If the filesystem created with the default options is slow or has >> higher cpu usage, it should be changed. > > I agree. For actual striped storage, this makes it faster, but this > case is a problem; block-sized stripe width is never going to be good. > What device is this, exactly? Look in linux-2.6/drivers/staging/zram/zram.txt > -Eric (losing my free airport wifi in about 8 minutes, so I may have > to continue this later...!) > >>> -Eric >> >> -Z.T. > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem(?) in ext4 or mke2fs 2011-04-03 19:01 ` Zeev Tarantov @ 2011-04-03 19:21 ` Eric Sandeen 0 siblings, 0 replies; 8+ messages in thread From: Eric Sandeen @ 2011-04-03 19:21 UTC (permalink / raw) To: Zeev Tarantov; +Cc: linux-ext4 On 4/3/11 12:01 PM, Zeev Tarantov wrote: > On Sun, Apr 3, 2011 at 21:56, Eric Sandeen <sandeen@redhat.com> wrote: >> On 4/3/11 11:52 AM, Zeev Tarantov wrote: >>> If the filesystem created with the default options is slow or has >>> higher cpu usage, it should be changed. >> >> I agree. For actual striped storage, this makes it faster, but this >> case is a problem; block-sized stripe width is never going to be good. >> What device is this, exactly? > > Look in linux-2.6/drivers/staging/zram/zram.txt OK, so it does: /* * To ensure that we always get PAGE_SIZE aligned * and n*PAGE_SIZED sized I/O requests. */ blk_queue_physical_block_size(zram->disk->queue, PAGE_SIZE); blk_queue_logical_block_size(zram->disk->queue, ZRAM_LOGICAL_BLOCK_SIZE); blk_queue_io_min(zram->disk->queue, PAGE_SIZE); blk_queue_io_opt(zram->disk->queue, PAGE_SIZE); These are all documented in Documentation/ABI/testing/sysfs-block. I don't think that setting all those values in zram is necessary and/or sufficient to achieve what is desired in the comment. io_min/io_opt generally are set only for striped devices. Still, mke2fsprogs should probably sanity-check for this; I'll make sure this seems right, and send a patch. Thanks, -Eric >> -Eric (losing my free airport wifi in about 8 minutes, so I may have >> to continue this later...!) >> >>>> -Eric >>> >>> -Z.T. >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem(?) in ext4 or mke2fs 2011-04-03 18:56 ` Eric Sandeen 2011-04-03 19:01 ` Zeev Tarantov @ 2011-04-04 0:40 ` Andreas Dilger 2011-04-04 0:50 ` Andreas Dilger 2 siblings, 0 replies; 8+ messages in thread From: Andreas Dilger @ 2011-04-04 0:40 UTC (permalink / raw) To: Eric Sandeen; +Cc: Zeev Tarantov, linux-ext4@vger.kernel.org Cheers, Andreas On 2011-04-03, at 8:56 AM, Eric Sandeen <sandeen@redhat.com> wrote: > On 4/3/11 11:52 AM, Zeev Tarantov wrote: >> On Sun, Apr 3, 2011 at 21:40, Eric Sandeen <sandeen@redhat.com> wrote: > > ... > >>> What does >>> >>> # blockdev --getiomin --getioopt /dev/<yourdevice> >>> >>> say for your device? >> >> get logical block (sector) size: 4096 >> get physical block (sector) size: 4096 >> get minimum I/O size: 4096 >> get optimal I/O size: 4096 >> get alignment offset in bytes: 0 >> get max sectors per request: 255 >> get blocksize: 4096 >> get readahead: 256 >> >>> The device may be reporting odd values, but mke2fs probably >>> should be smart enough not to set block-sized stripe unit and width... >> >> If the filesystem created with the default options is slow or has >> higher cpu usage, it should be changed. > > I agree. For actual striped storage, this makes it faster, but this > case is a problem; block-sized stripe width is never going to be good. > What device is this, exactly? > > -Eric (losing my free airport wifi in about 8 minutes, so I may have > to continue this later...!) > >>> -Eric >> >> -Z.T. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem(?) in ext4 or mke2fs 2011-04-03 18:56 ` Eric Sandeen 2011-04-03 19:01 ` Zeev Tarantov 2011-04-04 0:40 ` Andreas Dilger @ 2011-04-04 0:50 ` Andreas Dilger 2 siblings, 0 replies; 8+ messages in thread From: Andreas Dilger @ 2011-04-04 0:50 UTC (permalink / raw) To: Eric Sandeen; +Cc: Zeev Tarantov, linux-ext4@vger.kernel.org Sorry for the previous empty message... I was just going to write that it makes sense to have mballoc use a reasonably large stripe size (e.g. 1MB) that is an even multiple of the underlying device blocksize. Something like the following might work: if (ioopt != 0) stripe = max(1, (1048576 + ioopt - 1) / ioopt) * ioopt; else stripe = 0; And let the kernel decide what to do if unspecified. I'd prefer to leave it unset if there is nothing provided by the device, so we don't confuse the default value with a value specified by the admin. Cheers, Andreas On 2011-04-03, at 8:56 AM, Eric Sandeen <sandeen@redhat.com> wrote: > On 4/3/11 11:52 AM, Zeev Tarantov wrote: >> On Sun, Apr 3, 2011 at 21:40, Eric Sandeen <sandeen@redhat.com> wrote: > > ... > >>> What does >>> >>> # blockdev --getiomin --getioopt /dev/<yourdevice> >>> >>> say for your device? >> >> get logical block (sector) size: 4096 >> get physical block (sector) size: 4096 >> get minimum I/O size: 4096 >> get optimal I/O size: 4096 >> get alignment offset in bytes: 0 >> get max sectors per request: 255 >> get blocksize: 4096 >> get readahead: 256 >> >>> The device may be reporting odd values, but mke2fs probably >>> should be smart enough not to set block-sized stripe unit and width... >> >> If the filesystem created with the default options is slow or has >> higher cpu usage, it should be changed. > > I agree. For actual striped storage, this makes it faster, but this > case is a problem; block-sized stripe width is never going to be good. > What device is this, exactly? > > -Eric (losing my free airport wifi in about 8 minutes, so I may have > to continue this later...!) > >>> -Eric >> >> -Z.T. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-04-04 0:50 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-03 18:07 problem(?) in ext4 or mke2fs Zeev Tarantov 2011-04-03 18:40 ` Eric Sandeen 2011-04-03 18:52 ` Zeev Tarantov 2011-04-03 18:56 ` Eric Sandeen 2011-04-03 19:01 ` Zeev Tarantov 2011-04-03 19:21 ` Eric Sandeen 2011-04-04 0:40 ` Andreas Dilger 2011-04-04 0:50 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox