linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bad ext4 sync performance on 16 TB GPT partition
@ 2010-02-26 10:18 Karsten Weiss
  2010-02-26 11:33 ` Karsten Weiss
  2010-02-26 11:46 ` Dmitry Monakhov
  0 siblings, 2 replies; 9+ messages in thread
From: Karsten Weiss @ 2010-02-26 10:18 UTC (permalink / raw)
  To: linux-ext4

Hi,

(please Cc: me, I'm no subscriber)

we were performing some ext4 tests on a 16 TB GPT partition and ran into 
this issue when writing a single large file with dd and syncing 
afterwards.

The problem: dd is fast (cached) but the following sync is *very* slow.

# /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB bs=1M count=10000 && sync"
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 15.9423 seconds, 658 MB/s
0.01user 441.40system 7:26.10elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+794minor)pagefaults 0swaps

dd: ~16 seconds
sync: ~7 minutes

(The same test finishes in 57s with xfs!)

Here's the "iostat -m /dev/sdb 1" output during dd write:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,00    0,00    6,62   19,35    0,00   74,03

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb             484,00         0,00       242,00          0        242

"iostat -m /dev/sdb 1" during the sync looks like this

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0,00    0,00   12,48    0,00    0,00   87,52

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdb              22,00         0,00         8,00          0          8

However, the sync performance is fine if we ...

* use xfs or
* disable the ext4 journal or
* disable ext4 extents (but with enabled journal)

Here's a kernel profile of the test:

# readprofile -r
# /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB_3 bs=1M count=10000 && sync"
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 15.8261 seconds, 663 MB/s
0.01user 448.55system 7:32.89elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+788minor)pagefaults 0swaps
# readprofile -m /boot/System.map-2.6.18-190.el5 | sort -nr -k 3 | head -15
3450304 default_idle                             43128.8000
  9532 mod_zone_page_state                      733.2308
 58594 find_get_pages                           537.5596
 58499 find_get_pages_tag                       427.0000
 72404 __set_page_dirty_nobuffers               310.7468
 10740 __wake_up_bit                            238.6667
  7786 unlock_page                              165.6596
  1996 dec_zone_page_state                      153.5385
 12230 clear_page_dirty_for_io                   63.0412
  5938 page_waitqueue                            60.5918
 14440 release_pages                             41.7341
 12664 __mark_inode_dirty                        34.6011
  5281 copy_user_generic_unrolled                30.7035
   323 redirty_page_for_writepage                26.9167
 15537 write_cache_pages                         18.9939

Here are three call traces from the "sync" command:

sync          R  running task       0  5041   5032                     (NOTLB)
 00000000ffffffff 000000000000000c ffff81022bb16510 00000000001ec2a3
 ffff81022bb16548 ffffffffffffff10 ffffffff800d1964 0000000000000010
 0000000000000286 ffff8101ff56bb48 0000000000000018 ffff81033686e970
Call Trace:
 [<ffffffff800d1964>] page_mkclean+0x255/0x281
 [<ffffffff8000eab5>] find_get_pages_tag+0x34/0x89
 [<ffffffff800f4467>] write_cache_pages+0x21b/0x332
 [<ffffffff886fad9d>] :ext4:__mpage_da_writepage+0x0/0x162
 [<ffffffff886fc2f1>] :ext4:ext4_da_writepages+0x317/0x4fe
 [<ffffffff8005b1f0>] do_writepages+0x20/0x2f
 [<ffffffff8002fefc>] __writeback_single_inode+0x1ae/0x328
 [<ffffffff8004a667>] wait_on_page_writeback_range+0xd6/0x12e
 [<ffffffff80020f0d>] sync_sb_inodes+0x1b5/0x26f
 [<ffffffff800f3b0b>] sync_inodes_sb+0x99/0xa9
 [<ffffffff800f3b78>] __sync_inodes+0x5d/0xaa
 [<ffffffff800f3bd6>] sync_inodes+0x11/0x29
 [<ffffffff800e1bb0>] do_sync+0x12/0x5a
 [<ffffffff800e1c06>] sys_sync+0xe/0x12
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

sync          R  running task       0  5041   5032                     (NOTLB)
 ffff81022a3e4e88 ffffffff8000e930 ffff8101ff56bee8 ffff8101ff56bcd8
 0000000000000000 ffffffff886fbdbc ffff8101ff56bc68 ffff8101ff56bc68
 00000000002537c3 000000000001dfcd 0000000000000007 ffff8101ff56bc90
Call Trace:
 [<ffffffff8000e9c4>] __set_page_dirty_nobuffers+0xde/0xe9
 [<ffffffff8001b694>] find_get_pages+0x2f/0x6d
 [<ffffffff886f8170>] :ext4:mpage_da_submit_io+0xd0/0x12c
 [<ffffffff886fc31d>] :ext4:ext4_da_writepages+0x343/0x4fe
 [<ffffffff8005b1f0>] do_writepages+0x20/0x2f
 [<ffffffff8002fefc>] __writeback_single_inode+0x1ae/0x328
 [<ffffffff8004a667>] wait_on_page_writeback_range+0xd6/0x12e
 [<ffffffff80020f0d>] sync_sb_inodes+0x1b5/0x26f
 [<ffffffff800f3b0b>] sync_inodes_sb+0x99/0xa9
 [<ffffffff800f3b78>] __sync_inodes+0x5d/0xaa
 [<ffffffff800f3bd6>] sync_inodes+0x11/0x29
 [<ffffffff800e1bb0>] do_sync+0x12/0x5a
 [<ffffffff800e1c06>] sys_sync+0xe/0x12
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

sync          R  running task       0  5353   5348                     (NOTLB)
 ffff810426e04048 0000000000001200 0000000100000001 0000000000000001
 0000000100000000 ffff8103dcecadf8 ffff810426dfd000 ffff8102ebd81b48
 ffff810426e04048 ffff810239bbbc40 0000000000000008 00000000008447f8
Call Trace:
 [<ffffffff801431db>] elv_merged_request+0x1e/0x26
 [<ffffffff8000c02b>] __make_request+0x324/0x401
 [<ffffffff8005c6cf>] cache_alloc_refill+0x106/0x186
 [<ffffffff886f6bf9>] :ext4:walk_page_buffers+0x65/0x8b
 [<ffffffff8000e9c4>] __set_page_dirty_nobuffers+0xde/0xe9
 [<ffffffff886fbd42>] :ext4:ext4_writepage+0x9b/0x333
 [<ffffffff886f8170>] :ext4:mpage_da_submit_io+0xd0/0x12c
 [<ffffffff886fc31d>] :ext4:ext4_da_writepages+0x343/0x4fe
 [<ffffffff8005b1f0>] do_writepages+0x20/0x2f
 [<ffffffff8002fefc>] __writeback_single_inode+0x1ae/0x328
 [<ffffffff8004a667>] wait_on_page_writeback_range+0xd6/0x12e
 [<ffffffff80020f0d>] sync_sb_inodes+0x1b5/0x26f
 [<ffffffff800f3b0b>] sync_inodes_sb+0x99/0xa9
 [<ffffffff800f3b78>] __sync_inodes+0x5d/0xaa
 [<ffffffff800f3bd6>] sync_inodes+0x11/0x29
 [<ffffffff800e1bb0>] do_sync+0x12/0x5a
 [<ffffffff800e1c06>] sys_sync+0xe/0x12
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0

I've tried some more options. These do *not* influence the (bad) result:

* data=writeback and data=ordered
* disabling/enabling uninit_bg 
* max_sectors_kb=512 or 4096
* io scheduler: cfq or noop

Some background information about the system:

OS: CentOS 5.4
Memory: 16 GB
CPUs: 2x Quad-Core Opteron 2356
IO scheduler: CFQ
Kernels:
* 2.6.18-164.11.1.el5 x86_64 (latest CentOS 5.4 kernel)
* 2.6.18-190.el5 x86_64 (latest Red Hat EL5 test kernel I've found from
  http://people.redhat.com/jwilson/el5/ which contains an ext4 version 
  which (according to the rpm's changelog) was updated from the 2.6.32
  ext4 codebase.
* I did not try a vanilla kernel so far.

# df -h /dev/sdb{1,2}
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1              15T  9.9G   14T   1% /mnt/large
/dev/sdb2             7.3T  179M  7.0T   1% /mnt/small

(parted) print
Model: easyRAID easyRAID_Q16PS (scsi)
Disk /dev/sdb: 24,0TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name   Flags
 1      1049kB  16,0TB  16,0TB  ext3         large
 2      16,0TB  24,0TB  8003GB  ext3         small

("start 1049kB" is at sector 2048)

sdb is a FC HW-RAID (easyRAID_Q16PS) and consists of a RAID6 volume 
created from 14 disks with chunk size 128kb.

QLogic Fibre Channel HBA Driver: 8.03.01.04.05.05-k
 QLogic QLE2462 - PCI-Express Dual Channel 4Gb Fibre Channel HBA
 ISP2432: PCIe (2.5Gb/s x4) @ 0000:01:00.1 hdma+, host#=8, fw=4.04.09 (486)

The ext4 filesystem was created with

mkfs.ext4 -T ext4,largefile4 -E stride=32,stripe-width=$((32*(14-2))) /dev/sdb1
or
mkfs.ext4 -T ext4 /dev/sdb1

Mount options: defaults,noatime,data=writeback

Any ideas?

-- 
Karsten Weiss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad ext4 sync performance on 16 TB GPT partition
  2010-02-26 10:18 Bad ext4 sync performance on 16 TB GPT partition Karsten Weiss
@ 2010-02-26 11:33 ` Karsten Weiss
  2010-02-26 11:46 ` Dmitry Monakhov
  1 sibling, 0 replies; 9+ messages in thread
From: Karsten Weiss @ 2010-02-26 11:33 UTC (permalink / raw)
  To: linux-ext4

On Fri, 26 Feb 2010, Karsten Weiss wrote:

> However, the sync performance is fine if we ...
> 
> * use xfs or
> * disable the ext4 journal or
> * disable ext4 extents (but with enabled journal)

ext3 is okay, too:

# /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB_1 bs=1M count=10000 && sync"
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 39.7188 seconds, 264 MB/s
0.03user 21.00system 1:47.24elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+788minor)pagefaults 0swaps

Compare:

XFS:    57s
ext4: 7m26s

-- 
Karsten Weiss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad ext4 sync performance on 16 TB GPT partition
  2010-02-26 10:18 Bad ext4 sync performance on 16 TB GPT partition Karsten Weiss
  2010-02-26 11:33 ` Karsten Weiss
@ 2010-02-26 11:46 ` Dmitry Monakhov
  2010-02-26 15:47   ` Karsten Weiss
  1 sibling, 1 reply; 9+ messages in thread
From: Dmitry Monakhov @ 2010-02-26 11:46 UTC (permalink / raw)
  To: Karsten Weiss; +Cc: linux-ext4@vger.kernel.org

Karsten Weiss <K.Weiss@science-computing.de> writes:

> Hi,
>
> (please Cc: me, I'm no subscriber)
>
> we were performing some ext4 tests on a 16 TB GPT partition and ran into 
> this issue when writing a single large file with dd and syncing 
> afterwards.
>
> The problem: dd is fast (cached) but the following sync is *very* slow.
>
> # /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB bs=1M count=10000 && sync"
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 15.9423 seconds, 658 MB/s
> 0.01user 441.40system 7:26.10elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+794minor)pagefaults 0swaps
>
> dd: ~16 seconds
> sync: ~7 minutes
>
> (The same test finishes in 57s with xfs!)
>
> Here's the "iostat -m /dev/sdb 1" output during dd write:
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0,00    0,00    6,62   19,35    0,00   74,03
>
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb             484,00         0,00       242,00          0        242
>
> "iostat -m /dev/sdb 1" during the sync looks like this
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            0,00    0,00   12,48    0,00    0,00   87,52
>
> Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
> sdb              22,00         0,00         8,00          0          8
>
> However, the sync performance is fine if we ...
>
> * use xfs or
> * disable the ext4 journal or
> * disable ext4 extents (but with enabled journal)
>
> Here's a kernel profile of the test:
>
> # readprofile -r
> # /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB_3 bs=1M count=10000 && sync"
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 15.8261 seconds, 663 MB/s
> 0.01user 448.55system 7:32.89elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+788minor)pagefaults 0swaps
> # readprofile -m /boot/System.map-2.6.18-190.el5 | sort -nr -k 3 | head -15
> 3450304 default_idle                             43128.8000
>   9532 mod_zone_page_state                      733.2308
>  58594 find_get_pages                           537.5596
>  58499 find_get_pages_tag                       427.0000
>  72404 __set_page_dirty_nobuffers               310.7468
>  10740 __wake_up_bit                            238.6667
>   7786 unlock_page                              165.6596
>   1996 dec_zone_page_state                      153.5385
>  12230 clear_page_dirty_for_io                   63.0412
>   5938 page_waitqueue                            60.5918
>  14440 release_pages                             41.7341
>  12664 __mark_inode_dirty                        34.6011
>   5281 copy_user_generic_unrolled                30.7035
>    323 redirty_page_for_writepage                26.9167
>  15537 write_cache_pages                         18.9939
>
> Here are three call traces from the "sync" command:
>
> sync          R  running task       0  5041   5032                     (NOTLB)
>  00000000ffffffff 000000000000000c ffff81022bb16510 00000000001ec2a3
>  ffff81022bb16548 ffffffffffffff10 ffffffff800d1964 0000000000000010
>  0000000000000286 ffff8101ff56bb48 0000000000000018 ffff81033686e970
> Call Trace:
>  [<ffffffff800d1964>] page_mkclean+0x255/0x281
>  [<ffffffff8000eab5>] find_get_pages_tag+0x34/0x89
>  [<ffffffff800f4467>] write_cache_pages+0x21b/0x332
>  [<ffffffff886fad9d>] :ext4:__mpage_da_writepage+0x0/0x162
>  [<ffffffff886fc2f1>] :ext4:ext4_da_writepages+0x317/0x4fe
>  [<ffffffff8005b1f0>] do_writepages+0x20/0x2f
>  [<ffffffff8002fefc>] __writeback_single_inode+0x1ae/0x328
>  [<ffffffff8004a667>] wait_on_page_writeback_range+0xd6/0x12e
>  [<ffffffff80020f0d>] sync_sb_inodes+0x1b5/0x26f
>  [<ffffffff800f3b0b>] sync_inodes_sb+0x99/0xa9
>  [<ffffffff800f3b78>] __sync_inodes+0x5d/0xaa
>  [<ffffffff800f3bd6>] sync_inodes+0x11/0x29
>  [<ffffffff800e1bb0>] do_sync+0x12/0x5a
>  [<ffffffff800e1c06>] sys_sync+0xe/0x12
>  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
>
> sync          R  running task       0  5041   5032                     (NOTLB)
>  ffff81022a3e4e88 ffffffff8000e930 ffff8101ff56bee8 ffff8101ff56bcd8
>  0000000000000000 ffffffff886fbdbc ffff8101ff56bc68 ffff8101ff56bc68
>  00000000002537c3 000000000001dfcd 0000000000000007 ffff8101ff56bc90
> Call Trace:
>  [<ffffffff8000e9c4>] __set_page_dirty_nobuffers+0xde/0xe9
>  [<ffffffff8001b694>] find_get_pages+0x2f/0x6d
>  [<ffffffff886f8170>] :ext4:mpage_da_submit_io+0xd0/0x12c
>  [<ffffffff886fc31d>] :ext4:ext4_da_writepages+0x343/0x4fe
>  [<ffffffff8005b1f0>] do_writepages+0x20/0x2f
>  [<ffffffff8002fefc>] __writeback_single_inode+0x1ae/0x328
>  [<ffffffff8004a667>] wait_on_page_writeback_range+0xd6/0x12e
>  [<ffffffff80020f0d>] sync_sb_inodes+0x1b5/0x26f
>  [<ffffffff800f3b0b>] sync_inodes_sb+0x99/0xa9
>  [<ffffffff800f3b78>] __sync_inodes+0x5d/0xaa
>  [<ffffffff800f3bd6>] sync_inodes+0x11/0x29
>  [<ffffffff800e1bb0>] do_sync+0x12/0x5a
>  [<ffffffff800e1c06>] sys_sync+0xe/0x12
>  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
>
> sync          R  running task       0  5353   5348                     (NOTLB)
>  ffff810426e04048 0000000000001200 0000000100000001 0000000000000001
>  0000000100000000 ffff8103dcecadf8 ffff810426dfd000 ffff8102ebd81b48
>  ffff810426e04048 ffff810239bbbc40 0000000000000008 00000000008447f8
> Call Trace:
>  [<ffffffff801431db>] elv_merged_request+0x1e/0x26
>  [<ffffffff8000c02b>] __make_request+0x324/0x401
>  [<ffffffff8005c6cf>] cache_alloc_refill+0x106/0x186
>  [<ffffffff886f6bf9>] :ext4:walk_page_buffers+0x65/0x8b
>  [<ffffffff8000e9c4>] __set_page_dirty_nobuffers+0xde/0xe9
>  [<ffffffff886fbd42>] :ext4:ext4_writepage+0x9b/0x333
>  [<ffffffff886f8170>] :ext4:mpage_da_submit_io+0xd0/0x12c
>  [<ffffffff886fc31d>] :ext4:ext4_da_writepages+0x343/0x4fe
>  [<ffffffff8005b1f0>] do_writepages+0x20/0x2f
>  [<ffffffff8002fefc>] __writeback_single_inode+0x1ae/0x328
>  [<ffffffff8004a667>] wait_on_page_writeback_range+0xd6/0x12e
>  [<ffffffff80020f0d>] sync_sb_inodes+0x1b5/0x26f
>  [<ffffffff800f3b0b>] sync_inodes_sb+0x99/0xa9
>  [<ffffffff800f3b78>] __sync_inodes+0x5d/0xaa
>  [<ffffffff800f3bd6>] sync_inodes+0x11/0x29
>  [<ffffffff800e1bb0>] do_sync+0x12/0x5a
>  [<ffffffff800e1c06>] sys_sync+0xe/0x12
>  [<ffffffff8005e28d>] tracesys+0xd5/0xe0
>
> I've tried some more options. These do *not* influence the (bad) result:
>
> * data=writeback and data=ordered
> * disabling/enabling uninit_bg 
> * max_sectors_kb=512 or 4096
> * io scheduler: cfq or noop
>
> Some background information about the system:
>
> OS: CentOS 5.4
> Memory: 16 GB
> CPUs: 2x Quad-Core Opteron 2356
> IO scheduler: CFQ
> Kernels:
> * 2.6.18-164.11.1.el5 x86_64 (latest CentOS 5.4 kernel)
> * 2.6.18-190.el5 x86_64 (latest Red Hat EL5 test kernel I've found from
>   http://people.redhat.com/jwilson/el5/ which contains an ext4 version 
>   which (according to the rpm's changelog) was updated from the 2.6.32
>   ext4 codebase.
Hmm.. It is hard to predict differences between vanilla tree.
This is no only ext4 related. writeback path is changed dramatically.
It is not easy to port writeback code to 2.6.18 with full performance
improvements but without introducing new issues.
> * I did not try a vanilla kernel so far.
IMHO It would be really good to know vanilla kernel's stats.
>
> # df -h /dev/sdb{1,2}
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sdb1              15T  9.9G   14T   1% /mnt/large
> /dev/sdb2             7.3T  179M  7.0T   1% /mnt/small
>
> (parted) print
> Model: easyRAID easyRAID_Q16PS (scsi)
> Disk /dev/sdb: 24,0TB
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
>
> Number  Start   End     Size    File system  Name   Flags
>  1      1049kB  16,0TB  16,0TB  ext3         large
>  2      16,0TB  24,0TB  8003GB  ext3         small
>
> ("start 1049kB" is at sector 2048)
>
> sdb is a FC HW-RAID (easyRAID_Q16PS) and consists of a RAID6 volume 
> created from 14 disks with chunk size 128kb.
>
> QLogic Fibre Channel HBA Driver: 8.03.01.04.05.05-k
>  QLogic QLE2462 - PCI-Express Dual Channel 4Gb Fibre Channel HBA
>  ISP2432: PCIe (2.5Gb/s x4) @ 0000:01:00.1 hdma+, host#=8, fw=4.04.09 (486)
>
> The ext4 filesystem was created with
>
> mkfs.ext4 -T ext4,largefile4 -E stride=32,stripe-width=$((32*(14-2))) /dev/sdb1
> or
> mkfs.ext4 -T ext4 /dev/sdb1
>
> Mount options: defaults,noatime,data=writeback
>
> Any ideas?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad ext4 sync performance on 16 TB GPT partition
  2010-02-26 11:46 ` Dmitry Monakhov
@ 2010-02-26 15:47   ` Karsten Weiss
  2010-02-26 17:49     ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Karsten Weiss @ 2010-02-26 15:47 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-ext4@vger.kernel.org

Hi Dmitry!

On Fri, 26 Feb 2010, Dmitry Monakhov wrote:

> > Kernels:
> > * 2.6.18-164.11.1.el5 x86_64 (latest CentOS 5.4 kernel)
> > * 2.6.18-190.el5 x86_64 (latest Red Hat EL5 test kernel I've found from
> >   http://people.redhat.com/jwilson/el5/ which contains an ext4 version 
> >   which (according to the rpm's changelog) was updated from the 2.6.32
> >   ext4 codebase.
> Hmm.. It is hard to predict differences between vanilla tree.
> This is no only ext4 related. writeback path is changed dramatically.
> It is not easy to port writeback code to 2.6.18 with full performance
> improvements but without introducing new issues.
> > * I did not try a vanilla kernel so far.
> IMHO It would be really good to know vanilla kernel's stats.

I did a quick&dirty compilation of vanilla kernel 2.6.33 and repeated the 
test:

# /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB bs=1M count=10000 && sync"
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 50.044 seconds, 210 MB/s
0.01user 13.76system 1:04.75elapsed 21%CPU (0avgtext+0avgdata 6224maxresident)k
0inputs+0outputs (0major+1049minor)pagefaults 0swaps

=> The problem shows only with the CentOS / Red Hat 5.4 kernels (including 
RH's test kernel 2.6.18-190.el5). Aadmittedly ext4 is only a technology 
preview in 5.4...

I've also tried the latest CentOS 5.3 kernel-2.6.18-128.7.1.el5 but 
couldn't mount the device (with -t ext4dev).

2.6.18-164.el5 (the initial CentOS 5.4 kernel) has the bug, too.

I'm willing to test patches if somebody wants to debug the problem.

-- 
Karsten Weiss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad ext4 sync performance on 16 TB GPT partition
  2010-02-26 15:47   ` Karsten Weiss
@ 2010-02-26 17:49     ` Eric Sandeen
  2010-03-01  8:57       ` Karsten Weiss
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2010-02-26 17:49 UTC (permalink / raw)
  To: Karsten Weiss; +Cc: Dmitry Monakhov, linux-ext4@vger.kernel.org

Karsten Weiss wrote:
> Hi Dmitry!
> 
> On Fri, 26 Feb 2010, Dmitry Monakhov wrote:
> 

...

>>> * I did not try a vanilla kernel so far.
>> IMHO It would be really good to know vanilla kernel's stats.
> 
> I did a quick&dirty compilation of vanilla kernel 2.6.33 and repeated the 
> test:
> 
> # /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB bs=1M count=10000 && sync"
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 50.044 seconds, 210 MB/s
> 0.01user 13.76system 1:04.75elapsed 21%CPU (0avgtext+0avgdata 6224maxresident)k
> 0inputs+0outputs (0major+1049minor)pagefaults 0swaps
> 
> => The problem shows only with the CentOS / Red Hat 5.4 kernels (including 
> RH's test kernel 2.6.18-190.el5). Aadmittedly ext4 is only a technology 
> preview in 5.4...
> 
> I've also tried the latest CentOS 5.3 kernel-2.6.18-128.7.1.el5 but 
> couldn't mount the device (with -t ext4dev).
> 
> 2.6.18-164.el5 (the initial CentOS 5.4 kernel) has the bug, too.
> 
> I'm willing to test patches if somebody wants to debug the problem.

Ok, that's interesting.  We've not had bona-fide RHEL customers report
the problem, but then maybe it hasn't been tested this way.

2.6.18-178.el5 and beyond is based on the 2.6.32 codebase for ext4.

Testing generic 2.6.32 might also be interesting as a datapoint,
if you're willing.

-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad ext4 sync performance on 16 TB GPT partition
  2010-02-26 17:49     ` Eric Sandeen
@ 2010-03-01  8:57       ` Karsten Weiss
  2010-03-01 16:22         ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Karsten Weiss @ 2010-03-01  8:57 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dmitry Monakhov, linux-ext4@vger.kernel.org

Hi Eric,

On Fri, 26 Feb 2010, Eric Sandeen wrote:

> > => The problem shows only with the CentOS / Red Hat 5.4 kernels (including 
> > RH's test kernel 2.6.18-190.el5). Aadmittedly ext4 is only a technology 
> > preview in 5.4...
> > 
> > I've also tried the latest CentOS 5.3 kernel-2.6.18-128.7.1.el5 but 
> > couldn't mount the device (with -t ext4dev).
> > 
> > 2.6.18-164.el5 (the initial CentOS 5.4 kernel) has the bug, too.
> > 
> > I'm willing to test patches if somebody wants to debug the problem.
> 
> Ok, that's interesting.  We've not had bona-fide RHEL customers report
> the problem, but then maybe it hasn't been tested this way.

I think so because, as I mentioned, the issue can be reproduced with the 
RH test kernel 2.6.18-190.el5 x86_64 (http://people.redhat.com/jwilson/el5/),
too.

> 2.6.18-178.el5 and beyond is based on the 2.6.32 codebase for ext4.
> 
> Testing generic 2.6.32 might also be interesting as a datapoint,
> if you're willing.

Sorry for the delay, here's the (good) 2.6.32 result:

# /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB bs=1M count=10000 && sync"
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 46.3369 seconds, 226 MB/s
0.00user 14.17system 0:59.53elapsed 23%CPU (0avgtext+0avgdata 6224maxresident)k
0inputs+0outputs (0major+1045minor)pagefaults 0swaps

To summarize:

Bad:  2.6.18-164.el5 (CentOS)
Bad:  2.6.18-164.11.1el5 (CentOS)
Bad:  2.6.18-190.el5 (RH)
Good: 2.6.32
Good: 2.6.33

-- 
Karsten Weiss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad ext4 sync performance on 16 TB GPT partition
  2010-03-01  8:57       ` Karsten Weiss
@ 2010-03-01 16:22         ` Eric Sandeen
  2010-03-12 11:37           ` Karsten Weiss
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2010-03-01 16:22 UTC (permalink / raw)
  To: Karsten Weiss; +Cc: Dmitry Monakhov, linux-ext4@vger.kernel.org

Karsten Weiss wrote:
> Hi Eric,
> 
> On Fri, 26 Feb 2010, Eric Sandeen wrote:
> 
>>> => The problem shows only with the CentOS / Red Hat 5.4 kernels (including 
>>> RH's test kernel 2.6.18-190.el5). Aadmittedly ext4 is only a technology 
>>> preview in 5.4...
>>>
>>> I've also tried the latest CentOS 5.3 kernel-2.6.18-128.7.1.el5 but 
>>> couldn't mount the device (with -t ext4dev).
>>>
>>> 2.6.18-164.el5 (the initial CentOS 5.4 kernel) has the bug, too.
>>>
>>> I'm willing to test patches if somebody wants to debug the problem.
>> Ok, that's interesting.  We've not had bona-fide RHEL customers report
>> the problem, but then maybe it hasn't been tested this way.
> 
> I think so because, as I mentioned, the issue can be reproduced with the 
> RH test kernel 2.6.18-190.el5 x86_64 (http://people.redhat.com/jwilson/el5/),
> too.
> 
>> 2.6.18-178.el5 and beyond is based on the 2.6.32 codebase for ext4.
>>
>> Testing generic 2.6.32 might also be interesting as a datapoint,
>> if you're willing.
> 
> Sorry for the delay, here's the (good) 2.6.32 result:
> 
> # /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB bs=1M count=10000 && sync"
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 46.3369 seconds, 226 MB/s
> 0.00user 14.17system 0:59.53elapsed 23%CPU (0avgtext+0avgdata 6224maxresident)k
> 0inputs+0outputs (0major+1045minor)pagefaults 0swaps
> 
> To summarize:
> 
> Bad:  2.6.18-164.el5 (CentOS)
> Bad:  2.6.18-164.11.1el5 (CentOS)
> Bad:  2.6.18-190.el5 (RH)
> Good: 2.6.32
> Good: 2.6.33
> 

Thanks, I'll have to investigate that.  I guess something may have gotten lost
in translation in the 2.6.32->2.6.18 backport.....

-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad ext4 sync performance on 16 TB GPT partition
  2010-03-01 16:22         ` Eric Sandeen
@ 2010-03-12 11:37           ` Karsten Weiss
  2010-03-12 15:20             ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Karsten Weiss @ 2010-03-12 11:37 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dmitry Monakhov, linux-ext4@vger.kernel.org

Hi Eric!

On Mon, 1 Mar 2010, Eric Sandeen wrote:

> > Sorry for the delay, here's the (good) 2.6.32 result:
> > 
> > # /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB bs=1M count=10000 && sync"
> > 10000+0 records in
> > 10000+0 records out
> > 10485760000 bytes (10 GB) copied, 46.3369 seconds, 226 MB/s
> > 0.00user 14.17system 0:59.53elapsed 23%CPU (0avgtext+0avgdata 6224maxresident)k
> > 0inputs+0outputs (0major+1045minor)pagefaults 0swaps
> > 
> > To summarize:
> > 
> > Bad:  2.6.18-164.el5 (CentOS)
> > Bad:  2.6.18-164.11.1el5 (CentOS)
> > Bad:  2.6.18-190.el5 (RH)
> > Good: 2.6.32
> > Good: 2.6.33

In the meantime I've also reproduced the problem on another machine with a 
Red Hat 5.5 Beta (x86_64) installation and decided to open a bug on RH's 
bugzilla:

Bad ext4 sync performance on 16 TB GPT partition
https://bugzilla.redhat.com/show_bug.cgi?id=572930

> Thanks, I'll have to investigate that.  I guess something may have gotten lost
> in translation in the 2.6.32->2.6.18 backport.....

Did you come up with anything I could test?

Is anyone else able to reproduce the problem?

-- 
Karsten Weiss

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Bad ext4 sync performance on 16 TB GPT partition
  2010-03-12 11:37           ` Karsten Weiss
@ 2010-03-12 15:20             ` Eric Sandeen
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Sandeen @ 2010-03-12 15:20 UTC (permalink / raw)
  To: Karsten Weiss; +Cc: Dmitry Monakhov, linux-ext4@vger.kernel.org

Karsten Weiss wrote:
> Hi Eric!
> 
> On Mon, 1 Mar 2010, Eric Sandeen wrote:
> 
>>> Sorry for the delay, here's the (good) 2.6.32 result:
>>>
>>> # /usr/bin/time bash -c "dd if=/dev/zero of=/mnt/large/10GB bs=1M count=10000 && sync"
>>> 10000+0 records in
>>> 10000+0 records out
>>> 10485760000 bytes (10 GB) copied, 46.3369 seconds, 226 MB/s
>>> 0.00user 14.17system 0:59.53elapsed 23%CPU (0avgtext+0avgdata 6224maxresident)k
>>> 0inputs+0outputs (0major+1045minor)pagefaults 0swaps
>>>
>>> To summarize:
>>>
>>> Bad:  2.6.18-164.el5 (CentOS)
>>> Bad:  2.6.18-164.11.1el5 (CentOS)
>>> Bad:  2.6.18-190.el5 (RH)
>>> Good: 2.6.32
>>> Good: 2.6.33
> 
> In the meantime I've also reproduced the problem on another machine with a 
> Red Hat 5.5 Beta (x86_64) installation and decided to open a bug on RH's 
> bugzilla:
> 
> Bad ext4 sync performance on 16 TB GPT partition
> https://bugzilla.redhat.com/show_bug.cgi?id=572930

Thanks, and thanks for double-checking upstream.

>> Thanks, I'll have to investigate that.  I guess something may have gotten lost
>> in translation in the 2.6.32->2.6.18 backport.....
> 
> Did you come up with anything I could test?

I'll look into this...
 
> Is anyone else able to reproduce the problem?

Since it's not an upstream problem, this issue is probably best discussed
in the RHEL bug, now.

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-03-12 15:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-26 10:18 Bad ext4 sync performance on 16 TB GPT partition Karsten Weiss
2010-02-26 11:33 ` Karsten Weiss
2010-02-26 11:46 ` Dmitry Monakhov
2010-02-26 15:47   ` Karsten Weiss
2010-02-26 17:49     ` Eric Sandeen
2010-03-01  8:57       ` Karsten Weiss
2010-03-01 16:22         ` Eric Sandeen
2010-03-12 11:37           ` Karsten Weiss
2010-03-12 15:20             ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).