* 2.6.16-rc6-mm2: slow writes on reiser4.
@ 2006-03-21 21:16 Laurent Riffard
2006-03-22 7:41 ` Hans Reiser
2006-03-22 17:48 ` Laurent Riffard
0 siblings, 2 replies; 14+ messages in thread
From: Laurent Riffard @ 2006-03-21 21:16 UTC (permalink / raw)
To: reiserfs-list
Hello,
Writing big files is very slow on reiser4 now.
"dd if=/dev/zero of=toto bs=1k count=102400; sync" takes more than 2 minutes on
reiser4 fs, but only 15 seconds on reiserfs fs.
Actually, writing on reiser4 is not uniformly slow, it seems to be blocked for
ages from time to time. I monitored the number of dirty pages from /proc/meminfo
an I hit sysrq-T when the system was stalling:
dd D 000017DE 0 21930 21929 (NOTLB)
d7169c74 e0c98b05 00000246 000017de 00000000 f396aa00 003d1249 d0b68140
d0b68030 f396aa00 003d1249 6d519e00 00000002 c0396434 d8bf8e30 d8bf8e38
00000246 d7169ca0 c0270f08 d0b68030 00000001 d0b68030 c0113b25 d8bf8e38
Call Trace:
[<c0270f08>] __down+0x81/0xdc
[<c026f3ba>] __down_failed+0xa/0x10
[<e0c91a62>] .text.lock.lock+0x15/0x1b [reiser4]
[<e0c90faf>] longterm_lock_znode+0x5b4/0x7b0 [reiser4]
[<e0cba16a>] cbk_level_lookup+0x8a/0x954 [reiser4]
[<e0cbb186>] traverse_tree+0x752/0xa0d [reiser4]
[<e0cbbbc2>] coord_by_handle+0x781/0x789 [reiser4]
[<e0cbbdb5>] object_lookup+0x1eb/0x230 [reiser4]
[<e0cdb201>] find_file_item+0x18d/0x1b7 [reiser4]
[<e0cdd873>] write_flow+0x208/0x6e1 [reiser4]
[<e0cde208>] write_unix_file+0x3d9/0x5b0 [reiser4]
[<c0147d36>] vfs_write+0x8a/0x133
[<c0148569>] sys_write+0x3b/0x60
[<c01029bb>] sysenter_past_esp+0x54/0x75
Below are the detailed test I ran. Feel free to ask for more information.
Reiser4 FS
==========
Desktop$ cd ~/kernel
kernel$ df .
Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
/dev/hda8 925M 825M 101M 90% /home/laurent/kernel
kernel$ grep hda8 /proc/mounts
/dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0
kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.06user 13.95system 1:42.09elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+250minor)pagefaults 0swaps
0.00user 0.00system 1:22.90elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.08user 14.01system 1:45.57elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+249minor)pagefaults 0swaps
0.00user 0.00system 0:09.78elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.06user 14.13system 2:18.27elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+251minor)pagefaults 0swaps
0.00user 0.00system 0:08.48elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.06user 14.27system 1:56.34elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+251minor)pagefaults 0swaps
0.00user 0.00system 0:10.46elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
Reiserfs FS
===========
kernel$ cd
~$ df .
Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
/dev/mapper/vglinux1-lvhome
7,0G 4,8G 2,3G 68% /home
[/dev/mapper/vglinux1-lvhome resides on /dev/hda4]
~$ grep lvhome /proc/mounts
/dev/vglinux1/lvhome /home reiserfs rw 0 0
~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.04user 1.75system 0:02.05elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+249minor)pagefaults 0swaps
0.00user 0.10system 0:12.93elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.04user 1.83system 0:01.98elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+250minor)pagefaults 0swaps
0.00user 0.16system 0:14.45elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.04user 1.79system 0:01.95elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+250minor)pagefaults 0swaps
0.00user 0.10system 0:13.47elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
~~
laurent
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-21 21:16 2.6.16-rc6-mm2: slow writes on reiser4 Laurent Riffard @ 2006-03-22 7:41 ` Hans Reiser 2006-03-22 18:51 ` Laurent Riffard 2006-04-01 23:15 ` Pierre Etchemaïté 2006-03-22 17:48 ` Laurent Riffard 1 sibling, 2 replies; 14+ messages in thread From: Hans Reiser @ 2006-03-22 7:41 UTC (permalink / raw) To: Laurent Riffard; +Cc: reiserfs-list, vs, Alexander Zarochentcev Laurent Riffard wrote: >Hello, > >Writing big files is very slow on reiser4 now. > >"dd if=/dev/zero of=toto bs=1k count=102400; sync" > try bs=4M, and tell me what happens. also try an empty fs, and an fs that is equally full to reiserfs. Note that reiserfs in your test is 68% full vs. 90% full for V4. It may be that we need to port some of the block allocation optimizations from V3 to V4 (Jeff's work) to help with 90% full filesystems. Thanks for doing this. Real users always teach me a lot when they test things differently from how I did. Hans > takes more than 2 minutes on >reiser4 fs, but only 15 seconds on reiserfs fs. > >Actually, writing on reiser4 is not uniformly slow, it seems to be blocked for >ages from time to time. I monitored the number of dirty pages from /proc/meminfo >an I hit sysrq-T when the system was stalling: > >dd D 000017DE 0 21930 21929 (NOTLB) > d7169c74 e0c98b05 00000246 000017de 00000000 f396aa00 003d1249 d0b68140 > d0b68030 f396aa00 003d1249 6d519e00 00000002 c0396434 d8bf8e30 d8bf8e38 > 00000246 d7169ca0 c0270f08 d0b68030 00000001 d0b68030 c0113b25 d8bf8e38 >Call Trace: > [<c0270f08>] __down+0x81/0xdc > [<c026f3ba>] __down_failed+0xa/0x10 > [<e0c91a62>] .text.lock.lock+0x15/0x1b [reiser4] > [<e0c90faf>] longterm_lock_znode+0x5b4/0x7b0 [reiser4] > [<e0cba16a>] cbk_level_lookup+0x8a/0x954 [reiser4] > [<e0cbb186>] traverse_tree+0x752/0xa0d [reiser4] > [<e0cbbbc2>] coord_by_handle+0x781/0x789 [reiser4] > [<e0cbbdb5>] object_lookup+0x1eb/0x230 [reiser4] > [<e0cdb201>] find_file_item+0x18d/0x1b7 [reiser4] > [<e0cdd873>] write_flow+0x208/0x6e1 [reiser4] > [<e0cde208>] write_unix_file+0x3d9/0x5b0 [reiser4] > [<c0147d36>] vfs_write+0x8a/0x133 > [<c0148569>] sys_write+0x3b/0x60 > [<c01029bb>] sysenter_past_esp+0x54/0x75 > >Below are the detailed test I ran. Feel free to ask for more information. > >Reiser4 FS >========== > >Desktop$ cd ~/kernel > >kernel$ df . >Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur >/dev/hda8 925M 825M 101M 90% /home/laurent/kernel > >kernel$ grep hda8 /proc/mounts >/dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0 > >kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.06user 13.95system 1:42.09elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+250minor)pagefaults 0swaps >0.00user 0.00system 1:22.90elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+191minor)pagefaults 0swaps > >kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.08user 14.01system 1:45.57elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+249minor)pagefaults 0swaps >0.00user 0.00system 0:09.78elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+191minor)pagefaults 0swaps > >kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.06user 14.13system 2:18.27elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+251minor)pagefaults 0swaps >0.00user 0.00system 0:08.48elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+190minor)pagefaults 0swaps > >kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.06user 14.27system 1:56.34elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+251minor)pagefaults 0swaps >0.00user 0.00system 0:10.46elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+190minor)pagefaults 0swaps > > >Reiserfs FS >=========== >kernel$ cd > >~$ df . >Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur >/dev/mapper/vglinux1-lvhome > 7,0G 4,8G 2,3G 68% /home >[/dev/mapper/vglinux1-lvhome resides on /dev/hda4] > >~$ grep lvhome /proc/mounts >/dev/vglinux1/lvhome /home reiserfs rw 0 0 > >~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.04user 1.75system 0:02.05elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+249minor)pagefaults 0swaps >0.00user 0.10system 0:12.93elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+191minor)pagefaults 0swaps > >~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.04user 1.83system 0:01.98elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+250minor)pagefaults 0swaps >0.00user 0.16system 0:14.45elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+191minor)pagefaults 0swaps > >~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync >102400+0 enregistrements lus. >102400+0 enregistrements écrits. >0.04user 1.79system 0:01.95elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+250minor)pagefaults 0swaps >0.00user 0.10system 0:13.47elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+190minor)pagefaults 0swaps > >~~ >laurent > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-22 7:41 ` Hans Reiser @ 2006-03-22 18:51 ` Laurent Riffard 2006-03-22 19:04 ` Hans Reiser 2006-04-01 23:15 ` Pierre Etchemaïté 1 sibling, 1 reply; 14+ messages in thread From: Laurent Riffard @ 2006-03-22 18:51 UTC (permalink / raw) To: Hans Reiser; +Cc: reiserfs-list, vs, Alexander Zarochentcev [-- Attachment #1: Type: text/plain, Size: 2572 bytes --] Le 22.03.2006 08:41, Hans Reiser a écrit : > Laurent Riffard wrote: > > >>Hello, >> >>Writing big files is very slow on reiser4 now. >> >>"dd if=/dev/zero of=toto bs=1k count=102400; sync" >> > > try bs=4M, and tell me what happens. also try an empty fs, and an fs > that is equally full to reiserfs. Note that reiserfs in your test is > 68% full vs. 90% full for V4. It may be that we need to port some of > the block allocation optimizations from V3 to V4 (Jeff's work) to help > with 90% full filesystems. Thanks for doing this. Real users always > teach me a lot when they test things differently from how I did. > > Hans Hello Hans, Yesterday, I realized that my tests were not fair. So I did some further tests trying to have the same situation for 3 different FS (reiserfs/ext2/reiser4) and I sent the result to the list, but this mail never reached the list. I have resent it. As per your request, I tried to replay my dd test on my 90% full reiser4 FS, using a 4M block size. Here are the results: --------------------- > Desktop$ cd ~/kernel > > kernel$ rm toto > rm: détruire fichier régulier `toto'? o > > kernel$ df . > Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur > /dev/hda8 925M 748M 177M 81% /home/laurent/kernel > > kernel$ grep /dev/hda8 /rpoc/mounts > grep: /rpoc/mounts: Aucun fichier ou répertoire de ce type > > kernel$ grep /dev/hda8 /proc/mounts > /dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0 > > kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync > 25+0 enregistrements lus. > 25+0 enregistrements écrits. > 0.00user 2.89system 0:17.18elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+252minor)pagefaults 0swaps > 0.00user 0.00system 2:19.91elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+191minor)pagefaults 0swaps > > kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync > 25+0 enregistrements lus. > 25+0 enregistrements écrits. > 0.00user 2.96system 1:16.42elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+252minor)pagefaults 0swaps > 0.00user 0.00system 0:08.70elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+190minor)pagefaults 0swaps --------------------- I tried to run an "iostat 10" simultaneously with dd+sync. I attached the output. Hope this helps. ~~ laurent [-- Attachment #2: typescript --] [-- Type: text/plain, Size: 4489 bytes --] Le script a d���but��� sur mer 22 mar 2006 19:12:56 CET Desktop$ cd ~/kernel kernel$ kernel$ sleep 15 && echo SYNC && sync && echo DD && time dd if=/dev/zero of=toto bs=4M count=25 && echo SYNC && time sync && echo END & [1] 4657 kernel$ iostat -t 10 /dev/hda8 Linux 2.6.16-rc6-mm2 (antares.localdomain) 22.03.2006 Heure: 19:13:32 avg-cpu: %user %nice %system %iowait %idle 5,01 0,02 11,07 4,45 79,46 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 5,34 0,27 217,58 1297 1026592 Heure: 19:13:42 avg-cpu: %user %nice %system %iowait %idle 0,10 0,00 0,20 0,20 99,50 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 0,00 0,00 0,00 0 0 SYNC DD Heure: 19:13:52 avg-cpu: %user %nice %system %iowait %idle 1,50 0,00 79,32 8,29 10,89 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 20,38 3,20 1202,00 32 12032 Heure: 19:14:02 avg-cpu: %user %nice %system %iowait %idle 2,30 0,00 81,08 16,62 0,00 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 33,53 0,00 1398,20 0 13968 Heure: 19:14:12 avg-cpu: %user %nice %system %iowait %idle 1,90 0,00 88,51 9,59 0,00 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 25,27 0,00 893,51 0 8944 Heure: 19:14:22 avg-cpu: %user %nice %system %iowait %idle 3,19 0,00 85,63 11,18 0,00 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 27,35 0,00 1288,62 0 12912 Heure: 19:14:32 avg-cpu: %user %nice %system %iowait %idle 0,80 0,00 90,01 9,19 0,00 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 25,17 0,00 800,00 0 8008 Heure: 19:14:42 avg-cpu: %user %nice %system %iowait %idle 0,30 0,00 74,93 24,78 0,00 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 54,35 0,00 3138,46 0 31416 Heure: 19:14:52 avg-cpu: %user %nice %system %iowait %idle 0,20 0,00 81,62 18,18 0,00 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 50,75 0,00 1324,28 0 13256 Heure: 19:15:02 avg-cpu: %user %nice %system %iowait %idle 0,60 0,00 71,60 27,80 0,00 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 76,30 0,00 2363,20 0 23632 Heure: 19:15:12 avg-cpu: %user %nice %system %iowait %idle 1,10 0,00 29,77 68,93 0,20 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 123,78 0,00 3275,12 0 32784 25+0 enregistrements lus. 25+0 enregistrements ���crits. 0.00user 2.94system 1:29.83elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+252minor)pagefaults 0swaps SYNC Heure: 19:15:22 avg-cpu: %user %nice %system %iowait %idle 2,90 0,00 76,60 19,10 1,40 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 55,10 0,80 1435,20 8 14352 0.00user 0.00system 0:17.41elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+190minor)pagefaults 0swaps END Heure: 19:15:32 avg-cpu: %user %nice %system %iowait %idle 0,10 0,00 31,73 42,14 26,03 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 94,19 0,00 3402,60 0 33992 Heure: 19:15:42 avg-cpu: %user %nice %system %iowait %idle 0,10 0,00 0,00 0,10 99,80 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda8 0,00 0,00 0,00 0 0 ^C [1]+ Done sleep 15 && echo SYNC && sync && echo DD && time dd if=/dev/zero of=toto bs=4M count=25 && echo SYNC && time sync && echo END kernel$ exit Script compl���t��� sur mer 22 mar 2006 19:15:46 CET ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-22 18:51 ` Laurent Riffard @ 2006-03-22 19:04 ` Hans Reiser 2006-03-23 18:44 ` Jindrich Makovicka 2006-03-28 20:19 ` Laurent Riffard 0 siblings, 2 replies; 14+ messages in thread From: Hans Reiser @ 2006-03-22 19:04 UTC (permalink / raw) To: Laurent Riffard; +Cc: reiserfs-list, vs, Alexander Zarochentcev Instead of using sync, could you increase the size of the files you write so that they are 10x ram size? I have a suspicion we are slow at sync.... I am not sure why, but I have seen other data where sync was slow for us, and maybe we need to optimize that code path. Hans Laurent Riffard wrote: >Le 22.03.2006 08:41, Hans Reiser a écrit : > > >>Laurent Riffard wrote: >> >> >> >> >>>Hello, >>> >>>Writing big files is very slow on reiser4 now. >>> >>>"dd if=/dev/zero of=toto bs=1k count=102400; sync" >>> >>> >>> >>try bs=4M, and tell me what happens. also try an empty fs, and an fs >>that is equally full to reiserfs. Note that reiserfs in your test is >>68% full vs. 90% full for V4. It may be that we need to port some of >>the block allocation optimizations from V3 to V4 (Jeff's work) to help >>with 90% full filesystems. Thanks for doing this. Real users always >>teach me a lot when they test things differently from how I did. >> >>Hans >> >> > >Hello Hans, > >Yesterday, I realized that my tests were not fair. So I did some >further tests trying to have the same situation for 3 different FS >(reiserfs/ext2/reiser4) and I sent the result to the list, but this >mail never reached the list. I have resent it. > >As per your request, I tried to replay my dd test on my 90% full >reiser4 FS, using a 4M block size. Here are the results: > >--------------------- > > >>Desktop$ cd ~/kernel >> >>kernel$ rm toto >>rm: détruire fichier régulier `toto'? o >> >>kernel$ df . >>Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur >>/dev/hda8 925M 748M 177M 81% /home/laurent/kernel >> >>kernel$ grep /dev/hda8 /rpoc/mounts >>grep: /rpoc/mounts: Aucun fichier ou répertoire de ce type >> >>kernel$ grep /dev/hda8 /proc/mounts >>/dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0 >> >>kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync >>25+0 enregistrements lus. >>25+0 enregistrements écrits. >>0.00user 2.89system 0:17.18elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k >>0inputs+0outputs (0major+252minor)pagefaults 0swaps >>0.00user 0.00system 2:19.91elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >>0inputs+0outputs (0major+191minor)pagefaults 0swaps >> >>kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync >>25+0 enregistrements lus. >>25+0 enregistrements écrits. >>0.00user 2.96system 1:16.42elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k >>0inputs+0outputs (0major+252minor)pagefaults 0swaps >>0.00user 0.00system 0:08.70elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >>0inputs+0outputs (0major+190minor)pagefaults 0swaps >> >> >--------------------- > >I tried to run an "iostat 10" simultaneously with dd+sync. I >attached the output. Hope this helps. >~~ >laurent > > >------------------------------------------------------------------------ > >Le script a débuté sur mer 22 mar 2006 19:12:56 CET >Desktop$ cd ~/kernel >kernel$ >kernel$ sleep 15 && echo SYNC && sync && echo DD && time dd if=/dev/zero of=toto bs=4M count=25 && echo SYNC && time sync && echo END & >[1] 4657 >kernel$ iostat -t 10 /dev/hda8 >Linux 2.6.16-rc6-mm2 (antares.localdomain) 22.03.2006 > >Heure: 19:13:32 >avg-cpu: %user %nice %system %iowait %idle > 5,01 0,02 11,07 4,45 79,46 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 5,34 0,27 217,58 1297 1026592 > >Heure: 19:13:42 >avg-cpu: %user %nice %system %iowait %idle > 0,10 0,00 0,20 0,20 99,50 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 0,00 0,00 0,00 0 0 > >SYNC >DD >Heure: 19:13:52 >avg-cpu: %user %nice %system %iowait %idle > 1,50 0,00 79,32 8,29 10,89 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 20,38 3,20 1202,00 32 12032 > >Heure: 19:14:02 >avg-cpu: %user %nice %system %iowait %idle > 2,30 0,00 81,08 16,62 0,00 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 33,53 0,00 1398,20 0 13968 > >Heure: 19:14:12 >avg-cpu: %user %nice %system %iowait %idle > 1,90 0,00 88,51 9,59 0,00 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 25,27 0,00 893,51 0 8944 > >Heure: 19:14:22 >avg-cpu: %user %nice %system %iowait %idle > 3,19 0,00 85,63 11,18 0,00 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 27,35 0,00 1288,62 0 12912 > >Heure: 19:14:32 >avg-cpu: %user %nice %system %iowait %idle > 0,80 0,00 90,01 9,19 0,00 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 25,17 0,00 800,00 0 8008 > >Heure: 19:14:42 >avg-cpu: %user %nice %system %iowait %idle > 0,30 0,00 74,93 24,78 0,00 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 54,35 0,00 3138,46 0 31416 > >Heure: 19:14:52 >avg-cpu: %user %nice %system %iowait %idle > 0,20 0,00 81,62 18,18 0,00 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 50,75 0,00 1324,28 0 13256 > >Heure: 19:15:02 >avg-cpu: %user %nice %system %iowait %idle > 0,60 0,00 71,60 27,80 0,00 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 76,30 0,00 2363,20 0 23632 > >Heure: 19:15:12 >avg-cpu: %user %nice %system %iowait %idle > 1,10 0,00 29,77 68,93 0,20 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 123,78 0,00 3275,12 0 32784 > >25+0 enregistrements lus. >25+0 enregistrements écrits. >0.00user 2.94system 1:29.83elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+252minor)pagefaults 0swaps >SYNC >Heure: 19:15:22 >avg-cpu: %user %nice %system %iowait %idle > 2,90 0,00 76,60 19,10 1,40 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 55,10 0,80 1435,20 8 14352 > >0.00user 0.00system 0:17.41elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k >0inputs+0outputs (0major+190minor)pagefaults 0swaps >END >Heure: 19:15:32 >avg-cpu: %user %nice %system %iowait %idle > 0,10 0,00 31,73 42,14 26,03 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 94,19 0,00 3402,60 0 33992 > >Heure: 19:15:42 >avg-cpu: %user %nice %system %iowait %idle > 0,10 0,00 0,00 0,10 99,80 > >Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >hda8 0,00 0,00 0,00 0 0 > > >^C >[1]+ Done sleep 15 && echo SYNC && sync && echo DD && time dd if=/dev/zero of=toto bs=4M count=25 && echo SYNC && time sync && echo END >kernel$ exit > >Script complété sur mer 22 mar 2006 19:15:46 CET > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-22 19:04 ` Hans Reiser @ 2006-03-23 18:44 ` Jindrich Makovicka 2006-03-23 21:32 ` Nate Diller 2006-03-28 20:19 ` Laurent Riffard 1 sibling, 1 reply; 14+ messages in thread From: Jindrich Makovicka @ 2006-03-23 18:44 UTC (permalink / raw) To: Hans Reiser; +Cc: Laurent Riffard, reiserfs-list, vs, Alexander Zarochentcev Hans Reiser wrote: > Instead of using sync, could you increase the size of the files you > write so that they are 10x ram size? > > I have a suspicion we are slow at sync.... I am not sure why, but I > have seen other data where sync was slow for us, and maybe we need to > optimize that code path. My impression is rather that the bottleneck is the amount of seeking the sync causes - would it be possible to reorder the write operations somehow, still preserving atomicity? Also, a comparison of Reiser4 performance on NCQ vs. non-NCQ drive could be interesting (I don't have NCQ, maybe that's the problem). Regards, -- Jindrich Makovicka ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-23 18:44 ` Jindrich Makovicka @ 2006-03-23 21:32 ` Nate Diller 0 siblings, 0 replies; 14+ messages in thread From: Nate Diller @ 2006-03-23 21:32 UTC (permalink / raw) To: Jindrich Makovicka Cc: Hans Reiser, Laurent Riffard, reiserfs-list, vs, Alexander Zarochentcev On 3/23/06, Jindrich Makovicka <makovick@kmlinux.fjfi.cvut.cz> wrote: > Hans Reiser wrote: > > Instead of using sync, could you increase the size of the files you > > write so that they are 10x ram size? > > > > I have a suspicion we are slow at sync.... I am not sure why, but I > > have seen other data where sync was slow for us, and maybe we need to > > optimize that code path. > > My impression is rather that the bottleneck is the amount of seeking the > sync causes - would it be possible to reorder the write operations > somehow, still preserving atomicity? yeah, the kernel is not good at ordering flush during sync, it would work much better if Reiser4 could just be told to do a full sync, and then have only one thread that climbs through the fake inode and squallocs everything. > Also, a comparison of Reiser4 performance on NCQ vs. non-NCQ drive could > be interesting (I don't have NCQ, maybe that's the problem). the scheduler could make a difference too, most likely in the area of 'congestion' threshold and handling. NATE ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-22 19:04 ` Hans Reiser 2006-03-23 18:44 ` Jindrich Makovicka @ 2006-03-28 20:19 ` Laurent Riffard 2006-03-28 20:34 ` Hans Reiser 2006-03-28 22:49 ` Philippe Gramoullé 1 sibling, 2 replies; 14+ messages in thread From: Laurent Riffard @ 2006-03-28 20:19 UTC (permalink / raw) To: Hans Reiser; +Cc: reiserfs-list, vs, Alexander Zarochentcev Le 22.03.2006 20:04, Hans Reiser a écrit : > Instead of using sync, could you increase the size of the files you > write so that they are 10x ram size? > > I have a suspicion we are slow at sync.... I am not sure why, but I > have seen other data where sync was slow for us, and maybe we need to > optimize that code path. > > Hans > Hello Hans, sorry for the long delay to reply. I'm not sure this is a problem with _sync_. I had concerns with sync on reiser4, but I was thinking it was related with the FS policy which try to do a lot of work in memory, and when syncing time comes, there is a huge amount of data to write back to disk. Well, I'm not a File Systems Expert, this is wild guess... Anyway, I didn't try to "write a file of size 10x ram size". My test case is a 925M FS with 100M free, and I have 512M ram. And I guess there is a problem with the Reiser4 internal data. It's an old FS, I made thousands of kernel builds on it. I allocated a new logical volume (about same size, same HD), made it a reiser4 FS and copied all my data on it. > [root@antares ~]# grep reiser4 /proc/mounts > /dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e22,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0 > /dev/vglinux1/test /mnt/disk reiser4 rw,atom_max_size=0x7e22,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0 > [root@antares ~]# grep -e hda8 -e dm-5 /proc/partitions > 3 8 995998 hda8 > 254 5 1003520 dm-5 > [root@antares ~]# cp -pRL /home/laurent/kernel/. /mnt/disk [cut errors with symbolic links] > [root@antares ~]# df /home/laurent/kernel /mnt/disk > Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur > /dev/hda8 925M 822M 103M 89% /home/laurent/kernel > /dev/mapper/vglinux1-test > 932M 800M 132M 86% /mnt/disk These FS are quite similars. Now guess what ? I filled these FS with dd. Original FS =========== # sync # time dd if=/dev/zero of=toto bs=1M count=150 103+0 enregistrements lus. 102+0 enregistrements écrits. Command exited with non-zero status 1 0.00user 2.94system 3:32.18elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k # time sync 0inputs+0outputs (0major+279minor)pagefaults 0swaps 0.00user 0.01system 0:00.18elapsed 6%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+191minor)pagefaults 0swaps Copy FS ======= # sync # time dd if=/dev/zero of=toto bs=1M count=150 dd: écriture de `toto': Aucun espace disponible sur le périphérique 132+0 enregistrements lus. 131+0 enregistrements écrits. Command exited with non-zero status 1 0.00user 4.08system 0:15.95elapsed 25%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+279minor)pagefaults 0swaps # time sync 0.00user 0.00system 0:00.17elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+190minor)pagefaults 0swaps disk$ See ? 3'30" versus 16". I packed the metadata of my original FS to a file, you can grab it from http://laurent.riffard.free.fr/kernel.reiser4.bz2 (6.7M). Note I was unable to unpack it : > # bunzip2 -c /tmp/kernel.reiser4.bz2 | debugfs.reiser4 -U /dev/vglinux1/test > debugfs.reiser4 1.0.5 > Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by reiser4progs/COPYING. > > Info : The metadata were packed with the reiser4progs 1.0.5. > Error: Can't unpack filesystem. ~~ laurent ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-28 20:19 ` Laurent Riffard @ 2006-03-28 20:34 ` Hans Reiser 2006-03-28 20:56 ` Hans Reiser 2006-03-28 22:49 ` Philippe Gramoullé 1 sibling, 1 reply; 14+ messages in thread From: Hans Reiser @ 2006-03-28 20:34 UTC (permalink / raw) To: Laurent Riffard; +Cc: reiserfs-list, vs, Alexander Zarochentcev, E. Gryaznova Laurent Riffard wrote: > > >See ? 3'30" versus 16". > >I packed the metadata of my original FS to a file, you can grab it >from http://laurent.riffard.free.fr/kernel.reiser4.bz2 (6.7M). > > Wow. We need to do the repacker. We might also need to examine whether there are optimizations in V3 block allocation we should apply to V4, but mostly we need the repacker. Ok, well, right after we go into the kernel it will be done. Thanks much Laurent, you did a great job of analyzing this for us. >Note I was unable to unpack it : > > >># bunzip2 -c /tmp/kernel.reiser4.bz2 | debugfs.reiser4 -U /dev/vglinux1/test >>debugfs.reiser4 1.0.5 >>Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by reiser4progs/COPYING. >> >>Info : The metadata were packed with the reiser4progs 1.0.5. >>Error: Can't unpack filesystem. >> >> > >~~ >laurent > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-28 20:34 ` Hans Reiser @ 2006-03-28 20:56 ` Hans Reiser 0 siblings, 0 replies; 14+ messages in thread From: Hans Reiser @ 2006-03-28 20:56 UTC (permalink / raw) To: Hans Reiser Cc: Laurent Riffard, reiserfs-list, vs, Alexander Zarochentcev, E. Gryaznova I think what this means is that after we have a repacker, we should gain performance advantages over our competition as a result. It is far easier for us to code an online repacker than it is for them. Hans ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-28 20:19 ` Laurent Riffard 2006-03-28 20:34 ` Hans Reiser @ 2006-03-28 22:49 ` Philippe Gramoullé 2006-03-29 6:16 ` Laurent Riffard 1 sibling, 1 reply; 14+ messages in thread From: Philippe Gramoullé @ 2006-03-28 22:49 UTC (permalink / raw) To: reiserfs-list Hello Laurent, On Tue, 28 Mar 2006 22:19:01 +0200 Laurent Riffard <laurent.riffard@free.fr> wrote: | These FS are quite similars. Now guess what ? I filled these FS with | dd. | | Original FS | =========== | # sync | # time dd if=/dev/zero of=toto bs=1M count=150 | 103+0 enregistrements lus. | 102+0 enregistrements écrits. | Command exited with non-zero status 1 Well, at least on my system , such a command exits with a 0 status Also, not a single of your posts in this thread has this error except this one and the one below | 0.00user 2.94system 3:32.18elapsed 1%CPU (0avgtext+0avgdata | 0maxresident)k | # time sync | 0inputs+0outputs (0major+279minor)pagefaults 0swaps | 0.00user 0.01system 0:00.18elapsed 6%CPU (0avgtext+0avgdata | 0maxresident)k | 0inputs+0outputs (0major+191minor)pagefaults 0swaps | | Copy FS | ======= | # sync | # time dd if=/dev/zero of=toto bs=1M count=150 | dd: écriture de `toto': Aucun espace disponible sur le périphérique | 132+0 enregistrements lus. | 131+0 enregistrements écrits. | Command exited with non-zero status 1 Here, i can understand the "exited with non-zero status 1" as "Aucun espace disponible sur le périphérique" is french for "No space left on device" | 0.00user 4.08system 0:15.95elapsed 25%CPU (0avgtext+0avgdata | 0maxresident)k | 0inputs+0outputs (1major+279minor)pagefaults 0swaps | # time sync | 0.00user 0.00system 0:00.17elapsed 0%CPU (0avgtext+0avgdata | 0maxresident)k | 0inputs+0outputs (0major+190minor)pagefaults 0swaps | disk$ | | See ? 3'30" versus 16". Are the 16" due to the fact that the above command exited earlier than it should have ? Thanks, Philippe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-28 22:49 ` Philippe Gramoullé @ 2006-03-29 6:16 ` Laurent Riffard 2006-03-29 14:30 ` Philippe Gramoullé 0 siblings, 1 reply; 14+ messages in thread From: Laurent Riffard @ 2006-03-29 6:16 UTC (permalink / raw) To: Philippe Gramoullé, reiserfs-list Le 29.03.2006 00:49, Philippe Gramoullé a écrit : > Hello Laurent, > > On Tue, 28 Mar 2006 22:19:01 +0200 > Laurent Riffard <laurent.riffard@free.fr> wrote: > > | These FS are quite similars. Now guess what ? I filled these FS with > | dd. > | > | Original FS > | =========== > | # sync > | # time dd if=/dev/zero of=toto bs=1M count=150 > | 103+0 enregistrements lus. > | 102+0 enregistrements écrits. > | Command exited with non-zero status 1 > > Well, at least on my system , such a command exits with a 0 status Oops ! I trimmed a line when I cut'n'paste. dd exits with the message "Aucun espace disponible sur le périphérique" which means "No space left on device". > Also, not a single of your posts in this thread has this error except this one > and the one below Yes I somewhat changed my test. On the previous test, I dd'd 100M to the FS. As the original FS and its copy have different free space, writing 100M on each FS results in 3M free versus 30M free. I did this test and I it takes about 2'20" versus 15". But I feared that one objects "It's because you have less free space on the first FS". So I found more conclusive to write 150M and thus to fill up the 2 FS. > | 0.00user 2.94system 3:32.18elapsed 1%CPU (0avgtext+0avgdata > | 0maxresident)k > | # time sync > | 0inputs+0outputs (0major+279minor)pagefaults 0swaps > | 0.00user 0.01system 0:00.18elapsed 6%CPU (0avgtext+0avgdata > | 0maxresident)k > | 0inputs+0outputs (0major+191minor)pagefaults 0swaps > | > | Copy FS > | ======= > | # sync > | # time dd if=/dev/zero of=toto bs=1M count=150 > | dd: écriture de `toto': Aucun espace disponible sur le périphérique > | 132+0 enregistrements lus. > | 131+0 enregistrements écrits. > | Command exited with non-zero status 1 > > Here, i can understand the "exited with non-zero status 1" as > "Aucun espace disponible sur le périphérique" is french for > "No space left on device" yes, see above. > | 0.00user 4.08system 0:15.95elapsed 25%CPU (0avgtext+0avgdata > | 0maxresident)k > | 0inputs+0outputs (1major+279minor)pagefaults 0swaps > | # time sync > | 0.00user 0.00system 0:00.17elapsed 0%CPU (0avgtext+0avgdata > | 0maxresident)k > | 0inputs+0outputs (0major+190minor)pagefaults 0swaps > | disk$ > | > | See ? 3'30" versus 16". > > Are the 16" due to the fact that the above command exited earlier than it should have ? No, (see above), both FS were filled up to 0M free space. > Thanks, > > Philippe > Thanks for your comments. I hope this made it clear. To be fair, you can see there is some differences between the 2 FS : - the copy is larger than the original one : 995998 bytes vs 1003520, which is 0.75% larger. - the original FS resides on an extended partition (/dev/hda8) while the copy is on a logical volume (/dev/vglinux1/test). This LV is hosted on /dev/hda4. I hope these differences do not have a high impact on the results. I'll try to dd of=/dev/hda8 if=/dev/vglinux1/test, and see if it makes some differences when I dd a 100M file on the FS. ~~ laurent ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-29 6:16 ` Laurent Riffard @ 2006-03-29 14:30 ` Philippe Gramoullé 0 siblings, 0 replies; 14+ messages in thread From: Philippe Gramoullé @ 2006-03-29 14:30 UTC (permalink / raw) To: Laurent Riffard; +Cc: reiserfs-list Hello Laurent, On Wed, 29 Mar 2006 08:16:55 +0200 Laurent Riffard <laurent.riffard@free.fr> wrote: | So I found more conclusive to write 150M and thus to fill up the 2 FS. Thanks for the explanations. Truly yours, Philippe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-22 7:41 ` Hans Reiser 2006-03-22 18:51 ` Laurent Riffard @ 2006-04-01 23:15 ` Pierre Etchemaïté 1 sibling, 0 replies; 14+ messages in thread From: Pierre Etchemaïté @ 2006-04-01 23:15 UTC (permalink / raw) To: reiserfs-list Le Tue, 21 Mar 2006 23:41:22 -0800, Hans Reiser <reiser@namesys.com> a écrit : > It may be that we need to port some of > the block allocation optimizations from V3 to V4 (Jeff's work) to help > with 90% full filesystems. Talking of that, I've read about a localized performance problem of reiserfs 3 in backuppc's mailing list (that is otherwise similar in performance with xfs for that task). I wonder if it was ever reported to you, as suggested in this mailing list... http://sourceforge.net/mailarchive/message.php?msg_id=8646808 My understanding is that backuppc is hitting reiserfs3 hard links worse case. Backuppc creates a huge pool of all versions of all files from all backups, compressed, organized using MD5 hashing (handling collisions of course), and hardlinked from their different backup views. [Some metadata is stored separately, so that several files with same content but different metadata can still be shared on disk. But I digress] At night, a sweeping process takes place to remove too old backups (according to user policy), and maybe check if some more background sharing/compression can be done. If I remember well, v3 puts directory entries and their corresponding inodes next to each other on disk. When hardlinks are created, new directory entries are created, pointing to the same inode. If the first directory entry is removed, the inode could be no longer stored near any of the entries pointing to it. Since backuppc is routinely removing directory entries in FIFO order, it's almost guaranteed to happen every time. Hence a very bad inodes distribution on disk after some time... I don't know what xfs does exactly (blocks of preallocated inodes ?) but it does better in this case. Hope it helps, Pierre. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4. 2006-03-21 21:16 2.6.16-rc6-mm2: slow writes on reiser4 Laurent Riffard 2006-03-22 7:41 ` Hans Reiser @ 2006-03-22 17:48 ` Laurent Riffard 1 sibling, 0 replies; 14+ messages in thread From: Laurent Riffard @ 2006-03-22 17:48 UTC (permalink / raw) To: reiserfs-list; +Cc: Hans Reiser [-- Attachment #1: Type: text/plain, Size: 1632 bytes --] [this is a second post, the first post seemed to never reach the list] Le 21.03.2006 22:16, Laurent Riffard a écrit : > Hello, > > Writing big files is very slow on reiser4 now. > > "dd if=/dev/zero of=toto bs=1k count=102400; sync" takes more than 2 minutes on > reiser4 fs, but only 15 seconds on reiserfs fs. Oops! My tests were not fair: my reiser4 FS was almost full while my reiserfs FS had plenty of free space. > kernel$ df . > Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur > /dev/hda8 925M 825M 101M 90% /home/laurent/kernel > kernel$ grep hda8 /proc/mounts > /dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0 [snip] > ~$ df . > Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur > /dev/mapper/vglinux1-lvhome > 7,0G 4,8G 2,3G 68% /home > ~$ grep lvhome /proc/mounts > /dev/vglinux1/lvhome /home reiserfs rw 0 0 So I did some tests with a 2GB logical volume. I formatted it (reiserfs/ext2/reiser4fs), I untared a copy of a kernel tree on this FS and I wrote a 100 MB file 3 times. FS Elapsed time for dd + sync reiserfs: 14.22s ext2: 11.12s reiser4: 19.71s I won't discuss why reiser4 is slow here. Maybe my tests are not so good. The interesting point of this thread is that reiser4 seems not to like the situations with little space available. I should replay these tests with 90% full FS (but it's time to go to bed now...). Below is attached the full logs of my tests. ~~ laurent [-- Attachment #2: typescript --] [-- Type: text/plain, Size: 9561 bytes --] Le script a d���but��� sur mar 21 mar 2006 22:40:11 CET [root@antares ~]# lvdisplay /dev/vglinux1/test --- Logical volume --- LV Name /dev/vglinux1/test VG Name vglinux1 LV UUID 1IdmIn-9Ne8-IZDS-PUYF-IyLP-Xz54-c50H2E LV Write Access read/write LV Status available # open 0 LV Size 2,00 GB Current LE 512 Segments 2 Allocation inherit Read ahead sectors 0 Block device 254:5 [root@antares ~]# mkfs.reiserfs /dev/vglinux1/test mkfs.reiserfs 3.6.19 (2003 www.namesys.com) A pair of credits: Yury Umanets (aka Umka) developed libreiser4, userspace plugins, and all userspace tools (reiser4progs) except of fsck. Hans Reiser was the project initiator, source of all funding for the first 5.5 years. He is the architect and official maintainer. Guessing about desired format.. Kernel 2.6.16-rc6-mm2 is running. Format 3.6 with standard journal Count of blocks on the device: 524288 Number of blocks consumed by mkreiserfs formatting process: 8227 Blocksize: 4096 Hash function used to sort names: "r5" Journal Size 8193 blocks (first block 18) Journal Max transaction length 1024 inode generation number: 0 UUID: 9f9b271b-1ed6-4ffb-9cde-243d3859b221 ATTENTION: YOU SHOULD REBOOT AFTER FDISK! ALL DATA WILL BE LOST ON '/dev/vglinux1/test'! Continue (y/n):y Initializing journal - 0%....20%....40%....60%....80%....100% Syncing..ok Tell your friends to use a kernel based on 2.4.18 or later, and especially not a kernel based on 2.4.9, when you use reiserFS. Have fun. ReiserFS is successfully created on /dev/vglinux1/test. [root@antares ~]# mount /dev/vglinux1/test /mnt/disk [root@antares ~]# cd /mnt/disk [root@antares disk]# tar -xjf ~laurent/.ketchup/linux-2.6.15.tar.bz2 [root@antares disk]# df . Sys. de fich. Tail. Occ. Disp. %Occ. Mont��� sur /dev/mapper/vglinux1-test 2,0G 260M 1,8G 13% /mnt/disk [root@antares disk]# ls linux-2.6.15 [root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements ���crits. 0.04user 1.60system 0:01.73elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+250minor)pagefaults 0swaps 0.00user 0.06system 0:15.53elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+191minor)pagefaults 0swaps [root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements ���crits. 0.02user 1.60system 0:01.65elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+251minor)pagefaults 0swaps 0.00user 0.04system 0:09.72elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+190minor)pagefaults 0swaps [root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements ���crits. 0.04user 1.63system 0:01.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+250minor)pagefaults 0swaps 0.00user 0.06system 0:15.58elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+192minor)pagefaults 0swaps [root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements ���crits. 0.03user 1.64system 0:01.70elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+251minor)pagefaults 0swaps 0.00user 0.05system 0:09.49elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+190minor)pagefaults 0swaps [root@antares disk]# cd [root@antares ~]# umount /mnt/disk [root@antares ~]# [root@antares ~]# mkfs.ext2 /dev/vglinux1/test mke2fs 1.38 (30-Jun-2005) ���tiquette de syst���me de fichiers= Type de syst���me d'exploitation: Linux Taille de bloc=4096 (log=2) Taille de fragment=4096 (log=2) 262144 inodes, 524288 blocs 26214 blocs (5.00%) r���serv��� pour le super usager Premier bloc de donn���es=0 16 bloc de groupes 32768 blocs par groupe, 32768 fragments par groupe 16384 inodes par groupe Archive du superbloc stock���e sur les blocs: 32768, 98304, 163840, 229376, 294912 ���criture des tables d'inodes: 0/16\b\b\b\b\b 1/16\b\b\b\b\b 2/16\b\b\b\b\b 3/16\b\b\b\b\b 4/16\b\b\b\b\b 5/16\b\b\b\b\b 6/16\b\b\b\b\b 7/16\b\b\b\b\b 8/16\b\b\b\b\b 9/16\b\b\b\b\b10/16\b\b\b\b\b11/16\b\b\b\b\b12/16\b\b\b\b\b13/16\b\b\b\b\b14/16\b\b\b\b\b15/16\b\b\b\b\bcompl���t��� ���criture des superblocs et de l'information de comptabilit��� du syst���me de fichiers: compl���t��� Le syst���me de fichiers sera automatiquement v���rifi��� tous les 35 montages ou apr���s 180 jours, selon la premi���re ���ventualit���. Utiliser tune2fs -c ou -i pour ���craser la valeur. [root@antares ~]# mount /dev/vglinux1/test /mnt/disk [root@antares ~]# cd /mnt/disk [root@antares disk]# tar -xjf ~laurent/.ketchup/linux-2.6.15.tar.bz2 [root@antares disk]# df . Sys. de fich. Tail. Occ. Disp. %Occ. Mont��� sur /dev/mapper/vglinux1-test 2,0G 253M 1,7G 14% /mnt/disk [root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements ���crits. 0.05user 0.68system 0:00.78elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+250minor)pagefaults 0swaps 0.00user 0.03system 0:10.43elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+190minor)pagefaults 0swaps [root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements ���crits. 0.04user 0.67system 0:00.72elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+251minor)pagefaults 0swaps 0.00user 0.02system 0:10.47elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+191minor)pagefaults 0swaps [root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements ���crits. 0.01user 0.69system 0:00.71elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+252minor)pagefaults 0swaps 0.00user 0.02system 0:10.26elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+190minor)pagefaults 0swaps [root@antares disk]# grep /mnt/disk /proc/mounts /dev/vglinux1/test /mnt/disk ext2 rw 0 0 [root@antares disk]# cd - [root@antares disk]# umount /mnt/disk [root@antares ~]# mkfs.reiser4 /dev/vglinux1/test mkfs.reiser4 1.0.5 Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by reiser4progs/COPYING. Block size 4096 will be used. Linux 2.6.16-rc6-mm2 is detected. Uuid 500241b7-0035-4254-91f4-cd6fb6c556a0 will be used. Reiser4 is going to be created on /dev/vglinux1/test. (Yes/No): ^[[K(Yes/No): Y^[[K(Yes/No): Ye^[[K(Yes/No): Yes^[[K(Yes/No): Yes Creating reiser4 on /dev/vglinux1/test ... Creating reiser4 on /dev/vglinux1/test ... done [root@antares ~]# mount /dev/vglinux1/test /mnt/disk [root@antares ~]# cd /mnt/disk [root@antares disk]# tar -xjf ~laurent/.ketchup/linux-2.6.15.tar.bz2 [root@antares disk]# grep /mnt/disk /proc/mounts /dev/vglinux1/test /mnt/disk reiser4 rw,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0 [root@antares disk]# df . Sys. de fich. Tail. Occ. Disp. %Occ. Mont��� sur /dev/mapper/vglinux1-test 2,0G 220M 1,7G 12% /mnt/disk [root@antares disk]# time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements ���crits. 0.10user 13.06system 0:18.88elapsed 69%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+250minor)pagefaults 0swaps 0.00user 0.05system 0:03.42elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+191minor)pagefaults 0swaps [root@antares disk]# time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements ���crits. 0.08user 12.88system 0:13.19elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+249minor)pagefaults 0swaps 0.00user 0.00system 0:05.19elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+191minor)pagefaults 0swaps [root@antares disk]# time dd if=/dev/zero of=toto bs=1k count=102400; time sync 102400+0 enregistrements lus. 102400+0 enregistrements ���crits. 0.09user 12.88system 0:13.17elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+251minor)pagefaults 0swaps 0.00user 0.00system 0:05.29elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+191minor)pagefaults 0swaps [root@antares disk]# [root@antares disk]# [root@antares disk]# exit Script compl���t��� sur mar 21 mar 2006 22:58:31 CET ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-04-01 23:15 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-03-21 21:16 2.6.16-rc6-mm2: slow writes on reiser4 Laurent Riffard 2006-03-22 7:41 ` Hans Reiser 2006-03-22 18:51 ` Laurent Riffard 2006-03-22 19:04 ` Hans Reiser 2006-03-23 18:44 ` Jindrich Makovicka 2006-03-23 21:32 ` Nate Diller 2006-03-28 20:19 ` Laurent Riffard 2006-03-28 20:34 ` Hans Reiser 2006-03-28 20:56 ` Hans Reiser 2006-03-28 22:49 ` Philippe Gramoullé 2006-03-29 6:16 ` Laurent Riffard 2006-03-29 14:30 ` Philippe Gramoullé 2006-04-01 23:15 ` Pierre Etchemaïté 2006-03-22 17:48 ` Laurent Riffard
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.