* 2.6.16-rc6-mm2: slow writes on reiser4.
@ 2006-03-21 21:16 Laurent Riffard
2006-03-22 7:41 ` Hans Reiser
2006-03-22 17:48 ` Laurent Riffard
0 siblings, 2 replies; 14+ messages in thread
From: Laurent Riffard @ 2006-03-21 21:16 UTC (permalink / raw)
To: reiserfs-list
Hello,
Writing big files is very slow on reiser4 now.
"dd if=/dev/zero of=toto bs=1k count=102400; sync" takes more than 2 minutes on
reiser4 fs, but only 15 seconds on reiserfs fs.
Actually, writing on reiser4 is not uniformly slow, it seems to be blocked for
ages from time to time. I monitored the number of dirty pages from /proc/meminfo
an I hit sysrq-T when the system was stalling:
dd D 000017DE 0 21930 21929 (NOTLB)
d7169c74 e0c98b05 00000246 000017de 00000000 f396aa00 003d1249 d0b68140
d0b68030 f396aa00 003d1249 6d519e00 00000002 c0396434 d8bf8e30 d8bf8e38
00000246 d7169ca0 c0270f08 d0b68030 00000001 d0b68030 c0113b25 d8bf8e38
Call Trace:
[<c0270f08>] __down+0x81/0xdc
[<c026f3ba>] __down_failed+0xa/0x10
[<e0c91a62>] .text.lock.lock+0x15/0x1b [reiser4]
[<e0c90faf>] longterm_lock_znode+0x5b4/0x7b0 [reiser4]
[<e0cba16a>] cbk_level_lookup+0x8a/0x954 [reiser4]
[<e0cbb186>] traverse_tree+0x752/0xa0d [reiser4]
[<e0cbbbc2>] coord_by_handle+0x781/0x789 [reiser4]
[<e0cbbdb5>] object_lookup+0x1eb/0x230 [reiser4]
[<e0cdb201>] find_file_item+0x18d/0x1b7 [reiser4]
[<e0cdd873>] write_flow+0x208/0x6e1 [reiser4]
[<e0cde208>] write_unix_file+0x3d9/0x5b0 [reiser4]
[<c0147d36>] vfs_write+0x8a/0x133
[<c0148569>] sys_write+0x3b/0x60
[<c01029bb>] sysenter_past_esp+0x54/0x75
Below are the detailed test I ran. Feel free to ask for more information.
Reiser4 FS
==========
Desktop$ cd ~/kernel
kernel$ df .
Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
/dev/hda8 925M 825M 101M 90% /home/laurent/kernel
kernel$ grep hda8 /proc/mounts
/dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0
kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.06user 13.95system 1:42.09elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+250minor)pagefaults 0swaps
0.00user 0.00system 1:22.90elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.08user 14.01system 1:45.57elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+249minor)pagefaults 0swaps
0.00user 0.00system 0:09.78elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.06user 14.13system 2:18.27elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+251minor)pagefaults 0swaps
0.00user 0.00system 0:08.48elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.06user 14.27system 1:56.34elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+251minor)pagefaults 0swaps
0.00user 0.00system 0:10.46elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
Reiserfs FS
===========
kernel$ cd
~$ df .
Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
/dev/mapper/vglinux1-lvhome
7,0G 4,8G 2,3G 68% /home
[/dev/mapper/vglinux1-lvhome resides on /dev/hda4]
~$ grep lvhome /proc/mounts
/dev/vglinux1/lvhome /home reiserfs rw 0 0
~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.04user 1.75system 0:02.05elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+249minor)pagefaults 0swaps
0.00user 0.10system 0:12.93elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.04user 1.83system 0:01.98elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+250minor)pagefaults 0swaps
0.00user 0.16system 0:14.45elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements écrits.
0.04user 1.79system 0:01.95elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+250minor)pagefaults 0swaps
0.00user 0.10system 0:13.47elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
~~
laurent
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-21 21:16 2.6.16-rc6-mm2: slow writes on reiser4 Laurent Riffard
@ 2006-03-22 7:41 ` Hans Reiser
2006-03-22 18:51 ` Laurent Riffard
2006-04-01 23:15 ` Pierre Etchemaïté
2006-03-22 17:48 ` Laurent Riffard
1 sibling, 2 replies; 14+ messages in thread
From: Hans Reiser @ 2006-03-22 7:41 UTC (permalink / raw)
To: Laurent Riffard; +Cc: reiserfs-list, vs, Alexander Zarochentcev
Laurent Riffard wrote:
>Hello,
>
>Writing big files is very slow on reiser4 now.
>
>"dd if=/dev/zero of=toto bs=1k count=102400; sync"
>
try bs=4M, and tell me what happens. also try an empty fs, and an fs
that is equally full to reiserfs. Note that reiserfs in your test is
68% full vs. 90% full for V4. It may be that we need to port some of
the block allocation optimizations from V3 to V4 (Jeff's work) to help
with 90% full filesystems. Thanks for doing this. Real users always
teach me a lot when they test things differently from how I did.
Hans
> takes more than 2 minutes on
>reiser4 fs, but only 15 seconds on reiserfs fs.
>
>Actually, writing on reiser4 is not uniformly slow, it seems to be blocked for
>ages from time to time. I monitored the number of dirty pages from /proc/meminfo
>an I hit sysrq-T when the system was stalling:
>
>dd D 000017DE 0 21930 21929 (NOTLB)
> d7169c74 e0c98b05 00000246 000017de 00000000 f396aa00 003d1249 d0b68140
> d0b68030 f396aa00 003d1249 6d519e00 00000002 c0396434 d8bf8e30 d8bf8e38
> 00000246 d7169ca0 c0270f08 d0b68030 00000001 d0b68030 c0113b25 d8bf8e38
>Call Trace:
> [<c0270f08>] __down+0x81/0xdc
> [<c026f3ba>] __down_failed+0xa/0x10
> [<e0c91a62>] .text.lock.lock+0x15/0x1b [reiser4]
> [<e0c90faf>] longterm_lock_znode+0x5b4/0x7b0 [reiser4]
> [<e0cba16a>] cbk_level_lookup+0x8a/0x954 [reiser4]
> [<e0cbb186>] traverse_tree+0x752/0xa0d [reiser4]
> [<e0cbbbc2>] coord_by_handle+0x781/0x789 [reiser4]
> [<e0cbbdb5>] object_lookup+0x1eb/0x230 [reiser4]
> [<e0cdb201>] find_file_item+0x18d/0x1b7 [reiser4]
> [<e0cdd873>] write_flow+0x208/0x6e1 [reiser4]
> [<e0cde208>] write_unix_file+0x3d9/0x5b0 [reiser4]
> [<c0147d36>] vfs_write+0x8a/0x133
> [<c0148569>] sys_write+0x3b/0x60
> [<c01029bb>] sysenter_past_esp+0x54/0x75
>
>Below are the detailed test I ran. Feel free to ask for more information.
>
>Reiser4 FS
>==========
>
>Desktop$ cd ~/kernel
>
>kernel$ df .
>Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
>/dev/hda8 925M 825M 101M 90% /home/laurent/kernel
>
>kernel$ grep hda8 /proc/mounts
>/dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0
>
>kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
>102400+0 enregistrements lus.
>102400+0 enregistrements écrits.
>0.06user 13.95system 1:42.09elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+250minor)pagefaults 0swaps
>0.00user 0.00system 1:22.90elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+191minor)pagefaults 0swaps
>
>kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
>102400+0 enregistrements lus.
>102400+0 enregistrements écrits.
>0.08user 14.01system 1:45.57elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+249minor)pagefaults 0swaps
>0.00user 0.00system 0:09.78elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+191minor)pagefaults 0swaps
>
>kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
>102400+0 enregistrements lus.
>102400+0 enregistrements écrits.
>0.06user 14.13system 2:18.27elapsed 10%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+251minor)pagefaults 0swaps
>0.00user 0.00system 0:08.48elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+190minor)pagefaults 0swaps
>
>kernel$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
>102400+0 enregistrements lus.
>102400+0 enregistrements écrits.
>0.06user 14.27system 1:56.34elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+251minor)pagefaults 0swaps
>0.00user 0.00system 0:10.46elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+190minor)pagefaults 0swaps
>
>
>Reiserfs FS
>===========
>kernel$ cd
>
>~$ df .
>Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
>/dev/mapper/vglinux1-lvhome
> 7,0G 4,8G 2,3G 68% /home
>[/dev/mapper/vglinux1-lvhome resides on /dev/hda4]
>
>~$ grep lvhome /proc/mounts
>/dev/vglinux1/lvhome /home reiserfs rw 0 0
>
>~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
>102400+0 enregistrements lus.
>102400+0 enregistrements écrits.
>0.04user 1.75system 0:02.05elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+249minor)pagefaults 0swaps
>0.00user 0.10system 0:12.93elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+191minor)pagefaults 0swaps
>
>~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
>102400+0 enregistrements lus.
>102400+0 enregistrements écrits.
>0.04user 1.83system 0:01.98elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+250minor)pagefaults 0swaps
>0.00user 0.16system 0:14.45elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+191minor)pagefaults 0swaps
>
>~$ sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
>102400+0 enregistrements lus.
>102400+0 enregistrements écrits.
>0.04user 1.79system 0:01.95elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+250minor)pagefaults 0swaps
>0.00user 0.10system 0:13.47elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+190minor)pagefaults 0swaps
>
>~~
>laurent
>
>
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-21 21:16 2.6.16-rc6-mm2: slow writes on reiser4 Laurent Riffard
2006-03-22 7:41 ` Hans Reiser
@ 2006-03-22 17:48 ` Laurent Riffard
1 sibling, 0 replies; 14+ messages in thread
From: Laurent Riffard @ 2006-03-22 17:48 UTC (permalink / raw)
To: reiserfs-list; +Cc: Hans Reiser
[-- Attachment #1: Type: text/plain, Size: 1632 bytes --]
[this is a second post, the first post seemed to never reach the list]
Le 21.03.2006 22:16, Laurent Riffard a écrit :
> Hello,
>
> Writing big files is very slow on reiser4 now.
>
> "dd if=/dev/zero of=toto bs=1k count=102400; sync" takes more than 2 minutes on
> reiser4 fs, but only 15 seconds on reiserfs fs.
Oops! My tests were not fair: my reiser4 FS was almost full while my
reiserfs FS
had plenty of free space.
> kernel$ df .
> Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
> /dev/hda8 925M 825M 101M 90% /home/laurent/kernel
> kernel$ grep hda8 /proc/mounts
> /dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0
[snip]
> ~$ df .
> Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
> /dev/mapper/vglinux1-lvhome
> 7,0G 4,8G 2,3G 68% /home
> ~$ grep lvhome /proc/mounts
> /dev/vglinux1/lvhome /home reiserfs rw 0 0
So I did some tests with a 2GB logical volume. I formatted it
(reiserfs/ext2/reiser4fs), I untared a copy of a kernel tree on this FS
and I wrote a 100 MB file 3 times.
FS Elapsed time for dd + sync
reiserfs: 14.22s
ext2: 11.12s
reiser4: 19.71s
I won't discuss why reiser4 is slow here. Maybe my tests are not so
good. The
interesting point of this thread is that reiser4 seems not to like
the situations
with little space available. I should replay these tests with 90%
full FS (but it's
time to go to bed now...).
Below is attached the full logs of my tests.
~~
laurent
[-- Attachment #2: typescript --]
[-- Type: text/plain, Size: 9561 bytes --]
Le script a d���but��� sur mar 21 mar 2006 22:40:11 CET
[root@antares ~]# lvdisplay /dev/vglinux1/test
--- Logical volume ---
LV Name /dev/vglinux1/test
VG Name vglinux1
LV UUID 1IdmIn-9Ne8-IZDS-PUYF-IyLP-Xz54-c50H2E
LV Write Access read/write
LV Status available
# open 0
LV Size 2,00 GB
Current LE 512
Segments 2
Allocation inherit
Read ahead sectors 0
Block device 254:5
[root@antares ~]# mkfs.reiserfs /dev/vglinux1/test
mkfs.reiserfs 3.6.19 (2003 www.namesys.com)
A pair of credits:
Yury Umanets (aka Umka) developed libreiser4, userspace plugins, and all
userspace tools (reiser4progs) except of fsck.
Hans Reiser was the project initiator, source of all funding for the first 5.5
years. He is the architect and official maintainer.
Guessing about desired format.. Kernel 2.6.16-rc6-mm2 is running.
Format 3.6 with standard journal
Count of blocks on the device: 524288
Number of blocks consumed by mkreiserfs formatting process: 8227
Blocksize: 4096
Hash function used to sort names: "r5"
Journal Size 8193 blocks (first block 18)
Journal Max transaction length 1024
inode generation number: 0
UUID: 9f9b271b-1ed6-4ffb-9cde-243d3859b221
ATTENTION: YOU SHOULD REBOOT AFTER FDISK!
ALL DATA WILL BE LOST ON '/dev/vglinux1/test'!
Continue (y/n):y
Initializing journal - 0%....20%....40%....60%....80%....100%
Syncing..ok
Tell your friends to use a kernel based on 2.4.18 or later, and especially not a
kernel based on 2.4.9, when you use reiserFS. Have fun.
ReiserFS is successfully created on /dev/vglinux1/test.
[root@antares ~]# mount /dev/vglinux1/test /mnt/disk
[root@antares ~]# cd /mnt/disk
[root@antares disk]# tar -xjf ~laurent/.ketchup/linux-2.6.15.tar.bz2
[root@antares disk]# df .
Sys. de fich. Tail. Occ. Disp. %Occ. Mont��� sur
/dev/mapper/vglinux1-test
2,0G 260M 1,8G 13% /mnt/disk
[root@antares disk]# ls
linux-2.6.15
[root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements ���crits.
0.04user 1.60system 0:01.73elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+250minor)pagefaults 0swaps
0.00user 0.06system 0:15.53elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
[root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements ���crits.
0.02user 1.60system 0:01.65elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+251minor)pagefaults 0swaps
0.00user 0.04system 0:09.72elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
[root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements ���crits.
0.04user 1.63system 0:01.69elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+250minor)pagefaults 0swaps
0.00user 0.06system 0:15.58elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+192minor)pagefaults 0swaps
[root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements ���crits.
0.03user 1.64system 0:01.70elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+251minor)pagefaults 0swaps
0.00user 0.05system 0:09.49elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
[root@antares disk]# cd
[root@antares ~]# umount /mnt/disk
[root@antares ~]#
[root@antares ~]# mkfs.ext2 /dev/vglinux1/test
mke2fs 1.38 (30-Jun-2005)
���tiquette de syst���me de fichiers=
Type de syst���me d'exploitation: Linux
Taille de bloc=4096 (log=2)
Taille de fragment=4096 (log=2)
262144 inodes, 524288 blocs
26214 blocs (5.00%) r���serv��� pour le super usager
Premier bloc de donn���es=0
16 bloc de groupes
32768 blocs par groupe, 32768 fragments par groupe
16384 inodes par groupe
Archive du superbloc stock���e sur les blocs:
32768, 98304, 163840, 229376, 294912
���criture des tables d'inodes: 0/16\b\b\b\b\b 1/16\b\b\b\b\b 2/16\b\b\b\b\b 3/16\b\b\b\b\b 4/16\b\b\b\b\b 5/16\b\b\b\b\b 6/16\b\b\b\b\b 7/16\b\b\b\b\b 8/16\b\b\b\b\b 9/16\b\b\b\b\b10/16\b\b\b\b\b11/16\b\b\b\b\b12/16\b\b\b\b\b13/16\b\b\b\b\b14/16\b\b\b\b\b15/16\b\b\b\b\bcompl���t���
���criture des superblocs et de l'information de comptabilit��� du syst���me de fichiers: compl���t���
Le syst���me de fichiers sera automatiquement v���rifi��� tous les 35 montages ou apr���s
180 jours, selon la premi���re ���ventualit���. Utiliser tune2fs -c ou -i pour ���craser la valeur.
[root@antares ~]# mount /dev/vglinux1/test /mnt/disk
[root@antares ~]# cd /mnt/disk
[root@antares disk]# tar -xjf ~laurent/.ketchup/linux-2.6.15.tar.bz2
[root@antares disk]# df .
Sys. de fich. Tail. Occ. Disp. %Occ. Mont��� sur
/dev/mapper/vglinux1-test
2,0G 253M 1,7G 14% /mnt/disk
[root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements ���crits.
0.05user 0.68system 0:00.78elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+250minor)pagefaults 0swaps
0.00user 0.03system 0:10.43elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
[root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements ���crits.
0.04user 0.67system 0:00.72elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+251minor)pagefaults 0swaps
0.00user 0.02system 0:10.47elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
[root@antares disk]# sync; time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements ���crits.
0.01user 0.69system 0:00.71elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+252minor)pagefaults 0swaps
0.00user 0.02system 0:10.26elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
[root@antares disk]# grep /mnt/disk /proc/mounts
/dev/vglinux1/test /mnt/disk ext2 rw 0 0
[root@antares disk]# cd -
[root@antares disk]# umount /mnt/disk
[root@antares ~]# mkfs.reiser4 /dev/vglinux1/test
mkfs.reiser4 1.0.5
Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by
reiser4progs/COPYING.
Block size 4096 will be used.
Linux 2.6.16-rc6-mm2 is detected.
Uuid 500241b7-0035-4254-91f4-cd6fb6c556a0 will be used.
Reiser4 is going to be created on /dev/vglinux1/test.
(Yes/No): ^[[K(Yes/No): Y^[[K(Yes/No): Ye^[[K(Yes/No): Yes^[[K(Yes/No): Yes
Creating reiser4 on /dev/vglinux1/test ... Creating reiser4 on /dev/vglinux1/test ... done
[root@antares ~]# mount /dev/vglinux1/test /mnt/disk
[root@antares ~]# cd /mnt/disk
[root@antares disk]# tar -xjf ~laurent/.ketchup/linux-2.6.15.tar.bz2
[root@antares disk]# grep /mnt/disk /proc/mounts
/dev/vglinux1/test /mnt/disk reiser4 rw,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0
[root@antares disk]# df .
Sys. de fich. Tail. Occ. Disp. %Occ. Mont��� sur
/dev/mapper/vglinux1-test
2,0G 220M 1,7G 12% /mnt/disk
[root@antares disk]# time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements ���crits.
0.10user 13.06system 0:18.88elapsed 69%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+250minor)pagefaults 0swaps
0.00user 0.05system 0:03.42elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
[root@antares disk]# time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements ���crits.
0.08user 12.88system 0:13.19elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+249minor)pagefaults 0swaps
0.00user 0.00system 0:05.19elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
[root@antares disk]# time dd if=/dev/zero of=toto bs=1k count=102400; time sync
102400+0 enregistrements lus.
102400+0 enregistrements ���crits.
0.09user 12.88system 0:13.17elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+251minor)pagefaults 0swaps
0.00user 0.00system 0:05.29elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
[root@antares disk]#
[root@antares disk]#
[root@antares disk]# exit
Script compl���t��� sur mar 21 mar 2006 22:58:31 CET
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-22 7:41 ` Hans Reiser
@ 2006-03-22 18:51 ` Laurent Riffard
2006-03-22 19:04 ` Hans Reiser
2006-04-01 23:15 ` Pierre Etchemaïté
1 sibling, 1 reply; 14+ messages in thread
From: Laurent Riffard @ 2006-03-22 18:51 UTC (permalink / raw)
To: Hans Reiser; +Cc: reiserfs-list, vs, Alexander Zarochentcev
[-- Attachment #1: Type: text/plain, Size: 2572 bytes --]
Le 22.03.2006 08:41, Hans Reiser a écrit :
> Laurent Riffard wrote:
>
>
>>Hello,
>>
>>Writing big files is very slow on reiser4 now.
>>
>>"dd if=/dev/zero of=toto bs=1k count=102400; sync"
>>
>
> try bs=4M, and tell me what happens. also try an empty fs, and an fs
> that is equally full to reiserfs. Note that reiserfs in your test is
> 68% full vs. 90% full for V4. It may be that we need to port some of
> the block allocation optimizations from V3 to V4 (Jeff's work) to help
> with 90% full filesystems. Thanks for doing this. Real users always
> teach me a lot when they test things differently from how I did.
>
> Hans
Hello Hans,
Yesterday, I realized that my tests were not fair. So I did some
further tests trying to have the same situation for 3 different FS
(reiserfs/ext2/reiser4) and I sent the result to the list, but this
mail never reached the list. I have resent it.
As per your request, I tried to replay my dd test on my 90% full
reiser4 FS, using a 4M block size. Here are the results:
---------------------
> Desktop$ cd ~/kernel
>
> kernel$ rm toto
> rm: détruire fichier régulier `toto'? o
>
> kernel$ df .
> Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
> /dev/hda8 925M 748M 177M 81% /home/laurent/kernel
>
> kernel$ grep /dev/hda8 /rpoc/mounts
> grep: /rpoc/mounts: Aucun fichier ou répertoire de ce type
>
> kernel$ grep /dev/hda8 /proc/mounts
> /dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0
>
> kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync
> 25+0 enregistrements lus.
> 25+0 enregistrements écrits.
> 0.00user 2.89system 0:17.18elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+252minor)pagefaults 0swaps
> 0.00user 0.00system 2:19.91elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+191minor)pagefaults 0swaps
>
> kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync
> 25+0 enregistrements lus.
> 25+0 enregistrements écrits.
> 0.00user 2.96system 1:16.42elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+252minor)pagefaults 0swaps
> 0.00user 0.00system 0:08.70elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+190minor)pagefaults 0swaps
---------------------
I tried to run an "iostat 10" simultaneously with dd+sync. I
attached the output. Hope this helps.
~~
laurent
[-- Attachment #2: typescript --]
[-- Type: text/plain, Size: 4489 bytes --]
Le script a d���but��� sur mer 22 mar 2006 19:12:56 CET
Desktop$ cd ~/kernel
kernel$
kernel$ sleep 15 && echo SYNC && sync && echo DD && time dd if=/dev/zero of=toto bs=4M count=25 && echo SYNC && time sync && echo END &
[1] 4657
kernel$ iostat -t 10 /dev/hda8
Linux 2.6.16-rc6-mm2 (antares.localdomain) 22.03.2006
Heure: 19:13:32
avg-cpu: %user %nice %system %iowait %idle
5,01 0,02 11,07 4,45 79,46
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 5,34 0,27 217,58 1297 1026592
Heure: 19:13:42
avg-cpu: %user %nice %system %iowait %idle
0,10 0,00 0,20 0,20 99,50
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 0,00 0,00 0,00 0 0
SYNC
DD
Heure: 19:13:52
avg-cpu: %user %nice %system %iowait %idle
1,50 0,00 79,32 8,29 10,89
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 20,38 3,20 1202,00 32 12032
Heure: 19:14:02
avg-cpu: %user %nice %system %iowait %idle
2,30 0,00 81,08 16,62 0,00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 33,53 0,00 1398,20 0 13968
Heure: 19:14:12
avg-cpu: %user %nice %system %iowait %idle
1,90 0,00 88,51 9,59 0,00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 25,27 0,00 893,51 0 8944
Heure: 19:14:22
avg-cpu: %user %nice %system %iowait %idle
3,19 0,00 85,63 11,18 0,00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 27,35 0,00 1288,62 0 12912
Heure: 19:14:32
avg-cpu: %user %nice %system %iowait %idle
0,80 0,00 90,01 9,19 0,00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 25,17 0,00 800,00 0 8008
Heure: 19:14:42
avg-cpu: %user %nice %system %iowait %idle
0,30 0,00 74,93 24,78 0,00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 54,35 0,00 3138,46 0 31416
Heure: 19:14:52
avg-cpu: %user %nice %system %iowait %idle
0,20 0,00 81,62 18,18 0,00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 50,75 0,00 1324,28 0 13256
Heure: 19:15:02
avg-cpu: %user %nice %system %iowait %idle
0,60 0,00 71,60 27,80 0,00
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 76,30 0,00 2363,20 0 23632
Heure: 19:15:12
avg-cpu: %user %nice %system %iowait %idle
1,10 0,00 29,77 68,93 0,20
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 123,78 0,00 3275,12 0 32784
25+0 enregistrements lus.
25+0 enregistrements ���crits.
0.00user 2.94system 1:29.83elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+252minor)pagefaults 0swaps
SYNC
Heure: 19:15:22
avg-cpu: %user %nice %system %iowait %idle
2,90 0,00 76,60 19,10 1,40
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 55,10 0,80 1435,20 8 14352
0.00user 0.00system 0:17.41elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
END
Heure: 19:15:32
avg-cpu: %user %nice %system %iowait %idle
0,10 0,00 31,73 42,14 26,03
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 94,19 0,00 3402,60 0 33992
Heure: 19:15:42
avg-cpu: %user %nice %system %iowait %idle
0,10 0,00 0,00 0,10 99,80
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda8 0,00 0,00 0,00 0 0
^C
[1]+ Done sleep 15 && echo SYNC && sync && echo DD && time dd if=/dev/zero of=toto bs=4M count=25 && echo SYNC && time sync && echo END
kernel$ exit
Script compl���t��� sur mer 22 mar 2006 19:15:46 CET
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-22 18:51 ` Laurent Riffard
@ 2006-03-22 19:04 ` Hans Reiser
2006-03-23 18:44 ` Jindrich Makovicka
2006-03-28 20:19 ` Laurent Riffard
0 siblings, 2 replies; 14+ messages in thread
From: Hans Reiser @ 2006-03-22 19:04 UTC (permalink / raw)
To: Laurent Riffard; +Cc: reiserfs-list, vs, Alexander Zarochentcev
Instead of using sync, could you increase the size of the files you
write so that they are 10x ram size?
I have a suspicion we are slow at sync.... I am not sure why, but I
have seen other data where sync was slow for us, and maybe we need to
optimize that code path.
Hans
Laurent Riffard wrote:
>Le 22.03.2006 08:41, Hans Reiser a écrit :
>
>
>>Laurent Riffard wrote:
>>
>>
>>
>>
>>>Hello,
>>>
>>>Writing big files is very slow on reiser4 now.
>>>
>>>"dd if=/dev/zero of=toto bs=1k count=102400; sync"
>>>
>>>
>>>
>>try bs=4M, and tell me what happens. also try an empty fs, and an fs
>>that is equally full to reiserfs. Note that reiserfs in your test is
>>68% full vs. 90% full for V4. It may be that we need to port some of
>>the block allocation optimizations from V3 to V4 (Jeff's work) to help
>>with 90% full filesystems. Thanks for doing this. Real users always
>>teach me a lot when they test things differently from how I did.
>>
>>Hans
>>
>>
>
>Hello Hans,
>
>Yesterday, I realized that my tests were not fair. So I did some
>further tests trying to have the same situation for 3 different FS
>(reiserfs/ext2/reiser4) and I sent the result to the list, but this
>mail never reached the list. I have resent it.
>
>As per your request, I tried to replay my dd test on my 90% full
>reiser4 FS, using a 4M block size. Here are the results:
>
>---------------------
>
>
>>Desktop$ cd ~/kernel
>>
>>kernel$ rm toto
>>rm: détruire fichier régulier `toto'? o
>>
>>kernel$ df .
>>Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
>>/dev/hda8 925M 748M 177M 81% /home/laurent/kernel
>>
>>kernel$ grep /dev/hda8 /rpoc/mounts
>>grep: /rpoc/mounts: Aucun fichier ou répertoire de ce type
>>
>>kernel$ grep /dev/hda8 /proc/mounts
>>/dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e0c,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0
>>
>>kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync
>>25+0 enregistrements lus.
>>25+0 enregistrements écrits.
>>0.00user 2.89system 0:17.18elapsed 16%CPU (0avgtext+0avgdata 0maxresident)k
>>0inputs+0outputs (0major+252minor)pagefaults 0swaps
>>0.00user 0.00system 2:19.91elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
>>0inputs+0outputs (0major+191minor)pagefaults 0swaps
>>
>>kernel$ sync; time dd if=/dev/zero of=toto bs=4M count=25; time sync
>>25+0 enregistrements lus.
>>25+0 enregistrements écrits.
>>0.00user 2.96system 1:16.42elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
>>0inputs+0outputs (0major+252minor)pagefaults 0swaps
>>0.00user 0.00system 0:08.70elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
>>0inputs+0outputs (0major+190minor)pagefaults 0swaps
>>
>>
>---------------------
>
>I tried to run an "iostat 10" simultaneously with dd+sync. I
>attached the output. Hope this helps.
>~~
>laurent
>
>
>------------------------------------------------------------------------
>
>Le script a débuté sur mer 22 mar 2006 19:12:56 CET
>Desktop$ cd ~/kernel
>kernel$
>kernel$ sleep 15 && echo SYNC && sync && echo DD && time dd if=/dev/zero of=toto bs=4M count=25 && echo SYNC && time sync && echo END &
>[1] 4657
>kernel$ iostat -t 10 /dev/hda8
>Linux 2.6.16-rc6-mm2 (antares.localdomain) 22.03.2006
>
>Heure: 19:13:32
>avg-cpu: %user %nice %system %iowait %idle
> 5,01 0,02 11,07 4,45 79,46
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 5,34 0,27 217,58 1297 1026592
>
>Heure: 19:13:42
>avg-cpu: %user %nice %system %iowait %idle
> 0,10 0,00 0,20 0,20 99,50
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 0,00 0,00 0,00 0 0
>
>SYNC
>DD
>Heure: 19:13:52
>avg-cpu: %user %nice %system %iowait %idle
> 1,50 0,00 79,32 8,29 10,89
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 20,38 3,20 1202,00 32 12032
>
>Heure: 19:14:02
>avg-cpu: %user %nice %system %iowait %idle
> 2,30 0,00 81,08 16,62 0,00
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 33,53 0,00 1398,20 0 13968
>
>Heure: 19:14:12
>avg-cpu: %user %nice %system %iowait %idle
> 1,90 0,00 88,51 9,59 0,00
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 25,27 0,00 893,51 0 8944
>
>Heure: 19:14:22
>avg-cpu: %user %nice %system %iowait %idle
> 3,19 0,00 85,63 11,18 0,00
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 27,35 0,00 1288,62 0 12912
>
>Heure: 19:14:32
>avg-cpu: %user %nice %system %iowait %idle
> 0,80 0,00 90,01 9,19 0,00
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 25,17 0,00 800,00 0 8008
>
>Heure: 19:14:42
>avg-cpu: %user %nice %system %iowait %idle
> 0,30 0,00 74,93 24,78 0,00
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 54,35 0,00 3138,46 0 31416
>
>Heure: 19:14:52
>avg-cpu: %user %nice %system %iowait %idle
> 0,20 0,00 81,62 18,18 0,00
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 50,75 0,00 1324,28 0 13256
>
>Heure: 19:15:02
>avg-cpu: %user %nice %system %iowait %idle
> 0,60 0,00 71,60 27,80 0,00
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 76,30 0,00 2363,20 0 23632
>
>Heure: 19:15:12
>avg-cpu: %user %nice %system %iowait %idle
> 1,10 0,00 29,77 68,93 0,20
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 123,78 0,00 3275,12 0 32784
>
>25+0 enregistrements lus.
>25+0 enregistrements écrits.
>0.00user 2.94system 1:29.83elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+252minor)pagefaults 0swaps
>SYNC
>Heure: 19:15:22
>avg-cpu: %user %nice %system %iowait %idle
> 2,90 0,00 76,60 19,10 1,40
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 55,10 0,80 1435,20 8 14352
>
>0.00user 0.00system 0:17.41elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
>0inputs+0outputs (0major+190minor)pagefaults 0swaps
>END
>Heure: 19:15:32
>avg-cpu: %user %nice %system %iowait %idle
> 0,10 0,00 31,73 42,14 26,03
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 94,19 0,00 3402,60 0 33992
>
>Heure: 19:15:42
>avg-cpu: %user %nice %system %iowait %idle
> 0,10 0,00 0,00 0,10 99,80
>
>Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
>hda8 0,00 0,00 0,00 0 0
>
>
>^C
>[1]+ Done sleep 15 && echo SYNC && sync && echo DD && time dd if=/dev/zero of=toto bs=4M count=25 && echo SYNC && time sync && echo END
>kernel$ exit
>
>Script complété sur mer 22 mar 2006 19:15:46 CET
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-22 19:04 ` Hans Reiser
@ 2006-03-23 18:44 ` Jindrich Makovicka
2006-03-23 21:32 ` Nate Diller
2006-03-28 20:19 ` Laurent Riffard
1 sibling, 1 reply; 14+ messages in thread
From: Jindrich Makovicka @ 2006-03-23 18:44 UTC (permalink / raw)
To: Hans Reiser; +Cc: Laurent Riffard, reiserfs-list, vs, Alexander Zarochentcev
Hans Reiser wrote:
> Instead of using sync, could you increase the size of the files you
> write so that they are 10x ram size?
>
> I have a suspicion we are slow at sync.... I am not sure why, but I
> have seen other data where sync was slow for us, and maybe we need to
> optimize that code path.
My impression is rather that the bottleneck is the amount of seeking the
sync causes - would it be possible to reorder the write operations
somehow, still preserving atomicity?
Also, a comparison of Reiser4 performance on NCQ vs. non-NCQ drive could
be interesting (I don't have NCQ, maybe that's the problem).
Regards,
--
Jindrich Makovicka
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-23 18:44 ` Jindrich Makovicka
@ 2006-03-23 21:32 ` Nate Diller
0 siblings, 0 replies; 14+ messages in thread
From: Nate Diller @ 2006-03-23 21:32 UTC (permalink / raw)
To: Jindrich Makovicka
Cc: Hans Reiser, Laurent Riffard, reiserfs-list, vs,
Alexander Zarochentcev
On 3/23/06, Jindrich Makovicka <makovick@kmlinux.fjfi.cvut.cz> wrote:
> Hans Reiser wrote:
> > Instead of using sync, could you increase the size of the files you
> > write so that they are 10x ram size?
> >
> > I have a suspicion we are slow at sync.... I am not sure why, but I
> > have seen other data where sync was slow for us, and maybe we need to
> > optimize that code path.
>
> My impression is rather that the bottleneck is the amount of seeking the
> sync causes - would it be possible to reorder the write operations
> somehow, still preserving atomicity?
yeah, the kernel is not good at ordering flush during sync, it would
work much better if Reiser4 could just be told to do a full sync, and
then have only one thread that climbs through the fake inode and
squallocs everything.
> Also, a comparison of Reiser4 performance on NCQ vs. non-NCQ drive could
> be interesting (I don't have NCQ, maybe that's the problem).
the scheduler could make a difference too, most likely in the area of
'congestion' threshold and handling.
NATE
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-22 19:04 ` Hans Reiser
2006-03-23 18:44 ` Jindrich Makovicka
@ 2006-03-28 20:19 ` Laurent Riffard
2006-03-28 20:34 ` Hans Reiser
2006-03-28 22:49 ` Philippe Gramoullé
1 sibling, 2 replies; 14+ messages in thread
From: Laurent Riffard @ 2006-03-28 20:19 UTC (permalink / raw)
To: Hans Reiser; +Cc: reiserfs-list, vs, Alexander Zarochentcev
Le 22.03.2006 20:04, Hans Reiser a écrit :
> Instead of using sync, could you increase the size of the files you
> write so that they are 10x ram size?
>
> I have a suspicion we are slow at sync.... I am not sure why, but I
> have seen other data where sync was slow for us, and maybe we need to
> optimize that code path.
>
> Hans
>
Hello Hans, sorry for the long delay to reply.
I'm not sure this is a problem with _sync_. I had concerns with sync
on reiser4, but I was thinking it was related with the FS policy
which try to do a lot of work in memory, and when syncing time
comes, there is a huge amount of data to write back to disk.
Well, I'm not a File Systems Expert, this is wild guess...
Anyway, I didn't try to "write a file of size 10x ram size". My test
case is a 925M FS with 100M free, and I have 512M ram. And I guess
there is a problem with the Reiser4 internal data. It's an old FS, I
made thousands of kernel builds on it.
I allocated a new logical volume (about same size, same HD), made it
a reiser4 FS and copied all my data on it.
> [root@antares ~]# grep reiser4 /proc/mounts
> /dev/hda8 /home/laurent/kernel reiser4 rw,nosuid,nodev,atom_max_size=0x7e22,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0
> /dev/vglinux1/test /mnt/disk reiser4 rw,atom_max_size=0x7e22,atom_max_age=0x249f0,atom_min_size=0x100,atom_max_flushers=0x1,cbk_cache_slots=0x10 0 0
> [root@antares ~]# grep -e hda8 -e dm-5 /proc/partitions
> 3 8 995998 hda8
> 254 5 1003520 dm-5
> [root@antares ~]# cp -pRL /home/laurent/kernel/. /mnt/disk
[cut errors with symbolic links]
> [root@antares ~]# df /home/laurent/kernel /mnt/disk
> Sys. de fich. Tail. Occ. Disp. %Occ. Monté sur
> /dev/hda8 925M 822M 103M 89% /home/laurent/kernel
> /dev/mapper/vglinux1-test
> 932M 800M 132M 86% /mnt/disk
These FS are quite similars. Now guess what ? I filled these FS with
dd.
Original FS
===========
# sync
# time dd if=/dev/zero of=toto bs=1M count=150
103+0 enregistrements lus.
102+0 enregistrements écrits.
Command exited with non-zero status 1
0.00user 2.94system 3:32.18elapsed 1%CPU (0avgtext+0avgdata
0maxresident)k
# time sync
0inputs+0outputs (0major+279minor)pagefaults 0swaps
0.00user 0.01system 0:00.18elapsed 6%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+191minor)pagefaults 0swaps
Copy FS
=======
# sync
# time dd if=/dev/zero of=toto bs=1M count=150
dd: écriture de `toto': Aucun espace disponible sur le périphérique
132+0 enregistrements lus.
131+0 enregistrements écrits.
Command exited with non-zero status 1
0.00user 4.08system 0:15.95elapsed 25%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (1major+279minor)pagefaults 0swaps
# time sync
0.00user 0.00system 0:00.17elapsed 0%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+190minor)pagefaults 0swaps
disk$
See ? 3'30" versus 16".
I packed the metadata of my original FS to a file, you can grab it
from http://laurent.riffard.free.fr/kernel.reiser4.bz2 (6.7M).
Note I was unable to unpack it :
> # bunzip2 -c /tmp/kernel.reiser4.bz2 | debugfs.reiser4 -U /dev/vglinux1/test
> debugfs.reiser4 1.0.5
> Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by reiser4progs/COPYING.
>
> Info : The metadata were packed with the reiser4progs 1.0.5.
> Error: Can't unpack filesystem.
~~
laurent
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-28 20:19 ` Laurent Riffard
@ 2006-03-28 20:34 ` Hans Reiser
2006-03-28 20:56 ` Hans Reiser
2006-03-28 22:49 ` Philippe Gramoullé
1 sibling, 1 reply; 14+ messages in thread
From: Hans Reiser @ 2006-03-28 20:34 UTC (permalink / raw)
To: Laurent Riffard; +Cc: reiserfs-list, vs, Alexander Zarochentcev, E. Gryaznova
Laurent Riffard wrote:
>
>
>See ? 3'30" versus 16".
>
>I packed the metadata of my original FS to a file, you can grab it
>from http://laurent.riffard.free.fr/kernel.reiser4.bz2 (6.7M).
>
>
Wow. We need to do the repacker. We might also need to examine whether
there are optimizations in V3 block allocation we should apply to V4,
but mostly we need the repacker. Ok, well, right after we go into the
kernel it will be done.
Thanks much Laurent, you did a great job of analyzing this for us.
>Note I was unable to unpack it :
>
>
>># bunzip2 -c /tmp/kernel.reiser4.bz2 | debugfs.reiser4 -U /dev/vglinux1/test
>>debugfs.reiser4 1.0.5
>>Copyright (C) 2001, 2002, 2003, 2004 by Hans Reiser, licensing governed by reiser4progs/COPYING.
>>
>>Info : The metadata were packed with the reiser4progs 1.0.5.
>>Error: Can't unpack filesystem.
>>
>>
>
>~~
>laurent
>
>
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-28 20:34 ` Hans Reiser
@ 2006-03-28 20:56 ` Hans Reiser
0 siblings, 0 replies; 14+ messages in thread
From: Hans Reiser @ 2006-03-28 20:56 UTC (permalink / raw)
To: Hans Reiser
Cc: Laurent Riffard, reiserfs-list, vs, Alexander Zarochentcev,
E. Gryaznova
I think what this means is that after we have a repacker, we should gain
performance advantages over our competition as a result. It is far
easier for us to code an online repacker than it is for them.
Hans
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-28 20:19 ` Laurent Riffard
2006-03-28 20:34 ` Hans Reiser
@ 2006-03-28 22:49 ` Philippe Gramoullé
2006-03-29 6:16 ` Laurent Riffard
1 sibling, 1 reply; 14+ messages in thread
From: Philippe Gramoullé @ 2006-03-28 22:49 UTC (permalink / raw)
To: reiserfs-list
Hello Laurent,
On Tue, 28 Mar 2006 22:19:01 +0200
Laurent Riffard <laurent.riffard@free.fr> wrote:
| These FS are quite similars. Now guess what ? I filled these FS with
| dd.
|
| Original FS
| ===========
| # sync
| # time dd if=/dev/zero of=toto bs=1M count=150
| 103+0 enregistrements lus.
| 102+0 enregistrements écrits.
| Command exited with non-zero status 1
Well, at least on my system , such a command exits with a 0 status
Also, not a single of your posts in this thread has this error except this one
and the one below
| 0.00user 2.94system 3:32.18elapsed 1%CPU (0avgtext+0avgdata
| 0maxresident)k
| # time sync
| 0inputs+0outputs (0major+279minor)pagefaults 0swaps
| 0.00user 0.01system 0:00.18elapsed 6%CPU (0avgtext+0avgdata
| 0maxresident)k
| 0inputs+0outputs (0major+191minor)pagefaults 0swaps
|
| Copy FS
| =======
| # sync
| # time dd if=/dev/zero of=toto bs=1M count=150
| dd: écriture de `toto': Aucun espace disponible sur le périphérique
| 132+0 enregistrements lus.
| 131+0 enregistrements écrits.
| Command exited with non-zero status 1
Here, i can understand the "exited with non-zero status 1" as
"Aucun espace disponible sur le périphérique" is french for
"No space left on device"
| 0.00user 4.08system 0:15.95elapsed 25%CPU (0avgtext+0avgdata
| 0maxresident)k
| 0inputs+0outputs (1major+279minor)pagefaults 0swaps
| # time sync
| 0.00user 0.00system 0:00.17elapsed 0%CPU (0avgtext+0avgdata
| 0maxresident)k
| 0inputs+0outputs (0major+190minor)pagefaults 0swaps
| disk$
|
| See ? 3'30" versus 16".
Are the 16" due to the fact that the above command exited earlier than it should have ?
Thanks,
Philippe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-28 22:49 ` Philippe Gramoullé
@ 2006-03-29 6:16 ` Laurent Riffard
2006-03-29 14:30 ` Philippe Gramoullé
0 siblings, 1 reply; 14+ messages in thread
From: Laurent Riffard @ 2006-03-29 6:16 UTC (permalink / raw)
To: Philippe Gramoullé, reiserfs-list
Le 29.03.2006 00:49, Philippe Gramoullé a écrit :
> Hello Laurent,
>
> On Tue, 28 Mar 2006 22:19:01 +0200
> Laurent Riffard <laurent.riffard@free.fr> wrote:
>
> | These FS are quite similars. Now guess what ? I filled these FS with
> | dd.
> |
> | Original FS
> | ===========
> | # sync
> | # time dd if=/dev/zero of=toto bs=1M count=150
> | 103+0 enregistrements lus.
> | 102+0 enregistrements écrits.
> | Command exited with non-zero status 1
>
> Well, at least on my system , such a command exits with a 0 status
Oops ! I trimmed a line when I cut'n'paste. dd exits with the
message "Aucun espace disponible sur le périphérique" which means
"No space left on device".
> Also, not a single of your posts in this thread has this error except this one
> and the one below
Yes I somewhat changed my test. On the previous test, I dd'd 100M to
the FS.
As the original FS and its copy have different free space, writing
100M on each FS results in 3M free versus 30M free. I did this test
and I it takes about 2'20" versus 15". But I feared that one objects
"It's because you have less free space on the first FS".
So I found more conclusive to write 150M and thus to fill up the 2 FS.
> | 0.00user 2.94system 3:32.18elapsed 1%CPU (0avgtext+0avgdata
> | 0maxresident)k
> | # time sync
> | 0inputs+0outputs (0major+279minor)pagefaults 0swaps
> | 0.00user 0.01system 0:00.18elapsed 6%CPU (0avgtext+0avgdata
> | 0maxresident)k
> | 0inputs+0outputs (0major+191minor)pagefaults 0swaps
> |
> | Copy FS
> | =======
> | # sync
> | # time dd if=/dev/zero of=toto bs=1M count=150
> | dd: écriture de `toto': Aucun espace disponible sur le périphérique
> | 132+0 enregistrements lus.
> | 131+0 enregistrements écrits.
> | Command exited with non-zero status 1
>
> Here, i can understand the "exited with non-zero status 1" as
> "Aucun espace disponible sur le périphérique" is french for
> "No space left on device"
yes, see above.
> | 0.00user 4.08system 0:15.95elapsed 25%CPU (0avgtext+0avgdata
> | 0maxresident)k
> | 0inputs+0outputs (1major+279minor)pagefaults 0swaps
> | # time sync
> | 0.00user 0.00system 0:00.17elapsed 0%CPU (0avgtext+0avgdata
> | 0maxresident)k
> | 0inputs+0outputs (0major+190minor)pagefaults 0swaps
> | disk$
> |
> | See ? 3'30" versus 16".
>
> Are the 16" due to the fact that the above command exited earlier than it should have ?
No, (see above), both FS were filled up to 0M free space.
> Thanks,
>
> Philippe
>
Thanks for your comments. I hope this made it clear.
To be fair, you can see there is some differences between the 2 FS :
- the copy is larger than the original one : 995998 bytes vs
1003520, which is 0.75% larger.
- the original FS resides on an extended partition (/dev/hda8) while
the copy is on a logical volume (/dev/vglinux1/test). This LV is
hosted on /dev/hda4.
I hope these differences do not have a high impact on the results.
I'll try to dd of=/dev/hda8 if=/dev/vglinux1/test, and see if it
makes some differences when I dd a 100M file on the FS.
~~
laurent
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-29 6:16 ` Laurent Riffard
@ 2006-03-29 14:30 ` Philippe Gramoullé
0 siblings, 0 replies; 14+ messages in thread
From: Philippe Gramoullé @ 2006-03-29 14:30 UTC (permalink / raw)
To: Laurent Riffard; +Cc: reiserfs-list
Hello Laurent,
On Wed, 29 Mar 2006 08:16:55 +0200
Laurent Riffard <laurent.riffard@free.fr> wrote:
| So I found more conclusive to write 150M and thus to fill up the 2 FS.
Thanks for the explanations.
Truly yours,
Philippe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 2.6.16-rc6-mm2: slow writes on reiser4.
2006-03-22 7:41 ` Hans Reiser
2006-03-22 18:51 ` Laurent Riffard
@ 2006-04-01 23:15 ` Pierre Etchemaïté
1 sibling, 0 replies; 14+ messages in thread
From: Pierre Etchemaïté @ 2006-04-01 23:15 UTC (permalink / raw)
To: reiserfs-list
Le Tue, 21 Mar 2006 23:41:22 -0800, Hans Reiser <reiser@namesys.com> a écrit :
> It may be that we need to port some of
> the block allocation optimizations from V3 to V4 (Jeff's work) to help
> with 90% full filesystems.
Talking of that, I've read about a localized performance problem of
reiserfs 3 in backuppc's mailing list (that is otherwise similar in
performance with xfs for that task). I wonder if it was ever reported
to you, as suggested in this mailing list...
http://sourceforge.net/mailarchive/message.php?msg_id=8646808
My understanding is that backuppc is hitting reiserfs3 hard links worse
case.
Backuppc creates a huge pool of all versions of all files from all
backups, compressed, organized using MD5 hashing (handling collisions
of course), and hardlinked from their different backup views. [Some
metadata is stored separately, so that several files with same content
but different metadata can still be shared on disk. But I digress]
At night, a sweeping process takes place to remove too old backups
(according to user policy), and maybe check if some more background
sharing/compression can be done.
If I remember well, v3 puts directory entries and their corresponding
inodes next to each other on disk. When hardlinks are created, new
directory entries are created, pointing to the same inode. If the first
directory entry is removed, the inode could be no longer stored near
any of the entries pointing to it.
Since backuppc is routinely removing directory entries in FIFO order,
it's almost guaranteed to happen every time. Hence a very bad inodes
distribution on disk after some time...
I don't know what xfs does exactly (blocks of preallocated inodes ?) but
it does better in this case.
Hope it helps,
Pierre.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-04-01 23:15 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-21 21:16 2.6.16-rc6-mm2: slow writes on reiser4 Laurent Riffard
2006-03-22 7:41 ` Hans Reiser
2006-03-22 18:51 ` Laurent Riffard
2006-03-22 19:04 ` Hans Reiser
2006-03-23 18:44 ` Jindrich Makovicka
2006-03-23 21:32 ` Nate Diller
2006-03-28 20:19 ` Laurent Riffard
2006-03-28 20:34 ` Hans Reiser
2006-03-28 20:56 ` Hans Reiser
2006-03-28 22:49 ` Philippe Gramoullé
2006-03-29 6:16 ` Laurent Riffard
2006-03-29 14:30 ` Philippe Gramoullé
2006-04-01 23:15 ` Pierre Etchemaïté
2006-03-22 17:48 ` Laurent Riffard
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.