* reiser fs slow on mksf and mount
@ 2005-08-26 16:45 Ming Zhang
2005-08-26 17:04 ` Vladimir V. Saveliev
0 siblings, 1 reply; 28+ messages in thread
From: Ming Zhang @ 2005-08-26 16:45 UTC (permalink / raw)
To: reiserfs-list
Hi, folks
I am not sure if this is normal or not.
I try to create&use a reiserfs on a 8 disk raid0. Then I found that mkfs
need ~90 sec and mount need ~70 seconds.
Is there anything wrong on my side?
Thanks!
Ming
Detailed info followed.
---------------------------------------------------------------------
[root@bakstor2u root]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6]
[raid10] [faulty]
md0 : active raid0 sda[0] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb
[1]
3125690368 blocks 64k chunks
unused devices: <none>
[root@bakstor2u root]# time mkfs.reiserfs /dev/md0 -ff
mkfs.reiserfs 3.6.13 (2003 www.namesys.com)
<...>
Guessing about desired format.. Kernel 2.6.11.12 is running.
Format 3.6 with standard journal
Count of blocks on the device: 781422592
Number of blocks consumed by mkreiserfs formatting process: 32059
Blocksize: 4096
Hash function used to sort names: "r5"
Journal Size 8193 blocks (first block 18)
Journal Max transaction length 1024
inode generation number: 0
UUID: 98d990f3-d54f-43e3-9fde-8c9c9a6d3481
Initializing journal - 0%....20%....40%....60%....80%....100%
Syncing..ok
Tell your friends to use a kernel based on 2.4.18 or later, and
especially not a
kernel based on 2.4.9, when you use reiserFS. Have fun.
ReiserFS is successfully created on /dev/md0.
real 1m28.783s
user 0m0.151s
sys 0m0.398s
[root@bakstor2u root]# time mount /dev/md0 t
real 1m11.448s
user 0m0.000s
sys 0m0.225s
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: reiser fs slow on mksf and mount 2005-08-26 16:45 reiser fs slow on mksf and mount Ming Zhang @ 2005-08-26 17:04 ` Vladimir V. Saveliev 2005-08-26 17:08 ` Ming Zhang 2005-08-29 16:44 ` Ming Zhang 0 siblings, 2 replies; 28+ messages in thread From: Vladimir V. Saveliev @ 2005-08-26 17:04 UTC (permalink / raw) To: mingz; +Cc: reiserfs-list Hello Ming Zhang wrote: > Hi, folks > > I am not sure if this is normal or not. > > I try to create&use a reiserfs on a 8 disk raid0. Then I found that mkfs > need ~90 sec and mount need ~70 seconds. > > Is there anything wrong on my side? > Your device is too big. > Thanks! > > > Ming > > > > Detailed info followed. > --------------------------------------------------------------------- > [root@bakstor2u root]# cat /proc/mdstat > Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] > [raid10] [faulty] > md0 : active raid0 sda[0] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb > [1] > 3125690368 blocks 64k chunks > > unused devices: <none> > > [root@bakstor2u root]# time mkfs.reiserfs /dev/md0 -ff > mkfs.reiserfs 3.6.13 (2003 www.namesys.com) > > <...> > > Guessing about desired format.. Kernel 2.6.11.12 is running. > Format 3.6 with standard journal > Count of blocks on the device: 781422592 > Number of blocks consumed by mkreiserfs formatting process: 32059 > Blocksize: 4096 > Hash function used to sort names: "r5" > Journal Size 8193 blocks (first block 18) > Journal Max transaction length 1024 > inode generation number: 0 > UUID: 98d990f3-d54f-43e3-9fde-8c9c9a6d3481 > Initializing journal - 0%....20%....40%....60%....80%....100% > Syncing..ok > > Tell your friends to use a kernel based on 2.4.18 or later, and > especially not a > kernel based on 2.4.9, when you use reiserFS. Have fun. > > ReiserFS is successfully created on /dev/md0. > > real 1m28.783s > user 0m0.151s > sys 0m0.398s > Hmm, mkfs.reiserfs had to write 32059 blocks. It is about 131mb. 1m28s is too much for that. Could it be that some of disks used in that raid were not spinning when you started mkreiserfs? > [root@bakstor2u root]# time mount /dev/md0 t > > real 1m11.448s > user 0m0.000s > sys 0m0.225s > There is a patch to cure this problem. http://www.mail-archive.com/reiserfs-list@namesys.com/msg18442.html Please note that it is experimental one. > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-26 17:04 ` Vladimir V. Saveliev @ 2005-08-26 17:08 ` Ming Zhang 2005-08-26 17:15 ` Ming Zhang 2005-08-29 16:44 ` Ming Zhang 1 sibling, 1 reply; 28+ messages in thread From: Ming Zhang @ 2005-08-26 17:08 UTC (permalink / raw) To: Vladimir V. Saveliev; +Cc: reiserfs-list i think 3.2TB partition is not that big these days right? i would think some people that hold millions of files will have much larger partition than this. so you think this is a normal speed for such size partition? also iostat 1 -k shows that during mount, there are small size read happen each second. so this is because read meta data and metadata is not continuous on disk? ming On Fri, 2005-08-26 at 21:04 +0400, Vladimir V. Saveliev wrote: > Hello > > Ming Zhang wrote: > > Hi, folks > > > > I am not sure if this is normal or not. > > > > I try to create&use a reiserfs on a 8 disk raid0. Then I found that mkfs > > need ~90 sec and mount need ~70 seconds. > > > > Is there anything wrong on my side? > > > > Your device is too big. > > > Thanks! > > > > > > Ming > > > > > > > > Detailed info followed. > > --------------------------------------------------------------------- > > [root@bakstor2u root]# cat /proc/mdstat > > Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] > > [raid10] [faulty] > > md0 : active raid0 sda[0] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb > > [1] > > 3125690368 blocks 64k chunks > > > > unused devices: <none> > > > > [root@bakstor2u root]# time mkfs.reiserfs /dev/md0 -ff > > mkfs.reiserfs 3.6.13 (2003 www.namesys.com) > > > > <...> > > > > Guessing about desired format.. Kernel 2.6.11.12 is running. > > Format 3.6 with standard journal > > Count of blocks on the device: 781422592 > > Number of blocks consumed by mkreiserfs formatting process: 32059 > > Blocksize: 4096 > > Hash function used to sort names: "r5" > > Journal Size 8193 blocks (first block 18) > > Journal Max transaction length 1024 > > inode generation number: 0 > > UUID: 98d990f3-d54f-43e3-9fde-8c9c9a6d3481 > > Initializing journal - 0%....20%....40%....60%....80%....100% > > Syncing..ok > > > > Tell your friends to use a kernel based on 2.4.18 or later, and > > especially not a > > kernel based on 2.4.9, when you use reiserFS. Have fun. > > > > ReiserFS is successfully created on /dev/md0. > > > > real 1m28.783s > > user 0m0.151s > > sys 0m0.398s > > > > Hmm, mkfs.reiserfs had to write 32059 blocks. It is about 131mb. 1m28s is too much for that. > Could it be that some of disks used in that raid were not spinning when you started mkreiserfs? > > > [root@bakstor2u root]# time mount /dev/md0 t > > > > real 1m11.448s > > user 0m0.000s > > sys 0m0.225s > > > > There is a patch to cure this problem. > http://www.mail-archive.com/reiserfs-list@namesys.com/msg18442.html > Please note that it is experimental one. > > > > > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-26 17:08 ` Ming Zhang @ 2005-08-26 17:15 ` Ming Zhang 2005-08-26 17:32 ` Vladimir V. Saveliev 0 siblings, 1 reply; 28+ messages in thread From: Ming Zhang @ 2005-08-26 17:15 UTC (permalink / raw) To: Vladimir V. Saveliev; +Cc: reiserfs-list forget to mention that original mount is immediately after mkfs. so no files in fs at all. now i create 1048576 4KB files then umount and remount [root@bakstor2u root]# time mount /dev/md0 t real 1m10.971s user 0m0.001s sys 0m0.188s almost same as first one. this is from dmesg ReiserFS: md0: found reiserfs format "3.6" with standard journal ReiserFS: md0: using ordered data mode ReiserFS: md0: journal params: device md0, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 ReiserFS: md0: checking transaction log (md0) ReiserFS: md0: Using r5 hash to sort names so i could not understand why mount a fs with 0 files is same time with mount a fs with 1M files. Thanks! Ming On Fri, 2005-08-26 at 13:08 -0400, Ming Zhang wrote: > i think 3.2TB partition is not that big these days right? > > i would think some people that hold millions of files will have much > larger partition than this. > > so you think this is a normal speed for such size partition? > > also iostat 1 -k shows that during mount, there are small size read > happen each second. so this is because read meta data and metadata is > not continuous on disk? > > ming > > > On Fri, 2005-08-26 at 21:04 +0400, Vladimir V. Saveliev wrote: > > Hello > > > > Ming Zhang wrote: > > > Hi, folks > > > > > > I am not sure if this is normal or not. > > > > > > I try to create&use a reiserfs on a 8 disk raid0. Then I found that mkfs > > > need ~90 sec and mount need ~70 seconds. > > > > > > Is there anything wrong on my side? > > > > > > > Your device is too big. > > > > > Thanks! > > > > > > > > > Ming > > > > > > > > > > > > Detailed info followed. > > > --------------------------------------------------------------------- > > > [root@bakstor2u root]# cat /proc/mdstat > > > Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] > > > [raid10] [faulty] > > > md0 : active raid0 sda[0] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb > > > [1] > > > 3125690368 blocks 64k chunks > > > > > > unused devices: <none> > > > > > > [root@bakstor2u root]# time mkfs.reiserfs /dev/md0 -ff > > > mkfs.reiserfs 3.6.13 (2003 www.namesys.com) > > > > > > <...> > > > > > > Guessing about desired format.. Kernel 2.6.11.12 is running. > > > Format 3.6 with standard journal > > > Count of blocks on the device: 781422592 > > > Number of blocks consumed by mkreiserfs formatting process: 32059 > > > Blocksize: 4096 > > > Hash function used to sort names: "r5" > > > Journal Size 8193 blocks (first block 18) > > > Journal Max transaction length 1024 > > > inode generation number: 0 > > > UUID: 98d990f3-d54f-43e3-9fde-8c9c9a6d3481 > > > Initializing journal - 0%....20%....40%....60%....80%....100% > > > Syncing..ok > > > > > > Tell your friends to use a kernel based on 2.4.18 or later, and > > > especially not a > > > kernel based on 2.4.9, when you use reiserFS. Have fun. > > > > > > ReiserFS is successfully created on /dev/md0. > > > > > > real 1m28.783s > > > user 0m0.151s > > > sys 0m0.398s > > > > > > > Hmm, mkfs.reiserfs had to write 32059 blocks. It is about 131mb. 1m28s is too much for that. > > Could it be that some of disks used in that raid were not spinning when you started mkreiserfs? > > > > > [root@bakstor2u root]# time mount /dev/md0 t > > > > > > real 1m11.448s > > > user 0m0.000s > > > sys 0m0.225s > > > > > > > There is a patch to cure this problem. > > http://www.mail-archive.com/reiserfs-list@namesys.com/msg18442.html > > Please note that it is experimental one. > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-26 17:15 ` Ming Zhang @ 2005-08-26 17:32 ` Vladimir V. Saveliev 2005-08-26 18:07 ` Ming Zhang 2005-08-26 18:16 ` Ming Zhang 0 siblings, 2 replies; 28+ messages in thread From: Vladimir V. Saveliev @ 2005-08-26 17:32 UTC (permalink / raw) To: mingz; +Cc: reiserfs-list Hello Ming Zhang wrote: > forget to mention that original mount is immediately after mkfs. so no > files in fs at all. > > now i create 1048576 4KB files > > then umount and remount > > [root@bakstor2u root]# time mount /dev/md0 t > > real 1m10.971s > user 0m0.001s > sys 0m0.188s > > almost same as first one. > > this is from dmesg > > ReiserFS: md0: found reiserfs format "3.6" with standard journal > ReiserFS: md0: using ordered data mode > ReiserFS: md0: journal params: device md0, size 8192, journal first > block 18, max trans len 1024, max batch 900, max commit age 30, max > trans age 30 > ReiserFS: md0: checking transaction log (md0) > ReiserFS: md0: Using r5 hash to sort names > > > so i could not understand why mount a fs with 0 files is same time with > mount a fs with 1M files. > > Thanks! > > Ming > > > > On Fri, 2005-08-26 at 13:08 -0400, Ming Zhang wrote: >>i think 3.2TB partition is not that big these days right? >> >>i would think some people that hold millions of files will have much >>larger partition than this. >> >>so you think this is a normal speed for such size partition? >> No. That was joke actually. Speed of mkfs.reiserfs confuses me. As you did answer about whether all disks were spinning - I suppose that they were. Can you please run vmstat 1 or iostat 1 while mkreiserfs is running? >>also iostat 1 -k shows that during mount, there are small size read >>happen each second. so this is because read meta data and metadata is >>not continuous on disk? >> Yes. On mount reiserfs reads all bitmap blocks to memory. Those blocks are spread over whole disk. There is workaround for this problem. >>>> >>>There is a patch to cure this problem. >>>http://www.mail-archive.com/reiserfs-list@namesys.com/msg18442.html >>>Please note that it is experimental one. >>> >>>> >>>> > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-26 17:32 ` Vladimir V. Saveliev @ 2005-08-26 18:07 ` Ming Zhang 2005-08-26 18:16 ` Ming Zhang 1 sibling, 0 replies; 28+ messages in thread From: Ming Zhang @ 2005-08-26 18:07 UTC (permalink / raw) To: Vladimir V. Saveliev; +Cc: reiserfs-list [-- Attachment #1: Type: text/plain, Size: 1304 bytes --] On Fri, 2005-08-26 at 21:32 +0400, Vladimir V. Saveliev wrote: > Hello > > > On Fri, 2005-08-26 at 13:08 -0400, Ming Zhang wrote: > >>i think 3.2TB partition is not that big these days right? > >> > >>i would think some people that hold millions of files will have much > >>larger partition than this. > >> > >>so you think this is a normal speed for such size partition? > >> > > No. That was joke actually. > Speed of mkfs.reiserfs confuses me. :) i captured vmstat 1 and iostat 1. i bet u will be confused more. it is pretty big for email so i attached them. > As you did answer about whether all disks were spinning - I suppose that they were. no > Can you please run vmstat 1 or iostat 1 while mkreiserfs is running? > > >>also iostat 1 -k shows that during mount, there are small size read > >>happen each second. so this is because read meta data and metadata is > >>not continuous on disk? > >> > > Yes. On mount reiserfs reads all bitmap blocks to memory. Those blocks are spread over whole disk. There is workaround for this problem. > > >>>> > >>>There is a patch to cure this problem. > >>>http://www.mail-archive.com/reiserfs-list@namesys.com/msg18442.html > >>>Please note that it is experimental one. > >>> > >>>> Does reiserfs4 have any improvement on this? Thanks! Ming [-- Attachment #2: iostat-log.gz --] [-- Type: application/x-gzip, Size: 3293 bytes --] [-- Attachment #3: vmstat-log.gz --] [-- Type: application/x-gzip, Size: 1207 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-26 17:32 ` Vladimir V. Saveliev 2005-08-26 18:07 ` Ming Zhang @ 2005-08-26 18:16 ` Ming Zhang 2005-08-27 19:29 ` Jeff Mahoney 1 sibling, 1 reply; 28+ messages in thread From: Ming Zhang @ 2005-08-26 18:16 UTC (permalink / raw) To: Vladimir V. Saveliev; +Cc: reiserfs-list On Fri, 2005-08-26 at 21:32 +0400, Vladimir V. Saveliev wrote: one more question about this bitmap blocks are this bitmap data is pinned into system thus will not be swapped out? Thanks! Ming > Yes. On mount reiserfs reads all bitmap blocks to memory. Those blocks are spread over whole disk. There is workaround for this problem. > > >>>> > >>>There is a patch to cure this problem. > >>>http://www.mail-archive.com/reiserfs-list@namesys.com/msg18442.html > >>>Please note that it is experimental one. > >>> > >>>> > >>>> > > > > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-26 18:16 ` Ming Zhang @ 2005-08-27 19:29 ` Jeff Mahoney 2005-08-27 21:45 ` Christian Iversen ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Jeff Mahoney @ 2005-08-27 19:29 UTC (permalink / raw) To: mingz; +Cc: Vladimir V. Saveliev, reiserfs-list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ming Zhang wrote: > On Fri, 2005-08-26 at 21:32 +0400, Vladimir V. Saveliev wrote: > > > one more question about this bitmap blocks > > are this bitmap data is pinned into system thus will not be swapped out? Yes, any buffers/pages with active reference counts are kept in memory. Since the current reiserfs bitmap implementation keeps a reference until filesystem umount, the bitmaps are pinned. My dynamic bitmap patch fixes both of the problems you've posed so far. Mount time is reduced to O(1) time, since only the superblock and root node are read at mount time. On my system, it's something along the lines of 0.2s. Memory consumption is reduced also, because the bitmap block is released after the allocation/free that required it is complete. It's a relatively straightforward patch - the error handling I refer to is how to handle block read failures, which would only occur if your disk is failing. - -Jeff - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDEL81LPWxlyuTD7IRAnHWAJ9TmL/5ziKt4ObSUR9c/MJps4HydQCfXj0s Kd4u+V+PYZQydA/YqelyJvo= =pHCV -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-27 19:29 ` Jeff Mahoney @ 2005-08-27 21:45 ` Christian Iversen 2005-08-27 21:55 ` David Masover ` (2 more replies) 2005-08-27 22:53 ` Ming Zhang 2005-08-29 19:40 ` Hans Reiser 2 siblings, 3 replies; 28+ messages in thread From: Christian Iversen @ 2005-08-27 21:45 UTC (permalink / raw) To: reiserfs-list On Saturday 27 August 2005 21:29, Jeff Mahoney wrote: > Ming Zhang wrote: > > On Fri, 2005-08-26 at 21:32 +0400, Vladimir V. Saveliev wrote: > > > > > > one more question about this bitmap blocks > > > > are this bitmap data is pinned into system thus will not be swapped out? > > Yes, any buffers/pages with active reference counts are kept in memory. > Since the current reiserfs bitmap implementation keeps a reference until > filesystem umount, the bitmaps are pinned. > > My dynamic bitmap patch fixes both of the problems you've posed so far. > Mount time is reduced to O(1) time, since only the superblock and root > node are read at mount time. On my system, it's something along the > lines of 0.2s. Memory consumption is reduced also, because the bitmap > block is released after the allocation/free that required it is complete. I've been reading about this patch with quite some interest. Would you say it's stable enough for daily use? I have a terabyte array that takes forever to mount, and probably uses quite a bit of memory too. Another thing is that it can easily take several seconds to do "ls -l" on a directory with a 0-10 GB data in it. Is that normal? There's usually less than 50 files of test data, ranging in size from 200MB to 900MB. I've disabled atime updates, but that didn't help much. The controller and disks are plenty fast, so I feel something is amiss. -- Regards, Christian Iversen ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-27 21:45 ` Christian Iversen @ 2005-08-27 21:55 ` David Masover 2005-08-29 19:44 ` Hans Reiser 2005-08-27 22:54 ` Ming Zhang 2005-08-29 15:07 ` Jeff Mahoney 2 siblings, 1 reply; 28+ messages in thread From: David Masover @ 2005-08-27 21:55 UTC (permalink / raw) To: Christian Iversen; +Cc: reiserfs-list Christian Iversen wrote: > On Saturday 27 August 2005 21:29, Jeff Mahoney wrote: > >>Ming Zhang wrote: > Another thing is that it can easily take several seconds to do "ls -l" on a > directory with a 0-10 GB data in it. Is that normal? There's usually less > than 50 files of test data, ranging in size from 200MB to 900MB. I've > disabled atime updates, but that didn't help much. The controller and disks > are plenty fast, so I feel something is amiss. Interesting, I'd always assumed this was an issue with the lazy allocation. On my box, this meant that occasionally, I'd run into a situation where some random FS operation would take 5-10 seconds, because (I assumed) it would have been the random operation that used up enough RAM that the FS decided to flush. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-27 21:55 ` David Masover @ 2005-08-29 19:44 ` Hans Reiser 0 siblings, 0 replies; 28+ messages in thread From: Hans Reiser @ 2005-08-29 19:44 UTC (permalink / raw) To: David Masover; +Cc: Christian Iversen, reiserfs-list David Masover wrote: > Christian Iversen wrote: > >> On Saturday 27 August 2005 21:29, Jeff Mahoney wrote: >> >>> Ming Zhang wrote: >> > >> Another thing is that it can easily take several seconds to do "ls >> -l" on a directory with a 0-10 GB data in it. Is that normal? There's >> usually less than 50 files of test data, ranging in size from 200MB >> to 900MB. I've disabled atime updates, but that didn't help much. The >> controller and disks are plenty fast, so I feel something is amiss. > > > Interesting, I'd always assumed this was an issue with the lazy > allocation. On my box, this meant that occasionally, I'd run into a > situation where some random FS operation would take 5-10 seconds, > because (I assumed) it would have been the random operation that used > up enough RAM that the FS decided to flush. > > This performance issue should be fixed in reiser4, please give it a try. It has to do with where stat data get stored. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-27 21:45 ` Christian Iversen 2005-08-27 21:55 ` David Masover @ 2005-08-27 22:54 ` Ming Zhang 2005-08-29 15:07 ` Jeff Mahoney 2 siblings, 0 replies; 28+ messages in thread From: Ming Zhang @ 2005-08-27 22:54 UTC (permalink / raw) To: Christian Iversen; +Cc: reiserfs-list On Sat, 2005-08-27 at 23:45 +0200, Christian Iversen wrote: > On Saturday 27 August 2005 21:29, Jeff Mahoney wrote: > > Ming Zhang wrote: > > > On Fri, 2005-08-26 at 21:32 +0400, Vladimir V. Saveliev wrote: > > > > > > > > > one more question about this bitmap blocks > > > > > > are this bitmap data is pinned into system thus will not be swapped out? > > > > Yes, any buffers/pages with active reference counts are kept in memory. > > Since the current reiserfs bitmap implementation keeps a reference until > > filesystem umount, the bitmaps are pinned. > > > > My dynamic bitmap patch fixes both of the problems you've posed so far. > > Mount time is reduced to O(1) time, since only the superblock and root > > node are read at mount time. On my system, it's something along the > > lines of 0.2s. Memory consumption is reduced also, because the bitmap > > block is released after the allocation/free that required it is complete. > > I've been reading about this patch with quite some interest. Would you say > it's stable enough for daily use? I have a terabyte array that takes forever > to mount, and probably uses quite a bit of memory too. yes, i think this will be a problem for 32bit box with large size storage. > > Another thing is that it can easily take several seconds to do "ls -l" on a > directory with a 0-10 GB data in it. Is that normal? There's usually less > than 50 files of test data, ranging in size from 200MB to 900MB. I've > disabled atime updates, but that didn't help much. The controller and disks > are plenty fast, so I feel something is amiss. > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-27 21:45 ` Christian Iversen 2005-08-27 21:55 ` David Masover 2005-08-27 22:54 ` Ming Zhang @ 2005-08-29 15:07 ` Jeff Mahoney 2 siblings, 0 replies; 28+ messages in thread From: Jeff Mahoney @ 2005-08-29 15:07 UTC (permalink / raw) To: Christian Iversen; +Cc: reiserfs-list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Christian Iversen wrote: > On Saturday 27 August 2005 21:29, Jeff Mahoney wrote: >>My dynamic bitmap patch fixes both of the problems you've posed so far. >>Mount time is reduced to O(1) time, since only the superblock and root >>node are read at mount time. On my system, it's something along the >>lines of 0.2s. Memory consumption is reduced also, because the bitmap >>block is released after the allocation/free that required it is complete. > > I've been reading about this patch with quite some interest. Would you say > it's stable enough for daily use? I have a terabyte array that takes forever > to mount, and probably uses quite a bit of memory too. I've done testing with it, and it's been ok. I haven't heard any bug reports, but I haven't heard any "hey it works" comments either. It should be pretty stable - the only thing I'm concerned about is how it deals with I/O errors. That part of the patch will be dependent on my developing better i/o handling for reiserfs in general. That patch is also mostly done, but needs more testing. - -Jeff - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDEySgLPWxlyuTD7IRAoMXAJ9IozJtv2236HK1R/4bMMaK8/mrIACeMYUu tz7nPOMFTIY2dJ/j9DXjMG0= =lGWE -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-27 19:29 ` Jeff Mahoney 2005-08-27 21:45 ` Christian Iversen @ 2005-08-27 22:53 ` Ming Zhang 2005-08-28 0:01 ` Jeff Mahoney 2005-08-29 19:40 ` Hans Reiser 2 siblings, 1 reply; 28+ messages in thread From: Ming Zhang @ 2005-08-27 22:53 UTC (permalink / raw) To: Jeff Mahoney; +Cc: Vladimir V. Saveliev, reiserfs-list On Sat, 2005-08-27 at 15:29 -0400, Jeff Mahoney wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Ming Zhang wrote: > > On Fri, 2005-08-26 at 21:32 +0400, Vladimir V. Saveliev wrote: > > > > > > one more question about this bitmap blocks > > > > are this bitmap data is pinned into system thus will not be swapped out? > > Yes, any buffers/pages with active reference counts are kept in memory. > Since the current reiserfs bitmap implementation keeps a reference until > filesystem umount, the bitmaps are pinned. so u always keep a reference on all bitmap pages and thus they can not be umounted. yes, by this way it can be pretty fast to do any meta-data operation. but what if current RAM can not hold these bitmap. maybe u think if i want to use tens of TB storage, i of course will have 32GB RAM. :P > > My dynamic bitmap patch fixes both of the problems you've posed so far. > Mount time is reduced to O(1) time, since only the superblock and root > node are read at mount time. On my system, it's something along the > lines of 0.2s. Memory consumption is reduced also, because the bitmap > block is released after the allocation/free that required it is complete. ic. this is like a delayed lazy allocation. > > It's a relatively straightforward patch - the error handling I refer to > is how to handle block read failures, which would only occur if your > disk is failing. yes, understandable. thanks! Ming > > - -Jeff > > - -- > Jeff Mahoney > SuSE Labs > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.0 (GNU/Linux) > > iD8DBQFDEL81LPWxlyuTD7IRAnHWAJ9TmL/5ziKt4ObSUR9c/MJps4HydQCfXj0s > Kd4u+V+PYZQydA/YqelyJvo= > =pHCV > -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-27 22:53 ` Ming Zhang @ 2005-08-28 0:01 ` Jeff Mahoney 2005-08-28 15:40 ` Ming Zhang 0 siblings, 1 reply; 28+ messages in thread From: Jeff Mahoney @ 2005-08-28 0:01 UTC (permalink / raw) To: mingz; +Cc: Vladimir V. Saveliev, reiserfs-list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ming Zhang wrote: > On Sat, 2005-08-27 at 15:29 -0400, Jeff Mahoney wrote: >>>are this bitmap data is pinned into system thus will not be swapped out? >> Yes, any buffers/pages with active reference counts are kept in memory. >> Since the current reiserfs bitmap implementation keeps a reference until >> filesystem umount, the bitmaps are pinned. > so u always keep a reference on all bitmap pages and thus they can not > be umounted. yes, by this way it can be pretty fast to do any meta-data > operation. but what if current RAM can not hold these bitmap. maybe u > think if i want to use tens of TB storage, i of course will have 32GB > RAM. :P That's been the argument Hans has been presenting so far. I tend to disagree with it for several reasons: * It used to be unheard of for huge filesystems to be accessible to users without high priced RAID arrays. Now, with individual disk capacities over 500 GB and the ease of software raid, multiple-TB filesystems are quite possible on a desktop machine. These machines, as desktops, have no need for 32 GB of RAM, but have a very real demand for large storage. * We don't cache any other metadata (other than the superblock, which is standard practice) specially. In a mostly-reader environment, bitmaps would rank very low in importance for caching. In short, we shouldn't be demanding that users of large storage also have loads of memory for what I feel is a very shaky argument in the first place. >> My dynamic bitmap patch fixes both of the problems you've posed so far. >> Mount time is reduced to O(1) time, since only the superblock and root >> node are read at mount time. On my system, it's something along the >> lines of 0.2s. Memory consumption is reduced also, because the bitmap >> block is released after the allocation/free that required it is complete. > ic. this is like a delayed lazy allocation. If I understand you correctly, you're referring to allocate-on-flush, which is a different idea entirely. What the dynamic bitmap patch does is similar to what most other filesystems do -- treat the bitmaps as any other kind of metadata is treated and read it on-demand. Allocate-on-flush allows the filesystem to wait until the last possible moment to allocate the space on disk, which makes performance a little nicer, but more importantly, allows the allocator to allocate entire chunks of a file rather than a block-at-a-time. - -Jeff - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDEP7ALPWxlyuTD7IRAj2+AJ9f2tHTlV3Mrl7m0jDtn50p1egacwCgjbT9 2HSYlvH9sIG53JGjBHgT+9s= =Suud -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-28 0:01 ` Jeff Mahoney @ 2005-08-28 15:40 ` Ming Zhang 2005-08-28 18:44 ` Jeff Mahoney 0 siblings, 1 reply; 28+ messages in thread From: Ming Zhang @ 2005-08-28 15:40 UTC (permalink / raw) To: Jeff Mahoney; +Cc: Vladimir V. Saveliev, reiserfs On Sat, 2005-08-27 at 20:01 -0400, Jeff Mahoney wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Ming Zhang wrote: > > On Sat, 2005-08-27 at 15:29 -0400, Jeff Mahoney wrote: > >>>are this bitmap data is pinned into system thus will not be swapped out? > >> Yes, any buffers/pages with active reference counts are kept in memory. > >> Since the current reiserfs bitmap implementation keeps a reference until > >> filesystem umount, the bitmaps are pinned. > > so u always keep a reference on all bitmap pages and thus they can not > > be umounted. yes, by this way it can be pretty fast to do any meta-data > > operation. but what if current RAM can not hold these bitmap. maybe u > > think if i want to use tens of TB storage, i of course will have 32GB > > RAM. :P > > That's been the argument Hans has been presenting so far. I tend to > disagree with it for several reasons: > * It used to be unheard of for huge filesystems to be accessible to > users without high priced RAID arrays. Now, with individual disk > capacities over 500 GB and the ease of software raid, multiple-TB > filesystems are quite possible on a desktop machine. These machines, as > desktops, have no need for 32 GB of RAM, but have a very real demand for > large storage. yes, i have a 12*400GB SATA MD raid that want to store my huge number of pictures (i am not a good photographer, but a quick shooter.) all these files are named as DSCxxxxxx.jpg but not continuous since some are really bad so deleted. so i jump out these questions to stress the scalability of fs. > * We don't cache any other metadata (other than the superblock, which is > standard practice) specially. In a mostly-reader environment, bitmaps > would rank very low in importance for caching. could u explain a bit more on what is the purpose of these bitmaps? what is the relationship between these bitmap and other metadata? > > In short, we shouldn't be demanding that users of large storage also > have loads of memory for what I feel is a very shaky argument in the > first place. assumed i have 2GB or 4GB ram, which is not unbelievable for a desktop now. but can these RAM be used by 32BIT arch? > > >> My dynamic bitmap patch fixes both of the problems you've posed so far. > >> Mount time is reduced to O(1) time, since only the superblock and root > >> node are read at mount time. On my system, it's something along the > >> lines of 0.2s. Memory consumption is reduced also, because the bitmap > >> block is released after the allocation/free that required it is complete. > > ic. this is like a delayed lazy allocation. > > If I understand you correctly, you're referring to allocate-on-flush, > which is a different idea entirely. What the dynamic bitmap patch does > is similar to what most other filesystems do -- treat the bitmaps as any > other kind of metadata is treated and read it on-demand. yes, sorry, it is not lazy allocation. it is load (into ram) on-demand or lazy load. > Allocate-on-flush allows the filesystem to wait until the last possible > moment to allocate the space on disk, which makes performance a little > nicer, but more importantly, allows the allocator to allocate entire > chunks of a file rather than a block-at-a-time. are u talking about allocating space after that file content is cached in RAM and before need to be flushed? this is then like a write-any file system that you can write to a place where it is still continuous and near to current disk head (though latter is hard to achieve since it is hidden by LVM/MD/...). > > - -Jeff > > - -- > Jeff Mahoney > SuSE Labs > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.0 (GNU/Linux) > > iD8DBQFDEP7ALPWxlyuTD7IRAj2+AJ9f2tHTlV3Mrl7m0jDtn50p1egacwCgjbT9 > 2HSYlvH9sIG53JGjBHgT+9s= > =Suud > -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-28 15:40 ` Ming Zhang @ 2005-08-28 18:44 ` Jeff Mahoney 2005-08-29 12:39 ` Ming Zhang 0 siblings, 1 reply; 28+ messages in thread From: Jeff Mahoney @ 2005-08-28 18:44 UTC (permalink / raw) To: mingz; +Cc: Vladimir V. Saveliev, reiserfs -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ming Zhang wrote: > On Sat, 2005-08-27 at 20:01 -0400, Jeff Mahoney wrote: > >> yes, i have a 12*400GB SATA MD raid that want to store my huge number of >> pictures (i am not a good photographer, but a quick shooter.) all these >> files are named as DSCxxxxxx.jpg but not continuous since some are >> really bad so deleted. so i jump out these questions to stress the >> scalability of fs. ReiserFS is very well suited for lots of files, so you should be all set in that respect. > * We don't cache any other metadata (other than the superblock, which is > standard practice) specially. In a mostly-reader environment, bitmaps > would rank very low in importance for caching. > >> could u explain a bit more on what is the purpose of these bitmaps? what >> is the relationship between these bitmap and other metadata? The bitmaps are used to keep track of which blocks on disk are used, and which are available for allocation. Every (blocksize * 8) blocks, there is a block reserved to keep track of which blocks in that range are allocated or not. On a 4k block filesystem, that boils down to 1 4k block for every 128 MB. If a block is used, the bit corresponding to it is set. When the block is freed, the bit is cleared. Well there are a several kinds of metadata on the filesystem: The super block, the bitmaps, the journal, and the reiserfs s-tree itself. The journal and bitmaps are only used when writing to the filesystem. The superblock and s-tree are used for any filesystem access. The relationship is that before a file data block or an s-tree node can be allocated on disk, the bitmaps must be checked to see where the block can be allocated. >> assumed i have 2GB or 4GB ram, which is not unbelievable for a desktop >> now. but can these RAM be used by 32BIT arch? The RAM can be used, sure, but not for the bitmaps. I believe the buffer heads for the bitmaps need to come out of the memory < 1 GB. It would be possible to put the bitmaps in high memory (like any other data), but the patch to do so would likely be more involved than the dynamic bitmap patch, and still waste the memory anyway. > Allocate-on-flush allows the filesystem to wait until the last possible > moment to allocate the space on disk, which makes performance a little > nicer, but more importantly, allows the allocator to allocate entire > chunks of a file rather than a block-at-a-time. > >> are u talking about allocating space after that file content is cached >> in RAM and before need to be flushed? this is then like a write-any file >> system that you can write to a place where it is still continuous and >> near to current disk head (though latter is hard to achieve since it is >> hidden by LVM/MD/...). Well those are two different issues. Allocate on flush would try to keep the file as contiguous as possible, whether by appending (ideal) or by keeping the new chunk of data all together as a separate fragment rather than individual blocks scattered everywhere. As for writing near the current disk head, that is an operation that is performed by the block layer. It can make the best decisions on that, since it its at the lowest level of abstraction. It's entirely possible that a filesystem be mounted via file-loopback on an NFS mount. In that case, the local system has no information at all about where the disk head would be. - -Jeff - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDEgYNLPWxlyuTD7IRAlkgAKCM8evk+X3FSAw9IzEbeRKyo+N2tgCffyNi yNcc2G2Uy09X5zMI97AKaJc= =UzK+ -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-28 18:44 ` Jeff Mahoney @ 2005-08-29 12:39 ` Ming Zhang 2005-08-29 14:26 ` Jeff Mahoney 0 siblings, 1 reply; 28+ messages in thread From: Ming Zhang @ 2005-08-29 12:39 UTC (permalink / raw) To: Jeff Mahoney; +Cc: Vladimir V. Saveliev, reiserfs On Sun, 2005-08-28 at 14:44 -0400, Jeff Mahoney wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Ming Zhang wrote: > > On Sat, 2005-08-27 at 20:01 -0400, Jeff Mahoney wrote: > > > >> yes, i have a 12*400GB SATA MD raid that want to store my huge number of > >> pictures (i am not a good photographer, but a quick shooter.) all these > >> files are named as DSCxxxxxx.jpg but not continuous since some are > >> really bad so deleted. so i jump out these questions to stress the > >> scalability of fs. > > ReiserFS is very well suited for lots of files, so you should be all set > in that respect. ic. thx. > > > * We don't cache any other metadata (other than the superblock, which is > > standard practice) specially. In a mostly-reader environment, bitmaps > > would rank very low in importance for caching. > > > >> could u explain a bit more on what is the purpose of these bitmaps? what > >> is the relationship between these bitmap and other metadata? > > The bitmaps are used to keep track of which blocks on disk are used, and > which are available for allocation. Every (blocksize * 8) blocks, there here blocksize is 512bytes right from followed data? this comes from sector size? so what is the on disk layout? i asked this because when i have a slow mount reiserfs on top of RAID1, I saw many small write each second. I guess they scatter over whole disk. > is a block reserved to keep track of which blocks in that range are > allocated or not. On a 4k block filesystem, that boils down to 1 4k > block for every 128 MB. If a block is used, the bit corresponding to it > is set. When the block is freed, the bit is cleared. > > Well there are a several kinds of metadata on the filesystem: The super > block, the bitmaps, the journal, and the reiserfs s-tree itself. The > journal and bitmaps are only used when writing to the filesystem. The > superblock and s-tree are used for any filesystem access. The > relationship is that before a file data block or an s-tree node can be > allocated on disk, the bitmaps must be checked to see where the block > can be allocated. ic. so other meta-data is checked as other file systems. > > >> assumed i have 2GB or 4GB ram, which is not unbelievable for a desktop > >> now. but can these RAM be used by 32BIT arch? > > The RAM can be used, sure, but not for the bitmaps. I believe the buffer > heads for the bitmaps need to come out of the memory < 1 GB. It would be > possible to put the bitmaps in high memory (like any other data), but > the patch to do so would likely be more involved than the dynamic bitmap > patch, and still waste the memory anyway. yes, i also suspect this 1GB limit. So 64bit is the way and AMD64 is cheap anyway rite? > > > Allocate-on-flush allows the filesystem to wait until the last possible > > moment to allocate the space on disk, which makes performance a little > > nicer, but more importantly, allows the allocator to allocate entire > > chunks of a file rather than a block-at-a-time. > > > >> are u talking about allocating space after that file content is cached > >> in RAM and before need to be flushed? this is then like a write-any file > >> system that you can write to a place where it is still continuous and > >> near to current disk head (though latter is hard to achieve since it is > >> hidden by LVM/MD/...). > > Well those are two different issues. Allocate on flush would try to keep > the file as contiguous as possible, whether by appending (ideal) or by > keeping the new chunk of data all together as a separate fragment rather > than individual blocks scattered everywhere. As for writing near the ic. yes, delay to that point will have best knowledge. > current disk head, that is an operation that is performed by the block > layer. It can make the best decisions on that, since it its at the > lowest level of abstraction. It's entirely possible that a filesystem be > mounted via file-loopback on an NFS mount. In that case, the local > system has no information at all about where the disk head would be. yes, but then block layer will need another bitmap to track which block is used or not and also do a mapping again... the cost of layering? ming > > - -Jeff > > - -- > Jeff Mahoney > SuSE Labs > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.0 (GNU/Linux) > > iD8DBQFDEgYNLPWxlyuTD7IRAlkgAKCM8evk+X3FSAw9IzEbeRKyo+N2tgCffyNi > yNcc2G2Uy09X5zMI97AKaJc= > =UzK+ > -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-29 12:39 ` Ming Zhang @ 2005-08-29 14:26 ` Jeff Mahoney 2005-08-29 14:41 ` Ming Zhang 0 siblings, 1 reply; 28+ messages in thread From: Jeff Mahoney @ 2005-08-29 14:26 UTC (permalink / raw) To: Ming Zhang; +Cc: Vladimir V. Saveliev, reiserfs -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ming Zhang wrote: > On Sun, 2005-08-28 at 14:44 -0400, Jeff Mahoney wrote: >>* We don't cache any other metadata (other than the superblock, which is >>standard practice) specially. In a mostly-reader environment, bitmaps >>would rank very low in importance for caching. > >>>could u explain a bit more on what is the purpose of these bitmaps? what >>>is the relationship between these bitmap and other metadata? > The bitmaps are used to keep track of which blocks on disk are used, and > which are available for allocation. Every (blocksize * 8) blocks, there > >> here blocksize is 512bytes right from followed data? this comes from >> sector size? No. Block size is the declared filesystem blocksize, not the hardware sector size. It must be a power of 2, and 512-8192 bytes. The "standard" filesystem blocksize is 4k. If you've declared your block size as 512 bytes (using mkreiserfs -b 512), that would certainly be another source of performance issues. >> so what is the on disk layout? i asked this because when i have a slow >> mount reiserfs on top of RAID1, I saw many small write each second. I >> guess they scatter over whole disk. Well two things occur on mount: Reading the bitmaps causes a read every 128M to occur, and replaying the journal can cause up to 8192 block writes to occur. Replaying the journal is generally pretty quick. Reading the bitmaps on a large filesystem can take a while. This is the issue you originally asked about. > is a block reserved to keep track of which blocks in that range are > allocated or not. On a 4k block filesystem, that boils down to 1 4k > block for every 128 MB. If a block is used, the bit corresponding to it > is set. When the block is freed, the bit is cleared. > > Well there are a several kinds of metadata on the filesystem: The super > block, the bitmaps, the journal, and the reiserfs s-tree itself. The > journal and bitmaps are only used when writing to the filesystem. The > superblock and s-tree are used for any filesystem access. The > relationship is that before a file data block or an s-tree node can be > allocated on disk, the bitmaps must be checked to see where the block > can be allocated. > >> ic. so other meta-data is checked as other file systems. No. The bitmaps and journal are still part of the same filesystem. They are just not part of the s-tree. >>>assumed i have 2GB or 4GB ram, which is not unbelievable for a desktop >>>now. but can these RAM be used by 32BIT arch? > The RAM can be used, sure, but not for the bitmaps. I believe the buffer > heads for the bitmaps need to come out of the memory < 1 GB. It would be > possible to put the bitmaps in high memory (like any other data), but > the patch to do so would likely be more involved than the dynamic bitmap > patch, and still waste the memory anyway. > >> yes, i also suspect this 1GB limit. So 64bit is the way and AMD64 is >> cheap anyway rite? Personally, I think so. > current disk head, that is an operation that is performed by the block > layer. It can make the best decisions on that, since it its at the > lowest level of abstraction. It's entirely possible that a filesystem be > mounted via file-loopback on an NFS mount. In that case, the local > system has no information at all about where the disk head would be. > >> yes, but then block layer will need another bitmap to track which block >> is used or not and also do a mapping again... > >> the cost of layering? The ideas of "in use" and "available" are purely filesystem abstractions to keep track of where we already have filesystem data/metadata. The block layer doesn't know or care about them - it's just a collection of blocks that the user may do whatever they please with. Now, not to confuse the issue, but the example of a loopback-mounted filesystem can cause an allocation if the host file is sparse, but that's really a corner case. - -Jeff - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDExsrLPWxlyuTD7IRAvGmAJ9QU16I2oz/kkCbqwdeGcIgkey8TgCgqS8s lI6YzJEJ20j5LiheAqw6eoE= =YD9V -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-29 14:26 ` Jeff Mahoney @ 2005-08-29 14:41 ` Ming Zhang 2005-08-29 14:51 ` Jeff Mahoney 0 siblings, 1 reply; 28+ messages in thread From: Ming Zhang @ 2005-08-29 14:41 UTC (permalink / raw) To: Jeff Mahoney; +Cc: Vladimir V. Saveliev, reiserfs On Mon, 2005-08-29 at 10:26 -0400, Jeff Mahoney wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Ming Zhang wrote: > > On Sun, 2005-08-28 at 14:44 -0400, Jeff Mahoney wrote: > >>* We don't cache any other metadata (other than the superblock, which is > >>standard practice) specially. In a mostly-reader environment, bitmaps > >>would rank very low in importance for caching. > > > >>>could u explain a bit more on what is the purpose of these bitmaps? what > >>>is the relationship between these bitmap and other metadata? > > The bitmaps are used to keep track of which blocks on disk are used, and > > which are available for allocation. Every (blocksize * 8) blocks, there > > > >> here blocksize is 512bytes right from followed data? this comes from > >> sector size? > > No. Block size is the declared filesystem blocksize, not the hardware > sector size. It must be a power of 2, and 512-8192 bytes. The "standard" > filesystem blocksize is 4k. If you've declared your block size as 512 > bytes (using mkreiserfs -b 512), that would certainly be another source > of performance issues. so 1 block per bit, thus (blocksize * 8) block per block. > > >> so what is the on disk layout? i asked this because when i have a slow > >> mount reiserfs on top of RAID1, I saw many small write each second. I > >> guess they scatter over whole disk. > > Well two things occur on mount: Reading the bitmaps causes a read every > 128M to occur, and replaying the journal can cause up to 8192 block > writes to occur. Replaying the journal is generally pretty quick. > Reading the bitmaps on a large filesystem can take a while. This is the > issue you originally asked about. since that is a newly formatted fs, there is no journal to replay. because that FS is big with 3.2TB, if bitmap is not continuous on disks, then the read is like a random read to read around total ~100MB 4K piece from disk. so this is why it is slow? any way to store these bitmap together? > > > is a block reserved to keep track of which blocks in that range are > > allocated or not. On a 4k block filesystem, that boils down to 1 4k > > block for every 128 MB. If a block is used, the bit corresponding to it > > is set. When the block is freed, the bit is cleared. > > > > Well there are a several kinds of metadata on the filesystem: The super > > block, the bitmaps, the journal, and the reiserfs s-tree itself. The > > journal and bitmaps are only used when writing to the filesystem. The > > superblock and s-tree are used for any filesystem access. The > > relationship is that before a file data block or an s-tree node can be > > allocated on disk, the bitmaps must be checked to see where the block > > can be allocated. > > > >> ic. so other meta-data is checked as other file systems. > > No. The bitmaps and journal are still part of the same filesystem. They > are just not part of the s-tree. yes. sorry i should say that file system still use s-tree to locate file data while bitmap is to assist the block allocation and journal is for consistency. > > >>>assumed i have 2GB or 4GB ram, which is not unbelievable for a desktop > >>>now. but can these RAM be used by 32BIT arch? > > The RAM can be used, sure, but not for the bitmaps. I believe the buffer > > heads for the bitmaps need to come out of the memory < 1 GB. It would be > > possible to put the bitmaps in high memory (like any other data), but > > the patch to do so would likely be more involved than the dynamic bitmap > > patch, and still waste the memory anyway. > > > >> yes, i also suspect this 1GB limit. So 64bit is the way and AMD64 is > >> cheap anyway rite? > > Personally, I think so. > > > current disk head, that is an operation that is performed by the block > > layer. It can make the best decisions on that, since it its at the > > lowest level of abstraction. It's entirely possible that a filesystem be > > mounted via file-loopback on an NFS mount. In that case, the local > > system has no information at all about where the disk head would be. > > > >> yes, but then block layer will need another bitmap to track which block > >> is used or not and also do a mapping again... > > > >> the cost of layering? > > The ideas of "in use" and "available" are purely filesystem abstractions > to keep track of where we already have filesystem data/metadata. The > block layer doesn't know or care about them - it's just a collection of > blocks that the user may do whatever they please with. Now, not to > confuse the issue, but the example of a loopback-mounted filesystem can > cause an allocation if the host file is sparse, but that's really a > corner case. > yes, that is cost worthy being paid. file system just need a set of blocks to working on... > - -Jeff > > - -- > Jeff Mahoney > SuSE Labs > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.0 (GNU/Linux) > > iD8DBQFDExsrLPWxlyuTD7IRAvGmAJ9QU16I2oz/kkCbqwdeGcIgkey8TgCgqS8s > lI6YzJEJ20j5LiheAqw6eoE= > =YD9V > -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-29 14:41 ` Ming Zhang @ 2005-08-29 14:51 ` Jeff Mahoney 2005-08-29 15:20 ` Ming Zhang 0 siblings, 1 reply; 28+ messages in thread From: Jeff Mahoney @ 2005-08-29 14:51 UTC (permalink / raw) To: mingz; +Cc: Vladimir V. Saveliev, reiserfs -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ming Zhang wrote: > On Mon, 2005-08-29 at 10:26 -0400, Jeff Mahoney wrote: > No. Block size is the declared filesystem blocksize, not the hardware > sector size. It must be a power of 2, and 512-8192 bytes. The "standard" > filesystem blocksize is 4k. If you've declared your block size as 512 > bytes (using mkreiserfs -b 512), that would certainly be another source > of performance issues. > >> so 1 block per bit, thus (blocksize * 8) block per block. Exactly. >> since that is a newly formatted fs, there is no journal to replay. >> because that FS is big with 3.2TB, if bitmap is not continuous on disks, >> then the read is like a random read to read around total ~100MB 4K piece >> from disk. so this is why it is slow? I need to look into this some more, but I suspect it may be related to congestion avoidance. The requests don't bind up in waiting for the data to come back, but, rather, allocating the request in the first place. >> any way to store these bitmap together? The "old" reiserfs disk format did exactly that. However, the gain realized (if any, see above) at mount time is quickly lost when the filesystem can no longer be dynamically expanded/shrunk, and if the bitmaps are actually read on-demand, then it causes needless seeks to the "bitmap secion." - -Jeff - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDEyDlLPWxlyuTD7IRArjZAJoCxQCJ8Qs4AM1OQZEJIhz1BvYwDQCeIRk+ VvRxXcyH1puW2vq1xDYygL0= =FVcM -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-29 14:51 ` Jeff Mahoney @ 2005-08-29 15:20 ` Ming Zhang 2005-08-29 15:28 ` Jeff Mahoney 0 siblings, 1 reply; 28+ messages in thread From: Ming Zhang @ 2005-08-29 15:20 UTC (permalink / raw) To: Jeff Mahoney; +Cc: Vladimir V. Saveliev, reiserfs On Mon, 2005-08-29 at 10:51 -0400, Jeff Mahoney wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Ming Zhang wrote: > > On Mon, 2005-08-29 at 10:26 -0400, Jeff Mahoney wrote: > > No. Block size is the declared filesystem blocksize, not the hardware > > sector size. It must be a power of 2, and 512-8192 bytes. The "standard" > > filesystem blocksize is 4k. If you've declared your block size as 512 > > bytes (using mkreiserfs -b 512), that would certainly be another source > > of performance issues. > > > >> so 1 block per bit, thus (blocksize * 8) block per block. > > Exactly. > > >> since that is a newly formatted fs, there is no journal to replay. > >> because that FS is big with 3.2TB, if bitmap is not continuous on disks, > >> then the read is like a random read to read around total ~100MB 4K piece > >> from disk. so this is why it is slow? > > I need to look into this some more, but I suspect it may be related to > congestion avoidance. The requests don't bind up in waiting for the data > to come back, but, rather, allocating the request in the first place. > > >> any way to store these bitmap together? > > The "old" reiserfs disk format did exactly that. However, the gain > realized (if any, see above) at mount time is quickly lost when the > filesystem can no longer be dynamically expanded/shrunk, and if the > bitmaps are actually read on-demand, then it causes needless seeks to > the "bitmap secion." but anyway the bitmap will not scattered all around the disk rite? so where i could find a document about this bitmap layout? also detailed information on whole file system layout? thanks. ming > > - -Jeff > > - -- > Jeff Mahoney > SuSE Labs > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.0 (GNU/Linux) > > iD8DBQFDEyDlLPWxlyuTD7IRArjZAJoCxQCJ8Qs4AM1OQZEJIhz1BvYwDQCeIRk+ > VvRxXcyH1puW2vq1xDYygL0= > =FVcM > -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-29 15:20 ` Ming Zhang @ 2005-08-29 15:28 ` Jeff Mahoney 2005-08-29 15:37 ` Ming Zhang 0 siblings, 1 reply; 28+ messages in thread From: Jeff Mahoney @ 2005-08-29 15:28 UTC (permalink / raw) To: mingz; +Cc: Vladimir V. Saveliev, reiserfs, Hans Reiser -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ming Zhang wrote: >>>any way to store these bitmap together? > The "old" reiserfs disk format did exactly that. However, the gain > realized (if any, see above) at mount time is quickly lost when the > filesystem can no longer be dynamically expanded/shrunk, and if the > bitmaps are actually read on-demand, then it causes needless seeks to > the "bitmap secion." > >> but anyway the bitmap will not scattered all around the disk rite? If they were grouped together, no. As I said, though, there are other reasons not to do that. >> so where i could find a document about this bitmap layout? also detailed >> information on whole file system layout? Since Reiser4 became the focus of Namesys development, Reiser3 information has been somewhat difficult to find on the web site. It's possible to find it using archive.org, however. Hans - would you consider restoring reiser3 information to the namesys web site? - -Jeff - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDEymZLPWxlyuTD7IRAv5wAJ9coGB6bChWSLyK1x7lB1LF2E4mIwCfUsEN 4nZnXzED8ytA5jiTkv0LbcE= =Tocq -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-29 15:28 ` Jeff Mahoney @ 2005-08-29 15:37 ` Ming Zhang 0 siblings, 0 replies; 28+ messages in thread From: Ming Zhang @ 2005-08-29 15:37 UTC (permalink / raw) To: Jeff Mahoney; +Cc: Vladimir V. Saveliev, reiserfs, Hans Reiser On Mon, 2005-08-29 at 11:28 -0400, Jeff Mahoney wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Ming Zhang wrote: > >>>any way to store these bitmap together? > > The "old" reiserfs disk format did exactly that. However, the gain > > realized (if any, see above) at mount time is quickly lost when the > > filesystem can no longer be dynamically expanded/shrunk, and if the > > bitmaps are actually read on-demand, then it causes needless seeks to > > the "bitmap secion." > > > >> but anyway the bitmap will not scattered all around the disk rite? > > If they were grouped together, no. As I said, though, there are other > reasons not to do that. > > >> so where i could find a document about this bitmap layout? also detailed > >> information on whole file system layout? > > Since Reiser4 became the focus of Namesys development, Reiser3 > information has been somewhat difficult to find on the web site. It's > possible to find it using archive.org, however. > > Hans - would you consider restoring reiser3 information to the namesys > web site? V4 information is great as well. but the namesys.com does not have detailed info. ming > > - -Jeff > > - -- > Jeff Mahoney > SuSE Labs > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.0 (GNU/Linux) > > iD8DBQFDEymZLPWxlyuTD7IRAv5wAJ9coGB6bChWSLyK1x7lB1LF2E4mIwCfUsEN > 4nZnXzED8ytA5jiTkv0LbcE= > =Tocq > -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-27 19:29 ` Jeff Mahoney 2005-08-27 21:45 ` Christian Iversen 2005-08-27 22:53 ` Ming Zhang @ 2005-08-29 19:40 ` Hans Reiser 2005-08-29 19:44 ` Jeff Mahoney 2 siblings, 1 reply; 28+ messages in thread From: Hans Reiser @ 2005-08-29 19:40 UTC (permalink / raw) To: Jeff Mahoney; +Cc: mingz, Vladimir V. Saveliev, reiserfs-list Did you ever look into my question about device congestion, and whether raising that limit would fix the bitmap loading time issue? Hans Jeff Mahoney wrote: > Ming Zhang wrote: > > >On Fri, 2005-08-26 at 21:32 +0400, Vladimir V. Saveliev wrote: > > > >one more question about this bitmap blocks > > >are this bitmap data is pinned into system thus will not be swapped out? > > > Yes, any buffers/pages with active reference counts are kept in memory. > Since the current reiserfs bitmap implementation keeps a reference until > filesystem umount, the bitmaps are pinned. > > My dynamic bitmap patch fixes both of the problems you've posed so far. > Mount time is reduced to O(1) time, since only the superblock and root > node are read at mount time. On my system, it's something along the > lines of 0.2s. Memory consumption is reduced also, because the bitmap > block is released after the allocation/free that required it is complete. > > It's a relatively straightforward patch - the error handling I refer to > is how to handle block read failures, which would only occur if your > disk is failing. > > -Jeff > > -- > Jeff Mahoney > SuSE Labs ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-29 19:40 ` Hans Reiser @ 2005-08-29 19:44 ` Jeff Mahoney 2005-08-29 19:53 ` Hans Reiser 0 siblings, 1 reply; 28+ messages in thread From: Jeff Mahoney @ 2005-08-29 19:44 UTC (permalink / raw) To: Hans Reiser; +Cc: mingz, Vladimir V. Saveliev, reiserfs-list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hans Reiser wrote: > Did you ever look into my question about device congestion, and whether > raising that limit would fix the bitmap loading time issue? I haven't really had the time recently. I'll look into it, but it doesn't change the fact that we're wasting RAM on huge filesystems. - -Jeff - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) iD8DBQFDE2WqLPWxlyuTD7IRAt8SAJ0b9JGq+vLlc4OXB+RSQUJsJIsfigCdFPVh 6UHE2ZO/z2cdXu/sQrkEYSY= =Gvd6 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-29 19:44 ` Jeff Mahoney @ 2005-08-29 19:53 ` Hans Reiser 0 siblings, 0 replies; 28+ messages in thread From: Hans Reiser @ 2005-08-29 19:53 UTC (permalink / raw) To: Jeff Mahoney; +Cc: mingz, Vladimir V. Saveliev, reiserfs-list, Nate Diller Jeff Mahoney wrote: > Hans Reiser wrote: > > >Did you ever look into my question about device congestion, and whether > >raising that limit would fix the bitmap loading time issue? > > > I haven't really had the time recently. I'll look into it, but it > doesn't change the fact that we're wasting RAM on huge filesystems. > > -Jeff > > -- > Jeff Mahoney > SuSE Labs Do you understand the argument, namely that it does not do any good to have N spindles if the device congestion limit prevents them from going in parallel? An easy way to test this would be to see if striped devices mount faster than concatenated ones. If yes, then there is crap io scheduler and/or raid device driver code to fix, and it matters more than the original issue at question. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: reiser fs slow on mksf and mount 2005-08-26 17:04 ` Vladimir V. Saveliev 2005-08-26 17:08 ` Ming Zhang @ 2005-08-29 16:44 ` Ming Zhang 1 sibling, 0 replies; 28+ messages in thread From: Ming Zhang @ 2005-08-29 16:44 UTC (permalink / raw) To: Vladimir V. Saveliev; +Cc: reiserfs just reproduced this on a 18GB SCSI disk. mount is still slow. so not related to RAID but only with bitmap. i just do modprobe aic7xxx, mkfs, then do mount, so the disk should be spin up. ming -------------------------------------- [root@sc420 root]# time mkfs.reiserfs -ff /dev/sdh1 mkfs.reiserfs 3.6.13 (2003 www.namesys.com) ... Guessing about desired format.. Kernel 2.6.12.4 is running. Format 3.6 with standard journal Count of blocks on the device: 4421872 Number of blocks consumed by mkreiserfs formatting process: 8346 Blocksize: 4096 Hash function used to sort names: "r5" Journal Size 8193 blocks (first block 18) Journal Max transaction length 1024 inode generation number: 0 UUID: b3de310a-b494-4415-a921-090d94f2f211 Initializing journal - 0%....20%....40%....60%....80%....100% Syncing..ok Tell your friends to use a kernel based on 2.4.18 or later, and especially not a kernel based on 2.4.9, when you use reiserFS. Have fun. ReiserFS is successfully created on /dev/sdh1. real 0m5.018s user 0m0.028s sys 0m0.134s [root@sc420 root]# time mount /dev/sdh1 t real 1m3.608s user 0m0.000s sys 0m0.052s [root@sc420 root]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda5 20161172 6417940 12719092 34% / /dev/sda3 194449 18990 165419 11% /boot none 127544 0 127544 0% /dev/shm /dev/sdh1 17686944 32840 17654104 1% /root/t [root@sc420 root]# rpm -q reiserfs-utils reiserfs-utils-3.6.13-1 On Fri, 2005-08-26 at 21:04 +0400, Vladimir V. Saveliev wrote: > Hello > > Ming Zhang wrote: > > Hi, folks > > > > I am not sure if this is normal or not. > > > > I try to create&use a reiserfs on a 8 disk raid0. Then I found that mkfs > > need ~90 sec and mount need ~70 seconds. > > > > Is there anything wrong on my side? > > > > Your device is too big. > > > Thanks! > > > > > > Ming > > > > > > > > Detailed info followed. > > --------------------------------------------------------------------- > > [root@bakstor2u root]# cat /proc/mdstat > > Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] > > [raid10] [faulty] > > md0 : active raid0 sda[0] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb > > [1] > > 3125690368 blocks 64k chunks > > > > unused devices: <none> > > > > [root@bakstor2u root]# time mkfs.reiserfs /dev/md0 -ff > > mkfs.reiserfs 3.6.13 (2003 www.namesys.com) > > > > <...> > > > > Guessing about desired format.. Kernel 2.6.11.12 is running. > > Format 3.6 with standard journal > > Count of blocks on the device: 781422592 > > Number of blocks consumed by mkreiserfs formatting process: 32059 > > Blocksize: 4096 > > Hash function used to sort names: "r5" > > Journal Size 8193 blocks (first block 18) > > Journal Max transaction length 1024 > > inode generation number: 0 > > UUID: 98d990f3-d54f-43e3-9fde-8c9c9a6d3481 > > Initializing journal - 0%....20%....40%....60%....80%....100% > > Syncing..ok > > > > Tell your friends to use a kernel based on 2.4.18 or later, and > > especially not a > > kernel based on 2.4.9, when you use reiserFS. Have fun. > > > > ReiserFS is successfully created on /dev/md0. > > > > real 1m28.783s > > user 0m0.151s > > sys 0m0.398s > > > > Hmm, mkfs.reiserfs had to write 32059 blocks. It is about 131mb. 1m28s is too much for that. > Could it be that some of disks used in that raid were not spinning when you started mkreiserfs? > > > [root@bakstor2u root]# time mount /dev/md0 t > > > > real 1m11.448s > > user 0m0.000s > > sys 0m0.225s > > > > There is a patch to cure this problem. > http://www.mail-archive.com/reiserfs-list@namesys.com/msg18442.html > Please note that it is experimental one. > > > > > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2005-08-29 19:53 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-08-26 16:45 reiser fs slow on mksf and mount Ming Zhang 2005-08-26 17:04 ` Vladimir V. Saveliev 2005-08-26 17:08 ` Ming Zhang 2005-08-26 17:15 ` Ming Zhang 2005-08-26 17:32 ` Vladimir V. Saveliev 2005-08-26 18:07 ` Ming Zhang 2005-08-26 18:16 ` Ming Zhang 2005-08-27 19:29 ` Jeff Mahoney 2005-08-27 21:45 ` Christian Iversen 2005-08-27 21:55 ` David Masover 2005-08-29 19:44 ` Hans Reiser 2005-08-27 22:54 ` Ming Zhang 2005-08-29 15:07 ` Jeff Mahoney 2005-08-27 22:53 ` Ming Zhang 2005-08-28 0:01 ` Jeff Mahoney 2005-08-28 15:40 ` Ming Zhang 2005-08-28 18:44 ` Jeff Mahoney 2005-08-29 12:39 ` Ming Zhang 2005-08-29 14:26 ` Jeff Mahoney 2005-08-29 14:41 ` Ming Zhang 2005-08-29 14:51 ` Jeff Mahoney 2005-08-29 15:20 ` Ming Zhang 2005-08-29 15:28 ` Jeff Mahoney 2005-08-29 15:37 ` Ming Zhang 2005-08-29 19:40 ` Hans Reiser 2005-08-29 19:44 ` Jeff Mahoney 2005-08-29 19:53 ` Hans Reiser 2005-08-29 16:44 ` Ming Zhang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.