From: Ming Zhang <mingz@ele.uri.edu>
To: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>,
reiserfs <reiserfs-list@namesys.com>
Subject: Re: reiser fs slow on mksf and mount
Date: Mon, 29 Aug 2005 10:41:29 -0400 [thread overview]
Message-ID: <1125326490.5544.56.camel@localhost.localdomain> (raw)
In-Reply-To: <43131B2B.8020406@suse.com>
On Mon, 2005-08-29 at 10:26 -0400, Jeff Mahoney wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Ming Zhang wrote:
> > On Sun, 2005-08-28 at 14:44 -0400, Jeff Mahoney wrote:
> >>* We don't cache any other metadata (other than the superblock, which is
> >>standard practice) specially. In a mostly-reader environment, bitmaps
> >>would rank very low in importance for caching.
> >
> >>>could u explain a bit more on what is the purpose of these bitmaps? what
> >>>is the relationship between these bitmap and other metadata?
> > The bitmaps are used to keep track of which blocks on disk are used, and
> > which are available for allocation. Every (blocksize * 8) blocks, there
> >
> >> here blocksize is 512bytes right from followed data? this comes from
> >> sector size?
>
> No. Block size is the declared filesystem blocksize, not the hardware
> sector size. It must be a power of 2, and 512-8192 bytes. The "standard"
> filesystem blocksize is 4k. If you've declared your block size as 512
> bytes (using mkreiserfs -b 512), that would certainly be another source
> of performance issues.
so 1 block per bit, thus (blocksize * 8) block per block.
>
> >> so what is the on disk layout? i asked this because when i have a slow
> >> mount reiserfs on top of RAID1, I saw many small write each second. I
> >> guess they scatter over whole disk.
>
> Well two things occur on mount: Reading the bitmaps causes a read every
> 128M to occur, and replaying the journal can cause up to 8192 block
> writes to occur. Replaying the journal is generally pretty quick.
> Reading the bitmaps on a large filesystem can take a while. This is the
> issue you originally asked about.
since that is a newly formatted fs, there is no journal to replay.
because that FS is big with 3.2TB, if bitmap is not continuous on disks,
then the read is like a random read to read around total ~100MB 4K piece
from disk. so this is why it is slow?
any way to store these bitmap together?
>
> > is a block reserved to keep track of which blocks in that range are
> > allocated or not. On a 4k block filesystem, that boils down to 1 4k
> > block for every 128 MB. If a block is used, the bit corresponding to it
> > is set. When the block is freed, the bit is cleared.
> >
> > Well there are a several kinds of metadata on the filesystem: The super
> > block, the bitmaps, the journal, and the reiserfs s-tree itself. The
> > journal and bitmaps are only used when writing to the filesystem. The
> > superblock and s-tree are used for any filesystem access. The
> > relationship is that before a file data block or an s-tree node can be
> > allocated on disk, the bitmaps must be checked to see where the block
> > can be allocated.
> >
> >> ic. so other meta-data is checked as other file systems.
>
> No. The bitmaps and journal are still part of the same filesystem. They
> are just not part of the s-tree.
yes. sorry i should say that file system still use s-tree to locate file
data while bitmap is to assist the block allocation and journal is for
consistency.
>
> >>>assumed i have 2GB or 4GB ram, which is not unbelievable for a desktop
> >>>now. but can these RAM be used by 32BIT arch?
> > The RAM can be used, sure, but not for the bitmaps. I believe the buffer
> > heads for the bitmaps need to come out of the memory < 1 GB. It would be
> > possible to put the bitmaps in high memory (like any other data), but
> > the patch to do so would likely be more involved than the dynamic bitmap
> > patch, and still waste the memory anyway.
> >
> >> yes, i also suspect this 1GB limit. So 64bit is the way and AMD64 is
> >> cheap anyway rite?
>
> Personally, I think so.
>
> > current disk head, that is an operation that is performed by the block
> > layer. It can make the best decisions on that, since it its at the
> > lowest level of abstraction. It's entirely possible that a filesystem be
> > mounted via file-loopback on an NFS mount. In that case, the local
> > system has no information at all about where the disk head would be.
> >
> >> yes, but then block layer will need another bitmap to track which block
> >> is used or not and also do a mapping again...
> >
> >> the cost of layering?
>
> The ideas of "in use" and "available" are purely filesystem abstractions
> to keep track of where we already have filesystem data/metadata. The
> block layer doesn't know or care about them - it's just a collection of
> blocks that the user may do whatever they please with. Now, not to
> confuse the issue, but the example of a loopback-mounted filesystem can
> cause an allocation if the host file is sparse, but that's really a
> corner case.
>
yes, that is cost worthy being paid. file system just need a set of
blocks to working on...
> - -Jeff
>
> - --
> Jeff Mahoney
> SuSE Labs
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.0 (GNU/Linux)
>
> iD8DBQFDExsrLPWxlyuTD7IRAvGmAJ9QU16I2oz/kkCbqwdeGcIgkey8TgCgqS8s
> lI6YzJEJ20j5LiheAqw6eoE=
> =YD9V
> -----END PGP SIGNATURE-----
next prev parent reply other threads:[~2005-08-29 14:41 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-08-26 16:45 reiser fs slow on mksf and mount Ming Zhang
2005-08-26 17:04 ` Vladimir V. Saveliev
2005-08-26 17:08 ` Ming Zhang
2005-08-26 17:15 ` Ming Zhang
2005-08-26 17:32 ` Vladimir V. Saveliev
2005-08-26 18:07 ` Ming Zhang
2005-08-26 18:16 ` Ming Zhang
2005-08-27 19:29 ` Jeff Mahoney
2005-08-27 21:45 ` Christian Iversen
2005-08-27 21:55 ` David Masover
2005-08-29 19:44 ` Hans Reiser
2005-08-27 22:54 ` Ming Zhang
2005-08-29 15:07 ` Jeff Mahoney
2005-08-27 22:53 ` Ming Zhang
2005-08-28 0:01 ` Jeff Mahoney
2005-08-28 15:40 ` Ming Zhang
2005-08-28 18:44 ` Jeff Mahoney
2005-08-29 12:39 ` Ming Zhang
2005-08-29 14:26 ` Jeff Mahoney
2005-08-29 14:41 ` Ming Zhang [this message]
2005-08-29 14:51 ` Jeff Mahoney
2005-08-29 15:20 ` Ming Zhang
2005-08-29 15:28 ` Jeff Mahoney
2005-08-29 15:37 ` Ming Zhang
2005-08-29 19:40 ` Hans Reiser
2005-08-29 19:44 ` Jeff Mahoney
2005-08-29 19:53 ` Hans Reiser
2005-08-29 16:44 ` Ming Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1125326490.5544.56.camel@localhost.localdomain \
--to=mingz@ele.uri.edu \
--cc=jeffm@suse.com \
--cc=reiserfs-list@namesys.com \
--cc=vs@namesys.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.