From: Anton Altaparmakov <aia21@cam.ac.uk>
To: Jan Koss <kossjan@gmail.com>
Cc: kernelnewbies@nl.linux.org, linux-fsdevel@vger.kernel.org
Subject: Re: fragmentation && blocks "realloc"
Date: Sun, 22 Jan 2006 21:32:20 +0000 (GMT) [thread overview]
Message-ID: <Pine.LNX.4.64.0601222108070.17838@hermes-2.csi.cam.ac.uk> (raw)
In-Reply-To: <b97d23040601221258n45af03e5kfec8e96132a83462@mail.gmail.com>
Hi,
On Sun, 22 Jan 2006, Jan Koss wrote:
> >(They access the block device directly, completely
> >bypassing the page cache so you are breaking cache coherency and are 100%
> >broken by design.)
>
> Oh... I thought that start from 2.4.x there are no separate implementation
> of working with blocks and pages, when you read block, kernel read whole page,
> am I wrong?
There is a very big difference. If you do sb_bread() you are reading a
block from the block device. And yes this block is attached to a page but
it is a page belonging to the block device address space mapping. You
cannot do anything to this block other than read/write it.
If you use the page cache to access the contents of a file, then that file
(or more precisely the inode of that file) will have an address space
mapping of its own, completely independent of the address space mapping of
the block device inode. Those pages will (or will not) have buffers
attached to them (your getblock() callback is there exactly to allow the
buffers to be created and mapped if they are not there). Those buffers
will be part of the file page cache page, thus part of the inode's address
space mapping, and those buffers have no meaning other than to say "the
data in this part of the page belongs to blockdevice so and so and to
blocknumber on that block device so and so". So you can change the
b_blocknr on those buffers to your hearts content (well you need to
observe necessarily locking so buffers under i/o don't get screwed) and
that is no problem.
Note that the buffers from the block device address space mapping are
COMPLETELY separate from the buffers from a file inode address space
mapping. So writes from one are NOT seen in the other and you NEVER can
mix the two forms of i/o and expect to have a working file system. You
will get random results and tons of weird data corruption that way.
> > They only way to help you
> > is to see your whole file system code
>
> If we need some handhold for discussion, lets talk about minix v.1
> (my file system derive from this code).
> Lets suppose I want make algorigth of allocation blocks in
> fs/minix/bitmap.c: minix_new_block more inteligent.
>
> I should say that minix code use sb_bread/brelse and work with pages (for
> example fs/minix/dir.c).
Er, not on current kernels:
$ grep bread linux-2.6/fs/minix/*
bitmap.c: *bh = sb_bread(sb, block);
bitmap.c: *bh = sb_bread(sb, block);
inode.c: if (!(bh = sb_bread(s, 1)))
inode.c: if (!(sbi->s_imap[i]=sb_bread(s, block)))
inode.c: if (!(sbi->s_zmap[i]=sb_bread(s, block)))
itree_common.c: bh = sb_bread(sb, block_to_cpu(p->key));
itree_common.c: bh = sb_bread(inode->i_sb, nr);
Are you working on 2.4 by any chance? If you are writing a new fs I would
strongly recommend you to work on 2.6 kernels otherwise you are writing
something that is already out of date...
The only thing minix in current 2.6 kernel uses bread for is to read the
on-disk inodes themselves. It never uses it to access file data at all
and I very much doubt that even old 2.4 kernels ever use bread for
anything that is not strictly metadata rather than file data.
> So instead of allocation one additional block,
> I want "realloc" blocks, so all file will occupy several consecutive blocks.
>
> And we stop on such code
> bh->b_blocknr = newblk;
> unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);
> mark_buffer_dirty (bh);
>
> And question how should I get this _bh_, if I can not use sb_bread?
That depends entirely in which function you are / which call path you are
in at present. Taking minix as an example, tell me the call path where
you end up wanting to do the above and I will tell you where to get the bh
from... (-:
Btw. don't think this is all that easy. If you want to keep whole files
rather than whole pages of buffers in consecutive blocks you are in for
some very serious fun with multi-page locking and/or complete i/o
serialisation, i.e. when a write is happening all other writes on the same
file will just block...
Best regards,
Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
next prev parent reply other threads:[~2006-01-22 21:32 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-20 11:47 fragmentation && blocks "realloc" Jan Koss
2006-01-20 13:34 ` Anton Altaparmakov
2006-01-20 15:46 ` Jan Koss
2006-01-20 19:22 ` Jan Koss
2006-01-20 20:11 ` Anton Altaparmakov
2006-01-21 9:42 ` Jan Koss
2006-01-21 20:28 ` Anton Altaparmakov
2006-01-22 20:58 ` Jan Koss
2006-01-22 21:32 ` Anton Altaparmakov [this message]
2006-01-22 22:05 ` Jan Koss
2006-01-24 10:37 ` Anton Altaparmakov
2006-02-23 21:47 ` Nate Diller
2006-01-20 20:04 ` Anton Altaparmakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0601222108070.17838@hermes-2.csi.cam.ac.uk \
--to=aia21@cam.ac.uk \
--cc=kernelnewbies@nl.linux.org \
--cc=kossjan@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).