Re: fragmentation && blocks "realloc"

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Anton Altaparmakov <aia21@cam.ac.uk>
To: Jan Koss <kossjan@gmail.com>
Cc: kernelnewbies@nl.linux.org, linux-fsdevel@vger.kernel.org
Subject: Re: fragmentation && blocks "realloc"
Date: Sun, 22 Jan 2006 21:32:20 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0601222108070.17838@hermes-2.csi.cam.ac.uk> (raw)
In-Reply-To: <b97d23040601221258n45af03e5kfec8e96132a83462@mail.gmail.com>

Hi,

On Sun, 22 Jan 2006, Jan Koss wrote:
> >(They access the block device directly, completely
> >bypassing the page cache so you are breaking cache coherency and are 100%
> >broken by design.)
> 
> Oh... I thought that start from 2.4.x there are no separate implementation
> of working with blocks and pages, when you read block, kernel read whole page,
> am I wrong?

There is a very big difference.  If you do sb_bread() you are reading a 
block from the block device.  And yes this block is attached to a page but 
it is a page belonging to the block device address space mapping.  You 
cannot do anything to this block other than read/write it.

If you use the page cache to access the contents of a file, then that file 
(or more precisely the inode of that file) will have an address space 
mapping of its own, completely independent of the address space mapping of 
the block device inode.  Those pages will (or will not) have buffers 
attached to them (your getblock() callback is there exactly to allow the 
buffers to be created and mapped if they are not there).  Those buffers 
will be part of the file page cache page, thus part of the inode's address 
space mapping, and those buffers have no meaning other than to say "the 
data in this part of the page belongs to blockdevice so and so and to 
blocknumber on that block device so and so".  So you can change the 
b_blocknr on those buffers to your hearts content (well you need to 
observe necessarily locking so buffers under i/o don't get screwed) and 
that is no problem.

Note that the buffers from the block device address space mapping are 
COMPLETELY separate from the buffers from a file inode address space 
mapping.  So writes from one are NOT seen in the other and you NEVER can 
mix the two forms of i/o and expect to have a working file system.  You 
will get random results and tons of weird data corruption that way.

> > They only way to help you
> > is to see your whole file system code
> 
> If we need some handhold for discussion, lets talk about minix v.1
> (my file system derive from this code).
> Lets suppose I want make algorigth of allocation blocks in
> fs/minix/bitmap.c: minix_new_block more inteligent.
> 
> I should say that minix code use sb_bread/brelse and work with pages (for
> example fs/minix/dir.c).

Er, not on current kernels:

$ grep bread linux-2.6/fs/minix/*
bitmap.c:       *bh = sb_bread(sb, block);
bitmap.c:       *bh = sb_bread(sb, block);
inode.c:        if (!(bh = sb_bread(s, 1)))
inode.c:                if (!(sbi->s_imap[i]=sb_bread(s, block)))
inode.c:                if (!(sbi->s_zmap[i]=sb_bread(s, block)))
itree_common.c:         bh = sb_bread(sb, block_to_cpu(p->key));
itree_common.c:                 bh = sb_bread(inode->i_sb, nr);

Are you working on 2.4 by any chance?  If you are writing a new fs I would 
strongly recommend you to work on 2.6 kernels otherwise you are writing 
something that is already out of date...

The only thing minix in current 2.6 kernel uses bread for is to read the 
on-disk inodes themselves.  It never uses it to access file data at all 
and I very much doubt that even old 2.4 kernels ever use bread for 
anything that is not strictly metadata rather than file data.

> So instead of allocation one additional block,
> I want "realloc" blocks, so all file will occupy several consecutive blocks.
> 
> And we stop on such code
>  bh->b_blocknr = newblk;
>  unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);
>  mark_buffer_dirty (bh);
> 
> And question how should I get this _bh_, if I can not use sb_bread?

That depends entirely in which function you are / which call path you are 
in at present.  Taking minix as an example, tell me the call path where 
you end up wanting to do the above and I will tell you where to get the bh 
from...  (-:

Btw. don't think this is all that easy.  If you want to keep whole files 
rather than whole pages of buffers in consecutive blocks you are in for 
some very serious fun with multi-page locking and/or complete i/o 
serialisation, i.e. when a write is happening all other writes on the same 
file will just block...

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

next prev parent reply	other threads:[~2006-01-22 21:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-20 11:47 fragmentation && blocks "realloc" Jan Koss
2006-01-20 13:34 ` Anton Altaparmakov
2006-01-20 15:46   ` Jan Koss
2006-01-20 19:22     ` Jan Koss
2006-01-20 20:11       ` Anton Altaparmakov
2006-01-21  9:42         ` Jan Koss
2006-01-21 20:28           ` Anton Altaparmakov
2006-01-22 20:58             ` Jan Koss
2006-01-22 21:32               ` Anton Altaparmakov [this message]
2006-01-22 22:05                 ` Jan Koss
2006-01-24 10:37                   ` Anton Altaparmakov
2006-02-23 21:47                     ` Nate Diller
2006-01-20 20:04     ` Anton Altaparmakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0601222108070.17838@hermes-2.csi.cam.ac.uk \
    --to=aia21@cam.ac.uk \
    --cc=kernelnewbies@nl.linux.org \
    --cc=kossjan@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).