Re: fragmentation && blocks "realloc"

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Anton Altaparmakov <aia21@cam.ac.uk>
To: Jan Koss <kossjan@gmail.com>
Cc: kernelnewbies@nl.linux.org, linux-fsdevel@vger.kernel.org
Subject: Re: fragmentation && blocks "realloc"
Date: Tue, 24 Jan 2006 10:37:41 +0000 (GMT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0601241012150.7092@hermes-2.csi.cam.ac.uk> (raw)
In-Reply-To: <b97d23040601221405p228451a7n18c69780b8772710@mail.gmail.com>

On Mon, 23 Jan 2006, Jan Koss wrote:
> On 1/23/06, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> > That depends entirely in which function you are / which call path you are
> > in at present.  Taking minix as an example, tell me the call path where
> > you end up wanting to do the above and I will tell you where to get the bh
> > from...  (-:
> 
> I told about 2.6.15.
> 
> in fs/minix/bitmap.c there is minix_new_block we come in it from get_block in
> fs/minix/itree_common.c.
> 
> After analizing blocks<->file I want move some blocks to another location
> and update page cache correspondingly, what should I do?

<Argh I just spent ages writing an email and it got lost when the internet 
connection died...  I only have what was visible on the terminal screen, 
so starting again on the rest...>

You cannot do what you want from such a low level because the upper layers 
hold locks that you need.  For example a readpage/writepage/prepare_write 
can be running concurrently with get_block() and even other instances of 
get_block() can be running at the same time and it would then be unsafe to 
do any sort of reallocation.  So you have to scrap that idea.

You could do it in higher up levels, i.e. in file ->write itself but again 
this introduces a lot of complexity to your file system.

Basically what you are trying to so is much harder than you think and
involves a lot of work...

There is a possible alternative however.  Your get_block function could 
take a reference on the inode (i_count), set a flag in the file system 
specific inode "need realloc" and add the inode to a queue of a 
"realloc-demon" for your fs which is just a kernel thread which will run 
periodically, say every five seconds, and it will take inodes one after 
the other from its queue, then take all necessary locks so you can do this 
(e.g i_mutex on the inode as well as i_alloc_sem/i_alloc_mutex - whatever 
it is called now) - note you will probably need an extra lock to prevent 
entry into readpage/writepage whilst this is happening and your 
readpage/writepage will need to take that lock for reading whilst your 
daemon takes it for writing so multiple read/writepage can run 
simultaneously but your deamon runs exclusive.

Then, if the inode is marked "need realloc" it will allocate a contiguous 
chunk of space equal to the file size, clear the "need-realloc" bit, do 
the reallocation by starting at the first page (index 0) and working 
upwards, getting it (warning: deadlock possible with a read or writepage 
holding that page's lock and blocked on your "realloc lock" so maybe 
trylock and if fails abort and requeue the inode to the daemon at the end 
of the queue), then when you have a page, loop around its buffers and for 
each buffer move it from the old allocation to the new one as I described 
earlier (i.e. just change b_blocknr, invalidate underlying metadata, mark 
the buffer dirty).

That or something simillar should work with minimal impact on your 
existing fs code.  And it has the huge benefit or performing the reallocs 
in the back ground.  Otherwise your original idea would be disastrous to 
performance.  Imagine a 8G file that you are appending data to.  Every 
time you append a new block you may end up having to reallocate the file 
from inside your get_block (you don't know that more writes are coming in 
a second) and each time it will take a few minutes so each little write 
will hang the system for a few minutes - hardly what you want...

And the daemon at least batches things in 5 second intervals so multiple 
"need realloc" settings on an inode will be done in one go every 5 
seconds.

You know, if it was that easy to keep fragmentation close or even equal to 
zero at all times without impact on performance, all file systems would be 
already doing that.  (-;

Hope this gives you a starting point if nothing else.

Best regards,

	Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

next prev parent reply	other threads:[~2006-01-24 10:37 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-20 11:47 fragmentation && blocks "realloc" Jan Koss
2006-01-20 13:34 ` Anton Altaparmakov
2006-01-20 15:46   ` Jan Koss
2006-01-20 19:22     ` Jan Koss
2006-01-20 20:11       ` Anton Altaparmakov
2006-01-21  9:42         ` Jan Koss
2006-01-21 20:28           ` Anton Altaparmakov
2006-01-22 20:58             ` Jan Koss
2006-01-22 21:32               ` Anton Altaparmakov
2006-01-22 22:05                 ` Jan Koss
2006-01-24 10:37                   ` Anton Altaparmakov [this message]
2006-02-23 21:47                     ` Nate Diller
2006-01-20 20:04     ` Anton Altaparmakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0601241012150.7092@hermes-2.csi.cam.ac.uk \
    --to=aia21@cam.ac.uk \
    --cc=kernelnewbies@nl.linux.org \
    --cc=kossjan@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).