From: Anton Altaparmakov <aia21@cam.ac.uk>
To: Jan Koss <kossjan@gmail.com>
Cc: kernelnewbies@nl.linux.org, linux-fsdevel@vger.kernel.org
Subject: Re: fragmentation && blocks "realloc"
Date: Tue, 24 Jan 2006 10:37:41 +0000 (GMT) [thread overview]
Message-ID: <Pine.LNX.4.64.0601241012150.7092@hermes-2.csi.cam.ac.uk> (raw)
In-Reply-To: <b97d23040601221405p228451a7n18c69780b8772710@mail.gmail.com>
On Mon, 23 Jan 2006, Jan Koss wrote:
> On 1/23/06, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> > That depends entirely in which function you are / which call path you are
> > in at present. Taking minix as an example, tell me the call path where
> > you end up wanting to do the above and I will tell you where to get the bh
> > from... (-:
>
> I told about 2.6.15.
>
> in fs/minix/bitmap.c there is minix_new_block we come in it from get_block in
> fs/minix/itree_common.c.
>
> After analizing blocks<->file I want move some blocks to another location
> and update page cache correspondingly, what should I do?
<Argh I just spent ages writing an email and it got lost when the internet
connection died... I only have what was visible on the terminal screen,
so starting again on the rest...>
You cannot do what you want from such a low level because the upper layers
hold locks that you need. For example a readpage/writepage/prepare_write
can be running concurrently with get_block() and even other instances of
get_block() can be running at the same time and it would then be unsafe to
do any sort of reallocation. So you have to scrap that idea.
You could do it in higher up levels, i.e. in file ->write itself but again
this introduces a lot of complexity to your file system.
Basically what you are trying to so is much harder than you think and
involves a lot of work...
There is a possible alternative however. Your get_block function could
take a reference on the inode (i_count), set a flag in the file system
specific inode "need realloc" and add the inode to a queue of a
"realloc-demon" for your fs which is just a kernel thread which will run
periodically, say every five seconds, and it will take inodes one after
the other from its queue, then take all necessary locks so you can do this
(e.g i_mutex on the inode as well as i_alloc_sem/i_alloc_mutex - whatever
it is called now) - note you will probably need an extra lock to prevent
entry into readpage/writepage whilst this is happening and your
readpage/writepage will need to take that lock for reading whilst your
daemon takes it for writing so multiple read/writepage can run
simultaneously but your deamon runs exclusive.
Then, if the inode is marked "need realloc" it will allocate a contiguous
chunk of space equal to the file size, clear the "need-realloc" bit, do
the reallocation by starting at the first page (index 0) and working
upwards, getting it (warning: deadlock possible with a read or writepage
holding that page's lock and blocked on your "realloc lock" so maybe
trylock and if fails abort and requeue the inode to the daemon at the end
of the queue), then when you have a page, loop around its buffers and for
each buffer move it from the old allocation to the new one as I described
earlier (i.e. just change b_blocknr, invalidate underlying metadata, mark
the buffer dirty).
That or something simillar should work with minimal impact on your
existing fs code. And it has the huge benefit or performing the reallocs
in the back ground. Otherwise your original idea would be disastrous to
performance. Imagine a 8G file that you are appending data to. Every
time you append a new block you may end up having to reallocate the file
from inside your get_block (you don't know that more writes are coming in
a second) and each time it will take a few minutes so each little write
will hang the system for a few minutes - hardly what you want...
And the daemon at least batches things in 5 second intervals so multiple
"need realloc" settings on an inode will be done in one go every 5
seconds.
You know, if it was that easy to keep fragmentation close or even equal to
zero at all times without impact on performance, all file systems would be
already doing that. (-;
Hope this gives you a starting point if nothing else.
Best regards,
Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
next prev parent reply other threads:[~2006-01-24 10:37 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-20 11:47 fragmentation && blocks "realloc" Jan Koss
2006-01-20 13:34 ` Anton Altaparmakov
2006-01-20 15:46 ` Jan Koss
2006-01-20 19:22 ` Jan Koss
2006-01-20 20:11 ` Anton Altaparmakov
2006-01-21 9:42 ` Jan Koss
2006-01-21 20:28 ` Anton Altaparmakov
2006-01-22 20:58 ` Jan Koss
2006-01-22 21:32 ` Anton Altaparmakov
2006-01-22 22:05 ` Jan Koss
2006-01-24 10:37 ` Anton Altaparmakov [this message]
2006-02-23 21:47 ` Nate Diller
2006-01-20 20:04 ` Anton Altaparmakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0601241012150.7092@hermes-2.csi.cam.ac.uk \
--to=aia21@cam.ac.uk \
--cc=kernelnewbies@nl.linux.org \
--cc=kossjan@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).