From: Hans Reiser <reiser@namesys.com>
To: Shawn Starr <spstarr@sh0n.net>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Possible Idea with filesystem buffering.
Date: Wed, 23 Jan 2002 01:02:02 +0300 [thread overview]
Message-ID: <3C4DE15A.8060700@namesys.com> (raw)
In-Reply-To: <Pine.LNX.4.40.0201202100340.455-100000@coredump.sh0n.net> <1011640576.21632.0.camel@unaropia>
Shawn, I didn't respond to this because it seems like you are mixing in
issues relating to the elevator code into this, and so I don't really
understand you.
Hans
Shawn Starr wrote:
>Nobody wants to comment on this? :(
>
>Shawn.
>
>On Sun, 2002-01-20 at 21:29, Shawn Starr wrote:
>
>>On Mon, 21 Jan 2002, Anton Altaparmakov wrote:
>>
>>>[snip]
>>>At 00:57 21/01/02, Hans Reiser wrote:
>>>[snip]
>>> > Would be best if VM told us if we really must write that page.
>>>
>>>In theory the VM should never call writepage unless the page must be writen
>>>out...
>>>
>>>But I agree with you that it would be good to be able to distinguish the
>>>two cases. I have been thinking about this a bit in the context of NTFS TNG
>>>but I think that it would be better to have a generic solution rather than
>>>every fs does their own copy of the same thing. I envisage that there is a
>>>flush daemon which just walks around writing pages to disk in the
>>>background (there could be one per fs, or a generic one which fs register
>>>with, at their option they could have their own of course) in order to keep
>>>the number of dirty pages low and in order to minimize data loss on the
>>>event of system/power failure.
>>>
>>>This demon requires several interfaces though, with regards to journalling
>>>fs. The daemon should have an interface where the fs can say "commit pages
>>>in this list NOW and do not return before done", also a barrier operation
>>>would be required in journalling context. A transactions interface would be
>>>ideal, where the fs can submit whole transactions consisting of writing out
>>>a list of pages and optional write barriers; e.g. write journal pages x, y,
>>>z, barrier, write metadata, perhaps barrier, finally write data pages a, b,
>>>c. Simple file systems could just not bother at all and rely on the flush
>>>daemon calling the fs to write the pages.
>>>
>>>Obviously when this daemon writes pages the pages will continue being
>>>there. OTOH, if the VM calls write page because it needs to free memory
>>>then writepage must write and clean the page.
>>>
>>if they are dirty and written immediately to the disk they can be cleaned
>>from the queue. It would be nice if there was some way to have a checksum
>>verify the data was written back then wipe it from the queue.
>>
>>As an example: 5 operations requested, 2 already in queue.
>>
>>In queue) DIRTY write to disk (this task has been in the queue for a
>>while)
>>
>>In queue) not 'old' memory but must be written to disk
>>
>>pending queue:
>>
>>1) read operation
>>2) read operation
>>3) Write operation
>>4) write operation
>>
>>The daemon should resort the priority write dirty pages to disk then write
>>nay other pages that are left on queue, then get to read pages.
>>
>>
>>Notes:
>>
>>If there is only one operation in the queue (say write) and nothing else
>>comes along, then the daemon should force-write the data back to disk
>>after a period of timeout (the memory in the slot becomes dirty)
>>
>>If there's too many tasks in the queue and another one requires more
>>memory then whats left in the buffer/cache the daemon could request to
>>store the request in swap memory and put it in the queue, if the request
>>is a write request it would have more priority then any read requests
>>still and get completed quickly allowing for remaining queue events to
>>complete.
>>
>>Example:
>>
>>ReiserFS:
>> Operation A. Write (10K)
>> Operation B. Read (200K)
>> Operation C. Write (160K)
>>
>>
>>XFS:
>> Operation A. Read (63K)
>> Operation B. Read (3k)
>> Operation C. Write (10K)
>>
>>
>>EXT3:
>> Operation A. Write (290K)
>> Operation B. Write (90K)
>> Operation C. Read (3k)
>>
>>the kpagebuf (or whatever name). Would get all these requests and sort out
>>what needs to be done first as long as there's buffer/cache memory free
>>the write operations would be done as fast as possible, verified by some
>>checksum and purged from the queue, If there's no cache/buffer memory
>>free then all write queues reguardless of being in swap or cache/buffer need to be
>>written to disk.
>>
>>So:
>>kpagebuf queue (total available buffer/cache memory is say 512K)
>>
>> EXT3 Write (290K)
>> ReiserFS Write (160K)
>> ReiserFS Write (10K)
>> XFS Write (10K)
>> EXT3 Write (90K) - Goes in swap because total > 512K (Dirty x2 state)
>> ReiserFS Read (200K) - Swap (dirty x2)
>> XFS Read (63K) - Swap (dirty x2)
>> XFS Read (3K) - Swap (dirty x2)
>> EXT3 Read (3K) - Swap (dirty x2)
>>
>>* The daemon would check in order of filesystem registeration for whos
>>should be in the read queue first.
>>
>>* The daemon should maximize amount of memory stored in bufeer/cache to
>>try to prevent write requests having to go into swap.
>>
>>In the above queue, we have a lot of read operations and one write
>>operation in swap. Clean out the write operations since they are now dirty
>>(because there's no room for more operations in the buffer/cache). Move
>>the swapped write operation to the top of the queue and get rid of it.
>>Move the read operations from swap to queue since there is room again. **
>>NOTE ** because those read requests are now dirty they MUST be delt with
>>or they'll get stuck in the queue with more write requests overtaking
>>them.
>>
>>Maybe I've lost it but that's how I see it ;)
>>
>>Shawn.
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at http://www.tux.org/lkml/
>>
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
next prev parent reply other threads:[~2002-01-22 22:06 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-01-20 9:04 Possible Idea with filesystem buffering Shawn
2002-01-20 11:31 ` Hans Reiser
2002-01-20 13:56 ` Rik van Riel
2002-01-20 14:21 ` Hans Reiser
2002-01-20 15:13 ` Rik van Riel
2002-01-20 21:15 ` Hans Reiser
2002-01-20 21:24 ` Rik van Riel
2002-01-20 21:30 ` Hans Reiser
2002-01-20 21:40 ` Rik van Riel
2002-01-20 21:49 ` Hans Reiser
2002-01-20 22:00 ` Rik van Riel
2002-01-21 0:10 ` Matt
2002-01-21 0:57 ` Hans Reiser
2002-01-21 1:28 ` Anton Altaparmakov
2002-01-21 2:29 ` Shawn Starr
2002-01-21 19:15 ` Shawn Starr
2002-01-22 22:02 ` Hans Reiser [this message]
2002-01-21 9:21 ` Horst von Brand
2002-01-21 9:13 ` Horst von Brand
2002-01-21 15:29 ` Eric W. Biederman
2002-01-20 17:51 ` Mark Hahn
2002-01-20 21:24 ` Hans Reiser
2002-01-20 21:32 ` Rik van Riel
2002-01-21 15:37 ` Eric W. Biederman
2002-01-20 22:45 ` Shawn Starr
2002-01-20 23:11 ` Rik van Riel
2002-01-20 23:40 ` Shawn Starr
2002-01-20 23:48 ` Rik van Riel
2002-01-21 0:44 ` Hans Reiser
2002-01-21 0:52 ` Rik van Riel
2002-01-21 1:08 ` Hans Reiser
2002-01-21 1:39 ` Rik van Riel
2002-01-21 11:10 ` Hans Reiser
2002-01-21 12:12 ` Rik van Riel
2002-01-21 13:42 ` Hans Reiser
2002-01-21 13:54 ` Rik van Riel
2002-01-21 14:07 ` Hans Reiser
2002-01-21 17:21 ` Chris Mason
2002-01-21 17:47 ` Hans Reiser
2002-01-21 19:44 ` Chris Mason
2002-01-21 20:41 ` Hans Reiser
2002-01-21 21:53 ` Chris Mason
2002-01-22 6:02 ` Andreas Dilger
2002-01-22 10:09 ` Tommi Kyntola
2002-01-22 11:39 ` Hans Reiser
2002-01-22 18:41 ` Andrew Morton
2002-01-22 19:03 ` Rik van Riel
2002-01-23 20:35 ` [Ext2-devel] " Stephen C. Tweedie
2002-01-23 20:48 ` Hans Reiser
2002-01-23 20:55 ` Andrew Morton
2002-01-23 23:53 ` Hugh Dickins
2002-01-24 0:01 ` Jeff Garzik
2002-01-22 20:19 ` Hans Reiser
2002-01-22 20:50 ` Rik van Riel
2002-01-22 14:03 ` Chris Mason
2002-01-22 14:39 ` Rik van Riel
2002-01-22 18:46 ` Hans Reiser
2002-01-22 19:19 ` Chris Mason
2002-01-22 20:13 ` Steve Lord
2002-01-22 21:22 ` Chris Mason
2002-01-22 20:32 ` Hans Reiser
2002-01-22 21:08 ` Chris Mason
2002-01-22 22:05 ` Hans Reiser
2002-01-22 22:21 ` Rik van Riel
2002-01-23 0:16 ` Hans Reiser
2002-01-22 22:10 ` Richard B. Johnson
2002-01-23 1:14 ` Stuart Young
2002-01-23 17:16 ` Daniel Phillips
2002-01-22 21:12 ` Rik van Riel
2002-01-22 21:28 ` Shawn Starr
2002-01-22 21:31 ` Rik van Riel
2002-01-22 20:20 ` Rik van Riel
2002-01-22 22:31 ` Hans Reiser
2002-01-22 23:34 ` Rik van Riel
2002-01-23 17:15 ` Josh MacDonald
2002-01-21 0:28 ` Hans Reiser
2002-01-21 0:47 ` Rik van Riel
2002-01-21 1:01 ` Hans Reiser
2002-01-21 1:21 ` Rik van Riel
2002-01-21 1:26 ` Hans Reiser
2002-01-21 1:40 ` Rik van Riel
2002-01-20 15:49 ` Anton Altaparmakov
2002-01-20 21:21 ` Hans Reiser
-- strict thread matches above, loose matches on Subject: below --
2002-01-22 21:02 Rolf Lear
[not found] <Pine.LNX.4.33L.0201222008280.32617-100000@imladris.surriel.com>
2002-01-22 23:31 ` Shawn Starr
2002-01-22 23:37 ` Rik van Riel
2002-01-23 5:26 ` Shawn Starr
2002-01-23 9:43 Martin Knoblauch
2002-01-23 11:52 ` Helge Hafting
2002-01-23 12:02 ` Rik van Riel
2002-01-23 12:11 ` Martin Knoblauch
[not found] <Pine.LNX.4.33.0201231301560.24338-100000@coffee.psychology.mcmaster.ca>
[not found] ` <3C4FC478.BCC44CDF@TeraPort.de>
[not found] ` <3C4FDB80.C9F83EBB@aitel.hist.no>
2002-01-24 13:59 ` Martin Knoblauch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3C4DE15A.8060700@namesys.com \
--to=reiser@namesys.com \
--cc=linux-kernel@vger.kernel.org \
--cc=spstarr@sh0n.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox